6IT

Obfuscating Protocols

One wide class of bot-protections which is IMNSHO drastically underestimated (and as a result – grossly underused), is time-based ones.

Let’s separate all time-based protections into two categories: “local” ones (which do NOT involve going to the Server-Side), and “Server-Side” ones.

Local Time-Based Protection

First, let’s see what we can do while staying completely on our Client-Side. Apparently, a lot.

Time-Based Debugging Detection

Whenever we detect that the time spent within a piece-of-non-blocking-code, is more than a few seconds – then, either the system is hopelessly swapping, or we're being debugged

What we can do rather easily with time measurements, is to detect debugging. For example, if we have a piece of non-blocking code, and mark it as such – then, whenever we detect that the time between entering such a piece-of-non-blocking-code and leaving it, is more than a few seconds – then, in any realistic scenario, we can safely say that one of the following occurred:

Either the system is hopelessly swapping
Or we’re being debugged.

If the system is swapping that hopelessly, our game is unplayable anyway, so whatever-we-do-with-it, it won’t realistically change anything (so false positives are not a problem¹). And if we’re being debugged – well, this is what we’re looking for in the first place.

Practically, in C++ it can be implemented as a C++ class ObfNonBlockingCode, with its constructor measuring and storing current time, and destructor measuring the time again, subtracting these two times, and then using this measured time difference. In ithare::obf library NoBugs2018, measured time difference is used as follows:

Measured time difference is divided by some pre-defined threshold (with the threshold pre-calculated in such a manner, that normally, after the division it becomes zero).² In other words, at this point we have a value which is either zero (indicating that we’re not debugged), or non-zero (which indicates that we ARE being debugged)
- Speaking of thresholds, ithare::obf currently uses something-of-the-order-of-15-seconds; this is MUCH more than usually-experienced-delays-due-to-thread-rescheduling (which are usually within hundreds-of-milliseconds), and on the other hand, is still small enough so that it is rather likely for a human being to spend more-than-that in debugging.
Then, this “should-be-zero-if-not-debugged” value can be used either directly, or as data-to-mix-into-the-obfuscation-as-supposed-zero-constant (so, if our variable is not zero, the data will be corrupted and will likely lead to a crash sooner rather than later).

¹ Make sure to check it with your legal department, because it MAY depend on the-action-you’ll-be-taking (!)

² and of course, as our calculations are very approximate – we can always replace division with a right-shift-for-appropriate-number-of-bits

Ways to Measure Time

Our next question is “how we can measure the time for our obfuscation purposes?” In general, at least for x86/x64, I know of three different approaches:

TBH, I am not fond of using system-level calls for obfuscation purposes (they are waaaay too obvious in the binary code).
System-level calls such as GetTickCount(), QueryPerformanceCounter(), or gettimeofday(). TBH, I am not fond of using system-level calls for obfuscation purposes (they are waaaay too obvious in the binary code). If there is nothing else - they might do, but given any other choice - I'd avoid them (again - for obfuscation purposes).
RDTSC instruction (either using __rdtsc() or inline asm). RDTSC is very lightweight (which is a plus), but on the negative side – it is still rather obvious in binary code.
- As for other risks of using RDTSC, MSDN says: “We strongly discourage using the RDTSC or RDTSCP processor instruction to directly query the TSC because you won't get reliable results on some versions of Windows, across live migrations of virtual machines, and on hardware systems without invariant or tightly synchronized TSCs.” Well, let’s discuss it one-by-one:
- - I have never seen those “some versions of Windows” where TSC wasn’t reliable. Moreover, I have never heard of somebody seeing them. [[TODO: some ACPI-related issues were reported; not that we REALLY care, but it should be possible to gather stats and ignore one-off issues]]
  - Hardware systems causing problems with TSC, did indeed exist at least in the past, but they were limited to multi-socket boxes (in fact, all such occurrences I know about, were related to motherboard failing to synchronize TSC across different sockets). As (a) multi-sockets are extremely unlikely to be encountered on Client-Side, and (b) as inter-socket discrepancies are extremely unlikely to compare to 15 seconds (hey, even if motherboard doesn’t synchronize CPUs-in-different-sockets properly, they will still start within some tens-of-milliseconds) – I don’t see it as a problem.³
  - As for live migrations of virtual machines, for our MOG-related Client-Side stuff, it is very unlikely to happen.
- As a result, ithare::obf still takes its chances with RDTSC – unless a better memory-reading option mentioned below is available .
Direct reading of system-provided memory. At least under Windows, there is a not-so-obvious way of obtaining time. The thing is that, as noted in [email protected], GetTickCount() is nothing more than reading from two-fixed-addresses with some basic math afterwards. Perfect – we can do it ourselves without ever going to system library, easily <wink />. A relevant fragment from ithare::obf:

#define ITHARE_OBF_TIME_NOW() ((uint64_t((*(uint32_t*)(0x7FFE'0320)))\
															*uint64_t((*(uint32_t*)(0x7FFE'0004))))>>0x18)

If, on top of it, we obfuscate those rather-obvious 0x7FFE’XXXX constants (along the lines discussed in "Obfuscating literals" section above) – we’ll get a very non-obvious-in-binary-code way to read current time (and as an option, to have the-program-being-debugged, crash in an extremely non-obvious manner). Oh, and as a side benefit – even if we’re this obvious as listed above, to the best of my knowledge⁴ ScyllaHide isn’t able to fix it automagically.

³ Though until it is tested on a million-size player population, there is certainly some risk involved.

⁴ Which isn’t much in this case TBH, so make sure to test my claim yourself if trying to rely on it.

Detecting ScyllaHide

When dealing with such programs as ScyllaHide, it is possible to detect them using timing. In particular, discrepancies between elapsed time as measured by GetTickCount(), RDTSC, and values-read-from-SharedUserData, can be used for detection. Very briefly – whenever ScyllaHide installs timer-based hooks – it can be detected fairly easily. The first line of detection would be comparing those-values-read-from-SharedUserData (and/or RDTSC), with (hooked-by-Scylla) GetTickCount(). But even if Scylla manages to hide from this one – we’ll still be able to detect it using more-heavy-weight techniques which are more typical for VM detection (and described below).

Detecting Being Run under VM

If our Client app is running under virtual machine (which are increasingly commonly used to run bots <sad-face />), we still can use timer to detect it. In particular:

If VM does not virtualize RDTSC, then – at least in case of suspend/resume – we’re likely to see discrepancies between non-virtualized RDTSC and virtualized GetTickCount()/whatever-else. In addition, we’re likely to see spikes in times which RDTSC takes Ortega.
If VM does virtualize RDTSC, it is even simpler – we’ll see a much more consistent picture of RDTSC-taking-MUCH-longer than it should.

For a detailed discussion on using-RDTSC-to-detect-running-under-VM – see Ortega. Just make sure to account for sporadic huuuge delays due to thread context switches which happen right between our measurements (i.e. one single delay – or actually, any-delay-happening-once-in-a-blue-moon – is NOT a sufficient evidence of running under VM, but an average of a dozen of such measurements can easily be).

On ithare::obf: time-based detection of ScyllaHide and VMs is on the list, but is not currently high in priority (read: “it is going to take a looooong while…” <sad-face />).

Remote Time-Based Protection

As we can see, even when running under purely local conditions (and even under VM(!)), time measurements still can help us to detect that we’re being debugged – or that we run under VM. But if we can involve our Server-Side, our possibilities expand further:

First, even simplistic dropping the connection on the player timeout (which we should do anyway – see Vol. IV’s chapter on Communications), will make the hacker’s life much more unpleasant. Indeed – if sitting-within-debugger longer than 15-seconds-or-so, causes you to restart from scratch, it is quite annoying.
In addition, we can have our Server-Side to send challenges to the Client, and measure response times of the Client. Then, we can collect the statistics about the timing of these responses – and then to use these statistics (to raise red flags, or in some cases – even to ban the player outright).⁵
- BTW, the challenges can either come from Server to Client – or from Server all the way to the player (such as captchas). From what I’ve seen and heard – both tend to work pretty well to detect bots <wink />.
- In theory, it can even be generalized to the point when we can guess what is exactly the piece of code which is currently being debugged on the other side <wink />.
What if we send not just a challenge, but a “challenge which includes some piece of code to be executed on the Client-Side”?
Moreover, with Server-Side timing available, we can go even further than simple debugger/VM detection, and get to real-life bot detection. What if we send not just a challenge, but a “challenge which includes some piece of code to be executed on the Client-Side”? This way, decompiling this piece-of-code within the time-necessary-to-reply, becomes perfectly impossible, so we should be able to catch the cheater (either he doesn't try to block our code and gets caught by the code, or he does block the code and gets caught on the Server-Side) – and without that much hassle…
- This can become your Ultimate Tool for catching cheaters.
- OTOH, such an Ultimate Tool (and any Ultimate Tool) still has to be used very carefully. In particular, the information about kinds of data/system info this code will try reading to calculate the reply, is of very significant value (if the hacker knows an exhaustive list of such data – he’ll be able to avoid modifying it, staying “clean” during the real game session).

[[TODO: describe risks, pitfalls - and ways to mitigate. Very briefly - it falls under the same high-risk category as Client self-updates - and has to be treated accordingly (with LOTS of attention paid to ensure proper handling of signatures). In particular - private-key-used-for-signing, should stay on an air-gapped box, i.e. on a machine which has-never-been-exposed-to-the-Internet(!).]]

⁵ In such cases, it MIGHT be more beneficial to ban the offender right away, rather than to wait for the “ban wave” to come.

[[To Be Continued...

This concludes beta Chapter 29(j) from the upcoming book "Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)".

Stay tuned for Chapter 29(k), where we'll discuss how my beloved 😉 (Re)Actors tend to help us with anti-cheating]]