Random Number Generation

bbbook_cover_vol05_-330.png

[[This is Chapter 19 from "beta" Volume VI of the upcoming book "Development&Deployment of Multiplayer Online Games", which is currently being beta-tested. Beta-testing is intended to improve the quality of the book, and provides free e-copy of the "release" book to those who help with improving; for further details see "Book Beta Testing". All the content published during Beta Testing, is subject to change before the book is published.

To navigate through the book, you may want to use Development&Deployment of MOG: Table of Contents.]]

Random number generation is often seen as trivial by game developers (“hey, just call rand() and you’re done!”). Well, it is not really trivial. It is quite easy to make a very silly mistake, which can go unnoticed for years, and when you learn about it – the Pretty Bad Damage has already been done to your game/company/… 🙁 . 

Psychological aspects of RNG

One important thing to keep in mind is that humans tend to have very selective memory when it comes to good and bad occurrences. Which translates into:

even if your RNG is statistically perfect, people will still complain🙁

As Jeffrey Kaplan from Blizzard has put it: "We found that this had a lot of problems where players would run into streaks, and they only remembered the shitty streaks. " ShackNews.OnBlizzard.

surprised2Some years ago, I made an interesting exercise and found that I myself was guilty of it too.

Some years ago, I made an interesting exercise and found that I myself was guilty of it too. When I was playing Civ IV and my stronger unit got killed by a much weaker unit by AI opponent, I (as probably most of the people out there) was complaining about the whole random thing being rigged in favor of AI. Intuitively, I was perfectly sure that the thing is extremely unfair. However, I started to write down the stats of the occurrences. And found that while they didn’t pass certain spectral tests, overall averages were within expectations. In other words, while the RNG wasn’t rigged as such1 (though spectral correlations MIGHT have been somewhat unfair given the play patterns of me and AI), it certainly did feel unfair.

This might seem as an excuse to use poor RNGs, but it is not. Because

If your RNG is flawed, people will complain even more🙁

Even more importantly, if your RNG is flawed and it is important enough for players, they may gather stats proving that you are not doing a good job. Also, you will lose an option to prove them that your RNG is good (which is a separate challenge, see discussion in "Dealing with Perception Problems" section below). 

1 Of course, all Civ games at levels above, IIRC, Noble, are heavily rigged towards AI, but that’s by design and is well-known

RNG Alternatives

Some people in the industry have argued removing an element of luck altogether. Examples of such techniques to avoid removing randomicity include Blizzard's "progressive percentages" ShackNews.OnBlizzard and Prismata's concept of "DeckHand" Grant.

However, it should be noted that avoiding randomicity is certainly not a silver bullet and has its own drawbacks. As Jeffrey Kaplan has put it: "The problem was... we found that while it was great that it got rid of the bad streaks, it also got rid of all the good streaks".

BB_emotion_0008b.png My job here is to tell how to implement RNG properly if you decided that you do need randomicity

And Prismata, while avoiding randomicity after the game is started, still relies on random choices to start the game with (which means RNG, and also means player complaints Grant).

To summarise - whether you need an RNG, is a gameplay choice, and in this book I'm trying to avoid arguing pro or contra  specific gameplay choices. My job here is to tell how to implement RNG properly if you decided that you do need randomicity (and for most of the games out there you will, believe me 😉 ).

Inherent Difficulties with Finding RNG Faults

Before we start discussing RNGs, let's make one unpleasant observation:

Faulty RNG can easily sit there for years without you noticing it 🙁

Faulty RNG may easily produce a bit stream which looks good enough at the first glance ("hey, it certainly looks as a random hex string every time I look at it"), and deficiencies may surface only after serious tool-based analysis as described below.

On the first glance, such well-hidden failures may look as a non-issue, but it is not the case, because such failures can be (a) exploited and (b) proven after the fact.

For example, your players may notice the problem, will start writing about it (and even may start exploiting it - we'll see an example of it later). Or, some day somebody may gather stats and prove that all the time your RNG wasn't fair, and was favouring some playing style. Effects of these things surely depend on specifics of your game, but for some games it can easily be devastating, as we'll see below. 

Three RNG Spectacular Failures

BB_emotion_0005b.png Just to describe what kind of problems a poorly implemented RNG can easily cause, I will provide three real-world examples

Just to describe what kind of problems a poorly implemented RNG can easily cause, I will provide three real-world examples. The first two are not directly related to games, but they are still very characteristic ones to show the scale of potential problems caused by poor RNGs. The third example is a real-world horror story of the game falling victim of its RNG.

Two Big Fat Problems with Crypto RNGs

Spectacular Failure #1. In 2006, Debian developer has made a change to RNG initialization code.2 The change effectively reduced amount of RNG seed material drastically, which in turn has lead to lots of the keys generated by Debian, being insecure. The problem was not discovered until 2008 (illustrating my point above about "sitting there for years without noticing it"), and has caused a whole lot of security problems worldwide. When your keys generated two years ago become vulnerable all of a sudden, it means that all your supposedly secure communications over last two years can be read by the attacker. Ouch!

Spectacular Failure #2 is a more recent one. In 2013, it has been found that Java class SecureRandom as it was implemented on Android, was not exactly secure and often generated the same “randoms” (much more often than with probability of 2^-n which we should expect).  This, in turn, has caused a vulnerability which allows to extract private keys used to sign Bitcoin wallets (see, for example, StackExchange for explanation). From non-technical point of view – this bug in RNG allowed to steal all the money in Bitcoin wallets… Double Ouch! 

While I have a rather strong opinion on “who’s the one to blame” in this disaster, I don’t want to go into this discussion here

Numerous Problems in a Real-World Game-Critical RNG

Spectacular Failure #3 is probably the most interesting one for several reasons. First of all, it is not about cryptography and is directly applied to games. In addition, it illustrates several very typical bugs in custom-made random generators.

This spectacular failure is all about the RNG which was used by Planet Poker, the company which used to run real-money(!) games at the time; while it happened around 2000, the bugs made there are still very typical these days. The story is very-well described in nauseating detail in the original article Cigital, so I will just provide a short list of Big Fat Mistakes Planet Poker gamedevs have made with their RNG:

  • First and foremost, they used a pseudo-random generator initialized ONLY WITH SYSTEM TIME (they used the one with a millisecond precision).
    • Moreover, they were doing it FOR EACH AND EVERY HAND(!!)
      • BB_emotion_0003b.png attackers were able to know the cards of their opponents at the table (!!!)
        These two mistakes has made their RNGs vulnerable to a simple attack, demonstrated in Cigital. In short – attackers were able to know the cards of their opponents at the table (!!!). The idea behind the attack was simple - as soon as we know when the RNG is seeded, and can synchronise our clock with the server clock, we have just a few hundreds of thousands of potential seeds. Then, when we see three cards on the flop (and have our own 2 cards too), we can run these few hundreds of thousands of potential seeds against the RNG, and to find the seed which would generate this flop and our own cards too. This matching seed defines all the cards of the opponents.
    • They were using a pseudo-RNG with a merely 32-bit internal state to produce 52-card decks of cards. However, a properly shuffled 52-card deck can have 52! ~= 2^226 different outcomes. And as pseudo-RNG with a 32-bit state can possibly have only 2^32 different patterns, it means that they would never get a vast majority of the potentially available hands (!)
    • They had a flawed shuffling algorithm which has caused a flaw in statistical shuffling distribution (see Cigital for explanation)
    • And they had an off-by-one error (a silly one, but it has also caused a skew in statistical distribution, see Cigital).
    • One thing AFAIR not mentioned in Cigital but still affecting distribution: when you have N bits, obtaining a random number in range of 0 to M where M is not 2^N -1, via taking a reminder, the random number is inherently skewed (!) - just because the number of states-you're-converting (which is 2^N), is not exactly even for all the outcomes. Though admittedly it has MUCH milder effect than any of the flaws mentioned above, it is still a flaw.
    • BB_emotion_0025b.png they DID publish their shuffling algorithm without making a closed review of it first
      Last but not least – they DID publish their shuffling algorithm without making a closed review of it first. Yes, I know I will be beaten hard for this statement (and yes, unless they’ve published the algorithm, they would still run that flawed software), but TBH – for that company it would be MUCH better not to publish the algorithm at all, then to publish it and get all the embarrassment, loss of players, etc. etc. etc.
      • On the other hand, it IS important to review your RNG algorithms, and MAYBE even to publish them. However, before pushing the button “upload code to your website”, you need to make sure that your RNG is really good. And you won’t be able to check it yourself (even if you’re a security pro) – just because it is YOUR code, it always feels better than it should; in other words, you “know” how it works instead of scrutinizing it. That’s why it is paramount to have a THIRD-party review. Ideally – by a different company specializing in security, though you MIGHT be able to get away with a completely different team within your own company.

BTW, if your game calls for a serious RNG, make sure to read Cigital – it provides a Very Good Lesson on “how NOT to implement your RNGs”. 

What Qualifies as a Good RNG

juryisout now as we’ve seen what a good RNG is certainly not, we can start thinking about what a good RNG is.


Ok, now as we’ve seen what a good RNG is certainly not, we can start thinking about what a good RNG is. Well, unfortunately, you cannot really prove that RNG is good;3 the only thing you can really do, is to prove that RNG is bad.In short – to perform a black-box test of your RNG, the best thing you can do is to run a series of tests looking for certain patterns within your output. 

3 technically, for some of the PRNGs you can kinda-prove it (see, for example, Blum-Blum-Shub generator, for which there is a kinda-proof – that is, there is a strict proof under an assumption that certain math problem is non-solvable in reasonable time, which is itself not proven, and is not likely to be proven). However, even if there would be a really strict proof that certain PRNG is really good, we’ll still be stuck with a question “how to seed this PRNG”, which leads us into an inherently non-provable field. 

Testing bit stream – Marsaglia DieHard, NIST, and TestU01

In particular, at some stage most of RNGs out there produce supposedly completely random bit stream one way or another (even if you have an RNG which generates numbers rather than bit stream, it is usually trivial to make it generate bit stream instead – just request integers which are uniformly distributed from 0 to 2^N-1 inclusive).

arguingfor You should run BOTH of them if you’re using your RNG in an anywhere serious manner

As soon as you have your supposedly random bit stream (a looooong one – usually in millions of bits), there is a way to get it tested for certain patterns to be present. In fact, there are two sets of tests which can indicate any problems with your bit stream. Three most useful of these tests are Marsaglia.Diehard,  NIST, and TestU01. You should run AT LEAST TWO of them (better - all three) if you’re using your RNG in an anywhere serious manner (i.e. if it can affect the fairness of your games – or, if we put it in a different perspective – if your players are complaining about your RNG). Note though that interpreting results of these tests can be rather tricky (and may require certain understanding of stuff such as statistically expected outcomes and p-values). 

Testing beyond bit stream – Chi Square

As we’ve seen above on the example of Planet Poker, it is very easy to make a mistake in converting your bit stream into the game events (these guys has managed to make four(!) separate mistakes on that way). In turn, it means that

even after your bit stream is good, your game events may still be skewed
pretty badly

As a result, for RNG-critical games, I strongly insist on perform testing on game-level data too (i.e. after bit stream is converted to whatever-game-level-events you have – ranging from a shuffled deck to sequences of critical-hit decisions).

The simplest form of such analysis does not require a math degree, and can be done using good ol’ chi square goodness-of-fit test Wikipedia.Chi.Square.Goodness.of.Fit. Very briefly, the process, when applied to games and RNGs, usually goes as follows:

  • We’re gathering stats of our game-level events (ideally – both before production, and while our game is running in production). For example, if, like Planet, we have a game of poker, we can record all the decks dealt.
  • dreaming We’re making a hypothesis that distribution of our game-level events is as it should be according to rules of our game.
    We’re making a hypothesis that distribution of our game-level events is as it should be according to rules of our game. For example, we’re making a hypothesis that for the 1st card dealt, probability of all the 52 cards is the same.4 To test this hypothesis, we calculate “how many times each of the cards was dealt as the 1st card of the deck”. In other words, we’re throwing our statistical events into different “bins”, and calculating the number of occurrences within of each “bin”. These numbers will be close, but almost-never will be exactly the same (neither they should). The role of chi-square test is to find out whether the observed deviations from completely-flat distribution are statistically acceptable or not.
  • For each card (from aces to deuces), we’re calculating a value of x = (O-E)^2/ E, where O is an observed value of number of occurrences for this card, and E – is an expected value (which, under our hypothesis of flat distribution, is the same for all the cards, and equal to the total-number-of-cards-dealt / 52). Then, we calculate the sum of all x, which becomes the chi-square value for our experiment.
  • Then we take a look at the table such as Chi.Square.Critical.Values, and look for a row with an appropriate number of bins “Degrees of freedom” a.k.a. ν (for equal distribution of poker cards, we’ll be looking for 51 “Degrees of freedom”, as a number of bins = 52 minus 1, as we’re calculating one parameter – the average – from the data itself).
  • Therefore, if we find that for our test of equal card distribution, if our calculated chi-square value is below 38.6 – then “Probability less than the critical value” is less than 10%, i.e. our hypothesis that everything is fine, stands with probability of 90%. And if our calculated chi-square value is above 65.4 – then “Probability less than the critical value” is over 90%, i.e. our hypothesis that everything is fine, stands with probability of less than 10%☹️.
    • surprised In science, the “magic number” to challenge hypothesis is accepted to be 5%
      In science, the “magic number” to challenge hypothesis is accepted to be 5%. In other words, to challenge the hypothesis about equal distribution of the cards, your chi-square value should be above 69.8 in at least one of the bins. However, in practice, as soon as you get close to 10% of your hypothesis being correct (i.e. to 90% “probability less than critical value” number), I strongly suggest to make another experiment analyzing more decks, to see whether you’re getting closer to the dangerous zone or not. If, while adding more decks to your experiment, the chi-square tends to lower – it means that you’ve experienced a fluke in your first experiment, and everything is fine. OTOH, if , while adding more decks to your experiment, the chi-square tends to raise – it can easily indicate a Big Fat Problem with your algorithms🙁 .

Of course, it is by far NOT the only statistical test which is possible to run, but it is the one which can (and IMO SHOULD) be performed yourself. However, for Really RNG-Critical Environments I strongly suggest to hire a company which will be able to perform a much more sophisticated analysis for you. 

4 Note that while for the game of Holdem used on Planet, equal-distribution hypothesis should stand for all the pocket cards, for cards-on-the-flop and further, equal distribution hypothesis is inherently invalid, as it becomes conditional on player actions, which can easily skew the distribution at least in some cases(!)

 

Obtaining random bit stream: On PRNGs and hardware RNGs

Ok, we’ve more or less defined “what is good RNG”, so we can start discussing “how to implement these good RNGs”.

In general, there are two very different types of RNG: the first one is PRNG (pseudo-RNG), and the second one is “true” RNG.

PRNGs

BB_emotion_0007b.pngOf course, if somebody can learn internal state, they can immediately get all the bits coming from PRNG, so at least some parts of the internal state need to be kept secret.

PRNG is essentially an algorithm, which keeps some internal state, and generates numbers which look “as if” they’re completely random, though the numbers itself are derived from the internal state in a perfectly deterministic manner. Of course, if somebody can learn internal state, they can immediately get all the bits coming from PRNG, so at least some parts of the internal state need to be kept secret.

Linear Congruential Generator and Mersenne Twister: Reversibility is a Potentially Mortal Sin

Mersenne Twister
The Mersenne Twister is a pseudorandom number generator... It is by far the most widely used general-purpose PRNG.
— Wikipedia —

There are quite a few different PRNGs out there. The simplest one – linear congruential generator (which can be written as a one-liner, see Wikipedia.LCG) – fell out of favour quite a while ago, as the one exhibiting certain statistical patterns. There is a replacement for it (and a very popular one in non-crypto part of the academy) – it is Mersenne Twister.

I won’t say anything bad about Mersenne Twister as long as it is used for Monte-Carlo simulations and any other similar applications, but I certainly do not really like it for games; in particular, as a non-crypto generator, it it may easily be reversible. And as soon as your RNG is reversible, one may reconstruct the state of your RNG from its outputs, with potentially devastating-for-your-game results. Even if you're using your RNG “just” to calculate a chance of critical hit, Mersenne Twister is potentially dangerous ☹️, and if you're using it to determine AI actions in an area which is supposed to be covered by fog-of-war at least for some players - you may be easily granting a Game-Critical Advantage to somebody who managed to reverse your RNG (see more discussion on implications of AI and fog-of-war below). In short -

Especially as long as there are much better and easily available
alternatives - DON'T use Mersenne Twister for games (see a potential
exception below though)

Yes, there are situations where you might be able to get away with Mersenne Twister, but well - analysis whether it can be abused or not, is rarely worth the trouble of discussing it. It is just simpler to use something like "Poor Man's Crypto PRNG" discussed below.

Exception: Potential Case for non-crypto-secure RNGs

One potential use case (I've neither seen nor heard of such things in real world, but well - I certainly didn't see everything) for non-crypto-PRNGs MIGHT theoretically arise in games if you're running something like MMORTS with hundreds of thousands of dice rolls per network tick. In such cases, you MAY indeed need something faster than AES-CTR (see "Poor Man's Crypto PRNG" below). If you ever find yourself in such a situation, I suggest to play the following game:

  • Run AES-CTR once per network tick (or frame)
  • Take its output to seed Mersenne Twister (or some other non-crypto PRNG, more on them below)
  • Use Mersenne Twister throughout the network tick/frame
  • Throw ALL the state of Mersenne Twister away at the end of the network tick/frame
  • While you're at it - make sure that the number of different states provided by your AES-CTR (for AES256 with a randomly initialized counter, it is 2^384) is MUCH more than the number of different states you hope to generate from your Mersenne Twister. In other words, trying to get some 100-card deck perfectly shuffled from an MT19997, which has been initialised from a single AES-CTR, is probably a pretty bad idea :-(. If you run into problems here - you can run several AES-CTRs to ensure that all the different states you need, can be indeed generated while using this trickery.

This way, impact of predictability of Mersenne Twister will be minimised (as its potentially predictable state is NOT carried across network ticks, and any inter-tick interaction, while still being a PRNG, is protected by crypto-level AES-CTR). On the other hand, even if we're staying within the same tick, if our game involves some fog-of-war (and/or walls), it MAY be possible to get some information out of it; for example, it MIGHT be possible to reconstruct the state of PRNG, and to reconstruct actions of AI in areas covered by fog-of-war 🙁. In such a case, the best thing would be to revert back to crypto PRNG. And if using of the crypto PRNG is not possible for performance reasons - it MIGHT be a good idea to use several different non-crypto-PRNGs for different characters of AI. For example, if we split the whole map into 100x100 zones, with a separate non-crypto-PRNG-per-tick-per-zone, then these effects will be much smaller, and if size of zone being MUCH smaller than area of the map visible anyway - the effects will be pretty much negligible. Alternatively, we MAY need to take a look at not-so-obviously-reversible non-crypto PRNGs (see below).

Other non-crypto-secure RNGs

IF (and I insist it is still a Big If) you need a non-crypto PRNG to make things faster (and once again - I DO insist that you SHOULD NOT use it as a single PRNG, but rather reinitialise it completely on each "network tick"/frame from a crypto-secure PRNG), you have more than one alternative.

Besides Mersenne Twister, there are quite a few other non-crypto-safe RNGs of different popularity and quality. Out of these, I'd suggest to keep away from linear congruential stuff, and to consider one of the following:

  • xorshift family of algorithms, especially xorshift+. Has good statistical properties, BUT is trivially reversible (just from one single output(!)), so if "fog-of-war"-like attacks described above are of any concern, I strongly suggest to avoid it.
  • aforementioned Mersenne Twister. Has reasonably good statistical properties. Reversible; restoring state of popular MT19937 requires 624 outputs.
  • PCG Has good statistical properties. In addition, while I am NOT buying the argument by its creator that it qualifies as a middle ground between non-crypto and crypto PRNGs in terms of reversibility (at least until it is subjected to security scrutiny by a Big Portion of crypto community), PCG DOES qualify as a good non-crypto PRNG, AND it IS more difficult to reverse than both xorshift and Mersenne Twister. If I am ever forced to use non-crypto PRNG, PCG is the one I am likely to pick.

Blum-Blum-Shub

Blum Blum Shub
Blum Blum Shub (B.B.S.) is a pseudorandom number generator proposed in 1986 by Lenore Blum, Manuel Blum and Michael Shub that is derived from Michael O. Rabin's oblivious transfer mapping.
— Wikipedia —

Coming from a very-different-than-Mersenne-Twister (but still very academical) field, there is a Blum-Blum-Shub PRNG. It is well-known as one of those algorithms which have a proof attached; note though that the proof itself hinges on the unproven intractability of a so-called quadratic residuosity problem. Of course, most likely this problem is indeed intractable. On the other hand, most likely AES128 is not breakable during the time while our game is being played. As a result, I don't consider the Blum-Blum-Shub to be any better than any of the crypto algorithms which are currently considered secure (even though there is usually no nice theorem attached to the latter).

In other words – I do NOT think that the “proof” behind the Blum-Blum-Shub qualifies as a sufficient reason to use it. Yes, it would be fine from reversibility perspective, but not more than that. And taking into account how slow Blum-Blum-Shub is – we can usually throw it out just because of poor performance.

“Poor Man’s” Crypto PRNG - AES-CTR

As a result, I generally suggest to use the following construct for a PRNG (often referred to as AES-CTR; for formal/more complete/more secure description, look, for example, for CTR_DRBG in SP800-90A):5

  • State of our PRNG consists of two fields: key (such as an AES128 key) and counter (such as a 128-bit counter)
  • PRNG is seeded with the key (needs to be of Crypto-Random Quality (!), see below); at the same point counter is initialized with either a different(!) random value, or a fixed one
  • to get next 16 bytes of random bit stream, we simply encrypt the counter with AES128 using the key as the crypto key (if somebody asks you about the mode - in ECB (sic!) mode).

This description is pretty much the same as saying that you're simply encrypting a stream of zero bytes (i.e. plaintext for encryption ALWAYS consisting of zero bytes) using your block cipher in CTR mode (that's why the name of AES-CTR). To achieve effect of initial counter being random, you may use random CTR's IV/nonce.

BTW, of course, you can use your favorite block cipher instread of AES128 too. Alternatively, you may use your favourite stream cipher (such as Chacha20) directly (without CTR). NB: On recent x64 CPUs by Intel AES is generally faster than Chacha20 (at least in part, due to AES-NI instructions).

thumbup On modern x86 CPUs, single core can generate 150M+ random bytes/second this way (and this is a Damn Lot).

First, let’s note that this construct is pretty fast. On modern x86 CPUs, single core can generate 150M+ random bytes/second this way (and this is a Damn Lot for most of the games; however, if you happen to need more - you MAY need a non-crypto PRNG, see discussion in "Exception: Potential Case for non-crypto-secure RNGs" section above).

Now let’s discuss the security (very sketchy, and only at intuitive level; much more formal reasoning does exist, but is way too far beyond the scope of this book). The construct is secure (at least as secure as AES128 in CTR mode provided that known-plaintext attacks are allowed) as essentially it is a bit stream which is used to XOR our data when we run AES128 in a CTR mode. As a result, any ability to predict it given the history or any chance of statistical skew, would make original AES-in-CTR-mode non-secure. So, as long as we consider AES-CTR encryption to be secure - so we should consider AES-CTR RNG.

Good stuff aside, let's discuss limitations of the "Poor Man's PRNG". For crypto-purposes, such an AES-CTR-based thing can generate up to 2^64 random blocks before it requires re-seeding (mechanics of this limitation is rather complicated, but is related to birthday paradox). With 150M+ random bytes/second I've mentioned above, it means that we could run our PRNG continuously on one single core for 2^64 blocks * 16 bytes/block / 150Mbytes/second / 86400 seconds/day / 365 days/year ~= 62394 years. Good enough for a vast majority of the games out there even without reseeding.

One potential issue with such a “Poor Man’s PRNG” is that it’s state is relatively limited. In particular, if trying to use it to deal decks of 52 cards, and initializing counter with a predefined value, we will be able to get only 2^128 different combinations,6 while we need at least 52! ~= 2^226. Therefore, if using such a “Poor Man’s PRNG” for card shuffling purposes, we need at least to upgrade it from AES128 to AES256, and preferably also to seed 128-bit counter too (making the whole space of reachable states 2^384, which provides a not-so-bad reserve over 2^226 we need). NB: these calculations are given just to illustrate the problem; in practice, for such RNG-critical  tasks I do NOT recommend “Poor Man’s Crypto”, see "Recommendations for Bit Stream" section below.

BB_emotion_0007b.pngStill, I’d rather not use “Poor Man’s” Crypto for Really Crypto-Sensitive Stuff, such as key generation. Also, we still need something to seed it too.

Still, I’d rather not use “Poor Man’s” Crypto for Really Crypto-Sensitive Stuff, such as key generation. Also, we still need something to seed it too. Which leads us to the much more murky field of “true” RNGs. 

5 note that as discussed below, any PRNG, including this one, needs to be properly seeded, so PRNG itself is not enough

6 actually, a little bit more than that, but this “little bit” is not likely to be worth more than ~30-40 bits in practice, as space of counter values is going to be limited

Determinism of PRNGs

It should be noted that with PRNGs, their output, while appearing completely random, is in fact perfectly deterministic (and yes, it stands for ALL PRNGs, however secure they are). This observation has some interesting implications:

  • you CAN use PRNG for reproducible testing (as long as you seed it with the same value)
    • as a result, you DON'T need to write PRNG's inputs and outputs into inputs-log when implementing deterministic Reactors (those discussed in Chapter V).

“True” RNGs

assertive2Even if we have the best-in-the-world PRNG, its output is still only as good as its seed.

Even if we have the best-in-the-world PRNG, its output is still only as good as its seed. For example, even if the guys from Planet Poker would use the-best-possible-PRNG, but would still seed it with time-with-millisecond-precision, they would still be easily breakable (to the extent that cards of the other players at the table can be predicted(!)).

To seed a PRNG, we DO need some “true” entropy. For this, we need some “true” RNG. Or from a different perspective: if our system as a whole is perfectly deterministic, we will have absolutely-nothing to get randomicity from.

“Physics-based” Hardware RNGs

One family of “true” RNGs is based on physical effects which are essentially random (such as Zener noise, thermal noise, and so on).

Unfortunately, good physics-based RNGs are difficult to find, and quite a bit of the stuff you can Google, is garbage 🙁. Here goes a very short guide for selecting your hardware RNG (that is, if you need it, which is NOT always the case):

  • Check whether you’re ok with physics-based Hardware RNG's advertised bandwidth and price.
  • Check whether the vendor claims that they run NIST, DieHard, and/or TestU01 tests (BTW, two are better than one, and three are better than two). And while we’re at it – any kind of FIPS-140 tests (unless they’re run in real-time to detect sudden failures) is pretty much useless for any kind of serious RNG testing.
  • Get their unit and run AT LEAST TWO of {NIST,Diehard,TestU01} tests yourself, in your lab.
    • There is a chance that you'll need to throw the RNG away at this point 🙁
  • Keep running the tests on random samples of their bit stream while you’re in production
BB_emotion_0023b.pngthe idea of throwing a single photon and detecting where it got reflected (which is a process guaranteed to be truly random by quantum mechanics), is IMHO fascinating

One physics-based RNG which I personally had Very Good experience with, is the one manufactured by ID Quantique (and while we're at it - no, I am not affiliated with them in any manner). Besides from their Quantis true RNGs working well in production and passing all the tests we were able to throw at it (including both Die-Hard and NIST, though we didn't try TestU01 at that time), the whole idea of throwing a single photon and detecting where it got reflected (which is a process guaranteed to be truly random by quantum mechanics), is IMHO fascinating. Moreover, due to independence of such single-photon experiments, such construct allows to avoid spectrum-related interdependencies (which are MUCH more difficult to deal with in other true RNGs).

Ongoing re-seeding, Fortuna and /dev/urandom

“Physics-based” RNG MAY look as “just the ticket” (and, if done properly, they are for some kinds of games), but in practice for many applications (many games included), they’re NOT strictly required.

pointingoutin addition to the usual PRNG functionality, these are specially designed for on-going re-seeding

There is an interesting family of PRNGs out there, represented, in particular, by Yarrow and later Wikipedia.Fortuna. While these are technically PRNGs (or more accurately - CSPRNGs as in "Crypto-Secure Pseudo-Random Generators"), in addition to the usual PRNG functionality, these are specially designed for on-going re-seeding. In other words - they DO accept all the entropy you can throw at them, and change their state accordingly.7 Moreover, if such a PRNG-with-ongoing-reseeding is integrated into an operating system, operating system will have a LOT of entropy to throw at it: information from interrupt handlers, disk I/O, and user input – everything can be used as a source of randomness… This will make such a PRNG non-predictable pretty soon even if its state has leaked once by some black magic(!). Oh, and as soon as the machine has run for long enough (which is usually in "< 1second" range) – such a PRNG-with-ongoing-reseeding-from-OS-events has enough state to be indistinguishable from True RNG.

However, as usual, there are certain considerations:

  • Out of all such PRNGs-with-ongoing-re-seeding, I strongly prefer Fortuna by Bruce Schneier
    • Yes, it is the one used by FreeBSD for it’s /dev/urandom
  • /dev/urandom in Linux uses a simpler Crypto-Secure PRNG than Fortuna, but it is criticized for weaknesses DodisEtAlBernstein
    • OTOH, unless your stuff is Really Critical, it is usually fine to use /dev/urandom
  • CryptGenRandom() by MS has also been criticised Dorrendorf
    • Same as with Linux and /dev/urandom, IMHO it is fine except for most-RNG-critical applications
  • If using /dev/urandom etc. directly, they MAY be rather slow (performance being system-dependent)
    • In this case, using their output as an input for "Poor Man's PRNG" described above, usually helps
  • If using /dev/urandom etc. directly while trying to use deterministic goodies from Reactors (as discussed in Chapter V), you DO need to record their data as inputs for your Reactor, which may additionally hurt performance further under certain scenarios

 7 Yes, at least for these PRNGs “re-seeding” does NOT mean “throw away all the state and start anew”, but rather means “mix existing state with a new state”

Recommendations for Bit Stream

Here goes my personal list of recommendations for bit stream RNGs:

  • For not-so-RNG-critical games:
    • DO use /dev/urandom to seed your PRNGs, to generate long-term keys etc. Under Windows, DO use CryptGenRandom()
      • If you run into performance issues - DO use “Poor Man’s PRNG” (seeded from /dev/urandom); this should satisfy all your RNG needs with a very healthy reserve
        • If you run into performance issues even with "Poor Man's PRNG" - I can safely bet that you're using TONS of random bits per network tick (or per frame). Then, use "Poor Man's PRNG" to initialise simple non-crypto PRNG (such as PCG) once per network tick/frame, and then use non-crypto PRNG throughout the tick, without carrying any of non-crypto PRNG state to the next tick. This will give you another boost in RNG performance, while minimizing any potential impact of the reversibility of non-crypto PRNG.
          • OTOH, if there is "fog-of-war" involved, this type of trickery MAY allow for nasty attacks which MAY partially lift "fog-of-war". To avoid them, the best is to use a crypto-PRNG, but other than that - splitting your map in smaller independent zones, or picking a non-crypto-PRNG which is more difficult to reverse, MAY help a bit.
    • judgingDO run your Marsaglia and NIST tests on your bit stream, and Chi-Square analysis at your game event level before deployment, and once in a while
      DO run at least two of {Marsaglia,NIST,TestU01} tests on your bit stream, and Chi-Square analysis at your game event level before deployment, and once in a while
  • For Really-RNG-Critical games (think lotteries, casinos, decisions defined as “chosen randomly” in stock exchanges, etc.):
    • DO use multiple sources of entropy.
      • This includes BOTH /dev/urandom AND at least one independent Hardware RNG
        • using more than one independent Hardware RNG is recommended
      • If your Hardware RNG is NOT used by /dev/urandom, use simple XOR (byte-for-byte) between output from /dev/urandom and output from your Hardware RNG(s); I prefer this configuration to the second one below.
      • If your HW RNG is already used by /dev/urandom, DO NOT mix output from your Hardware RNG with output from your /dev/urandom again; mixing things twice may easily be devastating for the quality and security of your bit stream(!)
      • DO consider running your own crypto-safe PRNG (such as Fortuna), using a completely independent seed, and XOR-ing its output with outputs of all the other streams mentioned above. Doing it this way would prevent several RNG disasters (including, for example, Debian-related one). The key question here is "where to obtain that completely independent seed". I would try to access several sites-providing-supposedly-true-random-data (with random.org, HotBits, and qrng.anu.edu.au coming to mind immediately), mix them (either via XOR-ing, or via hashing), and use the result to initialise this your own crypto-safe PRNG (NB: you may either to do it on each restart, or may do it offline and initialise Fortuna's seed file, just using it on subsequent restarts). In addition, you MAY also do ongoing re-seeding of your own PRNG such as Fortuna with your own Client-Side data (such as resulting from user inputs, but NOT coming from the server itself, as it may violate independence requirements). It should be noted that I tend to consider such your own PRNG as a kind of "safety net", which is XOR-ed with the others all the time (and as long as it is independent and simply XOR-ed, it cannot possibly get things worse), but which start being important only when/if everything else fails (such as in a doomsday scenario of all HW generators failing and /dev/urandom being improperly initialised such as it has happened with Debian).
    • DO run simple tests (such as monobit/poker/runs/long-runs which were crossed out from FIPS140-2, see FIPS140-2) over each of your HW generator outputs in real-time while they are running. Hardware RNGs DO fail accidentally, and knowing about it as soon as possible, IS important.
    • DO hire a company (ideally - one of those certifying casino RNG stuff) to perform analysis of your RNG. DO it again after ANY change you make to your RNG.
    • DO run Marsaglia and NIST and TestU01 on samples from your bitstream, and Chi-Square analysis at your game event level, on daily basis. Libraries you're relying on, DO change (and not always to the good 🙁), hardware generators DO fail, and so on... And running with a faulty RNG in a Really-RNG-Critical environment is tantamount to a Mortgage-Crisis-Size disaster 🙁🙁 .

From Bit Stream to Real World

BB_emotion_0011b.pngIn addition to the perfect bit stream, we also need a way to convert it to our precious randomised game events

Ok, we’ve got our perfectly random bitstream. However, as we can see from the mistakes made by Planet Poker devs, it is certainly not the end of the story. In addition to the perfect bit stream, we also need a way to convert it to our precious randomised game events (answering questions ranging from "what is the next card in the deck?" to "whether this hit should be a critical hit?").

As discussing ALL the random things you may want to do, would take too long, I’ll take a closer look only at two most common primitives (which are still subject to Big Fat Bugs 🙁 ).

Bit Stream to Random Number in Range

One very common operation requested from RNG, is “give me a random integer from 0 to N” (to be specific, let’s assume it is ‘N inclusive’). The very common way to handle it, is just to get next 32 bits from the bit stream, and to use something like

//WARNING: BIAS AHEAD!
int my_random(uint32_t randBits, int to) {
// Converts 32 random bits into a number from 0 to 'to' inclusive
return randBits % (to + 1);
}

Unfortunately, even for small values of to, there is a small but potentially nasty problem with this code: if to is not 2^N-1, returned value is NOT exactly evenly distributed. This happens just because 2^32 cannot be evenly divided into number-of-buckets which is not 2^N 🙁 . Sure, for small values of to the bias will be very small, but for Really RNG-Critical applications, this DOES qualify as an unacceptable bias 🙁 .

To deal with it (that is, for Really RNG-Critical stuff), I usually prefer to use the following bullet-proof algorithm:

  • Take a number-of-bits N (which is enough-to-represent-to-parameter-plus-one), from bit stream
    • For example, if to is 51, we need to take N=6 bits
  • Convert these bits into a number X (it will be from 0 to 2^N-1)
  • If X is less than or equal to to – we’ve got our random value
  • If X is more than to – we throw away all those N bits of random data, and go back to square one
arguingforgiven that bugs in this kind of code can be Very Difficult to find - I STRONGLY prefer to Keep It Simple and just throw bits away

Sure, it is not the most optimal way (and it DOES waste quite a few random bits unnecessarily – usually around 25%, but if your to is like 2^N, it may be as high as 50%); however, it has exactly zero bias, AND it is simple enough so that it is rather difficult to make a mistake while implementing it. If you're really concerned about those wasted bits of randomicity (which you usually shouldn't, as generation is extremely fast for AES-CTR-like stuff) - it IS possible to reuse them (see, for example, CheckMyWorking), but the whole thing will become MUCH more complicated (and error-prone too); given that bugs in this kind of code can be Very Difficult to find - I STRONGLY prefer to Keep It Simple and just throw bits away.

Shuffling

Another algorithm which is easy to mis-implement, is shuffling. As we’ve seen above, it IS quite easy to make a mistake in shuffling. And there ARE ways to implement shuffling properly (see, for example, Wikipedia.Fisher-Yates.Shuffle, look for Durstenfeld variation there).

BB_emotionM_0027b.png developers with absolutely no clue about the subject area are allowed to 'optimise' Major Standard Libraries

It is fairly simple, but it is still not too difficult to make a mistake (at least Planet Poker devs managed to do it). On the other hand, from my experience in development, I've learned to distrust those third-party implementations which are written by NON-experts in the field. This, in particular, applies to std:: libraries when it comes to math-related stuff8 In one of the versions of an std:: library (shipped to millions of developers and compiled into Who Knows how many programs), I've seen an implementation of std::hash() which was "optimised" to split the long string into 16 or so parts of the equal length, and to calculate "hash" only from bytes around these split points (effectively ignoring most of the string when calculating such "hash"). Yes, this "optimisation" is long gone now, but it was out there for 2 years, and illustrates that developers with absolutely no clue about the subject area are allowed to "optimise" Major Standard Libraries🙁🙁 . In turn, this raises a Pretty Bad Question: how can I be 100% sure that such an enterprising-and-optimizing-developer won't "optimise" shuffle in some pretty-bad-but-not-immediately-visible way? Reviewing 3rd-party code after each update is even worse than writing it ourselves🙁 .

As a result, here we’re facing a strange and unpleasant dilemma between writing shuffle ourselves and relying on existing stuff such as std::random_shuffle(). With this regard, my personal position (given certain things which I’ve seen in certain std:: implementations), goes as follows (and yes, I know that once again I will be beaten Really Hard for saying it):

  • For the not-so-RNG-critical stuff, it is ok to use third-party library by non-experts such as std::
  • However, for really-RNG-critical games, I’d rather take it from the library which specialises in random stuff (I don't know such a library though), or implement it myself (and BOTH review AND test it thoroughly). I am in particular afraid of scenarios when “newer better” version of the third-party library gets broken but it goes unnoticed for a while because the results are not obviously broken, but rather skewed instead. I certainly do NOT want to fall victim of such an “optimization” for a Really Critical part of the system.

 8 and BTW, an aforementioned bug in java.SecureRandom is another example of this kind of problems

Chi-Square Testing

Last but not least about the algorithms which convert bit streams into game-levels events (such as obtaining random number from the bit stream, or shuffling):

Make sure to run Chi-Square Testing (as described above) on your
game-level events

While these tests are NOT guaranteed to catch ALL the bugs, they will catch at least most egregious biases (including some of the bugs made by Planet Poker). Without this, I won't be comfortable to release any kind of the RNG for use even in the not-so-RNG-critical environments.

Dealing with Perception Problems

Ok, you've spent a lot of time and made your perfect RNG, which passes all the tests, code reviews, and third-party audits. Let's even assume that it is indeed perfect. However, it is not the end of the RNG story; to the contrary, in many RNG-critical cases it may be just a beginning.

BB_emotionM_0030b.pngthey will invent all kinds of the conspiracy theories why you intentionally biasing your RNG to make money

As noted above, players will complain about biased RNG even if your RNG is perfect. Even worse, they will invent all kinds of the conspiracy theories why you intentionally biasing your RNG to make money (see, for example, Blizzard.ConspiracyTheory using a poetic wording of "the 'invisible' hand messing around with their decks"). As noted above, this is largely related to the phenomenon of remembering-only-bad-stuff 🙁. But we cannot change this, so what we can do to deal with those disgruntled players and conspiracy theories?

Once upon a time, I've seen a company which took these things very seriously. And they were very interested in countering those conspiracy theories. They even considered hiring a mathematician to disprove (statistically) the prevalent conspiracy theory. However, they quickly realised that while disproving the theory would take half a year, inventing a new one would take just another 10 minutes, so it would be a never-winning situation.

Then, they've done an IMHO brilliant thing: they've started saying to their disgruntled players:

Hey, we can give you all the history of all the random events for your
account, and you can analyse it to your heart's content. If you find a proof
of something bad going on here - you're Very Welcome to come back to
us. (BTW, we've already handed out histories to
players, and nobody has come back yet).

The brilliance of this solution is that it shifts "burden of proof" to the player. Company doesn't need to do prove that it is innocent (which is pretty much hopeless even if you ARE innocent, see above), and it is now responsibility of the player to prove that something fishy IS going on.

However, keep in mind that to play such games, your RNG DOES need to
be perfect

Otherwise, the damage can be Really Bad🙁.

Yet Another Real-World Horror Story

And to wrap this Chapter up, yet another real-world story to illustrate how elusive those randomicity issues are.

Once upon a time, there was another poker software company. And, as it is customary in these circles, they have had RNG.

thumbdownOn one not-so-bright day, there came a report that their tournaments have had exactly the same deck dealt on most of the first tables in the tournament.

On one not-so-bright day, there came a report that their tournaments have had exactly the same deck dealt on most of the first tables in the tournament. Most likely, they were using the seed initialized by time at the moment when the table was created; as the tables within the tournament are created at almost the same time – it meant that the first deck was (almost) the same.

That’s not the end of the story, however. After the flaw became a public knowledge, it has caused quite a stir among players, and as a result, their CEO has made a public statement along the lines of “from now on, we will take special measures to guarantee that starting hands on different tables of the tournament are never the same”.

So much for randomicity (NB: any "guarantees" on the hands "never" being the same introduce bias, as all the hands are supposed to be perfectly independent; moreover, this bias can be potentially used to get an unfair advantage - and degree of this advantage depends on how they implemented these "guarantees").

I rest my case.

[[TODO: Separating Game RNG from Transport-Security RNG]]

[[To Be Continued...

tired



This concludes beta Chapter 19 from the upcoming book "Development and Deployment of Multiplayer Online Games (from social games to MMOFPS, with social games in between)". Stay tuned for beta Chapter 15, where we'll start discussing basic security as it applies to games.]]

References

[FIPS140-2] "SECURITY REQUIREMENTS FOR CRYPTOGRAPHIC MODULES"

[ShackNews.OnBlizzard] "Blizzard Details Secret World of Warcraft 'Progressive Percentage' Item Drop Mechanic"

[Grant] Elyot Grant, "The Role of Luck: Why RNG isn't the answer"

[StackExchange] "Technical details of attack on Android bitcoin usage of SecureRandom"

[Cigital] Brad Arkin, Frank Hill, Scott Marks, Matt Schmid, Thomas John Walls, and Gary McGraw, "How we Learned to Cheat in Online Poker: A Study in Software Security"

[Marsaglia.Diehard] "The Marsaglia Random Number CDROM including the Diehard Battery of Tests of Randomness"

[NIST] "Random Number Generation"

[Wikipedia.Chi.Square.Goodness.of.Fit] "Pearson's chi-squared test"

[Chi.Square.Critical.Values] "Critical Values of the Chi-Square Distribution"

[Wikipedia.LCG] "Linear congruential generator"

[SP800-90A] "Recommendation for Random Number Generation Using Deterministic Random Bit Generators"

[Wikipedia.Fortuna] "Fortuna (PRNG)"

[DodisEtAl] Yevgeniy Dodis, David Pointcheval, Sylvain Ruhault, Damien Vergnaud, and Daniel Wichs, "Security Analysis of Pseudo-Random Number Generators with Input: /dev/random is not Robust"

[Bernstein] D. J. Bernstein, "Entropy Attacks!"

[Dorrendorf] Leo Dorrendorf, "Cryptanalysis of the Random Number Generator of the Windows Operating System"

[CheckMyWorking] "Converting a stream of binary digits to a stream of base n digits"

[Wikipedia.Fisher-Yates.Shuffle] "Fisher-Yates Shuffle"

[Blizzard.ConspiracyTheory] "Blizzard stop abusing the game - RNG, algorithms etc"

[PCG] "PCG"

[xorshift] "Xorshift"

[TestU01] "TestU01"