Comments on the Avalanche Effect

August 10th, 2019

In a recent blog post “A fast 16-bit random number generator?” Daniel Lemire gives a proof-of-concept PRNG with 16-bits of state. The empirical search strategy used was estimating the “Avalanche Effect” of the cadidate parameter and choosing the best. In the comments someone questioned why using a hash measure is reasonable for PRNG. One purpose of the post is to answer that question. Additionally I’ll eventually get around to using some of this stuff.

Let’s forget all about hashing and say we have some bit mixing function $f$ which transforms a register sized integer value to a same sized integer value (not a requirement but KISS) and let’s just say 32-bit integers (for bit mixing bigger really is better…but that’s another story). Avalanche is vaguely defined as “small changes in input produce large changes in output”. Since that’s totally useless we define a couple of desirable properties (stated slightly differently from the common):

strict avalanche criterion (SAC): If we apply $f$ to an input $x$ and to $x$ with the $i^{\text{th}}$ bit flipped then output bits flip with a probability of 1/2.

So the code for computing the changes for the $i^{\text{th}}$ bit might look like this: bits = f(x) ^ f(x^(1<<i)).

bit independence criterion (BIC): If we apply $f$ to an input $x$ and to $x$ with the $i^{\text{th}}$ bit flipped then all other pairs of bit positions should flip independently in the output.

BIC is a stricter and harder measure to pass but after gathering counts (wait for it) proceeds like a SAC based test, so I’m going to ignore that. Also there are variants of SAC tests (such as multiple input bits flipping or nearby inputs) which I’m also going to (mostly) ignore. Changing a single bit has the most value since we’re looking how well our mixing function propagates a single bit change in input to all output bits.

ASIDE: I’m going to state that it’s highly desirable for our function $f$ to be a bijection or “every input maps to a unique output”. (I’m gonna keep ignoring different sized input/ouput). If $f$ not a bijection then for a given output there are two or more inputs which produce it this introduces a bias (both for SAC measures and pigeonhole principle flavored as well.

Now Lemire’s 16-bit generator does not use a bijection for the mix and the downside of this is the generator is biased. The upside is it performs amazingly well on a number of PRNG statistics (Yo, this is really hard with 16-bits of state). So the point of this aside is to remember Goodhart’s law and that desirable properties can be blown off (have weak measures) if we meet our overall goal(s).

Back on topic. SAC: output bits flip with probablity of 1/2. For PRNG we expect the output bits to be set with a probability of 1/2. So feeding a sequence of numbers into $f$ should produce a random number (and yeah there are a bunch of other tests we want to run to see how well it’s working). In fact many state-of-the-art random number generators do exactly this. PCGs generate a LCG and run through a mix and there are quite of few that use Weyl sequences (additive recurrence) followed by mix. A quick reasoning why this is popular is a PRNG state update needs to be a bijection that is full period (this is a hard requirement) and the period of the mixing function doesn’t matter at all. So we can use a crappy (in terms of randomness measures) state update and mix it with a very good mixing function.

The remainder of this is spitballing how to make tests out of the desired property. Let jump in a look at a stripped down version that Lemire used:

{
  uint32_t sum = 0;

  // brute-force walk all 16-bit input values
  for (uint32_t x=0; x<=0xffff; x++) {
  
    // hash of the input sample 'x'
    uint32_t h = f(x);
    
    // walk all single bit flips of 'x'
    for (uint32_t i=0; i<16; i++) {
      uint32_t pop = popcount(h ^ f(x ^ (1 << i))) - 16/2;
      pop  = (pop >= 0) ? pop : -pop;
      sum += pop;
    }
  }
  // sum is the "measure"
}

So he just walk all legal input, flips the bits, finds the changed bits, then:

computes the population count (aka Hamming weight)
subtracts 8. Why? Each bit should flip with probability of 1/2 so (on average) half the bits should flip at each step
takes the abs of the previous result
add this value to the running sum

Nice, short and simple estimation of SAC. Small values are good and big are bad.

Let’s run with a skeleton for 32-bit to 32-bit which is not so short:

  // array to hold the counts initialized to zero
  counts[32][32] = { { 0 } };

  // array to hold popcount counts
  popcount[33] = { 0 };

  // walk through 'n' inputs
  for(uint32_t i=0; i<n; i++) {
    uint32_t x = ...;                 // x = the ith input to test
    uint32_t h = f(x);                // hash of x
    
    // walk though flipping each bit of 'x'
    for (uint32_t j=0; j<32; j++) {
      uint32_t pop = 0;
      
      // compute the output bits which flipped
      uint32_t bits = h ^ f(x ^ (1<<j));

      // update count indiviually: 
      for (uint32_t k=0; k<32; k++) {
        uint32_t b = (bits >> k) & 1;
        pop          += b;
        counts[j][k] += b;
      }
      popcount[pop]++;
    }
  }

The outer structure is pretty much the same. Bullet-point time:

We’re not brute forcing all input and are instead sampling $n$ values. There are various sampling strategies that can be used, but we’re going to ignore that in this post.
The other change is that last loop. We’re instead walking individually through the bits in our “which ones flipped” variable bits and updating a “matrix” of counts in we which track how output bits flip for each changed input bit position. Additionally this skeleton computes the population count and tracks how many of each we see.

Once this process is done we need to compute a measure of how well we’re meeting the expected property. Let’s say we walk through each element of counts and divide by the number of samples $\left(n\right)$ we used. We’d have a number on $\left[0,1\right]$. Zero if the given input bit never flipped the given output bit and one if it always flipped. Since our expectation result is $\frac{1}{2}$ let’s subtract that and multiply by two to remap the range to $\left[-1,1\right]$ and call that value the bias. If we were to walk through all the bins of counts and track the maximum magnitude bias we could return that as our measure and we’ll call that the max bias (this is the $\ell^{\infty}$ or uniform/supremum/infinity/Chebyshev norm of the bias). This gives us the magnitude of worst case bias (just to state the obvious).

Something completely different we could do with our values in counts is to perform a goodness-of-fit test. The SAC property says that all the values should be uniformly distributed if the function is a good bit mixer.

Another thing we could do with the bias is directly visualize its output as a heat map. As a baseline let’s look at how an identity function would look:

uint32_t f(uint32_t h) { return h; }

The legend is off to the right with positive values in red, negative in blue and approaching zero is black. The identity function doesn’t mix anything so the only bits that flip are those from performing the test. Reading the figure we have from left-to-right least to most significant output bit and bottom-to-top least to most significant bit flipped on the input sample.

Now let’s look at the MurmurHash3 32-bit finalizer at each step. There are really very quite many hash functions which have an identical structure simply with different choices of constants.

uint32_t f(uint32_t h)
{
  h ^= h >> 16; h *= 0x85ebca6b;
  h ^= h >> 13; h *= 0xc2b2ae35;
  h ^= h >> 16;
  return h;
}

The figures are in the same order as the code: top-left is first xorshift, top-right the multiply, second row the next xorshift/multiply and last row the final xorshift. The samples are high-entropy (a PRNG source) and the number of samples ($n=1048575$) a middle ground choice (enough that the bias is clear but no so many that the final figure becomes all black).

An older hashing function (Wang hash) is similar to new-jack MurmurHash3 sytle finalizers and is sometime still recommended. Let’s look at that:

uint32_t f(uint32_t h)
{
  h = (h ^ 61) ^ (h >> 16);
  h *= 9;
  h ^= h >> 4;
  h *= 0x27d4eb2d;
  h ^= h >> 15;
  return h;
}

Let’s very quickly run through was we could do with the population count data. SAC says bits should flip with probability of 1/2. If we think of each bit as a coin tossed in the air (tails=0 and heads=1) then the probability of getting a specific population count is described by the binomial distribution:

\[\left(\begin{array}{l}{n} \\ {k}\end{array}\right) p^{k}(1-p)^{n-k}\]

where $p$ is the probability (which is 1/2), $n$ is the number of trials (the number of bits) and $k$ the number of successes (the population count). Renaming $n$ (we’re already using that) to $b$ and plugging in our fixed $p$ gives:

\[2^{-b} \left(\begin{array}{l}{b} \\ {k}\end{array}\right)\]

At this point we have enough information to perform a goodness-of-fit test.

Comments

math (32)