prns.h

April 19th, 2016

Overview

Standard random number generators allow you get the current member of the sequence and update the position to the next. They do not allow you to walk backwards, skip around or simply ask: What’s the n^th member?

This public domain single header file library (prns.h) implements basic functions for random access reads of a pseudo-random sequence. It is specifically designed for this purpose and all functions are roughly the same complexity. Although it is relatively fast and has quite reasonable statistically quality, there are state-of-the-art generators which are both faster and higher quality. So it is only of interest if you need a generator which can perform these uncommon operations in constant time. It additionally support a second independent sequence.

The structure is explained in: Implementation details

If seeking and skipping are of reduced interest we can change the base generation technique (SEE: An alternate solution) from that initially presented.

Documentation

The library is composed of one typedef prns_t which holds the state data for a given generator and a handful of inline functions which provide basic operations.

Position in sequence manipulation

Returns the current position in the sequence.

uint64_t prns_tell(prns_t* gen)

Set the current position in the sequence to pos.

void prns_set(prns_t* gen, uint64_t pos)

Moves the current position by the relative value offset.

void prns_seek(prns_t* gen, int64_t offset)

Reading sequence members

Returns the n^th member of the sequence.

uint64_t prns_at(uint64_t n)

Returns the member of the sequence at the current position.

uint64_t prns_peek(prns_t* gen)

Returns the member of the sequence at the current position and increments.

uint64_t prns_next(prns_t* gen)

Returns the member of the sequence at the current position and decrements.

uint64_t prns_prev(prns_t* gen)

Down sequence

In addition to the main random number sequence of the above functions the library supports a second independent sequence. Its state is initalized from the current position in the main sequence by:

uint64_t prns_start_down(prns_t* gen)

the result of which is not a random number but a state value. To read from the sequence:

uint64_t prns_down(uint64_t* state)

As a strawman example the following queries exactly one value from the main sequence and either increments or decrements the position depending on the boolean forward. It can then reads some unknown number of values from the down sequence to perform its function.

void consume_one(prns_t* gen, bool_t forward)
{
  uint64_t val   = prns_peek(gen);        // only needed if 'val' is used
  uint64_t state = prns_start_down(gen);  // initial state of the down sequence

  prns_seek(gen, forward ? 1 : -1);       // move pos by direction

  do {
    uint64_t d   = prns_down(&state);     // next in down sequence and update its state
    ...
  } while(...);
}

Utils

The bit finalizing function. (SEE: bit finalizing)

uint64_t prns_mix(uint64_t x)

Implementation details and configuration options

The generator is structured like most modernish PRNG and hashing functions. One function updates the state and a second performs mixing before returning the final result. In fact the numbers produced (in default compile time configuration) are identical to Java’s SplitMix64 so the generator passes both BigCrush and PractRand.

State update

To make all position related operations constant time requires a simple closed-form state update which is O(1). Using the members of a Weyl sequence ¹ for state fills this requirement. If we call $ n $ the position in the sequence, $ s_n $ the state at that position, $ W $ the Weyl constant and $ k $ the signed offset we’re updating the position then state updates can be expressed as:

\[\begin{equation} \label{eq:statePos} s_{n+k}=s_{n}+k\thinspace W \end{equation}\]

The natural value for $ s_0 $ is zero. Fixing $ n $ and/or $ k $ gives how state is related to position. Solving the closed form for the position $ n $:

\[\begin{equation} \label{eq:position} n=W^{-1}\thinspace s_n \end{equation}\]

where $ W^{-1} $ is the modulo inverse of $ W $. So all state operations become modulo integer addition and/or multiplication.

The down sequence follows the same structure simply with different Weyl constant.

The constants for the main sequence are macro defines: PRNS_WEYL defines the forward $ W $ and PRNS_WEYL_I defines $ W^{-1} $. The down sequence only support query and increment and its constant is defined by PRNS_WEYL_D.

Bit finalizing

The state update is so simple that the burden of producing a sequence with statistical random properties is on the shoulders of the bit finalizing function. David Stafford posted a blog entry² in which he presents fourteen variants of cascading two right xorshift multiples followed by an right xorshift:

  x ^= (x >> S0); x *= M0;
  x ^= (x >> S1); x *= M1;
  x ^= (x >> S2);

and two of the variant have good performance for low entropy input mix01 and mix13 and a Weyl sequence is low entropy. A right xorshift can be expressed as matrix equation over $ \mathbb{F_2} $:

\[x' = (I+R^a)x\]

where $ I $ is the identity and $ R $ the (right) shift matrices and $ a $ is the shift amount.

So the overall structure of a produced random number is alternating operations in $ \mathbb{Z_{2^{64}}} $ (Weyl sequence and products in the mixing function) and $ \mathbb{F_2} $.

The logical of this kind of construction is informally as follows. A modulo integer product with an odd constant produces no mixing on the lowest bit and the most on the highest. Each bit position from the lowest-to-highest gets increasing more mixed. If this isn’t obvious then consider the long hand product in binary. Right xorshifts are typcially formed so the shift amount is around half the working bit-width (~16 for 32-bit, ~32 for 64-bit, etc). This performs no-mixing on the high (shift amount of) bits and mixes the high order bits with the low-order.

This means the last right xorshift of the mixing function only effects the low order bits of the result (the default of S2 is 33) so this can be dropped without effecting statistical quality if accounted for in using functions. Example the standard equispaced conversion to singles only uses the top 24 bits. Likewise integer results can be formed using only the high bits of the result.

The mixing functions constants are defined as: PRNS_MIX_S0, PRNS_MIX_S1, PRNS_MIX_S2, PRNS_MIX_M0, PRNS_MIX_M1.

The final right-xorshift is disabled by defining: PRNS_NO_FINAL_XORSHIFT

The mixing function of the main sequence is additionally never called directly it is instead a macro expansion which defaults to: #define PRNS_MIX(X) prns_mix(X)

The down sequence likewise uses: #define PRNS_MIX_D(X) prns_mix(X)

The default provided bit-mixing function is intended to be a middle-ground solution for people that do not want to bother with providing one. This default mixing functions performs respectably in the generalized test battery Crush³ (report) which is massive overkill.

If you usage requires less than a couple of hundred thousand random numbers per simulation step per entity then a lighter weight mixing function targeting Smallcrush⁴ is more appropriate: #define PRNS_SMALLCRUSH sets the mixing function to a single right xorshift-multiply (report).

These tests were run using the provided driver (SEE: Resources)

An alternate solution for (M)LCGs and PCGs

If we reduce the problem statement to focus on the ability to query the current member and either increment or decrement the position, then we could instead use a power-of-two LCG. Then the forward state update looks like:

\[\begin{equation} \label{eq:lcgf} u_{i+1} = \left( a\thinspace u_{i} + b \right) \bmod 2^{w} \end{equation}\]

reversing the recurrence gives us (dropping the explict mod):

\[\begin{equation} \label{eq:lcgr} u_{i-1} = \left( a^{-1}\left( u_{i} - b \right) \right) = \left( a^{-1} u_{i} - a^{-1}b \right) = \left( a_r\thinspace u_{i} + b_r \right) \end{equation}\]

where $ a^{-1} $ is the modulo inverse of $a$. So we have two standard power-of-two LCG state updates where one simply walks the sequence in the opposite direction.

For the mixing function we could turn to one of those provide by O’Neill in her PCG paper⁵. In this case would would have identical performance and statistical quality as the equivalent PCG with the added ability to walk both directions at the same cost.

The recurrence relation $ \eqref{eq:lcgf} $ can be expressed in closed form (without the modulo) for ‘k’ steps forward as:

\[\begin{equation} \label{eq:lcgcf} u_{i+k} = a^k\thinspace u_i + \left( \frac{a^k-1}{a-1} \right) b \end{equation}\]

Equation $ \eqref{eq:lcgcf} $ can be evaluated using divide-and-conquer. If ‘k’ is negative we could still evaulate directly with the forward form but that would insure that we would be performing the maximum number of iterations. Effectively we would be walking the long way around the circle between the two points. Instead it is possible to use the reversed sequence constants. Obviously we can also compute the member at a given position using $ \eqref{eq:lcgcf} $ as well.

Note that the C++ version of PCG include skipping features, although it does not appear that the code takes advantage of being able to walk the minimal path.

Equation $ \eqref{eq:lcgcf} $ can be rewritten as:

\[u_{i+k} = M \thinspace u_i + C \thinspace b\]

Where $M = a^k$ which has various well know divide-and-conquer forms and:

\[C = \left( \frac{a^k-1}{a-1} \right) = 1 + a + a^2 + ... + a^{k-1}\]

The basics are pretty straight forward and are implemented in: (lcgs.h)

This is currently defaulting to the dumbest mixing on the planet. I need to sit down an think about it. The source includes PCG mixing functions which you have to set macros to get included…not public domain.

Fabian Giesen has written a better description of sequence skipping for LCGs than this HERE with various choices of improved exp performance.

Generalization to any bijection state update

I would argue that all interesting state updates are bijections which can be expressed as the matrix equation over some field:

\[u_{i+1} = M\ u_{i}\]

so $ M $ is invertible and there exists some scheme to produce $ M^{k} $, say by precomputing code for various $ k $ values. So there are O(1) formulations for reversed sequence walking and O(ln n) for skipping. However there are not any obvious choices such that forward/reverse walking are roughly the same complexity. As examples consider binary-to-gray code conversion vs. gray-to-binary or this paper⁶.

Resources

Mathematica notebook for goofying with constants: here
TestU01 (SmallCrush/Crush) test driver: here

References and Footnotes

Weyl sequence overview (local post) ↩
Better Bit Mixing - Improving on MurmurHash3’s 64-bit Finalizer, David Stafford (blog post) ↩
Crush is a TestU01⁷ battery, roughly speaking passing this means you’re good for ~$2 \times 10^6$ samples per problem. ↩
Smallcrush is a TestU01⁷ battery, roughly speaking passing this means you’re good for ~$2 \times 10^5$ samples per problem. ↩
PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation, Melissa E. O’Neill, (paper) ↩
Efficient Jump Ahead for $\mathbb{F_2}$-Linear Random Number Generators, Haramoto, Matsumoto, Nishimura, Panneton, L’Ecuyer, 2008. (PDF) ↩
TestU01: A C library for empirical testing of random number generators, Pierre L’Ecuyer and Richard Simard, 2007. (original source/paper) (hacked source) ↩ ↩²

Comments

SHF (3)