A quick description of a 32bit state Weyl sequence^{1} (logically additive recurrence) followed by a bitfinializing operation which can be used for generating permutations of $\left[0,~2^{32}\right)$ or for random number generation which supports random access queries. The goal is to find a sweetspot of time complexity, statistical quality and number of unique statistically independent permutations generated.
Bulletpoints:
 Random access is the key point. If that’s not needed then there are a number of other options.
 Structurally equivalent to the 64bit version described HERE.
 Passes^{2} Smallcrush^{3}.
 Single and/or multiple stream (aka different permutations)
 Example implementation as a public domain single header file library (lprns.h).
 Approximately the same complexity as the roughly equivalent PCG^{4} generator: godbolt. We’re trading statistical quality for random access operations.
Briefly the state is updated by a Weyl sequence which allows very fast queries. To provide reasonable statistical quality this is then passed through a bit finalizing function. Example from the library:
Nonparamaterized version loses the second input parameter and m
becomes a constant. This is commonish finalizer used (for example) in murmurhash 3, xxhash, etc. In an ideal world I’d have run a search/optimization to provide the constants. Constants from murmur/xxhash are designed for highentropy input with a goodness measure of bit avalanche. Here we have a low entropy input (a lowdiscrepancy sequence) which we want to transform into a reasonable pseudorandom sequence. Since it’s doing good enough at testing…I couldn’t be bothered.
The example code above requires each stream to have a pretty good constant m
otherwise statistical quality will be decreased^{5}. The library provides an inital value and function to create the next from it, etc. The function simply drops the bottom 3 bits (fixed at 101), computes the next larger integer with the same population count and sets the bottom 3 bits to 101. Minimal testing indicates that this scheme works well but that doesn’t mean much. Frankly I don’t consider it very interesting. Getting the m
values is a pain, its probably one more extra thing to store and it limits the number of unique permutations. Tossing it out since these downsides might not be an issue for some cases.
Recently I discovered that my 64bit variant is virtually identical to a generator in the Java system library informally called splitmix^{6}. This generator supports multiple streams by parameterizing the Weyl constant. Although somewhat easier than the version above all of the same problems apply here as well. Instead let’s toss together one that allows any unique integer to define a stream:
The previous is less than ideal in my eyes since we have two $\mathbb{F}_2$ operations in a row (xor with m
followed by an xorshift) but I’m attempting to not extended the operation chain any longer than needed. If lower quality is fine we can drop one xorshiftmultiply (and say just grab murmurhash 2 constants) or even lower by dropping all after then xor by m
and returning a product with a good (M)LCG constant. The last becomes equivalent to a minimal 2D white noise generator I previously posted^{7} about.
Footnotes

“Weyl sequence overview” (local post) ↩

Statistics police might call me on declaring a pass:
 TestU01 requires pvalues within a 99.9% confidence level to declare a test passes. Otherwise it reports the questionable pvalue or a complete fail.
 Test driver can be found here and runs of 100 batteries for fixed and streaming version here and here respectively. Streaming test changes the permutation using the library provided function each battery run. If you inspect these runs you’ll find one questionable pvalue in each just outside of the confidence range. On about 20k runs hitting a questionable pvalue about 2.2% of the time and no fails. The questionable pvalues are scattered across eight of the fifteen tests in the battery.
 Ideally I’d rerun test with questionable pvalues with increased sample sizes and rinceandrepeat until it moves into pass or fail. Oh wait! I did do that. SmallCrushAdaptive (source)

Smallcrush is a TestU01^{8} battery, roughly speaking passing this means you’re good for ~$2 \times 10^5$ samples per problem. ↩

“PCG: A Family of Simple Fast SpaceEfficient Statistically Good Algorithms for Random Number Generation”, Melissa E. O’Neill (web site) ↩

Questionable pvalues rate increases from ~2.2% to ~3.7% with a set of constants I expect to be pretty good. ↩

java.util.SplittableRandom
originally described in “Fast splittable pseudorandom number generators.”, Guy Steele Jr., Doug Lea, and Christine Flood, 2014 ↩ 
The original version in “minimal quality 2D hashing with Weyl” (local post) ↩

“TestU01: A C library for empirical testing of random number generators”, Pierre L’Ecuyer and Richard Simard, 2007. (original source/paper) (hacked source) ↩