Can an involution be a competitive bit finalizer?

August 20th, 2019

You know if someone had asked me this question a few months ago I would have responded that it’s was very unlikely and my facial expression would have looked like the person was eating huge globs of mayonnaise straight out of the jar. Let’s jump backward and ballpark some defs:

An involution is a function that’s its own inverse: $f\left(f\left(x\right)\right) = f^2\left(x\right) = x$
A bijection is a function that has an inverse: $f^{-1}\left(f\left(x\right)\right) = f\left(f^{-1}\left(x\right)\right) = x$

where the $n$ in $f^n$ is denoting $n$ iterations of $f$. And I’m using “bit finalizer” to mean a bit mixing function like that typically used as the final step of a hash function. I’m going to be using the terminology and heat map figures from the previous post “Comments on the Avalanche Effect”.

Now back to the beginning. I have a couple of issues with involutions as bit finalizers. The first is there aren’t many problems that need such a thing (roughly encoding and decoding are the same thing). The second issue it’s more computational work to hit reasonable properties but less that I would have previously thought so here we are…me typing.

Code for this post can be found here and for quick eye-balling of functions on godbolt.

The set up

What got me thinking about this problem was a question on twitter by paniq where he wanted to pack a 2D coordinate in a integer, perform an involution and unpack back to 2D. This would allow to “statelessly” swap the data between pairs of coordinates for an entire grid in a random manner. My suggested attempt at a solution was something like this:

// bad idea #1
uint32_t g0(uint32_t x)
{
  if (x & 1) {
    // call this f(x)
    x *= 0xac564b05;
    x += 0x85ebca77;
  }
  else {
    // and this f^{-1}(x)
    x -= 0x85ebca77;
    x *= 0xdc33c9cd;
  }
  return x;
}

This is a pair of bijections (one’s the inverse of the other) in the same form as a LCG. These have the know property that they map odd to even and even to odd. By inspecting the value of the low bit (determine if odd or even) we can choose $f\left(x\right)$ for odd and $f^{-1}\left(x\right)$ for even inputs and this composite function is an involution. The “problem” of paniq has two additional wrinkles. First we functions that work on the same size of the texture being manipulated and small sized make finding a good bit mixer harder. As an example if the texture were 256x256 we’d need an involution for a 16-bit integer. The second trick is the function needs to be parameterized so this swapping operation can been performed differently in some number of way…all of which need to be somewhat independent. These move the problem out of the territory of “fun little thing to think about” to “that sounds like a lot of work”. So I’m going to blissfully ignore these extra requirements.

The visualization of the strict avalanche criterion (SAC) bias of the above function:

The left hand is sampled using Sobol and right hand a pseudo-random sequence.

The plot thickens

Harold Aptroot spiced things up by noting that a similarity transform of an involution is an involution. Sweet! This is something I can work with. Let’s run through the justification in painful details using a matrix like notation where $A$ is a bijection and $B$ an involution:

\[\begin{align*} \left( ABA^{-1}\right)^2 & = \left( ABA^{-1} \right) \left( ABA^{-1} \right) \\ & = AB \left( A^{-1}A \right) BA^{-1} \\ & = ABBA^{-1} \\ & = AB^2A^{-1} \\ & = AA^{-1} \\ & = I \end{align*}\]

Bullet point it time:

We’re saying some function $ABA^{-1}$ is an involution which has a period of 2 so squaring it should yield the identity $I$. Start expanding.
$A$ is a bijection so it has an inverse $A^{-1}$, multipled together gives $I$ which reduces to a 70s Swedish pop group.
$B$ is an involution, squaring it reduces to $I$ and finally $A$ and it’s inverse cancel each other out.

Okay..but does it do anything?

The rest of this is just a quick feasibility study to see if we can construct an involution that is competitive against standard methods of bit mixing both in terms of statistical measures and runtime costs. It is pretty much a dump of what I walked through to get a feel for the answer. No real search methods were used..just me hacking stuff and seeing what happened.

First we need a fast involution to be transformed and hopefully one that performs some bit mixing as well. The most obvious choice (at least to me) is a right xorshift by at least half the bit-width of the register. Since we’re using a bit operation $\left(\mathbb{F}_2\right)$ for the involution I want to choose a modulo integer operation for the bijection (alternating fields between each logical operation). So about the cheapest thing we can do here is multiply by an odd integer (the inverse of which can be found by mod inverse). So I simply grabbed an odd constant from L’Ecuyer’s table paper and tweaked the xorshift amount until the bias was low.

#define M0 0x5f356495
#define M1 0x32c446bd

uint32_t f0(uint32_t x)
{
  x *= M0;
  x ^= (x>>25);
  x *= M1;
  return x;
}

This is doing suprisingly well for a short sequence involution. And again the above constants are going to be sub-optimal as are all the functions that follow.

Instead of beefing up the bijection I wanted to see how changing the involution to an xor-rotate would play out. First we need some helper functions:

inline uint32_t rot(uint32_t x, uint32_t i)
{
#if defined(__clang__)
  return __builtin_rotateleft32(x,i);
#elif defined(_MSC_VER)
  return _rotl(x,i);
#else
  return (x << i) | (x >> (-i & 31));
#endif
}

// I + C^a + C^b
inline uint32_t xor_rot2(uint32_t x, uint32_t a, uint32_t b)
{
  return x^rot(x,a)^rot(x,b);
}

I kept the multiplicative constants the same and tweaked the xor-rotate choice:

uint32_t f1(uint32_t x)
{
  x *= M0;
  x = xor_rot2(x,6,22);
  x *= M1;
  return x;
}

This was a nice little jump, we’re still on the cheap side and at this point I’d seen about enough. To kill off the bias on the low bits I just added (didn’t bother to tweak) right xorshift by half the bitwidth.

uint32_t f2(uint32_t x)
{
  x ^= x >> 16;
  x *= M0;
  x = xor_rot2(x,6,22);
  x *= M1;
  x ^= x >> 16;
  return x;
}

At this point we’re looking good and we’re also at about exactly the same runtime cost as current standard methods that are structured like this:

// 32-bit MurmurHash3, xxHash et al. look like this:
uint32_t common_bijective_finalizers(uint32_t x)
{
  x ^= x >> S0; x *= K0;
  x ^= x >> S1; x *= K1;
  x ^= x >> S2;
  return x;
}

As a final “I want to see what happens” test I swapped out the outer xorshift with an xor-rotate randomly chosen from those that have a period of 32 and a 3 rotate inverse:

// C^a + C^b + C^c
inline uint32_t xor_rot3(uint32_t x, uint32_t a, uint32_t b, uint32_t c)
{
  return rot(x,a)^rot(x,b)^rot(x,c);
}

uint32_t f3(uint32_t x)
{
  x = xor_rot2(x,11,16); x *= M0;
  x = xor_rot2(x, 6,22); x *= M1;
  x = xor_rot3(x,10,21,26);
  return x;
}

Crush-ing some numbers

This section spews out the results you’d see from running the example code. It measures SAC using three different sampling methods:

$\text{cn}$: counting numbers (well programmer version thereof aka starting from zero) for low entropy input.
$\text{ss}$: Sobol sequence to get good coverage of input space.
$\text{he}$: random numbers for high entropy input.

Each set is sampled $2^{23}$ times. The max bias measured as described in the Avalanche Effect post then multipled by 100 to give a percent bias of the worst case. Also a goodness-of-fit (GOF) test is performed to get a global estimate. Specifically Pearson’s chi-squared is computed, takes the square root and multiplied by 100.

Additionally three batteries of TestU01 are run on the input. Each hash function is treated like a random number generator where we walk through counting numbers and apply the hash.

Rabbit: computes 33 statistics from 26 base tests drawing approximately a user defined number of sample (specified in bits). My runs draw 1024 samples of the function so is setup for 1024*32 bits.
SmallCrush: computes 15 statistics from 10 base tests drawing approximately $51,320,000$ samples
Crush: computes 144 statistics from 96 base tests drawing approximately $2^{35}$ samples

The reported numbers are how many statistics of the battery failed.

For reference I’ve include 32-bit finalizer of MurmurHash3, xxHash and a pair by Chris Wellons. All except triple32 are two xorshift-multiplies followed by an xorshift and triple32 adds an additional xorshift-multiply.

Clicking on the headers sorts the data.

hash	% max bias (cn)	% max bias (ss)	% max bias (he)	GOF (cn)	GOF (ss)	GOF (he)	Rabbit	SmallCrush	Crush
murmur3	0.229263	0.518417	0.207162	0.052966	0.092238	0.043021	1	2	24
xxhash32	0.377083	0.579166	0.433731	0.069322	0.090209	0.066725	1	2	39
triple32	0.135088	0.156140	0.130367	0.044136	0.045361	0.034147	1	0	5
lowbias	0.169849	0.266051	0.122666	0.047634	0.068301	0.040458	2	2	44
f2	0.409937	0.393176	0.398207	0.054149	0.070380	0.045054	1	0	13
f3	0.591612	0.445747	0.516152	0.056496	0.052190	0.045781	1	0	3

The three inital functions are failures but here’s their data as well (minus Crush since it’s a waste of time).

hash	% max bias (cn)	% max bias (ss)	% max bias (he)	GOF (cn)	GOF (ss)	GOF (he)	Rabbit	SmallCrush
g0	100.000000	100.000000	100.000000	76.090304	82.050457	76.090831	7	15
f0	100.000000	100.000000	100.000000	23.667056	38.865972	23.663481	8	14
f1	100.000000	100.000000	100.000000	20.454587	20.904367	20.456410	1	7

Conclusions

I normally don’t have a conclusion section. I talk some trash, you can dig through with a stick and decide if there’s anything you want to take home to try out. In this post I goofing around abit so I’m going to toss out my take-a-way.

The answer to the title is yes.

Bonus takeaways:

xor-rotates might be worth investigating for various bit mixing operations
Chris Wellons’ finalizers perform rather well

Crush output …so you don’t have to

Running the Crush battery takes about an hour so to save you the PITA here is what you would get:

========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        f1
 Number of statistics:  144
 Total CPU time:   00:26:31.17
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
SerialOver, t = 2                eps  
SerialOver, t = 4               9.4e-4
CollisionOver, t = 2           6.2e-11
CollisionOver, t = 2             eps  
CollisionOver, t = 4             eps  
CollisionOver, t = 8             eps  
CollisionOver, t = 20            eps  
BirthdaySpacings, t = 2          eps  
BirthdaySpacings, t = 4       2.4e-227
BirthdaySpacings, t = 7          eps  
BirthdaySpacings, t = 8          eps  
BirthdaySpacings, t = 8          eps  
ClosePairs NP, t = 2            6.7e-4
ClosePairs mNP1, t = 2          2.9e-6
ClosePairs mNP2, t = 2          7.3e-5
ClosePairs NJumps, t = 2        2.7e-5
SimpPoker, d = 16               6.2e-6
SimpPoker, d = 16                eps  
SimpPoker, d = 64                eps  
CouponCollector, d = 4           eps  
CouponCollector, d = 16          eps  
Gap, r = 27                      eps  
Gap, r = 0                      6.8e-7
Gap, r = 22                      eps  
Run of U01, r = 15               eps  
MaxOft, t = 5                  1 - eps1
MaxOft AD, t = 5               1.4e-10
MaxOft, t = 10                 1 - eps1
MaxOft, t = 20                 1 - eps1
MaxOft, t = 30                 1 - eps1
AppearanceSpacings, r = 0        eps  
AppearanceSpacings, r = 20       eps  
WeightDistrib, r = 8           1.1e-12
WeightDistrib, r = 24            eps  
GCD, r = 0                       eps  
GCD, r = 10                      eps  
RandomWalk1 H (L = 90)           eps  
RandomWalk1 M (L = 90)           eps  
RandomWalk1 H (L = 90)           eps  
RandomWalk1 M (L = 90)           eps  
RandomWalk1 J (L = 90)           eps  
RandomWalk1 R (L = 90)           eps  
RandomWalk1 C (L = 90)           eps  
RandomWalk1 H (L = 1000)         eps  
RandomWalk1 M (L = 1000)         eps  
RandomWalk1 R (L = 1000)         eps  
RandomWalk1 C (L = 1000)         eps  
RandomWalk1 H (L = 1000)         eps  
RandomWalk1 M (L = 1000)         eps  
RandomWalk1 J (L = 1000)       4.0e-12
RandomWalk1 R (L = 1000)         eps  
RandomWalk1 C (L = 1000)         eps  
RandomWalk1 H (L = 10000)        eps  
RandomWalk1 M (L = 10000)        eps  
RandomWalk1 R (L = 10000)      1.1e-16
RandomWalk1 C (L = 10000)        eps  
RandomWalk1 H (L = 10000)        eps  
RandomWalk1 M (L = 10000)        eps  
RandomWalk1 R (L = 10000)        eps  
RandomWalk1 C (L = 10000)        eps  
LinearComp, r = 29             1 - 4.7e-14
Fourier3, r = 0                  eps  
Fourier3, r = 20                 eps  
LongestHeadRun, r = 20          7.2e-4
HammingWeight2, r = 0          1 - eps1
HammingWeight2, r = 20         1 - eps1
HammingCorr, L = 30            1 - eps1
HammingCorr, L = 300           1 - eps1
HammingCorr, L = 1200          1 - eps1
HammingIndep, L = 30             eps  
HammingIndep, L = 30             eps  
HammingIndep, L = 300            eps  
HammingIndep, L = 300            eps  
HammingIndep, L = 1200           eps  
HammingIndep, L = 1200           eps  
Run of bits, r = 20              eps  
AutoCor, d = 30                  eps  
AutoCor, d = 10                  eps  
 ----------------------------------------------
 All other tests were passed


========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        f2
 Number of statistics:  144
 Total CPU time:   00:26:56.26
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
SerialOver, t = 2                eps  
BirthdaySpacings, t = 2        6.2e-82
SimpPoker, d = 64               6.1e-8
MaxOft, t = 5                  1 - 5.9e-10
MaxOft AD, t = 5                2.9e-9
MaxOft, t = 10                 1 - eps1
MaxOft, t = 20                 1 - 1.7e-11
MaxOft, t = 30                 1 - eps1
AppearanceSpacings, r = 0      1 - eps1
WeightDistrib, r = 8            2.5e-7
HammingCorr, L = 30            1 -  8.4e-6
HammingIndep, L = 30            5.2e-4
AutoCor, d = 30                9.5e-33
 ----------------------------------------------
 All other tests were passed


========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        f3
 Number of statistics:  144
 Total CPU time:   00:27:52.02
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
MaxOft, t = 10                 1 - 9.9e-12
MaxOft, t = 20                 1 -  4.6e-6
MaxOft, t = 30                 1 - 2.6e-14
 ----------------------------------------------
 All other tests were passed


========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        xxhash32
 Number of statistics:  144
 Total CPU time:   00:27:14.14
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
SerialOver, t = 2                eps  
SerialOver, t = 4              8.9e-14
CollisionOver, t = 2          4.2e-281
BirthdaySpacings, t = 2        1.1e-93
BirthdaySpacings, t = 4          eps  
ClosePairs NP, t = 2           2.1e-10
ClosePairs mNP, t = 2          1.0e-60
ClosePairs mNP1, t = 2         1.5e-66
ClosePairs mNP2, t = 2         3.4e-20
ClosePairs NJumps, t = 2       6.5e-87
SimpPoker, d = 16                eps  
SimpPoker, d = 16                eps  
SimpPoker, d = 64                eps  
SimpPoker, d = 64                eps  
CouponCollector, d = 4           eps  
CouponCollector, d = 16         3.7e-9
Gap, r = 0                       eps  
Gap, r = 27                      eps  
Gap, r = 0                     1.1e-14
Gap, r = 22                      eps  
Run of U01, r = 0              6.8e-14
Run of U01, r = 15               eps  
MaxOft, t = 5                  1 - eps1
MaxOft, t = 10                 1 - eps1
MaxOft, t = 20                 1 -  1.4e-9
MaxOft, t = 30                 1 - 3.6e-15
SampleCorr                      6.8e-7
AppearanceSpacings, r = 0        eps  
WeightDistrib, r = 8            1.9e-4
WeightDistrib, r = 24          1.3e-12
MatrixRank, 60 x 60             0.9993 
HammingWeight2, r = 20         1 -  3.7e-6
HammingCorr, L = 30            1 -  4.4e-5
HammingCorr, L = 1200          1 -  7.2e-6
HammingIndep, L = 30           1.5e-14
Run of bits, r = 20             8.3e-4
AutoCor, d = 30                1.6e-31
AutoCor, d = 10                2.8e-25
 ----------------------------------------------
 All other tests were passed


========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        murmurhash3
 Number of statistics:  144
 Total CPU time:   00:27:08.43
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
SerialOver, t = 2                eps  
CollisionOver, t = 2             eps  
CollisionOver, t = 2            6.7e-7
BirthdaySpacings, t = 2        4.7e-42
BirthdaySpacings, t = 4          eps  
ClosePairs NP, t = 2           1.4e-15
ClosePairs mNP, t = 2         3.2e-157
ClosePairs mNP1, t = 2        8.9e-284
ClosePairs mNP2, t = 2        8.0e-123
ClosePairs NJumps, t = 2         eps  
ClosePairsBitMatch, t = 2       4.4e-4
SimpPoker, d = 64                eps  
SimpPoker, d = 64                eps  
Gap, r = 0                       eps  
Gap, r = 0                     2.8e-11
Gap, r = 22                      eps  
Run of U01, r = 15              1.4e-7
MaxOft, t = 5                  2.8e-37
MaxOft, t = 10                 1 - eps1
MaxOft, t = 20                 1 - 1.2e-11
MaxOft, t = 30                 1 - 1.1e-14
AppearanceSpacings, r = 0      1 - eps1
WeightDistrib, r = 16           1.8e-5
WeightDistrib, r = 24            eps  
 ----------------------------------------------
 All other tests were passed


========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        lowbias
 Number of statistics:  144
 Total CPU time:   00:27:03.56
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
SerialOver, t = 2                eps  
SerialOver, t = 4                eps  
CollisionOver, t = 2             eps  
CollisionOver, t = 2           7.4e-33
BirthdaySpacings, t = 2       4.6e-170
BirthdaySpacings, t = 4          eps  
BirthdaySpacings, t = 7       1.6e-156
BirthdaySpacings, t = 8        2.3e-53
ClosePairs NP, t = 2           1.2e-13
ClosePairs mNP, t = 2         3.2e-157
ClosePairs mNP1, t = 2        1.6e-180
ClosePairs mNP2, t = 2         1.3e-70
ClosePairs NJumps, t = 2      3.9e-249
ClosePairsBitMatch, t = 2       1.1e-4
SimpPoker, d = 16                eps  
SimpPoker, d = 16                eps  
SimpPoker, d = 64                eps  
SimpPoker, d = 64                eps  
CouponCollector, d = 4           eps  
CouponCollector, d = 16          eps  
CouponCollector, d = 16          eps  
Gap, r = 0                       eps  
Gap, r = 27                      eps  
Gap, r = 0                       eps  
Gap, r = 22                      eps  
Run of U01, r = 0                eps  
Run of U01, r = 15              4.3e-4
Permutation, r = 0              1.1e-6
MaxOft AD, t = 5               5.6e-11
MaxOft AD, t = 10              6.4e-15
MaxOft AD, t = 20              1 -  4.8e-8
MaxOft, t = 30                 1 - 3.1e-15
MaxOft AD, t = 30              1 -  6.0e-9
SampleProd, t = 30              0.9992 
SampleMean                      4.5e-6
SampleCorr                     1 -  6.5e-9
AppearanceSpacings, r = 0        eps  
WeightDistrib, r = 8             eps  
SumCollector                     eps  
Fourier3, r = 20                5.7e-5
HammingWeight2, r = 20          0.9992 
HammingCorr, L = 1200          1 -  9.2e-5
HammingIndep, L = 30            1.2e-6
AutoCor, d = 10                 1.1e-8
 ----------------------------------------------
 All other tests were passed


========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        triple32
 Number of statistics:  144
 Total CPU time:   00:27:30.04
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
MaxOft, t = 5                  1 - 1.8e-13
MaxOft, t = 10                 1 -  1.2e-9
MaxOft, t = 20                 1 -  3.2e-9
MaxOft, t = 30                 1 - eps1
RandomWalk1 H (L = 10000)       7.6e-4
 ----------------------------------------------
 All other tests were passed

Comments

math (34)