August 20th, 2019 You know if someone had asked me this question a few months ago I would have responded that it’s was very unlikely and my facial expression would have looked like the person was eating huge globs of mayonnaise straight out of the jar. Let’s jump backward and ballpark some defs: An involution is a function that’s its own inverse: $f\left(f\left(x\right)\right) = f^2\left(x\right) = x$ A bijection is a function that has an inverse: $f^{-1}\left(f\left(x\right)\right) = f\left(f^{-1}\left(x\right)\right) = x$ where the $n$ in $f^n$ is denoting $n$ iterations of $f$. And I’m using “bit finalizer” to mean a bit mixing function like that typically used as the final step of a hash function. I’m going to be using the terminology and heat map figures from the previous post “Comments on the Avalanche Effect”. Now back to the beginning. I have a couple of issues with involutions as bit finalizers. The first is there aren’t many problems that need such a thing (roughly encoding and decoding are the same thing). The second issue it’s more computational work to hit reasonable properties but less that I would have previously thought so here we are…me typing. Code for this post can be found here and for quick eye-balling of functions on godbolt. The set up What got me thinking about this problem was a question on twitter by paniq where he wanted to pack a 2D coordinate in a integer, perform an involution and unpack back to 2D. This would allow to “statelessly” swap the data between pairs of coordinates for an entire grid in a random manner. My suggested attempt at a solution was something like this: // bad idea #1 uint32_t g0(uint32_t x) { if (x & 1) { // call this f(x) x *= 0xac564b05; x += 0x85ebca77; } else { // and this f^{-1}(x) x -= 0x85ebca77; x *= 0xdc33c9cd; } return x; } This is a pair of bijections (one’s the inverse of the other) in the same form as a LCG. These have the know property that they map odd to even and even to odd. By inspecting the value of the low bit (determine if odd or even) we can choose $f\left(x\right)$ for odd and $f^{-1}\left(x\right)$ for even inputs and this composite function is an involution. The “problem” of paniq has two additional wrinkles. First we functions that work on the same size of the texture being manipulated and small sized make finding a good bit mixer harder. As an example if the texture were 256x256 we’d need an involution for a 16-bit integer. The second trick is the function needs to be parameterized so this swapping operation can been performed differently in some number of way…all of which need to be somewhat independent. These move the problem out of the territory of “fun little thing to think about” to “that sounds like a lot of work”. So I’m going to blissfully ignore these extra requirements. The visualization of the strict avalanche criterion (SAC) bias of the above function: The left hand is sampled using Sobol and right hand a pseudo-random sequence. The plot thickens Harold Aptroot spiced things up by noting that a similarity transform of an involution is an involution. Sweet! This is something I can work with. Let’s run through the justification in painful details using a matrix like notation where $A$ is a bijection and $B$ an involution: \[\begin{align*} \left( ABA^{-1}\right)^2 & = \left( ABA^{-1} \right) \left( ABA^{-1} \right) \\ & = AB \left( A^{-1}A \right) BA^{-1} \\ & = ABBA^{-1} \\ & = AB^2A^{-1} \\ & = AA^{-1} \\ & = I \end{align*}\] Bullet point it time: We’re saying some function $ABA^{-1}$ is an involution which has a period of 2 so squaring it should yield the identity $I$. Start expanding. $A$ is a bijection so it has an inverse $A^{-1}$, multipled together gives $I$ which reduces to a 70s Swedish pop group. $B$ is an involution, squaring it reduces to $I$ and finally $A$ and it’s inverse cancel each other out. Okay..but does it do anything? The rest of this is just a quick feasibility study to see if we can construct an involution that is competitive against standard methods of bit mixing both in terms of statistical measures and runtime costs. It is pretty much a dump of what I walked through to get a feel for the answer. No real search methods were used..just me hacking stuff and seeing what happened. First we need a fast involution to be transformed and hopefully one that performs some bit mixing as well. The most obvious choice (at least to me) is a right xorshift by at least half the bit-width of the register. Since we’re using a bit operation $\left(\mathbb{F}_2\right)$ for the involution I want to choose a modulo integer operation for the bijection (alternating fields between each logical operation). So about the cheapest thing we can do here is multiply by an odd integer (the inverse of which can be found by mod inverse). So I simply grabbed an odd constant from L’Ecuyer’s table paper and tweaked the xorshift amount until the bias was low. #define M0 0x5f356495 #define M1 0x32c446bd uint32_t f0(uint32_t x) { x *= M0; x ^= (x>>25); x *= M1; return x; } This is doing suprisingly well for a short sequence involution. And again the above constants are going to be sub-optimal as are all the functions that follow. Instead of beefing up the bijection I wanted to see how changing the involution to an xor-rotate would play out. First we need some helper functions: inline uint32_t rot(uint32_t x, uint32_t i) { #if defined(__clang__) return __builtin_rotateleft32(x,i); #elif defined(_MSC_VER) return _rotl(x,i); #else return (x << i) | (x >> (-i & 31)); #endif } // I + C^a + C^b inline uint32_t xor_rot2(uint32_t x, uint32_t a, uint32_t b) { return x^rot(x,a)^rot(x,b); } I kept the multiplicative constants the same and tweaked the xor-rotate choice: uint32_t f1(uint32_t x) { x *= M0; x = xor_rot2(x,6,22); x *= M1; return x; } This was a nice little jump, we’re still on the cheap side and at this point I’d seen about enough. To kill off the bias on the low bits I just added (didn’t bother to tweak) right xorshift by half the bitwidth. uint32_t f2(uint32_t x) { x ^= x >> 16; x *= M0; x = xor_rot2(x,6,22); x *= M1; x ^= x >> 16; return x; } At this point we’re looking good and we’re also at about exactly the same runtime cost as current standard methods that are structured like this: // 32-bit MurmurHash3, xxHash et al. look like this: uint32_t common_bijective_finalizers(uint32_t x) { x ^= x >> S0; x *= K0; x ^= x >> S1; x *= K1; x ^= x >> S2; return x; } As a final “I want to see what happens” test I swapped out the outer xorshift with an xor-rotate randomly chosen from those that have a period of 32 and a 3 rotate inverse: // C^a + C^b + C^c inline uint32_t xor_rot3(uint32_t x, uint32_t a, uint32_t b, uint32_t c) { return rot(x,a)^rot(x,b)^rot(x,c); } uint32_t f3(uint32_t x) { x = xor_rot2(x,11,16); x *= M0; x = xor_rot2(x, 6,22); x *= M1; x = xor_rot3(x,10,21,26); return x; } Crush-ing some numbers This section spews out the results you’d see from running the example code. It measures SAC using three different sampling methods: $\text{cn}$: counting numbers (well programmer version thereof aka starting from zero) for low entropy input. $\text{ss}$: Sobol sequence to get good coverage of input space. $\text{he}$: random numbers for high entropy input. Each set is sampled $2^{23}$ times. The max bias measured as described in the Avalanche Effect post then multipled by 100 to give a percent bias of the worst case. Also a goodness-of-fit (GOF) test is performed to get a global estimate. Specifically Pearson’s chi-squared is computed, takes the square root and multiplied by 100. Additionally three batteries of TestU01 are run on the input. Each hash function is treated like a random number generator where we walk through counting numbers and apply the hash. Rabbit: computes 33 statistics from 26 base tests drawing approximately a user defined number of sample (specified in bits). My runs draw 1024 samples of the function so is setup for 1024*32 bits. SmallCrush: computes 15 statistics from 10 base tests drawing approximately $51,320,000$ samples Crush: computes 144 statistics from 96 base tests drawing approximately $2^{35}$ samples The reported numbers are how many statistics of the battery failed. For reference I’ve include 32-bit finalizer of MurmurHash3, xxHash and a pair by Chris Wellons. All except triple32 are two xorshift-multiplies followed by an xorshift and triple32 adds an additional xorshift-multiply. Clicking on the headers sorts the data. hash % max bias (cn) % max bias (ss) % max bias (he) GOF (cn) GOF (ss) GOF (he) Rabbit SmallCrush Crush murmur3 0.229263 0.518417 0.207162 0.052966 0.092238 0.043021 1 2 24 xxhash32 0.377083 0.579166 0.433731 0.069322 0.090209 0.066725 1 2 39 triple32 0.135088 0.156140 0.130367 0.044136 0.045361 0.034147 1 0 5 lowbias 0.169849 0.266051 0.122666 0.047634 0.068301 0.040458 2 2 44 f2 0.409937 0.393176 0.398207 0.054149 0.070380 0.045054 1 0 13 f3 0.591612 0.445747 0.516152 0.056496 0.052190 0.045781 1 0 3 The three inital functions are failures but here’s their data as well (minus Crush since it’s a waste of time). hash % max bias (cn) % max bias (ss) % max bias (he) GOF (cn) GOF (ss) GOF (he) Rabbit SmallCrush g0 100.000000 100.000000 100.000000 76.090304 82.050457 76.090831 7 15 f0 100.000000 100.000000 100.000000 23.667056 38.865972 23.663481 8 14 f1 100.000000 100.000000 100.000000 20.454587 20.904367 20.456410 1 7 Conclusions I normally don’t have a conclusion section. I talk some trash, you can dig through with a stick and decide if there’s anything you want to take home to try out. In this post I goofing around abit so I’m going to toss out my take-a-way. The answer to the title is yes. Bonus takeaways: xor-rotates might be worth investigating for various bit mixing operations Chris Wellons’ finalizers perform rather well Crush output …so you don’t have to Running the Crush battery takes about an hour so to save you the PITA here is what you would get: ========= Summary results of Crush ========= Version: TestU01 1.2.3 Generator: f1 Number of statistics: 144 Total CPU time: 00:26:31.17 The following tests gave p-values outside [0.001, 0.9990]: (eps means a value < 1.0e-300): (eps1 means a value < 1.0e-15): Test p-value ---------------------------------------------- 1 SerialOver, t = 2 eps 2 SerialOver, t = 4 9.4e-4 3 CollisionOver, t = 2 6.2e-11 4 CollisionOver, t = 2 eps 6 CollisionOver, t = 4 eps 8 CollisionOver, t = 8 eps 10 CollisionOver, t = 20 eps 11 BirthdaySpacings, t = 2 eps 13 BirthdaySpacings, t = 4 2.4e-227 15 BirthdaySpacings, t = 7 eps 16 BirthdaySpacings, t = 8 eps 17 BirthdaySpacings, t = 8 eps 18 ClosePairs NP, t = 2 6.7e-4 18 ClosePairs mNP1, t = 2 2.9e-6 18 ClosePairs mNP2, t = 2 7.3e-5 18 ClosePairs NJumps, t = 2 2.7e-5 23 SimpPoker, d = 16 6.2e-6 24 SimpPoker, d = 16 eps 26 SimpPoker, d = 64 eps 28 CouponCollector, d = 4 eps 30 CouponCollector, d = 16 eps 32 Gap, r = 27 eps 33 Gap, r = 0 6.8e-7 34 Gap, r = 22 eps 36 Run of U01, r = 15 eps 41 MaxOft, t = 5 1 - eps1 41 MaxOft AD, t = 5 1.4e-10 42 MaxOft, t = 10 1 - eps1 43 MaxOft, t = 20 1 - eps1 44 MaxOft, t = 30 1 - eps1 49 AppearanceSpacings, r = 0 eps 50 AppearanceSpacings, r = 20 eps 52 WeightDistrib, r = 8 1.1e-12 54 WeightDistrib, r = 24 eps 63 GCD, r = 0 eps 64 GCD, r = 10 eps 65 RandomWalk1 H (L = 90) eps 65 RandomWalk1 M (L = 90) eps 66 RandomWalk1 H (L = 90) eps 66 RandomWalk1 M (L = 90) eps 66 RandomWalk1 J (L = 90) eps 66 RandomWalk1 R (L = 90) eps 66 RandomWalk1 C (L = 90) eps 67 RandomWalk1 H (L = 1000) eps 67 RandomWalk1 M (L = 1000) eps 67 RandomWalk1 R (L = 1000) eps 67 RandomWalk1 C (L = 1000) eps 68 RandomWalk1 H (L = 1000) eps 68 RandomWalk1 M (L = 1000) eps 68 RandomWalk1 J (L = 1000) 4.0e-12 68 RandomWalk1 R (L = 1000) eps 68 RandomWalk1 C (L = 1000) eps 69 RandomWalk1 H (L = 10000) eps 69 RandomWalk1 M (L = 10000) eps 69 RandomWalk1 R (L = 10000) 1.1e-16 69 RandomWalk1 C (L = 10000) eps 70 RandomWalk1 H (L = 10000) eps 70 RandomWalk1 M (L = 10000) eps 70 RandomWalk1 R (L = 10000) eps 70 RandomWalk1 C (L = 10000) eps 72 LinearComp, r = 29 1 - 4.7e-14 74 Fourier3, r = 0 eps 75 Fourier3, r = 20 eps 77 LongestHeadRun, r = 20 7.2e-4 80 HammingWeight2, r = 0 1 - eps1 81 HammingWeight2, r = 20 1 - eps1 82 HammingCorr, L = 30 1 - eps1 83 HammingCorr, L = 300 1 - eps1 84 HammingCorr, L = 1200 1 - eps1 85 HammingIndep, L = 30 eps 86 HammingIndep, L = 30 eps 87 HammingIndep, L = 300 eps 88 HammingIndep, L = 300 eps 89 HammingIndep, L = 1200 eps 90 HammingIndep, L = 1200 eps 92 Run of bits, r = 20 eps 95 AutoCor, d = 30 eps 96 AutoCor, d = 10 eps ---------------------------------------------- All other tests were passed ========= Summary results of Crush ========= Version: TestU01 1.2.3 Generator: f2 Number of statistics: 144 Total CPU time: 00:26:56.26 The following tests gave p-values outside [0.001, 0.9990]: (eps means a value < 1.0e-300): (eps1 means a value < 1.0e-15): Test p-value ---------------------------------------------- 1 SerialOver, t = 2 eps 11 BirthdaySpacings, t = 2 6.2e-82 25 SimpPoker, d = 64 6.1e-8 41 MaxOft, t = 5 1 - 5.9e-10 41 MaxOft AD, t = 5 2.9e-9 42 MaxOft, t = 10 1 - eps1 43 MaxOft, t = 20 1 - 1.7e-11 44 MaxOft, t = 30 1 - eps1 49 AppearanceSpacings, r = 0 1 - eps1 52 WeightDistrib, r = 8 2.5e-7 82 HammingCorr, L = 30 1 - 8.4e-6 85 HammingIndep, L = 30 5.2e-4 95 AutoCor, d = 30 9.5e-33 ---------------------------------------------- All other tests were passed ========= Summary results of Crush ========= Version: TestU01 1.2.3 Generator: f3 Number of statistics: 144 Total CPU time: 00:27:52.02 The following tests gave p-values outside [0.001, 0.9990]: (eps means a value < 1.0e-300): (eps1 means a value < 1.0e-15): Test p-value ---------------------------------------------- 42 MaxOft, t = 10 1 - 9.9e-12 43 MaxOft, t = 20 1 - 4.6e-6 44 MaxOft, t = 30 1 - 2.6e-14 ---------------------------------------------- All other tests were passed ========= Summary results of Crush ========= Version: TestU01 1.2.3 Generator: xxhash32 Number of statistics: 144 Total CPU time: 00:27:14.14 The following tests gave p-values outside [0.001, 0.9990]: (eps means a value < 1.0e-300): (eps1 means a value < 1.0e-15): Test p-value ---------------------------------------------- 1 SerialOver, t = 2 eps 2 SerialOver, t = 4 8.9e-14 3 CollisionOver, t = 2 4.2e-281 11 BirthdaySpacings, t = 2 1.1e-93 13 BirthdaySpacings, t = 4 eps 18 ClosePairs NP, t = 2 2.1e-10 18 ClosePairs mNP, t = 2 1.0e-60 18 ClosePairs mNP1, t = 2 1.5e-66 18 ClosePairs mNP2, t = 2 3.4e-20 18 ClosePairs NJumps, t = 2 6.5e-87 23 SimpPoker, d = 16 eps 24 SimpPoker, d = 16 eps 25 SimpPoker, d = 64 eps 26 SimpPoker, d = 64 eps 28 CouponCollector, d = 4 eps 30 CouponCollector, d = 16 3.7e-9 31 Gap, r = 0 eps 32 Gap, r = 27 eps 33 Gap, r = 0 1.1e-14 34 Gap, r = 22 eps 35 Run of U01, r = 0 6.8e-14 36 Run of U01, r = 15 eps 41 MaxOft, t = 5 1 - eps1 42 MaxOft, t = 10 1 - eps1 43 MaxOft, t = 20 1 - 1.4e-9 44 MaxOft, t = 30 1 - 3.6e-15 48 SampleCorr 6.8e-7 49 AppearanceSpacings, r = 0 eps 52 WeightDistrib, r = 8 1.9e-4 54 WeightDistrib, r = 24 1.3e-12 57 MatrixRank, 60 x 60 0.9993 81 HammingWeight2, r = 20 1 - 3.7e-6 82 HammingCorr, L = 30 1 - 4.4e-5 84 HammingCorr, L = 1200 1 - 7.2e-6 85 HammingIndep, L = 30 1.5e-14 92 Run of bits, r = 20 8.3e-4 95 AutoCor, d = 30 1.6e-31 96 AutoCor, d = 10 2.8e-25 ---------------------------------------------- All other tests were passed ========= Summary results of Crush ========= Version: TestU01 1.2.3 Generator: murmurhash3 Number of statistics: 144 Total CPU time: 00:27:08.43 The following tests gave p-values outside [0.001, 0.9990]: (eps means a value < 1.0e-300): (eps1 means a value < 1.0e-15): Test p-value ---------------------------------------------- 1 SerialOver, t = 2 eps 3 CollisionOver, t = 2 eps 4 CollisionOver, t = 2 6.7e-7 11 BirthdaySpacings, t = 2 4.7e-42 13 BirthdaySpacings, t = 4 eps 18 ClosePairs NP, t = 2 1.4e-15 18 ClosePairs mNP, t = 2 3.2e-157 18 ClosePairs mNP1, t = 2 8.9e-284 18 ClosePairs mNP2, t = 2 8.0e-123 18 ClosePairs NJumps, t = 2 eps 21 ClosePairsBitMatch, t = 2 4.4e-4 25 SimpPoker, d = 64 eps 26 SimpPoker, d = 64 eps 31 Gap, r = 0 eps 33 Gap, r = 0 2.8e-11 34 Gap, r = 22 eps 36 Run of U01, r = 15 1.4e-7 41 MaxOft, t = 5 2.8e-37 42 MaxOft, t = 10 1 - eps1 43 MaxOft, t = 20 1 - 1.2e-11 44 MaxOft, t = 30 1 - 1.1e-14 49 AppearanceSpacings, r = 0 1 - eps1 53 WeightDistrib, r = 16 1.8e-5 54 WeightDistrib, r = 24 eps ---------------------------------------------- All other tests were passed ========= Summary results of Crush ========= Version: TestU01 1.2.3 Generator: lowbias Number of statistics: 144 Total CPU time: 00:27:03.56 The following tests gave p-values outside [0.001, 0.9990]: (eps means a value < 1.0e-300): (eps1 means a value < 1.0e-15): Test p-value ---------------------------------------------- 1 SerialOver, t = 2 eps 2 SerialOver, t = 4 eps 3 CollisionOver, t = 2 eps 4 CollisionOver, t = 2 7.4e-33 11 BirthdaySpacings, t = 2 4.6e-170 13 BirthdaySpacings, t = 4 eps 15 BirthdaySpacings, t = 7 1.6e-156 17 BirthdaySpacings, t = 8 2.3e-53 18 ClosePairs NP, t = 2 1.2e-13 18 ClosePairs mNP, t = 2 3.2e-157 18 ClosePairs mNP1, t = 2 1.6e-180 18 ClosePairs mNP2, t = 2 1.3e-70 18 ClosePairs NJumps, t = 2 3.9e-249 21 ClosePairsBitMatch, t = 2 1.1e-4 23 SimpPoker, d = 16 eps 24 SimpPoker, d = 16 eps 25 SimpPoker, d = 64 eps 26 SimpPoker, d = 64 eps 27 CouponCollector, d = 4 eps 29 CouponCollector, d = 16 eps 30 CouponCollector, d = 16 eps 31 Gap, r = 0 eps 32 Gap, r = 27 eps 33 Gap, r = 0 eps 34 Gap, r = 22 eps 35 Run of U01, r = 0 eps 36 Run of U01, r = 15 4.3e-4 37 Permutation, r = 0 1.1e-6 41 MaxOft AD, t = 5 5.6e-11 42 MaxOft AD, t = 10 6.4e-15 43 MaxOft AD, t = 20 1 - 4.8e-8 44 MaxOft, t = 30 1 - 3.1e-15 44 MaxOft AD, t = 30 1 - 6.0e-9 46 SampleProd, t = 30 0.9992 47 SampleMean 4.5e-6 48 SampleCorr 1 - 6.5e-9 49 AppearanceSpacings, r = 0 eps 52 WeightDistrib, r = 8 eps 55 SumCollector eps 75 Fourier3, r = 20 5.7e-5 81 HammingWeight2, r = 20 0.9992 84 HammingCorr, L = 1200 1 - 9.2e-5 85 HammingIndep, L = 30 1.2e-6 96 AutoCor, d = 10 1.1e-8 ---------------------------------------------- All other tests were passed ========= Summary results of Crush ========= Version: TestU01 1.2.3 Generator: triple32 Number of statistics: 144 Total CPU time: 00:27:30.04 The following tests gave p-values outside [0.001, 0.9990]: (eps means a value < 1.0e-300): (eps1 means a value < 1.0e-15): Test p-value ---------------------------------------------- 41 MaxOft, t = 5 1 - 1.8e-13 42 MaxOft, t = 10 1 - 1.2e-9 43 MaxOft, t = 20 1 - 3.2e-9 44 MaxOft, t = 30 1 - eps1 70 RandomWalk1 H (L = 10000) 7.6e-4 ---------------------------------------------- All other tests were passed Comments math (33) integer (7) , hash (2) , random (17)