This is a small table of reversible integer functions that can be implemented in a small number of opcodes. This is adapted from a similar table by Bret Mulvey^{1} which I ran across from a Pixar technical note^{2}. I’m basically going to fill in what I think are some interesting points. I’m going to assume all operations are on whole registers, otherwise you need to sprinkle in AND masks as needed.
Basic table
The table is intended to rather minimal. Some forward notes:
 Mostly avoiding duplicate entries so no need to list the inverse functions seperately.
 The field indicates the mathematical field where the operation is most easily expressed. I’m using $\mathbb{Z}$ to indicate integers modulo some power of two^{3} and $\mathbb{F_2}$ for informally bitwise operations.
field  $f\left(x\right)$  $f^{1}\left(x\right)$  notes 

$\mathbb{Z} $  x = x 
x = x 

$\mathbb{F}_2$  x ^= k 
x ^= k 

$\mathbb{Z} $  x += k 
x = k 

$\mathbb{Z} $  x *= k 
x *= mod_inverse(k) 
1, k must be odd 
$\mathbb{Z} $  x += x << k 
x *= mod_inverse(1+(1<<k)) 
1,2 
$\mathbb{Z} $  x = x << k 
x *= mod_inverse(1(1<<k)) 
1,2 
$\mathbb{F}_2$  x ^= x >> k 
— see below  3 
$\mathbb{F}_2$  x ^= x << k 
— see below  3,5 
$\mathbb{F}_2$  x = rotl(x,k) 
x = rotr(x,k) 
4 
$\mathbb{F}_2$  x = permute(x) 
— see below  3,4 
$\mathbb{F}_2$  — various  — see below  6 
 Where
mod_inverse
is the multiplicative inverse^{4}.  Ok the forward operations are multiplication.
 See below for inverse function.
 Bit rotations are a special case of bit permutations. Other examples include byteswaps and bit reversals.
 Special case of “carryless multiplication”.
 See below.
Since all of these operations are invertible^{5}, any composition of them is likewise invertible. So if we perform three in a row:
\[\begin{eqnarray*} x' &=& f_2\left(f_1\left(f_0\left(x\right)\right)\right) \\ &=& \left(f_2 \circ f_1 \circ f_0\right)\left(x\right) \end{eqnarray*}\]
then the inverse is simply solving the equation which is chaining the inverse functions in the opposite order.
Numbers aside: The total number of invertible functions is equal to the number of permutations. For a bitwidth of $\beta$ there are $2^{\beta}!$ unique invertible functions. A big number but most are boring and worse not fast to compute.
Quick bitwise operations as $\mathbb{F_2}$
We can consider a bit to be an integer modulo 2 and can describe basic bit operations as the two element finite field^{6} $\mathbb{F}_2$. As a field we need to support the operations: addition, subtraction, multiplication and division. If have two computer integer values a
and b
where the lowest bit of each are the values we’re performing operations on:
$\mathbb{F}_2$ operation  Clike operation  Clike reduced  notes 

$a$  (a) & 1 
a 
since $1+1=0$, $1=1$ also $0=0$ 
$a+b$  (a+b) & 1 
a^b 

$ab$  (ab) & 1 
a^b 
or via negate: $a+\left(b\right) = a+b$ 
$a~b$  (a*b) & 1 
a&b 

$a^{1}$  mod_inverse(a) 
a 
In turn a collection of bits (like a register) can treated as a vector of $\mathbb{F}_2$ elements. So the Clike reduced are the vector operations (multiplication and multiplicative inverse taken to be componentwise). We can go a step further and consider our vectors as column matrices which can be transformed by a matrix. Taking 32bit registers then we can represent a transform by a 32x32 matrix. Well we can’t represent vector addition this way, we’d have to increase the matrix size to 33x33, ignoring this here since XORing by a constant isn’t super interesting. Taking the convention that our 32bit integer x
in matrix form is $x=\left(x_0,x_1,\ldots\right)^{T}$ where $x_0$ is the low bit, then there are two basic transform matrices we need to consider: permutations and shifts (bit rotations can be represented by either). Transforming $x$ to $x’$ becomes:
And since all the functions are reversible then the inverse of $M$ exists. And from group theory (Cayley’s theorem^{7}) we have:
where $n$ is some positive integer. The smallest $n$ is the (finite) order (period/cardinality) of $M$. For computation we don’t necessarily need the smallest $n$ if the op in question is $I$ for any greater power. This gives us:
Aside: If the dimensions of our matrix is $\beta$ (using example of $\beta=32$) then there are $2^{\beta^2}$ matrices (so example is $2^{1024}$). The number of invertible matrices is:
The percentage of invertible matrices approaches a limit of 28.8788%. (Mathematica told me about the RHS above. QPochhammer^{8} is news to me.)
Bit permutations
Bit permutations can be represented by, well, a permutation matrix^{9} which has exactly one ‘1’ entry in each column and row and everything else is zero. Permutation matrices are orthogonal so if $P$ is a permutation matrix then $P^{1} = P^{T}$. The identity matrix $I$ qualifies. Bittwiddling resources are loaded with examples, like the public available “Matters Computational”^{10}. Since I mentioned numbers before, we have $\beta!$ of these (big, but very small compared to total number of invertible). Some examples:
 Identity matrix
 Exchange matrix^{11} $J$ represents bitreversal. It’s a self inverse so $J^2=I$.
 Various other swap operations, notably byteswap (shown below), which are also involutions.
 Rotating left can be represented by a circulant matrix^{12}, specifically the cyclicpermutation matrix let’s call that $C$. Raising $C$ to a positive integer $k$ is a left rotate by $k$, negative $k$ becomes right rotate and $C^{\beta} = I$ (any $k \bmod \beta = 0$ as well, showing order).
PDEP
 BMI2 ops parallel bits deposit (
PDEP
) and parallel bits extract (PEXT
) can be used to build bit permutations.
Bit shifts
Given the stated conventions then a right shift of one bit is represented by an upper shift matrix^{13} let’s call that $R$ and a left shift of one bit by lower shift matrix $L$. These alone are not invertible since we have information loss of the shifted away bits. Raise each to positive integer power k
is a shift by k
bit positions and if $k \geq \beta$ the result becomes zero (your computer language may not like this in code). These two matrices are mutual transposes.
Xorshifts
The right xorshift: x = x ^ (x>>k)
which RHS translates into $x + R^k~x = \left(I + R^k\right)x $, so the matrix is:
if $k\geq \beta$ then $M=I$.
PDEP
The left xorshift: x = x ^ (x<<k)
likewise becomes the transformation matrix:
PDEP
Inverse of xorshifts
If you don’t happen to know how to find the inverse of an xorshift then it’s pretty straightword to understand. Let’s just look at some code for the standard gray code:
After the binary to gray conversion the top bit is left unchanged. To reverse we repeat the forward transform and we’re back to two bits, next step we’ve got four, etc. Verifying all of the bits get restored is tedious but not very difficult. For any right xorshift by k
, the inverse is a sequence of $i$ right xorshifts with shifts of: $2^i~k$ (each twice of the previous…we’re double number of restored bits with each), where $i$ is enough that the next shift would be greaterthan or equal to the bit width. Examples for three, five and nine:
If the shift amount is half the width or greater then the xorshift is an involution.
Let’s play the matrices a bit. Mechanically expand a composition of two right xorshifts x ^= x>>a; x^= x>>b
:
so same direction xorshifts commute. The wikipedia list of gray to binary runs in the opposite order of that above. For dead horse thing this expression is the same as above: x = x ^ (x>>a) ^ (x>>b) ^ (x>>(a+b))
Setting $a=b=k$ and plugging in gives squaring:
\[\begin{eqnarray*} \left(I + R^k\right)^2 & = & I + 2 R^k + R^{2k} \\ & = & I + R^{2k} \end{eqnarray*}\]which give the ‘bitdoubling’ operation we used above for the inverse. Combine this with the Cayley’s theorem result and we can directly form the inverse.
\[M^{1} = M^{n1} = \left(I + R^k\right)\left(I + R^{2k}\right)\left(I + R^{4k}\right)\ldots\]
I go into the details a bit more in the companion XorRotate post^{14}. Everything said about right xorshift apply to left.
Carryless multiplies and their inverses
Carryless multiplies opcodes are now^{15} fast enough to be usable on Intelalikes. Let’s just jump in with a quick foolishly naive software implementation for 32x32 mod 2^32:
If we were to change r ^= a
to r += a
we’d have a really bad integer product flipping to long multiplication in modulo $\mathbb{Z}$ from modulo $\mathbb{F}_2$ (where addition is XOR). If we were to fix either to a constant (let’s choose b
) then we could expand this based on the bits of b
which are set: (x<<b0)^(x<<b1)...
where bn
is the position of the n^{th} set bit. You might be shocked to discover that if the lowest bit is set then this carryless product has a multiplicative inverse. The forward transform in matrix would then look like:
our now oldfriend the left xorshift. And to find a generic inverse we simply need to compute $M^{\beta1}$.
The code above is structured for clarity and will produce unneeded operations. An actually implementation for current Intel intrinics would look like:
Others
 Basics of XOR rotates are in another post^{14}.
 AND XOR rotates (not motived yet)
 CRC32C (not motived yet)
References and Footnotes

“Correlated MultiJittered Sampling”, Andrew Kensler 2013 (PDF) ↩

Mentally replace that by $\mathbb{Z}_{2^b}$ or $\mathbb{Z}/2^{b}$ if you’re offended. ↩

“Integer multiplicative inverse via Newton’s method”, 2017 (page) ↩

Well $\mathbb{F}_{2}$ is more general than math on bits…don’t need any of that here. ↩

Carryless multiplies on Intel Haswell+ and AMD Jaguar+ microarchitecture. SEE: Fog’s tables ↩