Harold Aptroot posted a method to compute the nearest integer with the same population count (popcount) as the input (see below). This got me thinking about the various enumeration methods so I’ll toss out some comments. I’ll show 32-bit example code (64-bit follows directly) and use 16-bit for visual examples.

Implementations in 32 & 64 bit with visual examples (like below) can be found HERE.

## Next popcount

Visting the next greater integer with the same popcount is the commonly seen method. Let’s walk through to a method (found in “Hacker’s Delight”, second edition in exercises. Warren credits David de Kloet) which has the interesting properly that if the input is the maximum integer (of the given popcount) then all ones is returned.

First let’s look at what happens if we were to just increment until we find the result:

with the following example input and after a “few” iterations (.=0 x=doesn’t matter)

$\begin{array}{lll} \texttt{x} & = & \texttt{xxxxxx.1.111....} \\ \texttt{a} & = & \texttt{............1111} \\ \texttt{x+a} & = & \texttt{xxxxxx.1.1111111} \end{array}$

all values of $a$ up to and including the above (as many 1s as there are trailing zeroes in the input) result in $(x+a)$ having a greater popcount than $x$ so we speed up the increment method by initializing $a$ to the value of the lowest set bit in $x$ which can be expressed as (x & -x).

Our example then looks like:

$\begin{array}{lll} \texttt{x} & & &=& \texttt{xxxxxx.1.111.... : } \text{input} \\ \texttt{a} &=& \texttt{x & -x} &=& \texttt{...........1.... : } \text{isolate lowest set bit} \\ \texttt{b} &=& \texttt{x + a} &=& \texttt{xxxxxx.11....... : } \text{closer!} \\ \end{array}$

which generally has a smaller popcount than $x$. Specifically if we call $L$ the length of the lowest run of 1s (here 3) and target popcount $p$ then $\text{popcount}\left(b\right) = p-(L-1)$ because the $L$ 1s of the run flip to zero and the first zero after flips to 1. So to get to our desired result we’d need to keep adding until the bottom $(L-1)$ bits are set.

Now let’s note that we can isolate the lowest run of 1s in a integer using our temporary value $b$ as (x & ~b). We simply need to right shift this value by one more than it’s “trailing zero count” to have the needed number of 1s in place:

$\begin{array}{lll} \texttt{x} & & & = & \texttt{xxxxxx.1.111.... : } \text{input}\\ \texttt{a} & = & \texttt{x & -x} & = & \texttt{...........1.... : } \text{isolate lowest set bit} \\ \texttt{b} & = & \texttt{x + a} & = & \texttt{xxxxxx.11....... : } \text{cascade the run} \\ \texttt{c} & = & \texttt{x & ~b} & = & \texttt{.........111.... : } \text{isolate lowest run of 1s} \\ \texttt{c} & = & \texttt{c >> ctz(c)} & = & \texttt{.............111 : } \text{ * move to bottom} \\ \texttt{c} & = & \texttt{c >> 1} & = & \texttt{..............11 : } \text{ * lose one more bit} \\ \texttt{r} & = & \texttt{b ^ c} & = & \texttt{xxxxxx.11.....11 : } \text{result (add/or/xor all work)} \end{array}$

Note that this shifting twice thing is to sidestep undefined behavior (UB) and we’d have to add one anyway to flip it to a single shift which is a zero sum gain. A direct translation into code then looks like:

This currently returns a funky result when the input is the maximum value of a given popcount. The max value is when the lowest bit string ($b$ above) includes the sign bit and our pair of shifts then result in the smallest number with a popcount of $\left(p-1\right)$ mod the bitwidth.

If instead we use signed (arithmetic) shifts then we get the above mentioned “feature” of max value input returning all 1s.

Example lowering into Intel and ARM (which can probably be improved by an ARM smart person):

## Previous popcount

I couldn’t find any efficient methods to walk to the next smaller. The only constant time version I found was a mod of R.W. Gosper HAKMEM method which looks like this:

Needing to perform an integer division is very unfortunate. However since we already have a version that walks to the next we can simply apply a linear transform to map/unmap the input/result. The ones complement (~x) reverses the notion of unsigned integer ordering so the follow function walks to the previous:

Which isn’t too bad at +2 basic logical ops. We could “carry through the derivation”: pull the complements inside, apply identities and win! But’s lets just re-run through the pop_next example:

$\begin{array}{lll} \texttt{x} & & & = & \texttt{xxxxxx1.1...1111 : } \text{input: ~x of 'next' example}\\ \texttt{a} & = & \texttt{~x & (x+1)} & = & \texttt{...........1.... : } \text{isolate lowest clear bit} \\ \texttt{b} & = & \texttt{ x - a} & = & \texttt{xxxxxx.11....... : } \text{cascade the run} \\ \texttt{c} & = & \texttt{~x & b} & = & \texttt{.........111.... : } \text{isolate lowest run of 0s} \\ \texttt{c} & = & \texttt{c >> ctz(c)} & = & \texttt{.............111 : } \text{ * move to bottom} \\ \texttt{c} & = & \texttt{c >> 1} & = & \texttt{..............11 : } \text{ * lose one more bit} \\ \texttt{r} & = & \texttt{b ^ c} & = & \texttt{xxxxxx1..11111.. : } \text{zero out bottom bits} \end{array}$

List of modifications needed:

• x is changed to ~x of pop_next case because we’re interested in odd inputs here and even there (more below)
• a goes from lowest set (x & -x) to lowest clear ~x & (x+1)
• b goes from x+a to x-a. Both flip the run and the next higher bit
• c remains the same
• r remains the same but requires using XOR to flip the needed bits

Example lowering into Intel for transformed next vs. reworked:

## Popcount toward some target direction

Given the transform version of pop_prev above it’s straightword to create runtime selection of the walk direction.

## Nearest popcount

Now back to Harold’s post:

Let’s write out a 32-bit version in C:

The first temp value $a$ gives the “first bit that differs from bit-0”. So for even $x$ it isolates the lowest set bit and for odd the lowest clear bit. If $x$ is either all zeros or all ones then $a$ is zero. Otherwise $a$ is always a single set bit with the lowest clear.

The second temp value $b$ looks like it’s computing a Gray code but given the possible values of $a$ it’s simply creating a number with 2 adjacent bits set (so the above code mod).

If in the by-example derivation of pop_next we’d used an odd integer then we would have came up with the pop_nearest above except for the computation of a would have been ‘isolate the lowest clear bit’ (~x & (x+1)) and if we’d carried through with an even integer for pop_prev then a would have been ‘isolate the lowest set bit’ (x & -x). So this neat little thing just marries together the simple cases of next and prev. Nice!

Let’s look at way happens with 5 trailing zeroes (x0) and 5 trailing ones (x1):

xxxxxxxxxx100000 : x0 = even x
xxxxxxxxxx011111 : x1 = odd  x
0000000000100000 : both yield this as 'a'
0000000000110000 : both yield this as 'b'

xxxxxxxxxx010000 : x0 ^ b
xxxxxxxxxx101111 : x1 ^ b


Notice that both decrease the length of the low run by 1 so we can percolate down:

xxxxxxxxxx000010 : x0 = even x
xxxxxxxxxx111101 : x1 = odd  x
0000000000000010 : both yield this as 'a'
0000000000000011 : both yield this as 'b'

xxxxxxxxxx000001 : x0 ^ b
xxxxxxxxxx111110 : x1 ^ b
-----  : each leave behind the original bit-0
and lowest 2 bits toggle each step from now on