Negate a number by flipping all bits then adding 1. This means addition hardware works identically for positive and negative numbers — no special case needed. The MSB is the sign bit: 0 = positive, 1 = negative. Range for N-bit signed: −2N−1 to 2N−1−1.
Hex │ Binary Hex │ Binary
────┼──────── ────┼────────
0 │ 0000 8 │ 1000
1 │ 0001 9 │ 1001
2 │ 0010 A │ 1010
3 │ 0011 B │ 1011
4 │ 0100 C │ 1100
5 │ 0101 D │ 1101
6 │ 0110 E │ 1110
7 │ 0111 F │ 1111
Conversion Examples
0xB5 = 1011 0101 = 181
0xDEAD = 1101 1110 1010 1101
Java: use L suffix for long literals: 0xFFFFFFFFL
C: always use 0x prefix — 0xFFFF is 65535.
INTEGER PROMOTION RULES (C)
Silent widening — the source of many bitmasking bugs
When you use a value smaller than int in a bitwise expression, C automatically promotes it to int before the operation. Bitwise NOT on a uint8_t produces a 32-bit result — not an 8-bit one.
uint8_t x = 0x01;
uint8_t result = ~x; // Danger: C promotes to int first!
// Step 1 — x promoted to int: 0x00000001
// Step 2 — ~(0x00000001) = 0xFFFFFFFE (32-bit result)
// Step 3 — truncated to u8 = 0xFE (OK by accident here)
// In a mask expression this silently bleeds through:
uint8_t mask = 0x0F;
if (~mask & 0xFF00FF00) { } // ~mask = 0xFFFFFFF0 — upper bits live!
// Safe fix: cast explicitly back
uint8_t safe = (uint8_t)(~x); // Forces back to 8 bits
Expression
What C actually does
Safe?
~(uint8_t)x
Promote to int, flip 32 bits → 0xFFFFFFFE
⚠️ Depends on use
(uint8_t)(~x)
Same, but truncates result → 0xFE
✅ Safe
1 << 31
Signed int shift — UB if overflow
❌ UB! Use 1U << 31
1ULL << 40
64-bit unsigned shift — safe
✅ Safe
ENDIANNESS — NETWORK CODE GOTCHA
Big-Endian vs Little-Endian
Little-endian (x86, ARM default): LSB stored at lowest address. Big-endian (network byte order): MSB stored at lowest address.
Network protocols always use big-endian. Masking a multi-byte field without calling ntohs()/ntohl() first will produce wrong results on x86.
Value 0x1234 in memory:
Little-endian (x86): [0x34][0x12] ← LSB at lower address
Big-endian (network): [0x12][0x34] ← MSB at lower address
Reading IPv4 total_length on x86 without conversion:
uint16_t len = *(uint16_t*)ptr; // gives 0x3412 instead of 0x1234!
Correct:
uint16_t len = ntohs(*(uint16_t*)ptr); // always right
// Rule: convert BEFORE masking, convert back AFTER buildinguint16_t raw = *(uint16_t*)ptr;
uint16_t host = ntohs(raw); // host byte orderuint16_t offset = host & 0x1FFF; // NOW safe to mask fragment offsetuint16_t df = host & 0x4000; // Don't Fragment bit// Writing back: convert to network order
pkt->frag_off = htons(host);
OVERFLOW AND WRAPAROUND
Unsigned Overflow — Well-Defined Wraparound
Unsigned integer overflow wraps modulo 2N. Exploited intentionally in checksums, ring buffers, and TCP sequence number arithmetic.
Signed Overflow — Undefined Behaviour (C)
Signed integer overflow is UB in C. The compiler may assume it never happens and optimise away your overflow guards. Always use unsigned types for bit manipulation.
uint8_t x = 255; x++; // wraps to 0 — defined for unsigned ✓
int8_t y = 127; y++; // UB in C — may silently corrupt logic ✗
// Safe patterns:
uint32_t a = UINT32_MAX;
uint32_t sum = a + 1; // wraps to 0 — defined ✓
// To detect: check BEFORE adding
if (a > UINT32_MAX - b) { /* overflow */ }
// Shift safety:
int x = 1 << 31; // UB on signed int!
uint32_t x = 1U << 31; // Fine — unsigned ✓
x = 0x43 = 0100 0011
mask = 0x20 = 0010 0000 ← (1 << 5)
─────────────
result = 0x63 = 0110 0011
↑ bit 5 is now 1, rest unchanged
XOR — Toggle bit 3
x = 1011 0101
mask = 0000 1000 ← (1 << 3)
─────────────
= 1011 1101 ← bit 3 flipped
Properties: x ^ x = 0 (self-cancel)
x ^ 0 = x (identity)
x ^ y ^ y = x (self-inverse)
NOT — Inverse mask
mask = (1 << 5) = 0000 0000 0010 0000
~mask = 1111 1111 1101 1111
x &= ~mask ← clears bit 5, rest unchanged
In C: ~0 on signed int = -1 (all bits set)
In Java: ~0 is also -1 (int always 32-bit)
SHIFTS — LOGICAL vs ARITHMETIC
Left Shift (<<): zeros fill from right — always same behavior
0b00000001 << 3 = 0b00001000 (1 × 8 = 8)
1 << n builds a mask with exactly bit n set.
Right Shift (>>): LOGICAL (unsigned) fills 0; ARITHMETIC (signed) fills sign bit
LOGICAL: 0b10110100 >> 2 → 0b00101101 (fills 0)
ARITHMETIC: 0b10110100 >> 2 → 0b11101101 (fills sign bit)
C: >> on unsigned = logical. >> on signed = implementation-defined (usually arithmetic)
Java: >> = arithmetic, >>> = logical (unsigned right shift)
⚠️ Java gotcha: There is no unsigned right shift in C — you must cast to unsigned first. In Java, always use >>> when you want logical (zero-fill) shift on potentially-negative values.
OPERATOR PRECEDENCE — THE HIDDEN BUG
⚠️ Critical gotcha: Bitwise operators have lower precedence than comparison operators (==, !=, <, >). The expression x & mask == 0 is parsed as x & (mask == 0) — almost certainly not what you want.
WRONG — == binds tighter than &:
if (x & mask == 0) { ... } // parsed as: x & (mask == 0) ← always 0
CORRECT — add parentheses:
if ((x & mask) == 0) { ... } // what you actually mean
Precedence (high → low):
~ NOT (unary — highest bitwise)
<< >> Shifts
& AND
^ XOR
| OR
== != < > Comparisons ← sit ABOVE &, ^, | ← GOTCHA!
&& || Logical AND/OR
= |= &= ^= Assignment (lowest)
Expression
Parsed as
Correct form
x & 0xFF == 0
x & (0xFF == 0) = x & 0 = always 0
(x & 0xFF) == 0
x | y > 0
x | (y > 0) — adds 0 or 1 to x
(x | y) > 0
a ^ b == c
a ^ (b == c) — XORs with boolean
(a ^ b) == c
flags & FLAG_A != 0
flags & (FLAG_A != 0) = flags & 1
(flags & FLAG_A) != 0
✅ Rule: Always wrap every bitwise sub-expression in its own parentheses when mixing with comparisons or logical operators.
THE 4 CORE MASK OPERATIONS
Operation
Formula
Effect
SET
x |= (1 << n)
Force bit n to 1
CLEAR
x &= ~(1 << n)
Force bit n to 0
TOGGLE
x ^= (1 << n)
Flip bit n
CHECK
(x >> n) & 1
Read bit n — returns 0 or 1
uint16_t x = 0b0000010011010010; // = 1234
x |= (1 << 7); // SET bit 7
x &= ~(1 << 6); // CLEAR bit 6
x ^= (1 << 4); // TOGGLE bit 4int b3 = (x >> 3) & 1; // CHECK bit 3 → 0 or 1
// Extract width bits starting at position start// Formula: (x >> start) & ((1 << width) - 1)int start = 13, width = 3;
uint16_t field = (x >> start) & ((1 << width) - 1);
// Extracts bits 13–15 as a standalone 0–7 value
⚠️ C gotcha:~mask on a 16-bit value may sign-extend to 32-bit. Cast explicitly: (uint16_t)(~mask) to be safe in mixed-width expressions.
BITMASK AS A SET — UNION, INTERSECTION, DIFFERENCE
A bitmask of N bits represents a subset of N elements
Bit i = 1 means element i is in the set. Bit i = 0 means it is absent. This abstraction drives subset-sum DP, scheduling, permutation states, and flag systems.
Set Operation
Formula
Meaning
Union
A | B
Elements in A or B (or both)
Intersection
A & B
Elements in both A and B
Difference A − B
A & ~B
Elements in A but not in B
Complement
~A & FULL
All elements NOT in A (where FULL = (1<<N)−1)
Is subset?
(A & B) == A
Every element of A is also in B
Is empty?
A == 0
Set has no elements
Full set (N bits)
(1 << N) - 1
All N elements present
Add element i
A | (1 << i)
Include element i in A
Remove element i
A & ~(1 << i)
Exclude element i from A
// 4 tasks {T0, T1, T2, T3} mapped to bits 0–3int A = 0b0101; // {T0, T2} selectedint B = 0b0110; // {T1, T2} selectedint u = A | B; // 0b0111 = {T0,T1,T2} — unionint i = A & B; // 0b0100 = {T2} — intersectionint d = A & ~B; // 0b0001 = {T0} — A minus Bint all = (1 << 4) - 1; // 0b1111 = {T0,T1,T2,T3} — full set// Iterate all non-empty subsets of set S (classic DP trick):for (int sub = S; sub > 0; sub = (sub - 1) & S) {
// process subset 'sub' — covers all 2^|S| non-empty subsets
}
📌 Subset enumeration:sub = (sub−1) & S iterates all non-empty subsets of S efficiently. This is the cornerstone of bitmask DP for problems like TSP, task scheduling, and subset-sum variants.
x = 1011 0100
x-1 = 1011 0011 (borrow cascades up)
x&(x-1)= 1011 0000 ← lowest set bit gone
// Brian Kernighan bit countintcountBits(int x) {
int c = 0;
while (x) { x &= (x-1); c++; }
return c; // O(set bits) not O(32)
}
🔵 Isolate Lowest Set Bit
x = 1011 0100
-x = 0100 1100 (two's complement: ~x+1)
x&(-x)= 0000 0100 ← only lowest bit remains
Why: -x flips all bits above the lowest 1,
leaves the 1 intact; AND zeros the rest.
int lowest = x & (-x);
🔶 Find MSB Position
// GCC built-in (fastest)int msb = 31 - __builtin_clz(x);
// clz = count leading zeros// Portable — bit-smearinguint32_tsmear(uint32_t x) {
x |= x >> 1; x |= x >> 2;
x |= x >> 4; x |= x >> 8;
x |= x >> 16;
return x - (x >> 1);
}
MORE CLASSIC ONE-LINERS
// XOR swap — no temp variable (AVOID in production)
a ^= b; b ^= a; a ^= b; // Fails if a and b are same memory location!// Branchless sign detection (signed 32-bit)int sign = x >> 31; // 0 if positive, -1 (0xFFFFFFFF) if negative// Branchless absolute valueint mask = x >> 31; // all-zeros or all-onesint abs = (x + mask) ^ mask; // For x=-5: (-5-1)^0xFFFFFFFF = 5// Rotate left k bits (32-bit)uint32_trotl(uint32_t x, int k) { return (x << k) | (x >> (32-k)); }
uint32_trotr(uint32_t x, int k) { return (x >> k) | (x << (32-k)); }
📌 Rotations wrap bits around instead of discarding them — unlike shifts. Essential in cryptography (AES, SHA) and networking checksums. The formula (x << k) | (x >> (32-k)) is what most compilers recognize and compile to a single ROL / ROR instruction.
🟣 Find Position of Lowest Set Bit
x = 1011 0100
x & (-x) = 0000 0100 ← isolates bit 2
__builtin_ctz(x) = count trailing zeros = bit position
ctz(0b10110100) = 2 ← position of lowest '1'
// GCC/Clang — compiles to single BSF/TZCNT on x86int pos = __builtin_ctz(x); // count trailing zeros (C)// Java equivalentint pos = Integer.numberOfTrailingZeros(x);
// Portable C (no builtins) — use isolate-then-popcount:int pos = __builtin_popcount((x & -x) - 1);
Next Power of Two
Goal: round x UP to nearest power of 2.
Key: smear all bits below MSB, then add 1.
x = 0b0101 1100 (92)
After smearing: 0b0111 1111 (127)
Add 1: 0b1000 0000 (128) ✓
uint32_tnextPow2(uint32_t x) {
if (x == 0) return1;
x--; // handle exact powers of 2
x |= x >> 1; x |= x >> 2;
x |= x >> 4; x |= x >> 8;
x |= x >> 16;
return x + 1;
}
// nextPow2(100)=128 nextPow2(128)=128 nextPow2(0)=1
UNIVERSAL BITFIELD FORMULA REFERENCE
EXTRACT a bitfield (read bits [start .. start+width-1]):
field = (value >> start) & ((1 << width) - 1)
INSERT a bitfield (write pattern into bits [start .. start+width-1]):
mask = ((1 << width) - 1) << start
value = (value & ~mask) | ((pattern & ((1 << width)-1)) << start)
CHECK if any bit in range is set:
(value & mask) != 0
COUNT set bits in range:
popcount((value >> start) & ((1 << width) - 1))
Each level swaps progressively smaller chunks: 16-bit halves → 8-bit bytes → 4-bit nibbles → 2-bit pairs → individual bits. Five passes, no loops. The masks are alternating patterns: 0xAAAAAAAA = 1010...1010, 0x55555555 = 0101...0101.
GRAY CODE
// Binary → Gray code (consecutive values differ by exactly 1 bit)inttoGray(int n) { return n ^ (n >> 1); }
// Gray code → BinaryintfromGray(int g) {
int n = 0;
for (; g > 0; g >>= 1) n ^= g;
return n;
}
Gray Code — consecutive values differ by exactly 1 bit:
Dec │ Binary │ Gray
────┼────────┼──────
0 │ 000 │ 000
1 │ 001 │ 001 ← 1 bit different
2 │ 010 │ 011 ← 1 bit different
3 │ 011 │ 010 ← 1 bit different
4 │ 100 │ 110 ← 1 bit different
5 │ 101 │ 111
6 │ 110 │ 101
7 │ 111 │ 100
⚡ Real-World Context: DPDK rte_mbuf.ol_flags
ol_flags is a 64-bit integer on every DPDK packet with 30+ defined flag bits controlling TX/RX offloads, tunneling, timestamps, and security metadata. Every packet in a SASE dataplane engine uses exactly this pattern.
// Define flag constants — each is a power-of-2 bit position#define PKT_RX_VLAN (1 << 0) // bit 0 — VLAN stripped#define PKT_RX_RSS_HASH (1 << 1) // bit 1 — RSS hash valid#define PKT_TX_OFFLOAD_IP (1 << 2) // bit 2 — IP checksum offload#define PKT_TX_TCP_SEG (1 << 3) // bit 3 — TCP segmentationuint64_t ol_flags = 0;
ol_flags |= PKT_RX_VLAN | PKT_RX_RSS_HASH; // Set multiple flagsif (ol_flags & PKT_TX_TCP_SEG) { /* TSO path */ } // Check one flag
ol_flags &= ~PKT_RX_VLAN; // Clear a flaguint64_t both = PKT_RX_VLAN | PKT_RX_RSS_HASH;
if ((ol_flags & both) == both) { /* both set */ } // Check if BOTH set
IPv4 HEADER BITFIELDS
IPv4 Flags + Fragment Offset (16-bit word at header offset 6):
Bit: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
┌───┬───┬───┬────────────────────────────┐
│ 0 │DF │MF │ Fragment Offset (13 bits) │
└───┴───┴───┴────────────────────────────┘
Reserved More Frags
Don't Fragment
Extract Fragment Offset: offset = ntohs(ip->frag_off) & 0x1FFF
Check Don't Fragment bit: df = ntohs(ip->frag_off) & 0x4000
C BITFIELD STRUCTS vs EXPLICIT MASKS
// C bitfield struct — concise but NOT portable for network protocolsstruct ipv4_hdr_bits {
uint32_t ihl : 4; // bits 0-3uint32_t version : 4; // bits 4-7uint32_t ecn : 2; // bits 8-9uint32_t dscp : 6; // bits 10-15uint32_t total_len : 16; // bits 16-31
};
⚠️ Portability trap: Bit ordering within bytes in C structs is compiler and architecture-dependent. For network protocol parsing, always prefer explicit bit manipulation with masks and shifts over bitfield structs. The struct version above is readable for documentation but unsafe for cross-platform packet parsing.
#define TCP_FIN 0x01#define TCP_SYN 0x02#define TCP_RST 0x04#define TCP_PSH 0x08#define TCP_ACK 0x10#define TCP_URG 0x20#define TCP_ECE 0x40#define TCP_CWR 0x80uint8_t flags = tcp_hdr->th_flags;
if (flags & TCP_SYN) { // new connection }
if ((flags & (TCP_SYN | TCP_ACK)) == (TCP_SYN | TCP_ACK)) { // handshake }
if (flags & TCP_RST) { // reset — drop state }
if (flags & TCP_FIN) { // graceful close }
VLAN 802.1Q TAG — COMPLETE EXTRACTION
802.1Q VLAN Tag — 16 bits (follows EtherType 0x8100):
Bit: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
┌──────────┬───┬──────────────────────────────────────┐
│ PCP (3) │DEI│ VID (12 bits) │
└──────────┴───┴──────────────────────────────────────┘
Priority Drop VLAN ID (0–4095)
Code Point Elig.
PCP = bits [15:13] — 3-bit 802.1p QoS priority (0–7, 7 = highest)
DEI = bit [12] — Drop Eligible Indicator
VID = bits [11:0] — VLAN ID (VID 0=untagged, 4095=reserved)
uint16_t tag = ntohs(*(uint16_t*)(pkt + 14)); // after 14-byte Ethernet hdruint8_t pcp = (tag >> 13) & 0x07; // bits 15–13: priority 0–7uint8_t dei = (tag >> 12) & 0x01; // bit 12: drop eligibleuint16_t vid = tag & 0x0FFF; // bits 11–0: VLAN ID 0–4095// Rebuild tag from fields:
tag = ((uint16_t)pcp << 13) | ((uint16_t)dei << 12) | (vid & 0x0FFF);
📌 VID 0 = priority-tagged (no VLAN). VID 4095 = reserved. Valid user VLANs: 1–4094. Always call ntohs() before extracting fields from a live packet.
PATTERN RECOGNITION CHEAT SHEET
When You See...
Think...
Key Formula
Find unique / missing element
XOR pair cancellation
result ^= each element
No extra space, O(n)
XOR trick
x ^ x = 0, x ^ 0 = x
Is power of two?
Clear lowest set bit
x > 0 && (x & (x-1)) == 0
Count set bits
Brian Kernighan
x &= x-1 in loop
Set/clear/toggle one bit
OR / AND~mask / XOR
1 << n as mask
Extract N bits from position P
Shift + mask
(x >> P) & ((1<<N)-1)
Insert pattern at position P
Clear-then-OR
(x & ~mask) | (pat << P)
Consecutive elements differ by 1 bit
Gray code
n ^ (n >> 1)
Hamming distance
popcount of XOR
__builtin_popcount(x^y)
CORE INTERVIEW SOLUTIONS
🧠 Find Single Number — LeetCode 136
Every element appears twice except one. XOR cancels pairs: x^x=0.
intsingleNumber(int[] nums) {
int result = 0;
for (int n : nums) result ^= n;
return result; // O(n) time · O(1) space
}
🧠 Find Missing Number
Array contains 0..n with one missing. XOR all indices with all values.
intmissingNumber(int[] nums) {
int xor = nums.length; // start with nfor (int i = 0; i < nums.length; i++)
xor ^= i ^ nums[i]; // paired indices and values cancelreturn xor;
}
🧠 Two Elements Appearing Once (others twice)
XOR all → get x^y. Use rightmost differing bit to partition array into two groups.
int[] twoSingleNumbers(int[] nums) {
int xorAll = 0;
for (int n : nums) xorAll ^= n; // xorAll = x ^ yint diffBit = xorAll & (-xorAll); // rightmost bit where x≠yint a = 0, b = 0;
for (int n : nums) {
if ((n & diffBit) != 0) a ^= n; // group with that bit setelse b ^= n; // group without
}
returnnewint[]{a, b};
}
🧠 Count Bits for 0..n — LeetCode 338
DP: dp[i] = dp[i >> 1] + (i & 1). Removes LSB and adds it back.
int[] countBits(int n) {
int[] dp = newint[n + 1];
for (int i = 1; i <= n; i++)
dp[i] = dp[i >> 1] + (i & 1); // O(n)return dp;
}
MORE INTERVIEW PATTERNS
🧠 Single Number II — LeetCode 137 (each element appears 3 times)
XOR cancels pairs but not triplets. Use two variables (ones, twos) to track per-bit modulo-3 count. The single number accumulates in ones.
intsingleNumberII(int[] nums) {
int ones = 0, twos = 0;
for (int n : nums) {
ones = (ones ^ n) & ~twos; // add to ‘seen once’ if not in ‘seen twice’
twos = (twos ^ n) & ~ones; // add to ‘seen twice’ if not in ‘seen once’
}
return ones; // bits that appeared exactly 1 time (mod 3)
}
Number of positions where two values differ = popcount of their XOR.
// Single pair — O(1)inthammingDistance(int x, int y) {
returnInteger.bitCount(x ^ y); // Java; C: __builtin_popcount(x^y)
}
// Total Hamming distance across all pairs in array — O(32n)inttotalHammingDistance(int[] nums) {
int total = 0;
for (int bit = 0; bit < 32; bit++) {
int ones = 0;
for (int n : nums) ones += (n >> bit) & 1;
total += ones * (nums.length - ones); // pairs that differ at this bit
}
return total;
}
🧠 Bitwise AND of Range [m, n] — LeetCode 201
AND of all numbers in [m, n]. Any bit that flips across the range becomes 0. Only the common prefix of m and n survives.
intrangeBitwiseAnd(int m, int n) {
int shift = 0;
while (m != n) { // right-shift until equal = find common prefix
m >>= 1;
n >>= 1;
shift++;
}
return m << shift; // restore prefix to original bit position
}
// Example: [5,7] = 101 & 110 & 111 = 100 → common prefix = 1, shift=2 → 4
OPERATOR SUMMARY
Operator
Symbol
Effect
Example
AND
&
1 only if both 1 — masks/filters
0b1100 & 0b1010 = 0b1000
OR
|
1 if at least one 1 — sets bits
0b1100 | 0b0011 = 0b1111
XOR
^
1 if bits differ — toggles/compares
0b1100 ^ 0b1010 = 0b0110
NOT
~
Flips all bits
~0b00001111 = 0b11110000
Left Shift
<<
Shift left, fill 0 from right
0b0001 << 3 = 0b1000
Right Shift
>>
Shift right (logical/arithmetic)
0b1000 >> 2 = 0b0010
CORE PATTERNS CHEATSHEET
Goal
Code
Set bit n
x |= (1 << n)
Clear bit n
x &= ~(1 << n)
Toggle bit n
x ^= (1 << n)
Check bit n
(x >> n) & 1
Is power of two?
x > 0 && (x & (x-1)) == 0
Clear lowest set bit
x & (x - 1)
Isolate lowest set bit
x & (-x)
Count set bits
while(x) { x &= x-1; count++; }
Extract field [p, p+w)
(x >> p) & ((1<<w)-1)
Insert pattern at p
(x & ~mask) | (pat << p)
Rotate left k
(x << k) | (x >> (32-k))
XOR unique element
result=0; for n: result ^= n
COMMON MASKS
Mask
Hex
Purpose
Lower 4 bits (nibble)
0x0F
Extract low nibble
Upper 4 bits
0xF0
Extract high nibble
Lower byte
0x00FF
Byte 0
Upper byte (16-bit)
0xFF00
Byte 1
Lower 16 bits
0x0000FFFF
Low word
Upper 16 bits
0xFFFF0000
High word
All ones (32-bit)
0xFFFFFFFF
Full mask / -1 (signed)
Alternating 01...
0x55555555
Even bits
Alternating 10...
0xAAAAAAAA
Odd bits
MASTERY CHECKLIST
Can convert between binary, hex, and decimal in both directions
Can represent negative numbers using two's complement (flip + add 1)
Can apply all 4 core mask operations: SET, CLEAR, TOGGLE, CHECK
Can build a multi-bit mask for any width/position using ((1<<w)-1)<<p
Can insert a bit pattern using 3-step clear-then-OR
Can extract a bitfield using shift-then-mask
Can explain why x & (x-1) detects powers of two
Can explain why x & (-x) isolates the lowest set bit
Can solve Single Number (LeetCode 136) using XOR in one pass
Can solve Count Bits (LeetCode 338) using DP in O(n)
Can write rotate-left / rotate-right for 32-bit unsigned
Can explain the difference between arithmetic and logical right shift and when each applies
SHIFT-BASED MULTIPLY AND DIVIDE
Shifts are cheaper than multiply/divide on many architectures
Left shift by n = multiply by 2n. Right shift by n = divide by 2n (integer, rounds toward zero for unsigned). Compilers do this automatically, but recognising the pattern matters for bitfield arithmetic.
// Multiply by powers of 2int x8 = n << 3; // n * 8int x12 = (n << 3) + (n << 2); // n*8 + n*4 = n*12// Divide by powers of 2 (unsigned — logical shift)uint32_t d4 = x >> 2; // x / 4 (exact if divisible, floors otherwise)// Signed division: arithmetic right shift rounds toward -inf, not zero!// For correct signed floor-divide by power of 2 use:int div4 = (x + 3) >> 2; // round toward zero for positive x
MEMORY ALIGNMENT — THE MOST IMPORTANT TRICK IN SYSTEMS
Why alignment matters
Cache lines (64 bytes), DPDK buffers (128 bytes), DMA descriptors, SIMD vectors — all require naturally aligned addresses. Mis-aligned access causes bus errors on strict architectures and performance penalties everywhere else.
A pointer p is aligned to A bytes when: (p & (A-1)) == 0
(Works only when A is a power of 2 — then A-1 is an all-ones mask)
Round DOWN to alignment A: aligned = ptr & ~(A-1)
Round UP to alignment A: aligned = (ptr + A - 1) & ~(A-1)
Example: A=64, ptr=0x1003
Round down: 0x1003 & ~63 = 0x1003 & 0xFFFFFFC0 = 0x10C0
Round up: (0x1003 + 63) & ~63 = 0x1042 & 0xFFFFFFC0 = 0x1040
// Check if ptr is aligned to A (A must be power of 2)bool isAligned(uintptr_t ptr, size_t A) {
return (ptr & (A - 1)) == 0;
}
// Round address up to next multiple of Auintptr_t alignUp(uintptr_t ptr, size_t A) {
return (ptr + A - 1) & ~(A - 1);
}
// Round address downuintptr_t alignDown(uintptr_t ptr, size_t A) {
return ptr & ~(A - 1);
}
// Verify at runtime:
assert(((uintptr_t)buf & 63) == 0); // must be 64-byte aligned
MODULO BY POWER OF TWO
x % (2^n) == x & (2^n - 1) — when x is unsigned
This is why ring buffers, hash tables, and DPDK use power-of-2 sizes: the wrapping step becomes a single AND instead of a costly division.
// Expensive: requires integer divide instructionint idx = pos % capacity; // UB if capacity is 0, slow in any case// Cheap: single AND — valid ONLY when capacity is a power of 2int idx = pos & (capacity - 1);
// Ring buffer wrap (DPDK-style):uint32_t mask = size - 1; // pre-computed once at init
head = (head + 1) & mask; // wrap head cheaply
tail = (tail + 1) & mask; // wrap tail cheaply// HASH bucket index (power-of-2 table size):uint32_t bucket = hash(key) & (TABLE_SIZE - 1);
⚠️ x & (n-1) gives wrong results if n is not a power of two. Always validate with assert((n & (n-1)) == 0) at initialisation.
POPULATION COUNT (HAMMING WEIGHT)
Count the number of 1-bits in an integer
Used in checksums, error correction, set cardinality checks, and cryptography. Hardware instruction on modern CPUs; software fallback uses the divide-and-conquer approach.
// Fastest — single hardware instruction on x86 (SSE4.2+)int cnt = __builtin_popcount(x); // C/GCC (32-bit)int cnt = __builtin_popcountll(x); // C/GCC (64-bit)int cnt = Integer.bitCount(x); // Java// Brian Kernighan — O(set bits), no hardware neededintpopcount(uint32_t x) {
int c = 0;
while (x) { x &= x - 1; c++; } // clear lowest set bit each iterationreturn c;
}
// Parallel popcount — O(log bits), no loopuint32_tpopcount32(uint32_t x) {
x = x - ((x >> 1) & 0x55555555); // count pairs
x = (x & 0x33333333) + ((x >> 2) & 0x33333333); // count nibbles
x = (x + (x >> 4)) & 0x0F0F0F0F; // count bytesreturn (x * 0x01010101) >> 24; // sum all byte counts
}
BITMASK DP — SUBSETS AND STATE COMPRESSION
When to use Bitmask DP
Bitmask DP represents state as a bitmask where each bit = one element in-or-out. Applicable when N ≤ 20 (so 2N states fit in memory). Common problems: TSP, assignment, task scheduling, game states.
Pattern:
dp[mask] = optimal cost/result considering exactly the elements in mask
Transitions:
For each mask, try adding element i (not yet in mask):
newmask = mask | (1 << i)
dp[newmask] = min/max(dp[newmask], dp[mask] + cost(i))
Base case: dp[0] = 0
Answer: dp[(1<<N)-1] (full set)
// Minimum cost to visit all N nodes (TSP-style DP)// dp[mask][i] = min cost to visit nodes in mask, ending at node iint[][] dp = newint[1 << N][N];
// Fill with INF:for (int[] row : dp) Arrays.fill(row, Integer.MAX_VALUE / 2);
dp[1][0] = 0; // start at node 0, mask = 0b0001for (int mask = 1; mask < (1 << N); mask++) {
for (int u = 0; u < N; u++) {
if ((mask & (1 << u)) == 0) continue; // u not in maskfor (int v = 0; v < N; v++) {
if ((mask & (1 << v)) != 0) continue; // v already visitedint newmask = mask | (1 << v);
dp[newmask][v] = Math.min(dp[newmask][v], dp[mask][u] + dist[u][v]);
}
}
}
// Enumerate subsets of mask in O(3^N) total:for (int sub = mask; sub > 0; sub = (sub - 1) & mask) { /*..*/ }
SWAR — SIMD WITHIN A REGISTER
Process multiple values simultaneously with integer arithmetic
SWAR packs multiple sub-word values into a single 64-bit register and uses carefully chosen masks to prevent carry-propagation between adjacent logical fields. Enables 8 parallel byte operations with a single integer add.
Example: add 1 to each of 8 bytes packed in a uint64_t
64-bit word: [B7][B6][B5][B4][B3][B2][B1][B0] (8 bytes)
Naive: requires 8 separate additions
SWAR: use overflow-guard mask to prevent carry from byte N into byte N+1
Mask 0x7F7F7F7F7F7F7F7F:
Clear the MSB of every byte (guard bit)
Allows carry within each byte but blocks inter-byte carry propagation
uint64_tswar_add1_to_each_byte(uint64_t v) {
// Add 1 to each byte field without inter-byte carryuint64_t lo = v & 0x7F7F7F7F7F7F7F7FULL; // clear MSB of each byte
lo = lo + 0x0101010101010101ULL; // add 1 to each byte (no overflow into guard)uint64_t hi = (v ^ lo) & 0x8080808080808080ULL; // recover original MSBsreturn lo | hi;
}
// Population count via SWAR (classic Hacker's Delight)uint64_tpopcount64_swar(uint64_t x) {
x = x - ((x >> 1) & 0x5555555555555555ULL);
x = (x & 0x3333333333333333ULL) + ((x >> 2) & 0x3333333333333333ULL);
x = (x + (x >> 4)) & 0x0F0F0F0F0F0F0F0FULL;
return (x * 0x0101010101010101ULL) >> 56;
}
📌 SWAR is used in high-performance parsers, DPDK, network classification engines, and wherever vectorized operations are needed without SIMD intrinsics. The key discipline: choose masks so the "guard bit" pattern prevents fields from bleeding into each other.