Bit Manipulation — First-Principles Deep Dive

DSA MASTERY · CH 11 · BIT MANIPULATION

⚙️ Bit Manipulation & Masking

Binary representation · 6 operators · masking patterns · classic tricks · interview problems · systems flags

Binary / Two's Complement AND · OR · XOR · NOT · Shifts Masking · Bitfields FAANG Patterns C / Java / C++

WHAT IS A BIT?

Memory Layout — Value 45 in 8 bits

Each bit position n carries weight 2ⁿ. Bit 0 = LSB (Least Significant Bit). Bit 7 = MSB (Most Significant Bit).

Bit position: 7 6 5 4 3 2 1 0 ┌────┬────┬────┬────┬────┬────┬────┬────┐ Value = 45: │ 0 │ 0 │ 1 │ 0 │ 1 │ 1 │ 0 │ 1 │ └────┴────┴────┴────┴────┴────┴────┴────┘ Bit weight: 128 64 32 16 8 4 2 1 Value = 0×128 + 0×64 + 1×32 + 0×16 + 1×8 + 1×4 + 0×2 + 1×1 = 45

DATA SIZES

C Type	Java Type	Size	Bit Width	Unsigned Range
`uint8_t`	`byte`	1 byte	8 bits	0 – 255
`uint16_t`	`short`	2 bytes	16 bits	0 – 65,535
`uint32_t`	`int`	4 bytes	32 bits	0 – 4,294,967,295
`uint64_t`	`long`	8 bytes	64 bits	0 – 18,446,744,073,709,551,615

TWO'S COMPLEMENT — SIGNED INTEGERS

Why Two's Complement?

Negate a number by flipping all bits then adding 1. This means addition hardware works identically for positive and negative numbers — no special case needed. The MSB is the sign bit: 0 = positive, 1 = negative. Range for N-bit signed: −2^N−1 to 2^N−1−1.

Represent -45 in 8-bit two's complement: Step 1 — Start with +45: 0 0 1 0 1 1 0 1 Step 2 — Flip all bits: 1 1 0 1 0 0 1 0 Step 3 — Add 1: 1 1 0 1 0 0 1 1 ← -45 = 0xD3 Verify: 128+64+16+2+1 = 211. 211 - 256 = -45 ✓

HEX ↔ BINARY QUICK REFERENCE

Hex │ Binary Hex │ Binary ────┼──────── ────┼──────── 0 │ 0000 8 │ 1000 1 │ 0001 9 │ 1001 2 │ 0010 A │ 1010 3 │ 0011 B │ 1011 4 │ 0100 C │ 1100 5 │ 0101 D │ 1101 6 │ 0110 E │ 1110 7 │ 0111 F │ 1111

Conversion Examples

0xB5 = 1011 0101 = 181

0xDEAD = 1101 1110 1010 1101

Java: use L suffix for long literals: 0xFFFFFFFFL

C: always use 0x prefix — 0xFFFF is 65535.

INTEGER PROMOTION RULES (C)

Silent widening — the source of many bitmasking bugs

When you use a value smaller than int in a bitwise expression, C automatically promotes it to int before the operation. Bitwise NOT on a uint8_t produces a 32-bit result — not an 8-bit one.

uint8_t x = 0x01; uint8_t result = ~x; // Danger: C promotes to int first! // Step 1 — x promoted to int: 0x00000001 // Step 2 — ~(0x00000001) = 0xFFFFFFFE (32-bit result) // Step 3 — truncated to u8 = 0xFE (OK by accident here) // In a mask expression this silently bleeds through: uint8_t mask = 0x0F; if (~mask & 0xFF00FF00) { } // ~mask = 0xFFFFFFF0 — upper bits live! // Safe fix: cast explicitly back uint8_t safe = (uint8_t)(~x); // Forces back to 8 bits

Expression	What C actually does	Safe?
`~(uint8_t)x`	Promote to int, flip 32 bits → 0xFFFFFFFE	⚠️ Depends on use
`(uint8_t)(~x)`	Same, but truncates result → 0xFE	✅ Safe
`1 << 31`	Signed int shift — UB if overflow	❌ UB! Use `1U << 31`
`1ULL << 40`	64-bit unsigned shift — safe	✅ Safe

ENDIANNESS — NETWORK CODE GOTCHA

Big-Endian vs Little-Endian

Little-endian (x86, ARM default): LSB stored at lowest address.
Big-endian (network byte order): MSB stored at lowest address.

Network protocols always use big-endian. Masking a multi-byte field without calling ntohs()/ntohl() first will produce wrong results on x86.

Value 0x1234 in memory: Little-endian (x86): [0x34][0x12] ← LSB at lower address Big-endian (network): [0x12][0x34] ← MSB at lower address Reading IPv4 total_length on x86 without conversion: uint16_t len = *(uint16_t*)ptr; // gives 0x3412 instead of 0x1234! Correct: uint16_t len = ntohs(*(uint16_t*)ptr); // always right

// Rule: convert BEFORE masking, convert back AFTER building uint16_t raw = *(uint16_t*)ptr; uint16_t host = ntohs(raw); // host byte order uint16_t offset = host & 0x1FFF; // NOW safe to mask fragment offset uint16_t df = host & 0x4000; // Don't Fragment bit // Writing back: convert to network order pkt->frag_off = htons(host);

OVERFLOW AND WRAPAROUND

Unsigned Overflow — Well-Defined Wraparound

Unsigned integer overflow wraps modulo 2^N. Exploited intentionally in checksums, ring buffers, and TCP sequence number arithmetic.

Signed Overflow — Undefined Behaviour (C)

Signed integer overflow is UB in C. The compiler may assume it never happens and optimise away your overflow guards. Always use unsigned types for bit manipulation.

uint8_t x = 255; x++; // wraps to 0 — defined for unsigned ✓ int8_t y = 127; y++; // UB in C — may silently corrupt logic ✗ // Safe patterns: uint32_t a = UINT32_MAX; uint32_t sum = a + 1; // wraps to 0 — defined ✓ // To detect: check BEFORE adding if (a > UINT32_MAX - b) { /* overflow */ } // Shift safety: int x = 1 << 31; // UB on signed int! uint32_t x = 1U << 31; // Fine — unsigned ✓

THE 6 BITWISE OPERATORS

Operator	Symbol	Rule	Key Use
AND	`&`	1 only when BOTH bits are 1	Filter/mask — forces bits to 0
OR	`\|`	1 when AT LEAST ONE bit is 1	Set (force to 1) specific bits
XOR	`^`	1 when bits are DIFFERENT	Toggle bits; self-cancellation tricks
NOT	`~`	Flips ALL bits (unary)	Create inverse masks to CLEAR bits
Left Shift	`<<`	Move bits toward MSB; fill 0s from right	Multiply by 2ⁿ; build masks with `1 << n`
Right Shift	`>>`	Move bits toward LSB	Divide by 2ⁿ; extract upper fields

AND — Extract lower nibble

x = 0xB7 = 1011 0111 mask = 0x0F = 0000 1111 ───────────── result = 0x07 = 0000 0111 ↑ lower 4 bits preserved, upper 4 zeroed

OR — Set bit 5

x = 0x43 = 0100 0011 mask = 0x20 = 0010 0000 ← (1 << 5) ───────────── result = 0x63 = 0110 0011 ↑ bit 5 is now 1, rest unchanged

XOR — Toggle bit 3

x = 1011 0101 mask = 0000 1000 ← (1 << 3) ───────────── = 1011 1101 ← bit 3 flipped Properties: x ^ x = 0 (self-cancel) x ^ 0 = x (identity) x ^ y ^ y = x (self-inverse)

NOT — Inverse mask

mask = (1 << 5) = 0000 0000 0010 0000 ~mask = 1111 1111 1101 1111 x &= ~mask ← clears bit 5, rest unchanged In C: ~0 on signed int = -1 (all bits set) In Java: ~0 is also -1 (int always 32-bit)

SHIFTS — LOGICAL vs ARITHMETIC

Left Shift (<<): zeros fill from right — always same behavior 0b00000001 << 3 = 0b00001000 (1 × 8 = 8) 1 << n builds a mask with exactly bit n set. Right Shift (>>): LOGICAL (unsigned) fills 0; ARITHMETIC (signed) fills sign bit LOGICAL: 0b10110100 >> 2 → 0b00101101 (fills 0) ARITHMETIC: 0b10110100 >> 2 → 0b11101101 (fills sign bit) C: >> on unsigned = logical. >> on signed = implementation-defined (usually arithmetic) Java: >> = arithmetic, >>> = logical (unsigned right shift)

⚠️ Java gotcha: There is no unsigned right shift in C — you must cast to unsigned first. In Java, always use >>> when you want logical (zero-fill) shift on potentially-negative values.

OPERATOR PRECEDENCE — THE HIDDEN BUG

⚠️ Critical gotcha: Bitwise operators have lower precedence than comparison operators (==, !=, <, >). The expression x & mask == 0 is parsed as x & (mask == 0) — almost certainly not what you want.

WRONG — == binds tighter than &: if (x & mask == 0) { ... } // parsed as: x & (mask == 0) ← always 0 CORRECT — add parentheses: if ((x & mask) == 0) { ... } // what you actually mean Precedence (high → low): ~ NOT (unary — highest bitwise) << >> Shifts & AND ^ XOR | OR == != < > Comparisons ← sit ABOVE &, ^, | ← GOTCHA! && || Logical AND/OR = |= &= ^= Assignment (lowest)

Expression	Parsed as	Correct form
`x & 0xFF == 0`	`x & (0xFF == 0)` = `x & 0` = always 0	`(x & 0xFF) == 0`
`x \| y > 0`	`x \| (y > 0)` — adds 0 or 1 to x	`(x \| y) > 0`
`a ^ b == c`	`a ^ (b == c)` — XORs with boolean	`(a ^ b) == c`
`flags & FLAG_A != 0`	`flags & (FLAG_A != 0)` = `flags & 1`	`(flags & FLAG_A) != 0`

✅ Rule: Always wrap every bitwise sub-expression in its own parentheses when mixing with comparisons or logical operators.

THE 4 CORE MASK OPERATIONS

Operation	Formula	Effect
SET	`x \|= (1 << n)`	Force bit n to 1
CLEAR	`x &= ~(1 << n)`	Force bit n to 0
TOGGLE	`x ^= (1 << n)`	Flip bit n
CHECK	`(x >> n) & 1`	Read bit n — returns 0 or 1

uint16_t x = 0b0000010011010010; // = 1234 x |= (1 << 7); // SET bit 7 x &= ~(1 << 6); // CLEAR bit 6 x ^= (1 << 4); // TOGGLE bit 4 int b3 = (x >> 3) & 1; // CHECK bit 3 → 0 or 1

MULTI-BIT MASK CONSTRUCTION

Formula: mask for width bits starting at pos

mask = ((1 << width) - 1) << pos

Goal: Build mask for bits 13, 14, 15 (3 bits wide, starting at bit 13) Step 1 — (1 << 3) = 0b1000 Step 2 — subtract 1 = 0b0111 ← 3 consecutive ones Step 3 — shift to pos 13: Bit: 15 14 13 12 11 ... 0 mask: 1 1 1 0 0 ... 0 = 0xE000 mask = ((1 << 3) - 1) << 13 = 0xE000

PATTERN INSERTION — READ/WRITE A BITFIELD

3-Step: Clear target bits → OR in new pattern

Goal: Replace bits 15–13 with pattern 0b110

x = 1234 = 0000 0100 1101 0010 (bits 15,14,13 = 000) STEP 1 — Build mask: mask = ((1<<3)-1) << 13 = 0xE000 STEP 2 — Clear target: x & ~mask = 0000 0100 1101 0010 STEP 3 — OR in pattern: pattern = 0b110 << 13 = 0xC000 result = 1100 0100 1101 0010 bits 15,14,13 = 1,1,0 ✓

uint16_t x = 1234; uint16_t mask = ((1 << 3) - 1) << 13; // 0xE000 — covers bits 13,14,15 uint16_t pattern = 0b110 << 13; // 0xC000 x = (x & ~mask) | pattern; // Result = 0xC4D2

EXTRACT A BITFIELD

// Extract width bits starting at position start // Formula: (x >> start) & ((1 << width) - 1) int start = 13, width = 3; uint16_t field = (x >> start) & ((1 << width) - 1); // Extracts bits 13–15 as a standalone 0–7 value

⚠️ C gotcha: ~mask on a 16-bit value may sign-extend to 32-bit. Cast explicitly: (uint16_t)(~mask) to be safe in mixed-width expressions.

BITMASK AS A SET — UNION, INTERSECTION, DIFFERENCE

A bitmask of N bits represents a subset of N elements

Bit i = 1 means element i is in the set. Bit i = 0 means it is absent. This abstraction drives subset-sum DP, scheduling, permutation states, and flag systems.

Set Operation	Formula	Meaning
Union	`A \| B`	Elements in A or B (or both)
Intersection	`A & B`	Elements in both A and B
Difference A − B	`A & ~B`	Elements in A but not in B
Complement	`~A & FULL`	All elements NOT in A (where FULL = (1<<N)−1)
Is subset?	`(A & B) == A`	Every element of A is also in B
Is empty?	`A == 0`	Set has no elements
Full set (N bits)	`(1 << N) - 1`	All N elements present
Add element i	`A \| (1 << i)`	Include element i in A
Remove element i	`A & ~(1 << i)`	Exclude element i from A

// 4 tasks {T0, T1, T2, T3} mapped to bits 0–3 int A = 0b0101; // {T0, T2} selected int B = 0b0110; // {T1, T2} selected int u = A | B; // 0b0111 = {T0,T1,T2} — union int i = A & B; // 0b0100 = {T2} — intersection int d = A & ~B; // 0b0001 = {T0} — A minus B int all = (1 << 4) - 1; // 0b1111 = {T0,T1,T2,T3} — full set // Iterate all non-empty subsets of set S (classic DP trick): for (int sub = S; sub > 0; sub = (sub - 1) & S) { // process subset 'sub' — covers all 2^|S| non-empty subsets }

📌 Subset enumeration: sub = (sub−1) & S iterates all non-empty subsets of S efficiently. This is the cornerstone of bitmask DP for problems like TSP, task scheduling, and subset-sum variants.

THE CANONICAL BIT TRICKS

🟢 Is Power of Two?

x = 16 = 0001 0000 x-1 = 15 = 0000 1111 x&(x-1) = 0000 0000 ← zero! Non-power (20): 0001 0100 19: 0001 0011 AND: 0001 0000 ← non-zero

bool isPow2(int x) { return x > 0 && (x & (x-1)) == 0; }

🔴 Clear Lowest Set Bit

x = 1011 0100 x-1 = 1011 0011 (borrow cascades up) x&(x-1)= 1011 0000 ← lowest set bit gone

// Brian Kernighan bit count int countBits(int x) { int c = 0; while (x) { x &= (x-1); c++; } return c; // O(set bits) not O(32) }

🔵 Isolate Lowest Set Bit

x = 1011 0100 -x = 0100 1100 (two's complement: ~x+1) x&(-x)= 0000 0100 ← only lowest bit remains Why: -x flips all bits above the lowest 1, leaves the 1 intact; AND zeros the rest.

int lowest = x & (-x);

🔶 Find MSB Position

// GCC built-in (fastest) int msb = 31 - __builtin_clz(x); // clz = count leading zeros // Portable — bit-smearing uint32_t smear(uint32_t x) { x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; return x - (x >> 1); }

MORE CLASSIC ONE-LINERS

// XOR swap — no temp variable (AVOID in production) a ^= b; b ^= a; a ^= b; // Fails if a and b are same memory location! // Branchless sign detection (signed 32-bit) int sign = x >> 31; // 0 if positive, -1 (0xFFFFFFFF) if negative // Branchless absolute value int mask = x >> 31; // all-zeros or all-ones int abs = (x + mask) ^ mask; // For x=-5: (-5-1)^0xFFFFFFFF = 5 // Rotate left k bits (32-bit) uint32_t rotl(uint32_t x, int k) { return (x << k) | (x >> (32-k)); } uint32_t rotr(uint32_t x, int k) { return (x >> k) | (x << (32-k)); }

📌 Rotations wrap bits around instead of discarding them — unlike shifts. Essential in cryptography (AES, SHA) and networking checksums. The formula (x << k) | (x >> (32-k)) is what most compilers recognize and compile to a single ROL / ROR instruction.

🟣 Find Position of Lowest Set Bit

x = 1011 0100 x & (-x) = 0000 0100 ← isolates bit 2 __builtin_ctz(x) = count trailing zeros = bit position ctz(0b10110100) = 2 ← position of lowest '1'

// GCC/Clang — compiles to single BSF/TZCNT on x86 int pos = __builtin_ctz(x); // count trailing zeros (C) // Java equivalent int pos = Integer.numberOfTrailingZeros(x); // Portable C (no builtins) — use isolate-then-popcount: int pos = __builtin_popcount((x & -x) - 1);

🯤 Next Power of Two

Goal: round x UP to nearest power of 2. Key: smear all bits below MSB, then add 1. x = 0b0101 1100 (92) After smearing: 0b0111 1111 (127) Add 1: 0b1000 0000 (128) ✓

uint32_t nextPow2(uint32_t x) { if (x == 0) return 1; x--; // handle exact powers of 2 x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; return x + 1; } // nextPow2(100)=128 nextPow2(128)=128 nextPow2(0)=1

UNIVERSAL BITFIELD FORMULA REFERENCE

EXTRACT a bitfield (read bits [start .. start+width-1]): field = (value >> start) & ((1 << width) - 1) INSERT a bitfield (write pattern into bits [start .. start+width-1]): mask = ((1 << width) - 1) << start value = (value & ~mask) | ((pattern & ((1 << width)-1)) << start) CHECK if any bit in range is set: (value & mask) != 0 COUNT set bits in range: popcount((value >> start) & ((1 << width) - 1))

REVERSE BITS — DIVIDE AND CONQUER

uint32_t reverseBits(uint32_t n) { n = ((n & 0xFFFF0000) >> 16) | ((n & 0x0000FFFF) << 16); // swap halves n = ((n & 0xFF00FF00) >> 8) | ((n & 0x00FF00FF) << 8); // swap bytes n = ((n & 0xF0F0F0F0) >> 4) | ((n & 0x0F0F0F0F) << 4); // swap nibbles n = ((n & 0xCCCCCCCC) >> 2) | ((n & 0x33333333) << 2); // swap pairs n = ((n & 0xAAAAAAAA) >> 1) | ((n & <span class="x55555555</span>) << 1); // swap bits return n; }

Strategy: Divide-and-Conquer Swap

Each level swaps progressively smaller chunks: 16-bit halves → 8-bit bytes → 4-bit nibbles → 2-bit pairs → individual bits. Five passes, no loops. The masks are alternating patterns: 0xAAAAAAAA = 1010...1010, 0x55555555 = 0101...0101.

GRAY CODE

// Binary → Gray code (consecutive values differ by exactly 1 bit) int toGray(int n) { return n ^ (n >> 1); } // Gray code → Binary int fromGray(int g) { int n = 0; for (; g > 0; g >>= 1) n ^= g; return n; }

Gray Code — consecutive values differ by exactly 1 bit: Dec │ Binary │ Gray ────┼────────┼────── 0 │ 000 │ 000 1 │ 001 │ 001 ← 1 bit different 2 │ 010 │ 011 ← 1 bit different 3 │ 011 │ 010 ← 1 bit different 4 │ 100 │ 110 ← 1 bit different 5 │ 101 │ 111 6 │ 110 │ 101 7 │ 111 │ 100

⚡ Real-World Context: DPDK rte_mbuf.ol_flags

ol_flags is a 64-bit integer on every DPDK packet with 30+ defined flag bits controlling TX/RX offloads, tunneling, timestamps, and security metadata. Every packet in a SASE dataplane engine uses exactly this pattern.

// Define flag constants — each is a power-of-2 bit position #define PKT_RX_VLAN (1 << 0) // bit 0 — VLAN stripped #define PKT_RX_RSS_HASH (1 << 1) // bit 1 — RSS hash valid #define PKT_TX_OFFLOAD_IP (1 << 2) // bit 2 — IP checksum offload #define PKT_TX_TCP_SEG (1 << 3) // bit 3 — TCP segmentation uint64_t ol_flags = 0; ol_flags |= PKT_RX_VLAN | PKT_RX_RSS_HASH; // Set multiple flags if (ol_flags & PKT_TX_TCP_SEG) { /* TSO path */ } // Check one flag ol_flags &= ~PKT_RX_VLAN; // Clear a flag uint64_t both = PKT_RX_VLAN | PKT_RX_RSS_HASH; if ((ol_flags & both) == both) { /* both set */ } // Check if BOTH set

IPv4 HEADER BITFIELDS

IPv4 Flags + Fragment Offset (16-bit word at header offset 6): Bit: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ┌───┬───┬───┬────────────────────────────┐ │ 0 │DF │MF │ Fragment Offset (13 bits) │ └───┴───┴───┴────────────────────────────┘ Reserved More Frags Don't Fragment Extract Fragment Offset: offset = ntohs(ip->frag_off) & 0x1FFF Check Don't Fragment bit: df = ntohs(ip->frag_off) & 0x4000

C BITFIELD STRUCTS vs EXPLICIT MASKS

// C bitfield struct — concise but NOT portable for network protocols struct ipv4_hdr_bits { uint32_t ihl : 4; // bits 0-3 uint32_t version : 4; // bits 4-7 uint32_t ecn : 2; // bits 8-9 uint32_t dscp : 6; // bits 10-15 uint32_t total_len : 16; // bits 16-31 };

⚠️ Portability trap: Bit ordering within bytes in C structs is compiler and architecture-dependent. For network protocol parsing, always prefer explicit bit manipulation with masks and shifts over bitfield structs. The struct version above is readable for documentation but unsafe for cross-platform packet parsing.

TCP FLAGS FIELD

TCP Flags — 9 bits at TCP header offset byte 13: Bit: 8 7 6 5 4 3 2 1 0 ┌────┬────┬────┬────┬────┬────┬────┬────┬────┐ │ NS │CWR │ECE │URG │ACK │PSH │RST │SYN │FIN │ └────┴────┴────┴────┴────┴────┴────┴────┴────┘ Common patterns: SYN = 0x002 ← connection initiation SYN+ACK = 0x012 ← server handshake reply ACK = 0x010 ← data acknowledgement FIN+ACK = 0x011 ← graceful connection close RST = 0x004 ← abrupt reset / port closed

#define TCP_FIN 0x01 #define TCP_SYN 0x02 #define TCP_RST 0x04 #define TCP_PSH 0x08 #define TCP_ACK 0x10 #define TCP_URG 0x20 #define TCP_ECE 0x40 #define TCP_CWR 0x80 uint8_t flags = tcp_hdr->th_flags; if (flags & TCP_SYN) { // new connection } if ((flags & (TCP_SYN | TCP_ACK)) == (TCP_SYN | TCP_ACK)) { // handshake } if (flags & TCP_RST) { // reset — drop state } if (flags & TCP_FIN) { // graceful close }

VLAN 802.1Q TAG — COMPLETE EXTRACTION

802.1Q VLAN Tag — 16 bits (follows EtherType 0x8100): Bit: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ┌──────────┬───┬──────────────────────────────────────┐ │ PCP (3) │DEI│ VID (12 bits) │ └──────────┴───┴──────────────────────────────────────┘ Priority Drop VLAN ID (0–4095) Code Point Elig. PCP = bits [15:13] — 3-bit 802.1p QoS priority (0–7, 7 = highest) DEI = bit [12] — Drop Eligible Indicator VID = bits [11:0] — VLAN ID (VID 0=untagged, 4095=reserved)

uint16_t tag = ntohs(*(uint16_t*)(pkt + 14)); // after 14-byte Ethernet hdr uint8_t pcp = (tag >> 13) & 0x07; // bits 15–13: priority 0–7 uint8_t dei = (tag >> 12) & 0x01; // bit 12: drop eligible uint16_t vid = tag & 0x0FFF; // bits 11–0: VLAN ID 0–4095 // Rebuild tag from fields: tag = ((uint16_t)pcp << 13) | ((uint16_t)dei << 12) | (vid & 0x0FFF);

📌 VID 0 = priority-tagged (no VLAN). VID 4095 = reserved. Valid user VLANs: 1–4094. Always call ntohs() before extracting fields from a live packet.

PATTERN RECOGNITION CHEAT SHEET

When You See...	Think...	Key Formula
Find unique / missing element	XOR pair cancellation	`result ^= each element`
No extra space, O(n)	XOR trick	`x ^ x = 0, x ^ 0 = x`
Is power of two?	Clear lowest set bit	`x > 0 && (x & (x-1)) == 0`
Count set bits	Brian Kernighan	`x &= x-1` in loop
Set/clear/toggle one bit	OR / AND~mask / XOR	`1 << n` as mask
Extract N bits from position P	Shift + mask	`(x >> P) & ((1<<N)-1)`
Insert pattern at position P	Clear-then-OR	`(x & ~mask) \| (pat << P)`
Consecutive elements differ by 1 bit	Gray code	`n ^ (n >> 1)`
Hamming distance	popcount of XOR	`__builtin_popcount(x^y)`

CORE INTERVIEW SOLUTIONS

🧠 Find Single Number — LeetCode 136

Every element appears twice except one. XOR cancels pairs: x^x=0.

int singleNumber(int[] nums) { int result = 0; for (int n : nums) result ^= n; return result; // O(n) time · O(1) space }

🧠 Find Missing Number

Array contains 0..n with one missing. XOR all indices with all values.

int missingNumber(int[] nums) { int xor = nums.length; // start with n for (int i = 0; i < nums.length; i++) xor ^= i ^ nums[i]; // paired indices and values cancel return xor; }

🧠 Two Elements Appearing Once (others twice)

XOR all → get x^y. Use rightmost differing bit to partition array into two groups.

int[] twoSingleNumbers(int[] nums) { int xorAll = 0; for (int n : nums) xorAll ^= n; // xorAll = x ^ y int diffBit = xorAll & (-xorAll); // rightmost bit where x≠y int a = 0, b = 0; for (int n : nums) { if ((n & diffBit) != 0) a ^= n; // group with that bit set else b ^= n; // group without } return new int[]{a, b}; }

🧠 Count Bits for 0..n — LeetCode 338

DP: dp[i] = dp[i >> 1] + (i & 1). Removes LSB and adds it back.

int[] countBits(int n) { int[] dp = new int[n + 1]; for (int i = 1; i <= n; i++) dp[i] = dp[i >> 1] + (i & 1); // O(n) return dp; }

MORE INTERVIEW PATTERNS

🧠 Single Number II — LeetCode 137 (each element appears 3 times)

XOR cancels pairs but not triplets. Use two variables (ones, twos) to track per-bit modulo-3 count. The single number accumulates in ones.

int singleNumberII(int[] nums) { int ones = 0, twos = 0; for (int n : nums) { ones = (ones ^ n) & ~twos; // add to ‘seen once’ if not in ‘seen twice’ twos = (twos ^ n) & ~ones; // add to ‘seen twice’ if not in ‘seen once’ } return ones; // bits that appeared exactly 1 time (mod 3) }

🧠 Hamming Distance — LeetCode 461 & Total Hamming Distance LC 477

Number of positions where two values differ = popcount of their XOR.

// Single pair — O(1) int hammingDistance(int x, int y) { return Integer.bitCount(x ^ y); // Java; C: __builtin_popcount(x^y) } // Total Hamming distance across all pairs in array — O(32n) int totalHammingDistance(int[] nums) { int total = 0; for (int bit = 0; bit < 32; bit++) { int ones = 0; for (int n : nums) ones += (n >> bit) & 1; total += ones * (nums.length - ones); // pairs that differ at this bit } return total; }

🧠 Bitwise AND of Range [m, n] — LeetCode 201

AND of all numbers in [m, n]. Any bit that flips across the range becomes 0. Only the common prefix of m and n survives.

int rangeBitwiseAnd(int m, int n) { int shift = 0; while (m != n) { // right-shift until equal = find common prefix m >>= 1; n >>= 1; shift++; } return m << shift; // restore prefix to original bit position } // Example: [5,7] = 101 & 110 & 111 = 100 → common prefix = 1, shift=2 → 4

OPERATOR SUMMARY

Operator	Symbol	Effect	Example
AND	`&`	1 only if both 1 — masks/filters	`0b1100 & 0b1010 = 0b1000`
OR	`\|`	1 if at least one 1 — sets bits	`0b1100 \| 0b0011 = 0b1111`
XOR	`^`	1 if bits differ — toggles/compares	`0b1100 ^ 0b1010 = 0b0110`
NOT	`~`	Flips all bits	`~0b00001111 = 0b11110000`
Left Shift	`<<`	Shift left, fill 0 from right	`0b0001 << 3 = 0b1000`
Right Shift	`>>`	Shift right (logical/arithmetic)	`0b1000 >> 2 = 0b0010`

CORE PATTERNS CHEATSHEET

Goal	Code
Set bit n	`x \|= (1 << n)`
Clear bit n	`x &= ~(1 << n)`
Toggle bit n	`x ^= (1 << n)`
Check bit n	`(x >> n) & 1`
Is power of two?	`x > 0 && (x & (x-1)) == 0`
Clear lowest set bit	`x & (x - 1)`
Isolate lowest set bit	`x & (-x)`
Count set bits	`while(x) { x &= x-1; count++; }`
Extract field [p, p+w)	`(x >> p) & ((1<<w)-1)`
Insert pattern at p	`(x & ~mask) \| (pat << p)`
Rotate left k	`(x << k) \| (x >> (32-k))`
XOR unique element	`result=0; for n: result ^= n`

COMMON MASKS

Mask	Hex	Purpose
Lower 4 bits (nibble)	`0x0F`	Extract low nibble
Upper 4 bits	`0xF0`	Extract high nibble
Lower byte	`0x00FF`	Byte 0
Upper byte (16-bit)	`0xFF00`	Byte 1
Lower 16 bits	`0x0000FFFF`	Low word
Upper 16 bits	`0xFFFF0000`	High word
All ones (32-bit)	`0xFFFFFFFF`	Full mask / -1 (signed)
Alternating 01...	`0x55555555`	Even bits
Alternating 10...	`0xAAAAAAAA`	Odd bits

MASTERY CHECKLIST

Can convert between binary, hex, and decimal in both directions
Can represent negative numbers using two's complement (flip + add 1)
Can apply all 4 core mask operations: SET, CLEAR, TOGGLE, CHECK
Can build a multi-bit mask for any width/position using ((1<<w)-1)<<p
Can insert a bit pattern using 3-step clear-then-OR
Can extract a bitfield using shift-then-mask
Can explain why x & (x-1) detects powers of two
Can explain why x & (-x) isolates the lowest set bit
Can solve Single Number (LeetCode 136) using XOR in one pass
Can solve Count Bits (LeetCode 338) using DP in O(n)
Can write rotate-left / rotate-right for 32-bit unsigned
Can explain the difference between arithmetic and logical right shift and when each applies

SHIFT-BASED MULTIPLY AND DIVIDE

Shifts are cheaper than multiply/divide on many architectures

Left shift by n = multiply by 2ⁿ. Right shift by n = divide by 2ⁿ (integer, rounds toward zero for unsigned). Compilers do this automatically, but recognising the pattern matters for bitfield arithmetic.

// Multiply by powers of 2 int x8 = n << 3; // n * 8 int x12 = (n << 3) + (n << 2); // n*8 + n*4 = n*12 // Divide by powers of 2 (unsigned — logical shift) uint32_t d4 = x >> 2; // x / 4 (exact if divisible, floors otherwise) // Signed division: arithmetic right shift rounds toward -inf, not zero! // For correct signed floor-divide by power of 2 use: int div4 = (x + 3) >> 2; // round toward zero for positive x

MEMORY ALIGNMENT — THE MOST IMPORTANT TRICK IN SYSTEMS

Why alignment matters

Cache lines (64 bytes), DPDK buffers (128 bytes), DMA descriptors, SIMD vectors — all require naturally aligned addresses. Mis-aligned access causes bus errors on strict architectures and performance penalties everywhere else.

A pointer p is aligned to A bytes when: (p & (A-1)) == 0 (Works only when A is a power of 2 — then A-1 is an all-ones mask) Round DOWN to alignment A: aligned = ptr & ~(A-1) Round UP to alignment A: aligned = (ptr + A - 1) & ~(A-1) Example: A=64, ptr=0x1003 Round down: 0x1003 & ~63 = 0x1003 & 0xFFFFFFC0 = 0x10C0 Round up: (0x1003 + 63) & ~63 = 0x1042 & 0xFFFFFFC0 = 0x1040

// Check if ptr is aligned to A (A must be power of 2) bool isAligned(uintptr_t ptr, size_t A) { return (ptr & (A - 1)) == 0; } // Round address up to next multiple of A uintptr_t alignUp(uintptr_t ptr, size_t A) { return (ptr + A - 1) & ~(A - 1); } // Round address down uintptr_t alignDown(uintptr_t ptr, size_t A) { return ptr & ~(A - 1); } // Verify at runtime: assert(((uintptr_t)buf & 63) == 0); // must be 64-byte aligned

MODULO BY POWER OF TWO

x % (2^n) == x & (2^n - 1) — when x is unsigned

This is why ring buffers, hash tables, and DPDK use power-of-2 sizes: the wrapping step becomes a single AND instead of a costly division.

// Expensive: requires integer divide instruction int idx = pos % capacity; // UB if capacity is 0, slow in any case // Cheap: single AND — valid ONLY when capacity is a power of 2 int idx = pos & (capacity - 1); // Ring buffer wrap (DPDK-style): uint32_t mask = size - 1; // pre-computed once at init head = (head + 1) & mask; // wrap head cheaply tail = (tail + 1) & mask; // wrap tail cheaply // HASH bucket index (power-of-2 table size): uint32_t bucket = hash(key) & (TABLE_SIZE - 1);

⚠️ x & (n-1) gives wrong results if n is not a power of two. Always validate with assert((n & (n-1)) == 0) at initialisation.

POPULATION COUNT (HAMMING WEIGHT)

Count the number of 1-bits in an integer

Used in checksums, error correction, set cardinality checks, and cryptography. Hardware instruction on modern CPUs; software fallback uses the divide-and-conquer approach.

// Fastest — single hardware instruction on x86 (SSE4.2+) int cnt = __builtin_popcount(x); // C/GCC (32-bit) int cnt = __builtin_popcountll(x); // C/GCC (64-bit) int cnt = Integer.bitCount(x); // Java // Brian Kernighan — O(set bits), no hardware needed int popcount(uint32_t x) { int c = 0; while (x) { x &= x - 1; c++; } // clear lowest set bit each iteration return c; } // Parallel popcount — O(log bits), no loop uint32_t popcount32(uint32_t x) { x = x - ((x >> 1) & 0x55555555); // count pairs x = (x & 0x33333333) + ((x >> 2) & 0x33333333); // count nibbles x = (x + (x >> 4)) & 0x0F0F0F0F; // count bytes return (x * 0x01010101) >> 24; // sum all byte counts }

BITMASK DP — SUBSETS AND STATE COMPRESSION

When to use Bitmask DP

Bitmask DP represents state as a bitmask where each bit = one element in-or-out. Applicable when N ≤ 20 (so 2^N states fit in memory). Common problems: TSP, assignment, task scheduling, game states.

Pattern: dp[mask] = optimal cost/result considering exactly the elements in mask Transitions: For each mask, try adding element i (not yet in mask): newmask = mask | (1 << i) dp[newmask] = min/max(dp[newmask], dp[mask] + cost(i)) Base case: dp[0] = 0 Answer: dp[(1<<N)-1] (full set)

// Minimum cost to visit all N nodes (TSP-style DP) // dp[mask][i] = min cost to visit nodes in mask, ending at node i int[][] dp = new int[1 << N][N]; // Fill with INF: for (int[] row : dp) Arrays.fill(row, Integer.MAX_VALUE / 2); dp[1][0] = 0; // start at node 0, mask = 0b0001 for (int mask = 1; mask < (1 << N); mask++) { for (int u = 0; u < N; u++) { if ((mask & (1 << u)) == 0) continue; // u not in mask for (int v = 0; v < N; v++) { if ((mask & (1 << v)) != 0) continue; // v already visited int newmask = mask | (1 << v); dp[newmask][v] = Math.min(dp[newmask][v], dp[mask][u] + dist[u][v]); } } } // Enumerate subsets of mask in O(3^N) total: for (int sub = mask; sub > 0; sub = (sub - 1) & mask) { /*..*/ }

SWAR — SIMD WITHIN A REGISTER

Process multiple values simultaneously with integer arithmetic

SWAR packs multiple sub-word values into a single 64-bit register and uses carefully chosen masks to prevent carry-propagation between adjacent logical fields. Enables 8 parallel byte operations with a single integer add.

Example: add 1 to each of 8 bytes packed in a uint64_t 64-bit word: [B7][B6][B5][B4][B3][B2][B1][B0] (8 bytes) Naive: requires 8 separate additions SWAR: use overflow-guard mask to prevent carry from byte N into byte N+1 Mask 0x7F7F7F7F7F7F7F7F: Clear the MSB of every byte (guard bit) Allows carry within each byte but blocks inter-byte carry propagation

uint64_t swar_add1_to_each_byte(uint64_t v) { // Add 1 to each byte field without inter-byte carry uint64_t lo = v & 0x7F7F7F7F7F7F7F7FULL; // clear MSB of each byte lo = lo + 0x0101010101010101ULL; // add 1 to each byte (no overflow into guard) uint64_t hi = (v ^ lo) & 0x8080808080808080ULL; // recover original MSBs return lo | hi; } // Population count via SWAR (classic Hacker's Delight) uint64_t popcount64_swar(uint64_t x) { x = x - ((x >> 1) & 0x5555555555555555ULL); x = (x & 0x3333333333333333ULL) + ((x >> 2) & 0x3333333333333333ULL); x = (x + (x >> 4)) & 0x0F0F0F0F0F0F0F0FULL; return (x * 0x0101010101010101ULL) >> 56; }

📌 SWAR is used in high-performance parsers, DPDK, network classification engines, and wherever vectorized operations are needed without SIMD intrinsics. The key discipline: choose masks so the "guard bit" pattern prevents fields from bleeding into each other.

← DSA Hub ↑ DSA Roadmap ⚙️ Systems Problems 🐛 Debugging Guide Practice Problems →