VPP P2A - vppinfra Core Library

VPP MASTERY · PHASE 2A · WEEKS 4–5

🧱 vppinfra - Core Library

vec · pool · bihash · clib_mem · format/unformat · ring buffers · timers

src/vppinfra/ C macros pool.h bihash_8_8.h vec.h

WHY VPPINFRA EXISTS

📚

vppinfra is VPP's Standard Library

FOUNDATION

vppinfra replaces the C standard library for VPP's dataplane. It provides memory management, data structures, I/O formatting, and timers that are specifically designed for the demands of high-performance packet processing: deterministic allocation, cache-line awareness, zero-copy design, and macro-heavy APIs for maximum inlining.

Every plugin and node you write will use vppinfra primitives. Understanding these data structures deeply - not just their API but their memory layout - is what separates engineers who write correct VPP code from those who write fast, correct VPP code.

vec - Dynamic Array · vec.h

Heap-allocated array with a hidden header storing length and capacity. The pointer points to element 0, not the header - fully compatible with C array indexing. Use everywhere you'd use std::vector or realloc-based arrays.

vec_add1 · vec_add · vec_len · vec_free · vec_foreach

pool - Fixed-Size Object Allocator · pool.h

Pre-allocated array of fixed-size objects with a free-list bitmap. O(1) alloc/free. Objects are addressed by index - never store pointers to pool elements (pool can realloc). This is how sessions, interfaces, and FIB entries are stored.

pool_get · pool_put · pool_elt_at_index · pool_foreach

bihash - Bounded-Index Hash · bihash_8_8.h

Two-level hash table with bounded worst-case lookup: a bucket array (L1) pointing to fixed-size pages (L2) of key-value pairs. Designed for concurrent read with a single writer. Used in NAT, ACL, FIB, session tables - essentially every fast-path lookup.

BV(clib_bihash_add_del) · BV(clib_bihash_search)

clib_mem - Memory Management · mem.h

Wrapper around dlmalloc with NUMA awareness and heap introspection. Supports multiple heaps (main heap, per-NUMA heaps). clib_mem_alloc_aligned guarantees cache-line alignment. Never use malloc/free in VPP code.

clib_mem_alloc · clib_mem_free · clib_mem_alloc_aligned

format/unformat · format.h

Extensible printf/scanf replacement. format returns a u8 * vec (not a fixed buffer). Custom format functions registered via %U. All VPP CLI output and packet trace use this - learn it to write readable trace functions.

format(0, "%U", format_ip4_address, &addr)

clib_fifo / clib_ring · fifo.h

FIFO and ring-buffer primitives built on top of vec. Used for work queues, event rings, and inter-thread communication at the framework level. Power-of-2 sized for fast modulo via bitmask.

clib_fifo_add1 · clib_fifo_sub1 · clib_ring_new

⚙️ DPDK PARALLEL - Data Structure Mapping

vec ≈ rte_malloc + manual realloc tracking - but with automatic growth and a type-safe foreach macro
pool ≈ rte_mempool for fixed-size objects - but pool objects stay in-place (no dequeue/enqueue), addressed by index, not pointer. VPP sessions are stored in pools exactly like DPDK mbufs in a mempool
bihash ≈ a hash table you'd build over rte_hash - but bihash is specifically designed for read-mostly concurrent access without locking on the read path
clib_mem ≈ rte_malloc - both support NUMA-local allocation and cache-line alignment. VPP uses clib_mem everywhere; never mix with rte_malloc inside VPP

VEC - DYNAMIC ARRAY (src/vppinfra/vec.h)

📐

Memory Layout - The Hidden Header

INTERNALS

The key to understanding vec is its memory layout. The header lives before the data in memory, so the pointer you hold points directly to element[0]. This makes vec transparent to any C code expecting a plain array.

/* Memory layout of a vec_t */
+──────────────────────────────────────────────────────+
|  vec_header_t   |  element[0]  |  element[1]  | ...  |
|  len   (u32)    |              |              |       |
|  dlmalloc_hdr   |              |              |       |
+──────────────────────────────────────────────────────+
                  ↑
            your pointer lives here

/* The pointer IS the array - C-array compatible */
u32 *my_vec = 0;          /* NULL == empty vec, NOT uninitialised */
vec_add1(my_vec, 42);     /* grows by 1, may realloc */
vec_add1(my_vec, 99);
/* my_vec[0] == 42, my_vec[1] == 99 - plain array access */
u32 len = vec_len(my_vec); /* == 2 */

NULL is a valid empty vec - always initialise to 0, never to an uninitialised pointer
Never hold pointers to vec elements - vec_add1 may realloc, invalidating all element addresses. Hold indices instead
vec_free does not NULL the pointer - use vec_free(v); v = 0; to be safe

🛠️

Complete vec API Reference

API

Function / Macro	Signature / Usage	Notes
`vec_add1(v, e)`	Append single element `e` to vec `v`	May realloc. v updated in-place (macro takes address)
`vec_add(v, p, n)`	Append `n` elements from array `p`	Bulk append - faster than N × vec_add1
`vec_add2(v, p, n)`	Reserve space for `n` elements, return pointer to first	Use when you want to write directly into the vec
`vec_len(v)`	Returns `u32` element count. Returns 0 for NULL vec	Safe to call on NULL - does not crash
`vec_bytes(v)`	Returns total byte size of vec data region	vec_len(v) * sizeof(v[0])
`vec_free(v)`	Free vec memory	Does NOT set v=0. Do that manually
`vec_reset_length(v)`	Set length to 0 without freeing memory	Reuse allocation - faster than free+realloc
`vec_foreach(var,v)`	Iterate: `vec_foreach(ep, entries) { ... }`	var is a pointer to each element
`vec_foreach_index(i,v)`	Iterate by index: `i` goes 0..vec_len(v)-1	When you need the index inside the loop
`vec_dup(v)`	Return a copy of the vec	Heap-allocates a new vec with same content
`vec_validate(v, i)`	Ensure vec is at least `i+1` elements, zero-filling new slots	Use to grow to a known index safely
`vec_validate_init_empty(v,i,val)`	Like vec_validate but fills with `val` instead of 0	Useful for flag arrays initialised to ~0
`vec_insert(v,n,i)`	Insert `n` zero elements at position `i`	O(n) - shifts elements right
`vec_del1(v,i)`	Delete element at `i`, replacing with last element	O(1) - order NOT preserved
`_vec_len(v)`	Direct header field access - no NULL check	Use only when v is guaranteed non-NULL
`vec_set_len(v,n)`	Force-set length field	Advanced: use after manual direct writes to vec memory

/* Typical plugin usage: building a list of sw_if_index values */
u32 *sw_if_indices = 0;   /* NULL = empty vec */

/* Collect all interfaces matching a condition */
pool_foreach(hw, im->hw_interfaces) {
    if (hw->flags & VNET_HW_INTERFACE_FLAG_LINK_UP)
        vec_add1(sw_if_indices, hw->sw_if_index);
}

/* Process them */
u32 *si;
vec_foreach(si, sw_if_indices) {
    vnet_sw_interface_t *swif = vnet_get_sw_interface(vnm, *si);
    /* ... do something with swif */
}

/* Reuse without realloc */
vec_reset_length(sw_if_indices);

/* Or free completely */
vec_free(sw_if_indices);
sw_if_indices = 0;

⚠️ Critical pitfall - storing pointers to vec elements: Any operation that can grow the vec (vec_add1, vec_add, vec_validate) may call realloc, which moves the entire array to a new address. Any pointer you saved to an element is now a dangling pointer. Always store the index into the vec, not a pointer to the element.

POOL - FIXED-SIZE OBJECT ALLOCATOR (src/vppinfra/pool.h)

🏊

Pool Memory Layout and Design

INTERNALS

A pool is a pre-allocated contiguous array of fixed-size objects. It maintains a free-list as a bitmap of free slots. Allocation (pool_get) finds the first free bit and marks it used - O(1). Free (pool_put) marks the slot free again - O(1). Crucially, object addresses are stable as long as the pool does not grow - the pool never moves existing elements on alloc.

/* Pool memory model */
pool = [  obj[0]  |  obj[1]  |  obj[2]  |  obj[3]  | ... ]
         (in use)    (FREE)     (in use)    (FREE)

free_bitmap = 0b...1010   (bits set = free slots)

/* pool_get: find lowest set bit, clear it, return pointer */
/* pool_put: set the bit at this index                     */

/* Declaration */
typedef struct {
    u32  conn_id;
    u32  sw_if_index;
    ip4_address_t src, dst;
    u8   state;
} my_session_t;

my_session_t *session_pool = 0;   /* NULL = empty pool */

🛠️

Complete pool API

API

Macro / Function	Usage	Returns / Effect
`pool_get(P, E)`	Allocate one element from pool P, set pointer E	E points to newly allocated element (zero-filled)
`pool_get_aligned(P, E, align)`	pool_get with alignment guarantee	Use for SIMD-aligned structs
`pool_put(P, E)`	Return element pointed to by E back to pool	Marks slot free. E still points to valid memory until next pool_get
`pool_put_index(P, i)`	Free by index rather than pointer	Equivalent to pool_put(P, pool_elt_at_index(P,i))
`pool_elt_at_index(P, i)`	Return pointer to element at index i	No bounds check - undefined if i is free
`pool_is_free_index(P, i)`	Return 1 if slot i is free	Always check before dereferencing by index
`pool_elts(P)`	Count of currently in-use elements	Returns u32
`pool_len(P)`	Capacity - total allocated slots (used + free)	Returns u32
`pool_foreach(E, P)`	Iterate over all in-use elements	E is a pointer to each live element
`pool_foreach_index(i, P)`	Iterate by index over all in-use elements	i is the index of each live element
`pool_free(P)`	Free the entire pool memory	Frees backing store, does not set P=0
`pool_validate_index(P, i)`	Assert that index i is valid (not free)	Debug helper - crashes on invalid access
`pool_alloc(P, n)`	Pre-allocate pool capacity for n objects	Avoids repeated realloc during initial population

/* Complete example: per-flow session pool */
my_session_t *sessions = 0;        /* pool */
uword *session_by_key = 0;         /* hash: key → pool index */

/* Create a session */
my_session_t *s;
pool_get_zero(sessions, s);        /* allocate + zero-fill */
u32 session_index = s - sessions;  /* derive index from pointer arithmetic */
s->conn_id = next_conn_id++;
s->state = SESSION_STATE_INIT;

/* Store in hash by key for fast lookup */
hash_set(session_by_key, flow_key, session_index);

/* Fast-path lookup: key → index → pointer */
uword *val = hash_get(session_by_key, flow_key);
if (val) {
    s = pool_elt_at_index(sessions, *val);
    /* s is now valid - use it */
}

/* Destroy a session */
hash_unset(session_by_key, flow_key);
pool_put(sessions, s);            /* marks slot free */

/* Walk all active sessions (e.g., for timeout sweep) */
pool_foreach(s, sessions) {
    if (now - s->last_seen > SESSION_TIMEOUT)
        expire_session(sessions, session_by_key, s);
}

💡 Deriving an index from a pointer: The idiom index = element_ptr - pool_base_ptr is idiomatic in VPP. It works because pool elements are contiguous. This index is stable even if the pool grows (existing elements don't move). Always store the index in inter-subsystem references (e.g., in a buffer's opaque field), never the pointer.

BIHASH - BOUNDED-INDEX EXTENSIBLE HASHING (src/vppinfra/bihash_*.h)

🔍

Bihash Architecture - Two-Level Lookup

INTERNALS

Bihash is VPP's primary hash table for dataplane lookups. Its design is optimised for the read-heavy, write-rare workload of packet forwarding: millions of lookups per second with occasional control-plane insertions.

/* Two-level structure */

Level 1 - Bucket Array (always in memory, fits in L2 cache):
  bucket[0]  → page pointer + lock bit
  bucket[1]  → page pointer + lock bit
  ...
  bucket[N-1]→ page pointer + lock bit

  hash(key) & (N-1) → selects bucket index

Level 2 - KV Pages (per-bucket, allocated on demand):
  page = [ kvp[0] | kvp[1] | kvp[2] | kvp[3] | ... ]
         (BIHASH_KVP_PER_PAGE entries, default 4 or 8)

Lookup:
  1. bucket_idx = hash(key) & (N-1)          O(1) - bitmask
  2. page = bucket[bucket_idx].page           O(1) - pointer deref
  3. linear scan page for matching key        O(1) - ≤8 compares

When a bucket fills (all KVP slots taken), it overflows to a chain of pages. The chain length is bounded by the load factor at init time. This gives worst-case O(chain_len) lookup - not O(n) like a chaining hash table.

The lock bit in the bucket enables a single-writer, multi-reader protocol: readers spin on the lock bit; the writer sets it, modifies the page, clears it. Readers detect inconsistency via the version counter and retry.

📦

Bihash Variants - Choosing the Right One

VARIANTS

Bihash is a template implemented via macros. The type name encodes key_size_value_size in bytes:

Type	Key	Value	Typical Use in VPP
`bihash_8_8`	8 bytes (u64)	8 bytes (u64)	NAT4 simple lookup, sw_if_index tables, ARP cache
`bihash_16_8`	16 bytes	8 bytes	NAT44 endpoint-dependent (src_ip+src_port+protocol+vrf)
`bihash_48_8`	48 bytes	8 bytes	NAT66, IPv6 session tables, 5-tuple flow tables
`bihash_24_8`	24 bytes	8 bytes	MPLS, L2 FIB with bridge domain
`bihash_40_8`	40 bytes	8 bytes	VXLAN tunnel tables (src+dst+vni)

/* Include the specific variant you need */
#include "vppinfra/bihash_8_8.h"
#include "vppinfra/bihash_template.h"  /* defines BV() macro */

/* BV() prepends the type name: BV(clib_bihash_init) → clib_bihash_8_8_init */
typedef clib_bihash_8_8_t my_hash_t;

/* Key-value pair type */
clib_bihash_kv_8_8_t kv;   /* kv.key (u64), kv.value (u64) */

/* Initialise (call once, control plane) */
clib_bihash_8_8_t h;
u32 nbuckets = 64 * 1024;   /* power of 2, tuned to expected entries */
u32 mem_bytes = 128 * 1024 * 1024;  /* 128 MB backing store */
clib_bihash_init_8_8(&h, "my-flow-table", nbuckets, mem_bytes);

⚡

Fast-Path Lookup and Insert Patterns

DATAPLANE

/* ── LOOKUP (fast path - called per packet) ── */
clib_bihash_kv_8_8_t kv;

/* Pack key: for 5-tuple flows you'd pack into 8 bytes */
kv.key = ((u64)src_addr << 32) | dst_addr;   /* example: src+dst IP */

if (PREDICT_TRUE(
    clib_bihash_search_8_8(&h, &kv, &kv) == 0))
{
    /* Hit - kv.value is the stored value (e.g. session pool index) */
    u32 session_idx = kv.value;
    my_session_t *s = pool_elt_at_index(sessions, session_idx);
    /* ... process packet using s */
}
else {
    /* Miss - new flow, create session */
    goto slow_path;
}

/* ── INSERT (slow path / control plane) ── */
clib_bihash_kv_8_8_t kv;
kv.key   = ((u64)src_addr << 32) | dst_addr;
kv.value = session_idx;                       /* pool index */
clib_bihash_add_del_8_8(&h, &kv, 1 /* is_add */);

/* ── DELETE ── */
kv.key = ((u64)src_addr << 32) | dst_addr;
kv.value = 0;                                 /* value irrelevant for delete */
clib_bihash_add_del_8_8(&h, &kv, 0 /* is_add=0 means delete */);

⚙️ DPDK PARALLEL - rte_hash vs bihash

rte_hash uses Cuckoo hashing with SIMD key comparison - excellent for fixed-size lookups but requires explicit locking for concurrent writers
bihash uses a per-bucket lock bit so the read path is lock-free - readers never acquire a lock, they just check the version counter. This matters at 10 Mpps where lock contention would be catastrophic
Both are pre-allocated with fixed backing memory. If bihash fills beyond capacity, lookups degrade (longer page chains) but do not crash - rte_hash returns -ENOSPC
For your session table work: bihash_48_8 is the right choice for full 5-tuple IPv4 flows (src_ip 4B + dst_ip 4B + src_port 2B + dst_port 2B + proto 1B = 13B, padded to 48B for alignment)

MEMORY, FORMAT/UNFORMAT, TIMERS

🖨️

format / unformat - Extensible I/O

FORMAT

format is VPP's printf replacement. Instead of writing to a fixed buffer, it appends to a u8 * vec, growing as needed. The %U specifier allows any function with the right signature to be used as a format directive - this is how VPP achieves composable trace output.

/* format signature: u8 *format(u8 *s, const char *fmt, ...); */
/* Returns the u8-vec with formatted output appended */

u8 *s = 0;   /* start with empty vec */
s = format(s, "Interface %d IP: %U\n",
           sw_if_index,
           format_ip4_address, &my_addr);  /* %U calls format_ip4_address */
vlib_cli_output(vm, "%v", s);             /* %v = print u8-vec */
vec_free(s);

/* Writing your own format function */
static u8 * format_my_flow(u8 *s, va_list *args) {
    my_flow_t *f = va_arg(*args, my_flow_t *);
    s = format(s, "[%U:%d → %U:%d proto %d]",
               format_ip4_address, &f->src, f->src_port,
               format_ip4_address, &f->dst, f->dst_port,
               f->proto);
    return s;
}

/* Use it anywhere */
s = format(0, "Flow: %U\n", format_my_flow, &my_flow);

/* unformat - parsing */
unformat_input_t input;
unformat_init_string(&input, "192.168.1.1");
ip4_address_t addr;
if (unformat(&input, "%U", unformat_ip4_address, &addr))
    vlib_cli_output(vm, "Parsed: %U\n", format_ip4_address, &addr);

format(0, ...) allocates a new vec - caller must vec_free it
format(existing_vec, ...) appends to existing vec
%v - print a u8 * vec as a string
%U - call a custom format function
Every packet trace function uses format - learn this before writing your first plugin

⏱️

clib_time and Timing Primitives

TIMERS

/* High-resolution time - based on TSC (rdtsc) */
clib_time_t ct;
clib_time_init(&ct);

f64 now = clib_time_now(&ct);        /* seconds since init, f64 */
u64 cycles = clib_cpu_time_now();    /* raw TSC cycles */

/* In graph nodes: use vlib_time_now() which is pre-computed per dispatch */
f64 now = vlib_time_now(vm);  /* preferred in node functions */

/* Timer wheel (tw_timer_*.h) for protocol timeouts */
/* Used for TCP retransmit timers, NAT session expiry */
#include "vppinfra/tw_timer_2t_1w_2048sl.h"
/* Parameters: 2 timers/object, 1 wheel, 2048 slots */
TWT(tw_timer_wheel) tw;
tw_timer_wheel_init_2t_1w_2048sl(&tw, expired_cb, 1.0, ~0);

For session timeouts in your Stateful Connection Tracker project (Mini-Project 7), use tw_timer_* - it handles expiry callbacks at O(1) per tick regardless of the number of active timers.

🧮

clib_mem - Memory Allocation

MEMORY

/* Always use clib_mem_*, never malloc/free in VPP code */

/* Basic allocation */
void *p = clib_mem_alloc(size);
void *p = clib_mem_alloc_aligned(size, CLIB_CACHE_LINE_BYTES);  /* 64-byte aligned */
clib_mem_free(p);

/* NUMA-aware allocation */
clib_mem_set_numa_affinity(numa_node);  /* set before alloc */
void *p = clib_mem_alloc_aligned(size, CLIB_CACHE_LINE_BYTES);
clib_mem_set_default_numa_affinity();   /* reset */

/* Heap introspection */
clib_mem_usage_t usage;
clib_mem_get_heap_usage(clib_mem_get_heap(), &usage);
/* usage.bytes_used, usage.bytes_free */

/* For per-worker allocations: use per-thread heaps */
/* vlib sets up per-thread heaps automatically */
void *old_heap = clib_mem_set_heap(vm->thread_main->heap);
/* allocate on worker-local heap */
clib_mem_set_heap(old_heap);   /* restore */

CLIB_CACHE_LINE_BYTES is 64 on x86_64. Always align per-worker data structures to cache lines to avoid false sharing between worker threads - a common source of hidden performance problems in multi-threaded VPP plugins.

P2A COMPLETION CHECKLIST

Understand vec memory layout: hidden header, pointer to element[0], NULL == empty
Can write vec_add1, vec_foreach, vec_reset_length patterns from memory
Know why you must never store pointers to vec elements (realloc risk)
Understand pool memory model: contiguous array, bitmap free-list, stable indices
Can implement the pool_get → store index → pool_elt_at_index pattern
Know the difference between pool_foreach and pool_foreach_index
Understand bihash two-level structure: bucket array → KV pages
Can choose the right bihash variant (8_8, 16_8, 48_8) for a given key size
Can implement bihash init, insert, lookup, and delete correctly
Understand bihash single-writer/multi-reader protocol (lock bit + version)
Can write a custom format function using the %U registration pattern
Know when to use clib_mem_alloc_aligned vs plain clib_mem_alloc
Can explain how timer wheels work and when to use tw_timer vs a simpler per-node sweep

✅ When complete: move to P2B - vlib. You now know the data structures. vlib is where you learn how those structures are used in the graph dispatcher - the engine that drives VPP.

← P1 Foundation 🗺️ Roadmap Next: vlib →