WHY VPPINFRA EXISTS
vppinfra is VPP's Standard Library
FOUNDATIONvppinfra replaces the C standard library for VPP's dataplane. It provides memory management, data structures, I/O formatting, and timers that are specifically designed for the demands of high-performance packet processing: deterministic allocation, cache-line awareness, zero-copy design, and macro-heavy APIs for maximum inlining.
Every plugin and node you write will use vppinfra primitives. Understanding these data structures deeply - not just their API but their memory layout - is what separates engineers who write correct VPP code from those who write fast, correct VPP code.
Heap-allocated array with a hidden header storing length and capacity. The pointer points to element 0, not the header - fully compatible with C array indexing. Use everywhere you'd use std::vector or realloc-based arrays.
vec_add1 · vec_add · vec_len · vec_free · vec_foreach
Pre-allocated array of fixed-size objects with a free-list bitmap. O(1) alloc/free. Objects are addressed by index - never store pointers to pool elements (pool can realloc). This is how sessions, interfaces, and FIB entries are stored.
pool_get · pool_put · pool_elt_at_index · pool_foreach
Two-level hash table with bounded worst-case lookup: a bucket array (L1) pointing to fixed-size pages (L2) of key-value pairs. Designed for concurrent read with a single writer. Used in NAT, ACL, FIB, session tables - essentially every fast-path lookup.
BV(clib_bihash_add_del) · BV(clib_bihash_search)
Wrapper around dlmalloc with NUMA awareness and heap introspection. Supports multiple heaps (main heap, per-NUMA heaps). clib_mem_alloc_aligned guarantees cache-line alignment. Never use malloc/free in VPP code.
clib_mem_alloc · clib_mem_free · clib_mem_alloc_aligned
Extensible printf/scanf replacement. format returns a u8 * vec (not a fixed buffer). Custom format functions registered via %U. All VPP CLI output and packet trace use this - learn it to write readable trace functions.
format(0, "%U", format_ip4_address, &addr)
FIFO and ring-buffer primitives built on top of vec. Used for work queues, event rings, and inter-thread communication at the framework level. Power-of-2 sized for fast modulo via bitmask.
clib_fifo_add1 · clib_fifo_sub1 · clib_ring_new
- vec ≈
rte_malloc+ manual realloc tracking - but with automatic growth and a type-safe foreach macro - pool ≈
rte_mempoolfor fixed-size objects - but pool objects stay in-place (no dequeue/enqueue), addressed by index, not pointer. VPP sessions are stored in pools exactly like DPDK mbufs in a mempool - bihash ≈ a hash table you'd build over
rte_hash- but bihash is specifically designed for read-mostly concurrent access without locking on the read path - clib_mem ≈
rte_malloc- both support NUMA-local allocation and cache-line alignment. VPP uses clib_mem everywhere; never mix with rte_malloc inside VPP
VEC - DYNAMIC ARRAY (src/vppinfra/vec.h)
Memory Layout - The Hidden Header
INTERNALSThe key to understanding vec is its memory layout. The header lives before the data in memory, so the pointer you hold points directly to element[0]. This makes vec transparent to any C code expecting a plain array.
/* Memory layout of a vec_t */ +──────────────────────────────────────────────────────+ | vec_header_t | element[0] | element[1] | ... | | len (u32) | | | | | dlmalloc_hdr | | | | +──────────────────────────────────────────────────────+ ↑ your pointer lives here /* The pointer IS the array - C-array compatible */ u32 *my_vec = 0; /* NULL == empty vec, NOT uninitialised */ vec_add1(my_vec, 42); /* grows by 1, may realloc */ vec_add1(my_vec, 99); /* my_vec[0] == 42, my_vec[1] == 99 - plain array access */ u32 len = vec_len(my_vec); /* == 2 */
- NULL is a valid empty vec - always initialise to 0, never to an uninitialised pointer
- Never hold pointers to vec elements -
vec_add1may realloc, invalidating all element addresses. Hold indices instead - vec_free does not NULL the pointer - use
vec_free(v); v = 0;to be safe
Complete vec API Reference
API| Function / Macro | Signature / Usage | Notes |
|---|---|---|
vec_add1(v, e) | Append single element e to vec v | May realloc. v updated in-place (macro takes address) |
vec_add(v, p, n) | Append n elements from array p | Bulk append - faster than N × vec_add1 |
vec_add2(v, p, n) | Reserve space for n elements, return pointer to first | Use when you want to write directly into the vec |
vec_len(v) | Returns u32 element count. Returns 0 for NULL vec | Safe to call on NULL - does not crash |
vec_bytes(v) | Returns total byte size of vec data region | vec_len(v) * sizeof(v[0]) |
vec_free(v) | Free vec memory | Does NOT set v=0. Do that manually |
vec_reset_length(v) | Set length to 0 without freeing memory | Reuse allocation - faster than free+realloc |
vec_foreach(var,v) | Iterate: vec_foreach(ep, entries) { ... } | var is a pointer to each element |
vec_foreach_index(i,v) | Iterate by index: i goes 0..vec_len(v)-1 | When you need the index inside the loop |
vec_dup(v) | Return a copy of the vec | Heap-allocates a new vec with same content |
vec_validate(v, i) | Ensure vec is at least i+1 elements, zero-filling new slots | Use to grow to a known index safely |
vec_validate_init_empty(v,i,val) | Like vec_validate but fills with val instead of 0 | Useful for flag arrays initialised to ~0 |
vec_insert(v,n,i) | Insert n zero elements at position i | O(n) - shifts elements right |
vec_del1(v,i) | Delete element at i, replacing with last element | O(1) - order NOT preserved |
_vec_len(v) | Direct header field access - no NULL check | Use only when v is guaranteed non-NULL |
vec_set_len(v,n) | Force-set length field | Advanced: use after manual direct writes to vec memory |
/* Typical plugin usage: building a list of sw_if_index values */ u32 *sw_if_indices = 0; /* NULL = empty vec */ /* Collect all interfaces matching a condition */ pool_foreach(hw, im->hw_interfaces) { if (hw->flags & VNET_HW_INTERFACE_FLAG_LINK_UP) vec_add1(sw_if_indices, hw->sw_if_index); } /* Process them */ u32 *si; vec_foreach(si, sw_if_indices) { vnet_sw_interface_t *swif = vnet_get_sw_interface(vnm, *si); /* ... do something with swif */ } /* Reuse without realloc */ vec_reset_length(sw_if_indices); /* Or free completely */ vec_free(sw_if_indices); sw_if_indices = 0;
⚠️ Critical pitfall - storing pointers to vec elements: Any operation that can grow the vec (vec_add1, vec_add, vec_validate) may call realloc, which moves the entire array to a new address. Any pointer you saved to an element is now a dangling pointer. Always store the index into the vec, not a pointer to the element.
POOL - FIXED-SIZE OBJECT ALLOCATOR (src/vppinfra/pool.h)
Pool Memory Layout and Design
INTERNALSA pool is a pre-allocated contiguous array of fixed-size objects. It maintains a free-list as a bitmap of free slots. Allocation (pool_get) finds the first free bit and marks it used - O(1). Free (pool_put) marks the slot free again - O(1). Crucially, object addresses are stable as long as the pool does not grow - the pool never moves existing elements on alloc.
/* Pool memory model */ pool = [ obj[0] | obj[1] | obj[2] | obj[3] | ... ] (in use) (FREE) (in use) (FREE) free_bitmap = 0b...1010 (bits set = free slots) /* pool_get: find lowest set bit, clear it, return pointer */ /* pool_put: set the bit at this index */ /* Declaration */ typedef struct { u32 conn_id; u32 sw_if_index; ip4_address_t src, dst; u8 state; } my_session_t; my_session_t *session_pool = 0; /* NULL = empty pool */
Complete pool API
API| Macro / Function | Usage | Returns / Effect |
|---|---|---|
pool_get(P, E) | Allocate one element from pool P, set pointer E | E points to newly allocated element (zero-filled) |
pool_get_aligned(P, E, align) | pool_get with alignment guarantee | Use for SIMD-aligned structs |
pool_put(P, E) | Return element pointed to by E back to pool | Marks slot free. E still points to valid memory until next pool_get |
pool_put_index(P, i) | Free by index rather than pointer | Equivalent to pool_put(P, pool_elt_at_index(P,i)) |
pool_elt_at_index(P, i) | Return pointer to element at index i | No bounds check - undefined if i is free |
pool_is_free_index(P, i) | Return 1 if slot i is free | Always check before dereferencing by index |
pool_elts(P) | Count of currently in-use elements | Returns u32 |
pool_len(P) | Capacity - total allocated slots (used + free) | Returns u32 |
pool_foreach(E, P) | Iterate over all in-use elements | E is a pointer to each live element |
pool_foreach_index(i, P) | Iterate by index over all in-use elements | i is the index of each live element |
pool_free(P) | Free the entire pool memory | Frees backing store, does not set P=0 |
pool_validate_index(P, i) | Assert that index i is valid (not free) | Debug helper - crashes on invalid access |
pool_alloc(P, n) | Pre-allocate pool capacity for n objects | Avoids repeated realloc during initial population |
/* Complete example: per-flow session pool */ my_session_t *sessions = 0; /* pool */ uword *session_by_key = 0; /* hash: key → pool index */ /* Create a session */ my_session_t *s; pool_get_zero(sessions, s); /* allocate + zero-fill */ u32 session_index = s - sessions; /* derive index from pointer arithmetic */ s->conn_id = next_conn_id++; s->state = SESSION_STATE_INIT; /* Store in hash by key for fast lookup */ hash_set(session_by_key, flow_key, session_index); /* Fast-path lookup: key → index → pointer */ uword *val = hash_get(session_by_key, flow_key); if (val) { s = pool_elt_at_index(sessions, *val); /* s is now valid - use it */ } /* Destroy a session */ hash_unset(session_by_key, flow_key); pool_put(sessions, s); /* marks slot free */ /* Walk all active sessions (e.g., for timeout sweep) */ pool_foreach(s, sessions) { if (now - s->last_seen > SESSION_TIMEOUT) expire_session(sessions, session_by_key, s); }
💡 Deriving an index from a pointer: The idiom index = element_ptr - pool_base_ptr is idiomatic in VPP. It works because pool elements are contiguous. This index is stable even if the pool grows (existing elements don't move). Always store the index in inter-subsystem references (e.g., in a buffer's opaque field), never the pointer.
BIHASH - BOUNDED-INDEX EXTENSIBLE HASHING (src/vppinfra/bihash_*.h)
Bihash Architecture - Two-Level Lookup
INTERNALSBihash is VPP's primary hash table for dataplane lookups. Its design is optimised for the read-heavy, write-rare workload of packet forwarding: millions of lookups per second with occasional control-plane insertions.
/* Two-level structure */
Level 1 - Bucket Array (always in memory, fits in L2 cache):
bucket[0] → page pointer + lock bit
bucket[1] → page pointer + lock bit
...
bucket[N-1]→ page pointer + lock bit
hash(key) & (N-1) → selects bucket index
Level 2 - KV Pages (per-bucket, allocated on demand):
page = [ kvp[0] | kvp[1] | kvp[2] | kvp[3] | ... ]
(BIHASH_KVP_PER_PAGE entries, default 4 or 8)
Lookup:
1. bucket_idx = hash(key) & (N-1) O(1) - bitmask
2. page = bucket[bucket_idx].page O(1) - pointer deref
3. linear scan page for matching key O(1) - ≤8 comparesWhen a bucket fills (all KVP slots taken), it overflows to a chain of pages. The chain length is bounded by the load factor at init time. This gives worst-case O(chain_len) lookup - not O(n) like a chaining hash table.
The lock bit in the bucket enables a single-writer, multi-reader protocol: readers spin on the lock bit; the writer sets it, modifies the page, clears it. Readers detect inconsistency via the version counter and retry.
Bihash Variants - Choosing the Right One
VARIANTSBihash is a template implemented via macros. The type name encodes key_size_value_size in bytes:
| Type | Key | Value | Typical Use in VPP |
|---|---|---|---|
bihash_8_8 | 8 bytes (u64) | 8 bytes (u64) | NAT4 simple lookup, sw_if_index tables, ARP cache |
bihash_16_8 | 16 bytes | 8 bytes | NAT44 endpoint-dependent (src_ip+src_port+protocol+vrf) |
bihash_48_8 | 48 bytes | 8 bytes | NAT66, IPv6 session tables, 5-tuple flow tables |
bihash_24_8 | 24 bytes | 8 bytes | MPLS, L2 FIB with bridge domain |
bihash_40_8 | 40 bytes | 8 bytes | VXLAN tunnel tables (src+dst+vni) |
/* Include the specific variant you need */ #include "vppinfra/bihash_8_8.h" #include "vppinfra/bihash_template.h" /* defines BV() macro */ /* BV() prepends the type name: BV(clib_bihash_init) → clib_bihash_8_8_init */ typedef clib_bihash_8_8_t my_hash_t; /* Key-value pair type */ clib_bihash_kv_8_8_t kv; /* kv.key (u64), kv.value (u64) */ /* Initialise (call once, control plane) */ clib_bihash_8_8_t h; u32 nbuckets = 64 * 1024; /* power of 2, tuned to expected entries */ u32 mem_bytes = 128 * 1024 * 1024; /* 128 MB backing store */ clib_bihash_init_8_8(&h, "my-flow-table", nbuckets, mem_bytes);
Fast-Path Lookup and Insert Patterns
DATAPLANE/* ── LOOKUP (fast path - called per packet) ── */ clib_bihash_kv_8_8_t kv; /* Pack key: for 5-tuple flows you'd pack into 8 bytes */ kv.key = ((u64)src_addr << 32) | dst_addr; /* example: src+dst IP */ if (PREDICT_TRUE( clib_bihash_search_8_8(&h, &kv, &kv) == 0)) { /* Hit - kv.value is the stored value (e.g. session pool index) */ u32 session_idx = kv.value; my_session_t *s = pool_elt_at_index(sessions, session_idx); /* ... process packet using s */ } else { /* Miss - new flow, create session */ goto slow_path; } /* ── INSERT (slow path / control plane) ── */ clib_bihash_kv_8_8_t kv; kv.key = ((u64)src_addr << 32) | dst_addr; kv.value = session_idx; /* pool index */ clib_bihash_add_del_8_8(&h, &kv, 1 /* is_add */); /* ── DELETE ── */ kv.key = ((u64)src_addr << 32) | dst_addr; kv.value = 0; /* value irrelevant for delete */ clib_bihash_add_del_8_8(&h, &kv, 0 /* is_add=0 means delete */);
- rte_hash uses Cuckoo hashing with SIMD key comparison - excellent for fixed-size lookups but requires explicit locking for concurrent writers
- bihash uses a per-bucket lock bit so the read path is lock-free - readers never acquire a lock, they just check the version counter. This matters at 10 Mpps where lock contention would be catastrophic
- Both are pre-allocated with fixed backing memory. If bihash fills beyond capacity, lookups degrade (longer page chains) but do not crash - rte_hash returns -ENOSPC
- For your session table work: bihash_48_8 is the right choice for full 5-tuple IPv4 flows (src_ip 4B + dst_ip 4B + src_port 2B + dst_port 2B + proto 1B = 13B, padded to 48B for alignment)
MEMORY, FORMAT/UNFORMAT, TIMERS
format / unformat - Extensible I/O
FORMATformat is VPP's printf replacement. Instead of writing to a fixed buffer, it appends to a u8 * vec, growing as needed. The %U specifier allows any function with the right signature to be used as a format directive - this is how VPP achieves composable trace output.
/* format signature: u8 *format(u8 *s, const char *fmt, ...); */ /* Returns the u8-vec with formatted output appended */ u8 *s = 0; /* start with empty vec */ s = format(s, "Interface %d IP: %U\n", sw_if_index, format_ip4_address, &my_addr); /* %U calls format_ip4_address */ vlib_cli_output(vm, "%v", s); /* %v = print u8-vec */ vec_free(s); /* Writing your own format function */ static u8 * format_my_flow(u8 *s, va_list *args) { my_flow_t *f = va_arg(*args, my_flow_t *); s = format(s, "[%U:%d → %U:%d proto %d]", format_ip4_address, &f->src, f->src_port, format_ip4_address, &f->dst, f->dst_port, f->proto); return s; } /* Use it anywhere */ s = format(0, "Flow: %U\n", format_my_flow, &my_flow); /* unformat - parsing */ unformat_input_t input; unformat_init_string(&input, "192.168.1.1"); ip4_address_t addr; if (unformat(&input, "%U", unformat_ip4_address, &addr)) vlib_cli_output(vm, "Parsed: %U\n", format_ip4_address, &addr);
format(0, ...)allocates a new vec - caller mustvec_freeitformat(existing_vec, ...)appends to existing vec%v- print au8 *vec as a string%U- call a custom format function- Every packet trace function uses format - learn this before writing your first plugin
clib_time and Timing Primitives
TIMERS/* High-resolution time - based on TSC (rdtsc) */ clib_time_t ct; clib_time_init(&ct); f64 now = clib_time_now(&ct); /* seconds since init, f64 */ u64 cycles = clib_cpu_time_now(); /* raw TSC cycles */ /* In graph nodes: use vlib_time_now() which is pre-computed per dispatch */ f64 now = vlib_time_now(vm); /* preferred in node functions */ /* Timer wheel (tw_timer_*.h) for protocol timeouts */ /* Used for TCP retransmit timers, NAT session expiry */ #include "vppinfra/tw_timer_2t_1w_2048sl.h" /* Parameters: 2 timers/object, 1 wheel, 2048 slots */ TWT(tw_timer_wheel) tw; tw_timer_wheel_init_2t_1w_2048sl(&tw, expired_cb, 1.0, ~0);
For session timeouts in your Stateful Connection Tracker project (Mini-Project 7), use tw_timer_* - it handles expiry callbacks at O(1) per tick regardless of the number of active timers.
clib_mem - Memory Allocation
MEMORY/* Always use clib_mem_*, never malloc/free in VPP code */ /* Basic allocation */ void *p = clib_mem_alloc(size); void *p = clib_mem_alloc_aligned(size, CLIB_CACHE_LINE_BYTES); /* 64-byte aligned */ clib_mem_free(p); /* NUMA-aware allocation */ clib_mem_set_numa_affinity(numa_node); /* set before alloc */ void *p = clib_mem_alloc_aligned(size, CLIB_CACHE_LINE_BYTES); clib_mem_set_default_numa_affinity(); /* reset */ /* Heap introspection */ clib_mem_usage_t usage; clib_mem_get_heap_usage(clib_mem_get_heap(), &usage); /* usage.bytes_used, usage.bytes_free */ /* For per-worker allocations: use per-thread heaps */ /* vlib sets up per-thread heaps automatically */ void *old_heap = clib_mem_set_heap(vm->thread_main->heap); /* allocate on worker-local heap */ clib_mem_set_heap(old_heap); /* restore */
CLIB_CACHE_LINE_BYTES is 64 on x86_64. Always align per-worker data structures to cache lines to avoid false sharing between worker threads - a common source of hidden performance problems in multi-threaded VPP plugins.
P2A COMPLETION CHECKLIST
- Understand vec memory layout: hidden header, pointer to element[0], NULL == empty
- Can write vec_add1, vec_foreach, vec_reset_length patterns from memory
- Know why you must never store pointers to vec elements (realloc risk)
- Understand pool memory model: contiguous array, bitmap free-list, stable indices
- Can implement the pool_get → store index → pool_elt_at_index pattern
- Know the difference between pool_foreach and pool_foreach_index
- Understand bihash two-level structure: bucket array → KV pages
- Can choose the right bihash variant (8_8, 16_8, 48_8) for a given key size
- Can implement bihash init, insert, lookup, and delete correctly
- Understand bihash single-writer/multi-reader protocol (lock bit + version)
- Can write a custom format function using the %U registration pattern
- Know when to use clib_mem_alloc_aligned vs plain clib_mem_alloc
- Can explain how timer wheels work and when to use tw_timer vs a simpler per-node sweep
✅ When complete: move to P2B - vlib. You now know the data structures. vlib is where you learn how those structures are used in the graph dispatcher - the engine that drives VPP.