NETWORKING MASTERY · PHASE 4 · MODULE 18 · WEEK 16 · PHASE 4 FINAL
⚡ VPP and Data Plane Development
Vector packet processing · Graph node framework · VPP plugins · FIB · VAPI · NGFW data plane
Advanced Prerequisite: M17 DPDK FD.io VPP 23.x Your Team's R&D Platform 3 Labs

VPP — VECTOR PACKET PROCESSOR (FD.io)

What VPP Is and Why Your Team Uses It

OVERVIEW

VPP (Vector Packet Processor, FD.io project by Cisco/Linux Foundation) is a full-featured userspace network stack built on DPDK. Where DPDK is a toolkit for packet I/O, VPP is a complete forwarding engine with L2/L3/L4 processing, routing, NAT, ACL, GRE, VxLAN, MPLS, IPsec, and a plugin framework — running at tens to hundreds of millions of packets per second.

VPP is ideal for NGFW development because it provides the fast data plane and rich protocol support you'd otherwise spend years building from scratch, while leaving the door open for custom processing nodes via its plugin system.

SystemMpps/core (64B)Features available
Linux kernel1–3Everything, but slow
DPDK bare (basicfwd)30–80Only what you code
VPP (L3 forwarding)20–100Full routing, NAT, ACL, tunnels — built-in
VPP + ACL plugin15–60+ stateful conntrack
VPP + IPsec5–20+ encryption (DPDK crypto offload available)
⚙️

VPP Startup Configuration

SETUP
# /etc/vpp/startup.conf — key sections

unix {
  nodaemon
  log /var/log/vpp/vpp.log
  full-coredump
  cli-listen /run/vpp/cli.sock   # vppctl connects here
}

dpdk {
  dev 0000:01:00.0 { name eth0 num-rx-queues 4 num-tx-queues 4 }
  dev 0000:01:00.1 { name eth1 num-rx-queues 4 num-tx-queues 4 }
  num-mbufs 131072
  socket-mem 2048,0   # 2GB hugepages on NUMA 0
}

cpu {
  main-core 0                   # main thread (management)
  corelist-workers 2,3,4,5      # 4 worker threads
}

buffers {
  buffers-per-numa 131072
  default data-size 2048
}

# Start VPP
sudo systemctl start vpp
sudo vppctl show version
sudo vppctl show interface

VECTOR PROCESSING — VPP'S CORE INNOVATION

🔢

Why Processing Vectors Beats One-at-a-Time

CONCEPT

VPP's central innovation is processing a batch (vector) of packets through each graph node at once, rather than processing each packet through all nodes in sequence. This exploits CPU microarchitecture in four ways:

I-Cache Efficiency

When the same code path executes for 32 packets in a row, the instruction cache stays warm throughout. One-at-a-time processing causes I-cache eviction between the lengthy gap between packet arrivals. VPP nodes measure vector sizes of 16–64 packets as optimal.

Branch Predictor Warm

Processing 32 IPv4 packets in a row means the same branches (version==4, ihl==5, no options) execute with the same outcome repeatedly. The CPU branch predictor achieves near-100% accuracy across the vector.

Prefetch Pipelining

While processing packet N, you prefetch packet N+4. The 100ns DRAM latency is hidden behind actual computation. The canonical VPP 4x unrolled loop with prefetch is specifically designed to fill the memory latency gap.

SIMD Opportunity

Processing multiple identical structures (IP headers) in sequence creates opportunities for AVX2/AVX512 SIMD optimisation — operating on 4–8 headers simultaneously. The VPP checksum and hash inner loops exploit this.

/* Vector size measurement */
show run
# Thread 1 vpp_wk_0:
#  Name               Calls  Vectors  Clocks   Vectors/Call
#  dpdk-input           100   3200    8.7e3     32.0
#  ip4-input            100   3200    1.9e3     32.0
#  ip4-lookup           100   3200    2.8e3     32.0
#  ip4-rewrite          100   3200    1.4e3     32.0
#
# Vectors/Call = average batch size (32 = optimal for most hardware)
# Clocks/Vector = CPU cycles per packet in this node
#   ip4-lookup: 2800 clocks / 32 packets = 87.5 clocks/packet = ~30ns at 3GHz

GRAPH NODE FRAMEWORK — PACKET PIPELINE ARCHITECTURE

🕸️

Nodes, Frames, and Packet Flow

GRAPH
/* VPP graph: directed acyclic graph of processing nodes */
/* Each edge carries a vlib_frame_t — an array of buffer indices */

Default IP4 forwarding path:
  dpdk-input → ethernet-input → ip4-input → ip4-lookup → ip4-rewrite → interface-output

With ACL and NAT inserted:
  dpdk-input → ethernet-input → ip4-input
    → [ip4-unicast feature arc]:
        acl-plugin-in-ip4-fa     (ingress ACL + conntrack)
        nat44-ed-in2out           (NAT inbound)
    → ip4-lookup → ip4-rewrite
    → [ip4-output feature arc]:
        nat44-ed-out2in-worker    (NAT outbound)
        acl-plugin-out-ip4-fa     (egress ACL)
    → interface-output

/* Node types */
VLIB_NODE_TYPE_INPUT:    Poll loop entry (dpdk-input, tap-inject)
VLIB_NODE_TYPE_INTERNAL: Processing nodes (ip4-lookup, acl-plugin)
VLIB_NODE_TYPE_PRE_INPUT: Runs before INPUT (for scheduling)
VLIB_NODE_TYPE_PROCESS:  Background process threads

/* vlib_frame_t — the unit of work between nodes */
typedef struct {
    u16  n_vectors;         /* number of packets in this frame */
    u32  vector_offset;     /* offset to u32[] array of buffer indices */
} vlib_frame_t;

/* Get the array of buffer indices from a frame */
u32 *bufs = vlib_frame_vector_args(frame);
/* bufs[0..n_vectors-1] are indices into vlib_main.buffer_pool */

/* Get packet data from a buffer index */
vlib_buffer_t *b = vlib_get_buffer(vm, bufs[0]);
ip4_header_t  *ip = vlib_buffer_get_current(b);
/* vlib_buffer_get_current(b) = b->data + b->current_data */

/* Key node commands */
show vlib graph           # all nodes and their next-node connections
show vlib graph ip4-input # next nodes of ip4-input
show run                  # per-node performance (vectors, clocks)
show errors               # error counters per node

WRITING A VPP PLUGIN — THE CANONICAL PATTERN

🔌

Minimal Plugin with the 4x Unroll Pattern

PLUGIN
/* my_node.c — packet counter plugin with canonical 4x loop */
#include <vnet/vnet.h>
#include <vnet/plugin/plugin.h>
#include <vpp/app/version.h>

VLIB_PLUGIN_REGISTER() = {
    .version     = VPP_BUILD_VER,
    .description = "Packet counter plugin",
};

typedef enum { MY_NEXT_IP4_LOOKUP, MY_NEXT_DROP, MY_N_NEXT } my_next_t;

typedef struct {
    u64 pkt_count[VLIB_MAX_WORKERS + 1];  /* per-thread, no locking */
} my_main_t;
my_main_t my_main;

VLIB_NODE_FN(my_counter_node)(vlib_main_t *vm,
                               vlib_node_runtime_t *node,
                               vlib_frame_t *frame)
{
    u32 n_left = frame->n_vectors;
    u32 *from  = vlib_frame_vector_args(frame);
    u16 nexts[VLIB_FRAME_SIZE], *next = nexts;
    u64 pkts = 0;

    /* ── 4x unrolled loop with prefetch ─────────────── */
    while (n_left >= 8) {
        /* Prefetch packet data 4 ahead */
        vlib_prefetch_buffer_with_index(vm, from[4], LOAD);
        vlib_prefetch_buffer_with_index(vm, from[5], LOAD);
        vlib_prefetch_buffer_with_index(vm, from[6], LOAD);
        vlib_prefetch_buffer_with_index(vm, from[7], LOAD);

        /* Get 4 buffers */
        vlib_buffer_t *b0 = vlib_get_buffer(vm, from[0]);
        vlib_buffer_t *b1 = vlib_get_buffer(vm, from[1]);
        vlib_buffer_t *b2 = vlib_get_buffer(vm, from[2]);
        vlib_buffer_t *b3 = vlib_get_buffer(vm, from[3]);
        (void)b0; (void)b1; (void)b2; (void)b3;

        next[0] = next[1] = next[2] = next[3] = MY_NEXT_IP4_LOOKUP;
        from += 4; next += 4; n_left -= 4; pkts += 4;
    }
    /* ── Scalar tail ─────────────────────────────────── */
    while (n_left > 0) {
        next[0] = MY_NEXT_IP4_LOOKUP;
        from++; next++; n_left--; pkts++;
    }

    my_main.pkt_count[vm->thread_index] += pkts;

    vlib_buffer_enqueue_to_next(vm, node,
        vlib_frame_vector_args(frame), nexts, frame->n_vectors);
    return frame->n_vectors;
}

VLIB_REGISTER_NODE(my_counter_node) = {
    .name          = "my-counter",
    .vector_size   = sizeof(u32),
    .type          = VLIB_NODE_TYPE_INTERNAL,
    .n_next_nodes  = MY_N_NEXT,
    .next_nodes    = {
        [MY_NEXT_IP4_LOOKUP] = "ip4-lookup",
        [MY_NEXT_DROP]       = "error-drop",
    },
};

/* Insert into ip4-unicast feature arc on an interface */
/* vnet_feature_enable_disable("ip4-unicast", "my-counter", sw_if_index, 1, 0, 0); */

/* CMakeLists.txt */
# add_vpp_plugin(my_plugin SOURCES my_node.c API_FILES my_plugin.api)
# Plugins auto-loaded from /usr/lib/vpp_plugins/ at VPP startup

💡 The 4x unroll + prefetch pattern is canonical VPP. Every performance-critical node in VPP core uses this exact structure. The prefetch distance of 4 is tuned for typical L1/L2 miss latency (~60–100ns) on Intel Xeon. Copy this template when writing your own DPI or NGFW nodes.

VPP FIB — THE MULTI-LAYER FORWARDING DATABASE

🗺️

VPP FIB Architecture

FIB
/* VPP FIB is a three-layer structure */

Layer 1: IP4 FIB table (per VRF)
  Hash table → O(1) exact match for /32 host routes
  mtrie       → LPM for all other prefixes (4-level trie, 8 bits/level)

Layer 2: Load-Balance (LB) object
  Created when a prefix has multiple equal-cost next-hops (ECMP)
  Contains N hash buckets, each pointing to an adjacency
  Flow-hash over 5-tuple selects bucket (consistent per flow)

Layer 3: Adjacency
  Pre-built rewrite string: "dst_mac src_mac ethertype" (14 bytes)
  Stored as raw bytes — ip4-rewrite just memcpy's directly into packet
  Interface index for output

/* FIB inspection commands */
show ip fib                     # entire IPv4 FIB (can be huge)
show ip fib table 0             # VRF 0 (default)
show ip fib 10.0.0.0/8         # specific prefix details
show ip fib 8.8.8.8/32         # host route
show ip fib summary             # count of prefixes by length
show ip adjacency               # all adjacency objects
show ip adjacency 42            # specific adjacency: rewrite bytes, interface
show ip adjacency summary       # count by type (glean/rewrite/midchain)

/* Route management */
ip route add 10.0.0.0/8 via 192.168.1.1 GigabitEthernet0/8/0
ip route del 10.0.0.0/8 via 192.168.1.1 GigabitEthernet0/8/0

/* ECMP: add same prefix twice = LB with 2 buckets */
ip route add 10.0.0.0/8 via 192.168.1.1 GigabitEthernet0/8/0
ip route add 10.0.0.0/8 via 192.168.1.2 GigabitEthernet0/8/1
show ip fib 10.0.0.0/8
# Displays: load-balance [index N] buckets 2
#             [0]: adj[via 192.168.1.1 GigE0/8/0]
#             [1]: adj[via 192.168.1.2 GigE0/8/1]

/* Null routes — blackhole */
ip route add 192.0.2.0/24 drop
ip route add 198.51.100.0/24 local  # deliver to local stack

/* Multiple VRFs (for tenant isolation in NGFW) */
ip table add 100
ip route add table 100 0.0.0.0/0 via 10.100.0.1 GigabitEthernet0/8/0
set interface ip table GigabitEthernet0/8/2 100  # assign interface to VRF 100

VAPI AND CLI — CONTROLLING VPP FROM CODE AND SCRIPTS

🔌

VPP Control Plane Interfaces

VAPI
/* Three control interfaces */

1. vppctl CLI — interactive and scripted
   vppctl show version
   vppctl ip route add 0.0.0.0/0 via 10.0.0.1
   echo "show ip fib summary" | vppctl
   vppctl exec /etc/vpp/setup.vpp   # run a config file

2. Python VAPI — programmatic automation
import vpp_papi
from vpp_papi import VPP

vpp = VPP(['/usr/share/vpp/api/vpe.api.json',
           '/usr/share/vpp/api/interface.api.json',
           '/usr/share/vpp/api/ip.api.json'])
vpp.connect('my-control-app')

# Show version
rv = vpp.api.show_version()
print(f"VPP version: {rv.version.decode()}")

# Add an IP route
from ipaddress import ip_address, ip_network
rv = vpp.api.ip_route_add_del(
    is_add=1,
    route={
        'prefix': {'address': {'af': 0, 'un': {'ip4': b'\x00\x00\x00\x00'}},
                   'len': 0},
        'n_paths': 1,
        'paths': [{'nh': {'address': {'af': 0,
                                       'un': {'ip4': b'\x0a\x00\x00\x01'}}},
                   'sw_if_index': 1,
                   'proto': 0}]
    }
)

# Create loopback interface
rv = vpp.api.create_loopback()
sw_if_index = rv.sw_if_index

vpp.disconnect()

3. VAT2 (JSON-based API test tool)
   # vat2 show_version
   # vat2 show_interface sw_if_index 0

/* Useful diagnostic commands */
show interface                    # all interfaces, TX/RX stats
show hardware-interfaces          # NIC capabilities, link state
show run                          # node performance (vectors/call, clocks)
show run summary                  # top CPU-consuming nodes
show errors                       # drop counters per node
show buffers                      # mempool usage
show threads                      # worker thread info and CPU pinning
show plugins                      # loaded plugins
show log                          # VPP log buffer

VPP AS AN NGFW DATA PLANE

🛡️

Building an NGFW Data Plane on VPP

NGFW

VPP's feature arc system lets you insert custom processing nodes into the packet pipeline without modifying VPP core. The ip4-unicast arc is the primary insertion point for NGFW functions on inbound IPv4 traffic.

/* NGFW pipeline using VPP feature arcs */

ip4-input
  ↓ [ip4-unicast feature arc — ordered by feature weight]
  ├── acl-plugin-in-ip4-fa      (stateful conntrack + ACL rules)
  ├── nat44-ed-in2out            (DNAT / inbound NAT)
  ├── ipsec-input-ip4            (IPsec decrypt)
  └── YOUR-NGFW-DPI-NODE         (your custom DPI plugin)
  ↓
ip4-lookup → ip4-rewrite
  ↓ [ip4-output feature arc]
  ├── nat44-ed-out2in-worker     (SNAT / outbound NAT)
  └── acl-plugin-out-ip4-fa     (egress ACL)
  ↓
interface-output

/* Enable your plugin on an interface */
vnet_feature_enable_disable("ip4-unicast", "my-ngfw-dpi", sw_if_index, 1, 0, 0);

/* VPP ACL plugin — built-in stateful firewall */
# Create an ACL (permit HTTPS, permit HTTP, deny all)
acl_add_replace acl_index 0 r {is_permit 1 proto 6 dst_port 443 443 dst_ip 0.0.0.0/0},
                               {is_permit 1 proto 6 dst_port 80  80  dst_ip 0.0.0.0/0},
                               {is_permit 0}

# Apply to interface (inbound = filter traffic entering through eth0)
set acl-list interface GigabitEthernet0/8/0 input 0

/* VPP NAT44 — stateful NAT */
nat44 enable sessions 65536
set interface nat44 in GigabitEthernet0/8/0 out GigabitEthernet0/8/1
nat44 add interface address GigabitEthernet0/8/1

/* Connection tracking for custom node */
/* Access conntrack state from within your node: */
clib_bihash_kv_16_8_t kv;
/* Key = 5-tuple; Value = session state struct */
if (!clib_bihash_search_16_8(&ngfw_main.session_table, &kv, &kv)) {
    ngfw_session_t *s = (ngfw_session_t *)(uword)kv.value;
    /* session found — check state, increment counters */
}

💡 VPP clib_bihash is your primary data structure for session tables. It's a cache-friendly, lock-free concurrent hash table that VPP uses internally for ARP, FIB, and conntrack. For your NGFW session table keyed on 5-tuple, clib_bihash_16_8 (16-byte key = 5-tuple, 8-byte value = session index) achieves ~100ns lookup at millions of sessions — far better than any kernel-side alternative.

VPP PERFORMANCE ANALYSIS TOOLS

📊

Reading show run and Diagnosing Bottlenecks

PERF TOOLS
/* show run output — interpreting the numbers */
vppctl show run
# Thread 1 vpp_wk_0 (lcore 2):
#   Name               State  Calls  Vectors  Clocks      Vec/Call  Clk/Vec
#   dpdk-input         active  1000  32000    8.70e+06     32.0     272
#   ip4-input          active  1000  32000    1.92e+06     32.0      60
#   ip4-lookup         active  1000  32000    2.84e+06     32.0      89
#   ip4-rewrite        active  1000  32000    1.44e+06     32.0      45
#   my-ngfw-dpi        active  1000  32000    9.60e+06     32.0     300

# Clk/Vec = CPU cycles per packet in this node (at 3GHz: 300 cycles = 100ns)
# Sum of all Clk/Vec = total cycles per packet through the pipeline
# my-ngfw-dpi is the bottleneck here (300 cycles vs 60-89 for built-ins)

/* Optimisation workflow */
1. Run: vppctl clear run; sleep 5; vppctl show run
2. Identify highest Clk/Vec node (your bottleneck)
3. Check: are we prefetching? 4x unrolled? NUMA-local memory?
4. Profile: perf stat -e cycles,cache-misses -C 2 sleep 5
5. Check vector sizes: Vectors/Call < 8 = under-loaded (not batching enough)

/* show errors — drop counter diagnosis */
vppctl show errors
# ip4-input: ip4 src address is multicast    12
# ip4-input: ip4 spoofed local-address       5
# acl-plugin-in-ip4-fa: ACL deny packets  4821

/* Buffer pressure — detect mempool exhaustion */
vppctl show buffers
# If "allocated" approaches "total": mempool running low → increase num-mbufs

/* Per-interface counters */
vppctl show interface GigabitEthernet0/8/0
# RX packets/bytes, TX packets/bytes, drops, errors
vppctl clear interfaces   # reset counters

/* Packet capture in VPP (pcap trace) */
pcap dispatch trace on max 1000 file /tmp/vpp.pcap
# ... generate traffic ...
pcap dispatch trace off
# Open /tmp/vpp.pcap in Wireshark — shows packet at each graph node!
LAB 1

VPP from Zero to Forwarding Packet

Objective: Install VPP, configure interfaces and routing, verify packet forwarding, explore the FIB and graph.

1
Install VPP: sudo apt install vpp vpp-plugin-core vpp-plugin-dpdk. Use tap interfaces for testing (no physical NIC required): create two tap interfaces in startup.conf using tuntap { dev tap0 }. Start VPP and verify: sudo vppctl show version.
2
Configure interfaces: vppctl set interface state tap0 up, vppctl set interface ip address tap0 10.1.0.1/24. Add a static route: vppctl ip route add 10.2.0.0/24 via 10.1.0.2 tap0. Inspect the FIB: vppctl show ip fib. Find the adjacency for your route: vppctl show ip adjacency.
3
Explore the graph: vppctl show vlib graph ip4-input — note the next nodes. Generate traffic (ping through tap interface) and run vppctl show run. Identify which nodes execute and their Clk/Vec values. Calculate: at your measured Clk/Vec, what is the maximum Mpps per core?
4
Test ACL: create a deny-all ACL and apply to tap0 inbound: vppctl acl_add_replace acl_index 0 r {is_permit 0}, vppctl set acl-list interface tap0 input 0. Verify pings are dropped. Check: vppctl show errors — see the ACL deny counter increment.
LAB 2

Write a Custom VPP Counter Plugin

Objective: Write, build, and load a VPP plugin that counts packets per source IP using clib_bihash.

1
Set up the VPP development environment: sudo apt install vpp-dev. Create a plugin directory structure: my_plugin/CMakeLists.txt and my_plugin/my_node.c. Use the template from Tab 3.
2
Extend the template: add a clib_bihash_8_8_t (key=src_ip u64, value=pkt_count u64). In the processing loop, extract the source IP from the IP header, look up/insert in the hash table, increment the count. Handle IPv4 only; pass all packets to ip4-lookup.
3
Add a CLI command to display the top-10 source IPs by packet count. Register with: VLIB_CLI_COMMAND(show_top_sources_cmd, static) = { .path = "show ngfw top-sources", .function = show_top_sources_fn }.
4
Build: mkdir build && cd build && cmake .. && make. Copy the .so to VPP plugin directory. Restart VPP and verify the plugin loads: vppctl show plugins | grep my. Enable on an interface, generate traffic, and run your CLI command.
LAB 3

NGFW Prototype — ACL + NAT + Custom Node

Objective: Assemble a minimal NGFW data plane with VPP's ACL plugin, NAT44, and your custom counter node all operating in the same pipeline.

1
Configure VPP with two interfaces: inside (tap0, 10.1.0.1/24) and outside (tap1, 203.0.113.1/24). Enable NAT44: nat44 enable sessions 1024, set interface nat44 in tap0 out tap1, nat44 add interface address tap1.
2
Apply an ACL on the inside interface: permit TCP 443, permit TCP 80, permit ICMP, deny all else. Test: verify HTTP/HTTPS traffic passes, Telnet (port 23) is dropped. Check show errors for ACL deny counts.
3
Enable your counter plugin from Lab 2 on the inside interface. Generate mixed traffic (ICMP, TCP 80, TCP 443). Run your show ngfw top-sources command and verify counts. Use show run to confirm your node's Clk/Vec — compare it to the built-in ACL node.
4
Capture packets at each stage using VPP's pcap trace: pcap dispatch trace on max 500 file /tmp/vpp.pcap. Open in Wireshark and identify the same packet at different graph nodes. Observe: pre-NAT vs post-NAT IP addresses confirming NAT rewrote the packet.

M18 MASTERY CHECKLIST

🎉 Phase 4 Complete — Linux Networking and Socket Programming

You have completed all 5 modules of Phase 4: Linux Network Stack (M14), Socket Programming (M15), eBPF and XDP (M16), DPDK (M17), and VPP (M18). You now have a complete and deep understanding of the Linux networking toolkit from kernel internals to the most advanced data-plane frameworks. Move to Phase 5 — Security Protocols, starting with M19 - Cryptography Foundations.

← M17 DPDK 🗺️ Roadmap Next: M19 - Cryptography →