VPP P1 - Foundation and Environment

VPP MASTERY · PHASE 1 · WEEKS 1–3

⚡ Foundation & Environment

Scalar vs Vector · VPP Layers · Build · Docker + Mellanox · startup.conf · CLI · First Packet

Docker AMD + Mellanox DPDK Background vppctl 1 Mini-Project

THE FUNDAMENTAL SHIFT

🧠

Scalar vs Vector Packet Processing

CORE CONCEPT

Scalar (traditional stacks): One packet enters the stack, traverses all processing stages, exits. Then the next packet starts. Every packet re-warms the CPU instruction cache from scratch.

Vector (VPP's model): A batch of packets - the vector - enters a single graph node together. That node processes all N packets before any packet moves to the next node. The first packet in the batch warms the I-cache; every subsequent packet in the batch benefits at zero cost.

// Scalar processing - per-packet cache thrash
for each packet:
  ip4_lookup(pkt)     // I-cache warm
  ip4_rewrite(pkt)    // I-cache cold again
  ethernet_output(pkt)

// Vector processing - VPP's model
ip4_lookup(pkt[0..255])     // warm once, amortised over 256 pkts
ip4_rewrite(pkt[0..255])    // warm once, amortised over 256 pkts
ethernet_output(pkt[0..255]) // warm once, amortised over 256 pkts

This single architectural decision - processing a vector of packets per node invocation - gives VPP its performance edge. It enables prefetching, SIMD vectorisation, and cache-efficient branch prediction that simply cannot happen one packet at a time.

⚙️ DPDK PARALLEL - What You Already Know

rte_eth_rx_burst() is VPP's equivalent of "get a vector of packets" - you already use burst RX for the same reason
PMD poll loop maps to VPP's INPUT node polling: both spin on hardware without interrupts
rte_mbuf** array from rx_burst ≈ VPP's vlib_frame_t of buffer indices - a batch of packet references processed together
VPP generalises the single DPDK burst loop into a chain of N graph nodes, each processing the same batch

📊

The Packet Processing Graph - Core Mental Model

ARCHITECTURE

VPP's dataplane is a directed graph of processing nodes. Each node is a C function. Packets (as buffer indices) flow along graph edges. A single packet traversal from RX to TX typically looks like:

dpdk-input
  → ethernet-input
    → ip4-input
      → ip4-lookup       (FIB lookup → next-hop)
        → ip4-rewrite    (rewrite L2 header)
          → dpdk-output  (TX to NIC)

The graph is not acyclic - a packet can re-visit ip4-lookup multiple times (e.g., MPLS label push/pop). Each node's output is a next index that selects the outgoing edge.

Nodes communicate via vlib_frame_t: arrays of u32 buffer indices, not pointers
All nodes for a given phase run to completion before the next phase begins
The graph dispatcher (vlib_main_loop) drives everything - you never write a main loop

💡 Key insight - why u32 indices, not pointers? A u32 is 4 bytes; a pointer is 8. A frame of 256 packet references is 1 KB with indices vs 2 KB with pointers. This matters: the entire frame fits in a cache line set. Buffer pool base address + index = pointer at any time - zero overhead to dereference.

IMPLEMENTATION TAXONOMY

VPP

Container application - the vpp binary itself. Ties all layers together, runs the main loop, loads plugins. Source: src/vpp/

Plugins

Shared libraries loaded at startup. DPDK, memif, NAT, ACL, GTP, QUIC - all plugins. Your own features go here. Source: src/plugins/

Key plugins: dpdk_plugin.so, memif_plugin.so, nat_plugin.so, acl_plugin.so, af_xdp_plugin.so

VNET

Networking layer. L2/L3/L4 graph nodes, interface abstraction (sw_if_index), FIB, ARP, neighbour tables, session layer. Source: src/vnet/

Key subdirs: src/vnet/ip/, src/vnet/ethernet/, src/vnet/fib/, src/vnet/devices/

VLIB

Vector processing library. Graph node scheduler, buffer management, cooperative threads (process nodes), CLI, packet tracing, counters. Source: src/vlib/

Key files: src/vlib/main.c (dispatch loop), src/vlib/node.h, src/vlib/buffer.h

VPPInfra

Core library - VPP's libc. Memory allocators, vectors, pools, hash tables, ring buffers, format/unformat, timers. Everything is built on top of this. Source: src/vppinfra/

Key files: pool.h, vec.h, hash.h, bihash_8_8.h, clib.h, format.h

📁

Source Repository Layout

CODEBASE MAP

github.com/FDio/vpp
├── src/vppinfra/     # Core library: vec.h, pool.h, hash.h, bihash_*.h
├── src/vlib/         # Graph dispatcher: main.c, node.h, buffer.h, threads.c
├── src/vnet/         # Networking: ip/, ethernet/, fib/, devices/, feature/
├── src/plugins/      # Plugins: dpdk/, memif/, nat/, acl/, af_xdp/, linux-cp/
├── src/vpp/          # Container binary: app/vpe_cli.c
├── src/vpp-api/      # API bindings: python/vpp_papi/, .api.json files
├── src/svm/          # Shared virtual memory
├── src/examples/    # Sample plugin, handoff demo
└── test/             # Python test framework: test_*.py

When you explore a new VPP subsystem, start by reading the .h file - it contains the data structures and macro definitions. The .c file contains the implementations. API definitions live in .api files alongside each plugin.

BUILD FROM SOURCE

🔨

Building VPP

HANDS-ON

Always build from source for development. Binary packages hide important details. The VPP build system is CMake-based with a convenience Makefile wrapper.

# Clone the repo
git clone https://github.com/FDio/vpp.git && cd vpp

# Install build dependencies (Ubuntu 22.04)
make install-dep

# Debug build - has symbols, ASAN-compatible, slower
make build

# Release/optimised build - production performance
make build-release

# Run debug VPP interactively (reads /etc/vpp/startup.conf)
make run

# Run under GDB for debugging
make run-gdb

# Run full test suite
make test

# Run a specific test
make test TEST=test_nat

Debug binary lives at: build-root/install-vpp_debug-native/vpp/bin/vpp
Release binary: build-root/install-vpp-native/vpp/bin/vpp
Plugins: compiled as .so files, loaded from the plugin directory at startup

DOCKER + AMD + MELLANOX SETUP

🐳

Container Setup for Mellanox Ports

YOUR ENV

Your environment: Docker containers on AMD server with Mellanox Ethernet ports. VPP needs privileged access to hugepages, VFIO devices, and the PCI bus. The following setup gives VPP everything it needs.

# Step 1: Allocate hugepages on the host (2MB pages)
echo 2048 | sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
sudo mkdir -p /dev/hugepages
sudo mount -t hugetlbfs nodev /dev/hugepages

# Step 2: Bind Mellanox port to vfio-pci (use PCI address from lspci)
sudo dpdk-devbind.py --status                     # find PCI address
sudo dpdk-devbind.py --bind vfio-pci 0000:03:00.0
sudo dpdk-devbind.py --bind vfio-pci 0000:03:00.1

# Step 3: Run VPP container with all required resources
docker run --privileged --network host \
  -v /dev/hugepages:/dev/hugepages \
  -v /sys/bus/pci:/sys/bus/pci \
  -v /run/vpp:/run/vpp \
  -v /dev/vfio:/dev/vfio \
  -v /dev/vfio/vfio:/dev/vfio/vfio \
  -v /etc/vpp:/etc/vpp \
  -it ubuntu:22.04 /bin/bash

⚙️ DPDK KNOWLEDGE - Mellanox + VFIO

mlx5 PMD: Mellanox ConnectX-4/5/6 use the mlx5 poll-mode driver. VPP's DPDK plugin includes mlx5 support. No separate binding needed for mlx5 - it works through the kernel mlx5_core + VFIO
IOVA mode: For Mellanox with DPDK, use --iova-mode va (VA mode). Set in VPP via dpdk { iova-mode va } in startup.conf
SR-IOV VFs: For multi-container setups, create VFs on the PF and pass one VF per container - same as standard DPDK SR-IOV workflow
No KNI: VPP does not use DPDK KNI. Use TAP v2 or linux-cp for Linux kernel access

STARTUP CONFIGURATION

⚙️

startup.conf - Every Stanza Explained

CONFIGURATION

startup.conf is VPP's single configuration file, read at launch. It controls process behaviour, CPU pinning, DPDK ports, buffer pools, and plugin loading. Here is a production-annotated example for your environment:

unix {
  nodaemon                          # run in foreground (good for containers)
  log /var/log/vpp/vpp.log
  full-coredump                     # core dumps on crash
  cli-listen /run/vpp/cli.sock     # vppctl connects here
  startup-config /etc/vpp/setup.gate # CLI commands run at startup
}

api-trace {
  on                                # record API calls (for replay debugging)
}

cpu {
  main-core 0                       # pin main thread to core 0
  corelist-workers 2-5             # 4 workers on cores 2-5
  # corelist-workers 2,4,6,8        # non-contiguous cores also OK
}

dpdk {
  dev 0000:03:00.0 {               # Mellanox port 0
    num-rx-queues 4                # 1 queue per worker thread
    num-tx-queues 4
    num-rx-desc 1024
    num-tx-desc 1024
  }
  dev 0000:03:00.1 {               # Mellanox port 1
    num-rx-queues 4
    num-tx-queues 4
  }
  uio-driver vfio-pci
  iova-mode va                      # required for Mellanox mlx5
  socket-mem 1024,1024             # 1 GB per NUMA socket
  no-multi-seg                      # disable jumbo unless needed
  log-level notice
}

buffers {
  buffers-per-numa 128000          # buffer pool size per NUMA node
  default-data-size 2048           # buffer data area in bytes
  # use 10240 for jumbo/MTU 9000
}

plugins {
  path /usr/lib/x86_64-linux-gnu/vpp_plugins
  plugin dpdk_plugin.so  { enable }
  plugin memif_plugin.so { enable }
  # plugin some_plugin.so { disable }
}

statseg {
  size 128m                         # stats segment size
  per-node-counters on
}

Key rules:

corelist-workers count must equal total RX queues across all interfaces for full utilisation
socket-mem uses hugepages - must be pre-allocated on host before container starts
buffers-per-numa - if you see buffer allocation failures in logs, increase this
startup-config - put CLI commands here (set interface state, add routes) for auto-config at boot

ESSENTIAL CLI COMMANDS

💻

vppctl - Your Primary Interface

CLI REFERENCE

vppctl connects to VPP's Unix socket (/run/vpp/cli.sock) and sends CLI commands. You can use it interactively or pipe commands:

vppctl                      # interactive shell
vppctl show version         # single command
echo "show run" | vppctl    # pipe

Command	What It Shows / Does	Use When
`show version`	VPP version, build date, plugins loaded	First thing after starting VPP
`show plugins`	All loaded plugins with versions	Verify dpdk_plugin, memif_plugin loaded
`show interface`	All interfaces: state, RX/TX packet+byte counters, error counts	Check interface is up, count packets
`show run`	Per-node stats: calls, vectors processed, suspends, clocks/vector	Most important perf view - check vectors/call
`show buffers`	Buffer pool utilisation per NUMA node	Check for buffer starvation (free < 20%)
`show error`	Error counter table: which nodes are dropping and why	Debug drops - e.g. "ip4 source lookup miss"
`show ip fib`	FIB routing table: all prefixes and their DPO chains	Verify routes are programmed correctly
`show ip neighbors`	ARP/ND neighbour table	Check ARP resolution
`trace add dpdk-input 100`	Capture next 100 packets entering from DPDK input	Start trace before sending test traffic
`show trace`	Full per-packet trace: every node the packet visited with timestamps	After trace capture - shows complete packet path
`clear trace`	Clear the trace buffer	Before new capture
`show interface rx-placement`	Which worker thread handles which interface RX queue	Verify NUMA-local queue assignments
`set interface rx-placement <if> queue 0 worker 0`	Assign interface queue to specific worker	Manual NUMA-aware pinning
`set interface state <if> up`	Bring interface up	After creating interface
`set interface ip address <if> 10.0.0.1/24`	Assign IP address	Configure L3 interface
`show dpdk interface`	DPDK-specific interface info: queues, link speed, driver	Verify mlx5 link is up at correct speed
`show dpdk interface xstats <if>`	Extended NIC statistics from the DPDK ethdev layer	Deep NIC-level counters
`show log`	VPP internal log messages	Troubleshoot startup and plugin errors
`event-logger on`	Enable high-resolution event logger	Timing analysis - use with g2 viewer

💡 The most important command: show run - look at vectors/call for your input node. A value of 32–256 means VPP is batching well. A value of 1–4 means the system is lightly loaded or misconfigured. Clocks/vector is your per-packet CPU cost - lower is better.

PROJECT 1

VPP Container Lab - First Packet

Objective: Spin up a VPP instance inside Docker with Mellanox ports, configure two interfaces, send traffic, and fully trace the packet path through the graph.

Pull or build a VPP Docker image with DPDK support for Mellanox mlx5. Verify with show plugins that dpdk_plugin.so is loaded.

Write a startup.conf with your Mellanox PCI addresses, 1 GB hugepages per socket, and 2 worker threads pinned to non-overlapping cores.

Start VPP and run show interface. Both Mellanox ports should appear as GigabitEthernet or Ethernet devices. Bring them up: set interface state <if> up.

Assign IP addresses to both DPDK interfaces. Add a static route between them: ip route add 192.168.2.0/24 via 192.168.1.2.

From a peer container or host, start a trace: trace add dpdk-input 100. Then send 10 ICMP pings to VPP's interface IP.

Run show trace. For each captured packet, identify every graph node it visited and the time spent (in clock ticks) at each node.

Run show run. Record: vectors/call for dpdk-input, clocks/vector for ip4-lookup and ip4-rewrite. This is your baseline performance fingerprint.

Experiment: change worker threads from 2 to 4 in startup.conf, restart, and compare show run output. Does throughput scale linearly?

Run show error and verify there are no unexpected drops. If there are, trace a dropped packet and identify the error node.

🐳 Docker --privileged + /dev/hugepages + /dev/vfio bind-mounted 📂 src/plugins/dpdk/device/node.c · src/vnet/ip/ip4_forward.c

PHASE 1 COMPLETION CHECKLIST

Can explain scalar vs vector processing and why vector processing improves I-cache utilisation
Know the 5 VPP layers (VPPInfra, vlib, vnet, plugins, VPP binary) and what each is responsible for
Can build VPP from source (make build and make build-release) and know where the binaries are
Can run a VPP container on the AMD/Mellanox environment with correct hugepage and VFIO setup
Can write a complete startup.conf from scratch with DPDK stanza, CPU pinning, and buffer sizing
Know the difference between main-core and corelist-workers and how to size them for NIC queues
Can use vppctl to bring up interfaces, assign IPs, add routes
Can capture and interpret a packet trace - identify each graph node in the trace output
Understand what show run shows: vectors/call, clocks/vector, and what good values look like
Completed Mini-Project 1: first packet traced end-to-end through the VPP graph

✅ When complete: ready for Phase 2 - Core VPP Internals. Start with vppinfra - every data structure you'll use in plugins is defined there.

← VPP Hub 🗺️ Roadmap Next: vppinfra →