Module 09 — Mempool + mbuf
Reference code — requires DPDK installed.
What you learn
How DPDK manages packet memory via pre-allocated mempools and rte_mbuf
structures — why mempools exist, how the mbuf memory layout works, the
full alloc/fill/process/free lifecycle, in-place packet modification
(the pattern used by the DNS sinkhole), hardware checksum offload flags,
and multi-segment (chained) mbufs.
Why mempools — the core insight
Without mempools (naive approach):
Packet arrives → malloc(1500) → process → free()
At 2M pkts/sec: 2M malloc() calls/sec
malloc has locks + may call brk() → becomes the bottleneck
With mempools (DPDK approach):
Startup: pre-allocate 65535 mbufs in hugepage memory
Runtime: "alloc" = pop pointer from lockless ring (~10 ns)
"free" = push pointer back to ring (~10 ns)
No syscalls, no locks in the fast path
The mempool object ring uses the same design as Module 03 (rte_ring).
mbuf memory layout
One mbuf object in the pool:
┌─────────────────────────────┐ ← buf_addr (hugepage address)
│ struct rte_mbuf (~128B) │ metadata: data_off, data_len, pkt_len,
│ │ nb_segs, port, ol_flags, hash.rss, ...
├─────────────────────────────┤
│ private area (0B default) │ app-specific metadata per mbuf
├─────────────────────────────┤
│ headroom (128B) │ reserved for prepending headers
├─────────────────────────────┤ ← buf_addr + data_off
│ │ ← rte_pktmbuf_mtod(m, T*)
│ packet data │
│ (data_len bytes) │
│ │
├─────────────────────────────┤
│ tailroom │ available for appending (answer section!)
└─────────────────────────────┘ ← buf_addr + buf_len
Where this fits in the real application
Startup:
mbuf_pool = rte_pktmbuf_pool_create("pktmbuf_pool_s0",
65535, 256, 0, RTE_MBUF_DEFAULT_BUF_SIZE,
rte_socket_id())
rte_eth_rx_queue_setup(port, queue, nb_desc, socket, &cfg, mbuf_pool)
RX lcore:
nb_rx = rte_eth_rx_burst(port, queue, mbufs, BURST_SIZE)
Worker lcore:
eth = rte_pktmbuf_mtod(mbufs[i], eth_hdr_t *) ← zero-copy header access
DNS sinkhole (Module 18):
→ rewrite packet in-place via rte_pktmbuf_mtod()
→ rte_pktmbuf_append() for answer section bytes
→ m->ol_flags |= TX_IPV4 | TX_IP_CKSUM | TX_UDP_CKSUM
TX lcore:
nb_tx = rte_eth_tx_burst(port, queue, mbufs, nb_fwd)
for dropped mbufs: rte_pktmbuf_free(m)
Key concepts in the code
1. Pool size: must be 2^k − 1
#define POOL_NUM_MBUFS 8191 /* 2^13 - 1: correct */
#define POOL_NUM_MBUFS 8192 /* 2^13: WRONG — rte_ring adds 1, wastes a slot */
2. Per-lcore cache — reducing ring contention
Without cache: every alloc/free touches the pool's central ring
→ ring lock contention between lcores
With cache_size=256: each lcore keeps 256 mbufs locally
→ alloc/free from local cache (no ring) until cache empties/fills
→ ring only touched when cache needs replenishment
→ typical alloc is ~3 ns instead of ~10 ns
3. rte_pktmbuf_mtod — the most-used macro
/* Get typed pointer to packet start */
eth_hdr_t *eth = rte_pktmbuf_mtod(m, eth_hdr_t *);
/* Expands to: */
eth_hdr_t *eth = (eth_hdr_t *)((char *)(m)->buf_addr + (m)->data_off);
4. rte_pktmbuf_append — used in DNS sinkhole
/* Reserve space at the tail and return pointer to it */
char *answer_section = rte_pktmbuf_append(m, dns_answer_len);
if (!answer_section) { /* tailroom exhausted */ }
memcpy(answer_section, answer_bytes, dns_answer_len);
/* Update IP total_len and UDP dgram_len to include the new bytes */
ip4->total_len = htons(ntohs(ip4->total_len) + dns_answer_len);
5. Hardware checksum offload
m->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM
| RTE_MBUF_F_TX_UDP_CKSUM;
ip4->checksum = 0;
udp->checksum = rte_ipv4_phdr_cksum(ip4, m->ol_flags);
Software checksum would cost ~50 ns per packet — HW offload costs ~0.
6. Multi-segment guard in the real app
nb_rx = rte_eth_rx_burst(port, queue, mbufs, BURST_SIZE);
for (int i = 0; i < nb_rx; i++) {
if (unlikely(mbufs[i]->nb_segs > 1)) {
/* Jumbo frame — the DP application drops it rather than paying linearise cost */
rte_pktmbuf_free(mbufs[i]);
continue;
}
eth = rte_pktmbuf_mtod(mbufs[i], eth_hdr_t *);
}
Pool sizing reference
| Deployment | num_mbufs | Rationale |
|---|---|---|
| Dev/test (VM) | 4095 (2^12-1) | Small hugepage allocation |
| Single port, 4 queues | 16383 (2^14-1) | 4×1024 descriptors + 3× headroom |
| 4 ports, 4 queues each | 65535 (2^16-1) | 16×1024 + 3× headroom |
| the DP application production | 65535 | Used in app_main.c |
Next module
Module 10 — Port Init: Configure a DPDK NIC port — set up RX/TX queues, descriptor rings, link speed, promiscuous mode, and RSS.
Source files
| File | Download |
|---|---|
mempool_mbuf.c |
mempool_mbuf.c |
Makefile |
Makefile |