Module 10 — NIC Port Initialization
Reference code — requires DPDK and a DPDK-bound NIC (or
--vdev net_null0).
What you learn
How to configure a DPDK NIC port end-to-end: detect capabilities, set up
RX/TX queues and descriptor rings, enable RSS for multi-queue distribution,
configure hardware offloads (checksum, VLAN), start the port, and verify
link state. This is the last setup step before worker lcores can call
rte_eth_rx_burst() / rte_eth_tx_burst().
Port init sequence
rte_eth_dev_count_avail() → how many NICs available
rte_eth_dev_info_get(port, &info) → query NIC capabilities
│
├─ mask requested offloads against info.rx_offload_capa
├─ mask RSS hash types against info.flow_type_rss_offloads
│
rte_eth_dev_configure(port, nb_rx_q, nb_tx_q, &port_conf)
│
rte_eth_dev_adjust_nb_rx_tx_desc() → align descriptor counts to NIC limits
│
rte_eth_rx_queue_setup() × nb_rx_q → each queue gets a slice of mbuf pool
rte_eth_tx_queue_setup() × nb_tx_q
│
rte_eth_dev_start() → NIC starts link negotiation + DMA
rte_eth_promiscuous_enable() → accept all frames (not just own MAC)
│
check_port_link_status() → poll until UP (timeout = 9 sec)
Files
| File | Purpose |
|---|---|
port_init.c |
Full port init: capability check, configure, queues, start, link poll, stats |
Makefile |
DPDK pkg-config build |
Key concepts in the code
1. Capability intersection — the most common bug
/* WRONG: request offloads blindly */
port_conf.rxmode.offloads = RTE_ETH_RX_OFFLOAD_CHECKSUM;
/* CORRECT: mask against what the NIC actually supports */
port_conf.rxmode.offloads &= dev_info.rx_offload_capa;
If you request CHECKSUM on a NIC that doesn’t support it,
rte_eth_dev_configure() returns -EINVAL. Always intersect against
dev_info capabilities.
The same applies to RSS hash types:
port_conf.rx_adv_conf.rss_conf.rss_hf &= dev_info.flow_type_rss_offloads;
2. RSS — why it’s critical for multi-lcore scaling
Without RSS (single RX queue):
All packets → queue 0 → RX lcore → ring → single worker lcore
↑ bottleneck
With RSS (4 RX queues, one per worker lcore):
DNS from 198.51.100.x → hash=0xA3 % 4 = 3 → queue 3 → worker lcore 6
DNS from 10.0.0.x → hash=0x51 % 4 = 1 → queue 1 → worker lcore 4
RSS hash is computed by the NIC hardware (Toeplitz algorithm on src/dst IP + port). Each worker lcore polls its own dedicated queue — no contention.
3. Descriptor ring depth and mbuf pool sizing
RX ring (1024 descriptors):
NIC pre-fills these with mbuf pointers from the pool.
If the software is slow and all 1024 descriptors are full:
→ new packets are dropped → stats.imissed increments
Pool must always have more mbufs than (sum of all RX descriptor rings):
4 queues × 1024 desc × 1 port × 2 = 8192 → pool of 8191 is just enough
4. rte_eth_dev_adjust_nb_rx_tx_desc
Different NICs have different constraints on descriptor counts.
Always call this after rte_eth_dev_configure():
uint16_t nb_rxd = 1024, nb_txd = 1024;
rte_eth_dev_adjust_nb_rx_tx_desc(port, &nb_rxd, &nb_txd);
/* nb_rxd / nb_txd are now adjusted to valid values for this NIC */
5. Promiscuous mode — why it’s necessary
the DP application sits inline between clients and the internet — packets are addressed to the router, not to the appliance. Promiscuous mode makes the NIC accept everything regardless of dst MAC.
6. TX hardware checksum offload
After DNS sinkhole response is built in Module 18:
m->ol_flags |= RTE_MBUF_F_TX_IPV4 | RTE_MBUF_F_TX_IP_CKSUM
| RTE_MBUF_F_TX_UDP_CKSUM;
ip4->checksum = 0;
udp->checksum = rte_ipv4_phdr_cksum(ip4, m->ol_flags);
For this to work, the TX queue must have been set up with the same offloads
in port_conf.txmode. If either is missing, rte_eth_tx_burst() silently
ignores the flags and transmits with checksum=0 — every receiver drops the packet.
7. stats.imissed — the first metric to check
struct rte_eth_stats stats;
rte_eth_stats_get(port, &stats);
if (stats.imissed > 0)
LOG_WARN("Port %u: %lu packets dropped (NIC ring full)", port, stats.imissed);
imissed increments when a packet arrives but there’s no free mbuf or no
free descriptor. In the DP application, imissed was the first indicator when the
product went live at 50 Gbps and revealed that one lcore was occasionally
taking too long in the Hyperscan path.
Testing without a physical NIC
/* Add to eal_args in main(): */
"--vdev", "net_null0,copy=1",
net_null is a virtual NIC PMD that accepts TX packets and returns empty
RX bursts. The entire port_init() code runs identically — useful for CI/CD.
Next module
Module 11 — Multi-lcore RX/TX Pipeline: Wire together everything from Modules 08–10 into a complete multi-lcore packet processing pipeline skeleton.
Source files
| File | Download |
|---|---|
port_init.c |
port_init.c |
Makefile |
Makefile |