NETWORKING MASTERY · PHASE 1 · MODULE 03 · WEEK 2
🌐 IPv4 Deep Dive
IP addressing · Subnetting · CIDR · Header fields · Fragmentation · TTL · ICMP · Routing basics
Beginner → Intermediate Prerequisite: M01, M02 RFC 791 Subnetting ICMP 3 Labs

THE INTERNET PROTOCOL — WHY IT EXISTS

🌐

IP — The Language of the Internet

FOUNDATION

The Internet Protocol (IP) is the fundamental protocol that makes the internet work. Defined in RFC 791 (1981), it gives every device a logical address and defines how data is packaged into packets and routed across interconnected networks.

Without IP, you could only talk to devices on your same physical network — your switch's MAC table handles that. IP is what lets your laptop in Mumbai send data to a server in Frankfurt through dozens of intermediate networks and routers, none of which need to know anything about your laptop or the server directly.

IP's three core jobs:

  • Logical addressing — Every device gets an IP address. Unlike MAC addresses (hardware), IP addresses are logical and can be assigned, changed, and hierarchically organised for efficient routing
  • Packet fragmentation and reassembly — If a packet is too large for a network link, IP splits it into smaller fragments and reassembles at the destination
  • Best-effort delivery — IP makes its best effort to deliver packets but makes no guarantees. Packets can be lost, duplicated, reordered, or corrupted. Reliability is left to upper layers (TCP at L4)
📮 Analogy — The Postal System

IP is like the postal service. Your letter (packet) has a destination address (IP address). The postal system (internet) routes it through intermediate sorting offices (routers) without you needing to know the route. Each sorting office reads the destination address, decides which direction to send it, and passes it along. If the letter is too thick for a slot (MTU exceeded), it gets split into multiple envelopes (fragmentation). The postal service doesn't guarantee delivery — letters can get lost, arrive late, or arrive out of order. If you need guarantees, you use registered mail (TCP).

📊

IP in Context — The Protocol Stack

POSITION IN STACK

IP sits at Layer 3 of the OSI model — above Ethernet (L2) and below TCP/UDP (L4). Every TCP connection, every UDP datagram, every DNS query, every HTTP request — they all travel inside IP packets.

/* Stack position of IPv4 */

Application layer:  HTTP data ("GET /index.html...")
                         ↓ TCP wraps with segment header
Transport layer:    [TCP hdr: sport=52341 dport=80] + [HTTP data]
                         ↓ IP wraps with packet header
Network layer:      [IP hdr: src=10.0.0.5 dst=93.184.216.34] + [TCP] + [HTTP]
                         ↓ Ethernet wraps with frame header
Data Link layer:    [Eth hdr: dst_mac src_mac 0x0800] + [IP] + [TCP] + [HTTP] + [CRC]
                         ↓ NIC transmits as bits
Physical layer:     01001000 01010100 01010100...

The IP header's Protocol field (1 byte) tells the receiver what L4 protocol lives inside the packet: 6 = TCP, 17 = UDP, 1 = ICMP, 89 = OSPF, 50 = ESP (IPsec). This is how the kernel knows which protocol handler to pass the packet to after stripping the IP header.

IPv4 HEADER — 20 BYTES MINIMUM, EVERY FIELD EXPLAINED

📦

IPv4 Header Layout

HEADER FORMAT

The IPv4 header is a minimum of 20 bytes (160 bits). It precedes the payload (TCP segment, UDP datagram, ICMP message, etc.). Each row below represents 32 bits (4 bytes) as transmitted on the wire.

Row 1
Ver
4 bits
IHL
4 bits
DSCP / ECN
8 bits
Total Length
16 bits
Row 2
Identification
16 bits
Flags
3 bits
Fragment Offset
13 bits
Row 3
TTL
8 bits
Protocol
8 bits
Header Checksum
16 bits
Row 4
Source IP Address
32 bits (4 bytes)
Row 5
Destination IP Address
32 bits (4 bytes)
Row 6+
Options (if IHL > 5) + Padding
0–40 bytes — rare in practice
🔍

Every Field — What It Does and Why It Matters

FIELD REFERENCE

Version (4 bits)

Always 0100 = 4 for IPv4. IPv6 uses 0110 = 6. The receiver checks this first to confirm which IP version it's dealing with. In DPDK/VPP, this is the first thing ip4-input validates.

IHL — Internet Header Length (4 bits)

Specifies the header length in 32-bit words. Minimum value is 5 (5 × 4 bytes = 20 bytes, the minimum header with no options). Maximum is 15 (15 × 4 = 60 bytes). IHL tells the receiver where the payload starts: payload offset = IHL × 4.

/* C: find where IP payload begins */
uint8_t *ip_hdr = packet_start;
uint8_t  ihl     = (ip_hdr[0] & 0x0F);     /* low nibble of first byte */
uint8_t *payload = ip_hdr + (ihl * 4);     /* jump over header */

DSCP / ECN (8 bits — formerly TOS)

Originally called Type of Service, now split into two fields:

  • DSCP (Differentiated Services Code Point, 6 bits) — QoS marking. Routers and firewalls use this to prioritise packets. Common values: 0 = Best Effort, 46 = Expedited Forwarding (voice/video), 34 = Assured Forwarding. NGFW policy engines can mark and classify traffic using DSCP.
  • ECN (Explicit Congestion Notification, 2 bits) — allows congestion notification without packet drops. Routers mark ECN bits when they're near capacity; the receiver signals the sender to slow down.

Total Length (16 bits)

The total size of the IP packet in bytes — header + payload. Maximum value: 65535. Practical maximum on standard Ethernet: 1500 (MTU). This field is critical: receivers use it to know how many bytes to read, and it allows detection of truncated packets.

Identification (16 bits)

A unique ID assigned by the sender to identify all fragments of the same original packet. When a large packet is fragmented, all fragments get the same Identification value — the receiver uses it to reassemble them. Not used for non-fragmented packets (but still set by the OS).

Flags (3 bits)

0
Reserved (always 0)
DF
Don't Fragment
MF
More Fragments
  • DF (Don't Fragment) — tells routers not to fragment this packet. If the packet is too large for a link and DF=1, the router drops it and sends an ICMP "Fragmentation Needed" message back. Used by Path MTU Discovery (PMTUD)
  • MF (More Fragments) — set to 1 on all fragment except the last. Receiver uses this to know when it has collected all fragments

Fragment Offset (13 bits)

Position of this fragment's data within the original packet, measured in units of 8 bytes. A value of 185 means this fragment's data starts at byte offset 185 × 8 = 1480 in the original packet. The receiver uses Identification + Fragment Offset to put fragments back in order.

TTL — Time To Live (8 bits)

A counter decremented by 1 at each router hop. When TTL reaches 0, the router discards the packet and sends an ICMP Time Exceeded message back to the sender. Purpose: prevent packets from looping forever in a routing loop. Starting TTL is typically 64 (Linux), 128 (Windows), or 255 (some routers). We cover TTL in detail in the TTL and Routing tab.

Protocol (8 bits)

Identifies the L4 protocol inside the payload:

  • 1 — ICMP
  • 6 — TCP
  • 17 — UDP
  • 41 — IPv6-in-IPv4 (6in4 tunnel)
  • 47 — GRE
  • 50 — ESP (IPsec)
  • 51 — AH (IPsec)
  • 89 — OSPF
  • 132 — SCTP

Header Checksum (16 bits)

A checksum computed over the IP header only (not the payload — TCP/UDP have their own checksums). Each router must recompute it after decrementing TTL. If the checksum fails, the packet is silently dropped. Modern NICs (including your Mellanox) offload checksum verification to hardware.

Source IP Address (32 bits) and Destination IP Address (32 bits)

The 4-byte IPv4 addresses of sender and receiver. These are the primary fields routers use for forwarding decisions. In NAT, both source and destination addresses may be rewritten by the firewall/NAT device.

💡 In DPDK/VPP code, the IPv4 header is accessed via ip4_header_t (VPP) or a manual struct. Key fields accessed in the fast path: ip4->dst_address (FIB lookup), ip4->protocol (dispatch to TCP/UDP), ip4->ttl (decrement), ip4->checksum (recompute after TTL change). These are the fields your graph nodes will read millions of times per second.

IPv4 ADDRESSING — 32-BIT ADDRESSES, NOTATION, CLASSES

🏷️

IPv4 Address Structure

CORE CONCEPT

An IPv4 address is a 32-bit number — four groups of 8 bits (octets) separated by dots. We write it in dotted-decimal notation where each octet is expressed as a decimal number from 0 to 255.

Binary representation of 192.168.1.100
192
11000000
.
168
10101000
.
1
00000001
.
100
01100100
Full 32-bit binary: 11000000.10101000.00000001.01100100

Every IP address has two parts — a network portion and a host portion. The subnet mask tells you which bits are the network part (1s) and which are the host part (0s).

  • All devices in the same network have identical network bits
  • Each device has a unique host portion within its network
  • Routers forward packets based on the network portion — they don't care about individual host bits
📚

Classful Addressing — Historical but Still Referenced

BACKGROUND

Before CIDR (1993), IPv4 addresses were divided into fixed classes. You still hear these terms in networking conversations:

ClassFirst BitsRangeDefault MaskNetworksHosts/NetworkUse
A0xxxxxxx1.0.0.0 – 126.255.255.255/8 (255.0.0.0)12616,777,214Large orgs
B10xxxxxx128.0.0.0 – 191.255.255.255/16 (255.255.0.0)16,38465,534Medium orgs
C110xxxxx192.0.0.0 – 223.255.255.255/24 (255.255.255.0)2,097,152254Small orgs
D1110xxxx224.0.0.0 – 239.255.255.255N/AN/AN/AMulticast
E1111xxxx240.0.0.0 – 255.255.255.255N/AN/AN/AReserved/Experimental

Classful addressing wasted enormous numbers of IP addresses (a company needing 300 hosts got a Class B with 65,534 addresses — 65,234 wasted). CIDR replaced classful addressing, but the Class A/B/C terminology persists in configuration and documentation.

🔧

Subnet Mask — The Network/Host Boundary

SUBNET MASK

A subnet mask is a 32-bit number where all network bits are 1 and all host bits are 0. Two notation forms:

  • Dotted-decimal: 255.255.255.0 — easier to read for humans
  • CIDR prefix length: /24 — count of 1-bits. Much more compact.
/* Example: 192.168.1.100/24 */
IP address:   192.168.1.100  =  11000000.10101000.00000001.01100100
Subnet mask:  255.255.255.0  =  11111111.11111111.11111111.00000000
                                 ←──── Network portion ────→ ←Host→

/* AND operation: IP & mask = Network address */
Network addr: 192.168.1.0    =  11000000.10101000.00000001.00000000

/* Broadcast: network with all host bits = 1 */
Broadcast:    192.168.1.255  =  11000000.10101000.00000001.11111111

/* Usable hosts: from .1 to .254 (254 hosts for /24) */
First host:   192.168.1.1
Last host:    192.168.1.254

Three critical addresses in every subnet:

  • Network address — host bits all 0. Identifies the subnet itself, not assignable to a host (e.g., 192.168.1.0)
  • Broadcast address — host bits all 1. Sends to all hosts in the subnet, not assignable (e.g., 192.168.1.255)
  • Usable host range — everything between. For /24: 192.168.1.1 to 192.168.1.254 = 254 usable hosts

SUBNETTING AND CIDR — DIVIDING ADDRESS SPACE EFFICIENTLY

✂️

Why Subnetting Exists

MOTIVATION

Subnetting takes a large network and divides it into smaller sub-networks. This is done for three reasons:

  • Security isolation — different departments/zones in different subnets, firewall between them (your NGFW use case)
  • Performance — smaller broadcast domains mean less broadcast noise
  • Address efficiency — allocate exactly as many IPs as you need, no wastage

When you subnet, you borrow bits from the host portion and add them to the network portion — increasing the prefix length. More network bits = smaller subnets = fewer hosts per subnet.

📐

CIDR Prefix Reference Table

REFERENCE
PrefixSubnet MaskHostsUsable HostsTypical Use
/8255.0.0.016,777,21616,777,214ISP, large org backbone
/16255.255.0.065,53665,534Large campus, cloud VPC
/20255.255.240.04,0964,094Medium office, data centre zone
/24255.255.255.0256254Standard office LAN, server subnet
/25255.255.255.128128126Split /24 into two halves
/26255.255.255.1926462Department subnets
/27255.255.255.2243230Small team subnet
/28255.255.255.2401614Small server cluster
/29255.255.255.24886Router-to-router links
/30255.255.255.25242Point-to-point links (2 hosts only)
/31255.255.255.25422*P2P links (RFC 3021 — no network/broadcast)
/32255.255.255.25511Host route, loopback, BGP next-hop

Formula: Hosts = 2^(32-prefix). Usable = Hosts - 2 (subtract network and broadcast). Exception: /31 and /32 have special rules.

🧮

Subnetting by Hand — Step-by-Step Method

TECHNIQUE

Problem: You have 192.168.10.0/24 and need to create 4 equal subnets. What are the subnets?

Step 1 — How many bits to borrow?

You need 4 subnets = 2² → borrow 2 bits from the host portion. New prefix = /24 + 2 = /26.

Step 2 — What is the block size?

Block size = 256 - subnet_mask_last_octet = 256 - 192 = 64. (For /26: mask = 255.255.255.192, last octet = 192.)

Step 3 — List the subnets (increment by block size in the last octet):

SubnetNetwork AddrFirst HostLast HostBroadcast
/26 #1192.168.10.0192.168.10.1192.168.10.62192.168.10.63
/26 #2192.168.10.64192.168.10.65192.168.10.126192.168.10.127
/26 #3192.168.10.128192.168.10.129192.168.10.190192.168.10.191
/26 #4192.168.10.192192.168.10.193192.168.10.254192.168.10.255

Visual — network vs host bits for /26:

Network bits (26 bits fixed)
192.168.10.xx
|
Host bits (6 bits variable)
0–63 per subnet

💡 NGFW application: In a typical enterprise NGFW deployment you'll design security zones as subnets: 10.0.1.0/24 = Inside LAN, 10.0.2.0/24 = DMZ servers, 10.0.3.0/24 = Management. The firewall sits between these subnets and applies policy at the IP layer. Knowing subnetting lets you write precise ACL rules like permit ip 10.0.1.0/24 10.0.2.0/24.

💻

Subnet Arithmetic in C

CODE
#include <stdio.h>
#include <arpa/inet.h>
#include <stdint.h>

int main() {
    /* IP and prefix */
    uint32_t ip     = inet_addr("192.168.10.100");  /* network byte order */
    uint32_t prefix = 26;

    /* Build mask: ~0 shifted left by (32-prefix) bits */
    uint32_t mask = htonl(~0u << (32 - prefix));    /* 0xFFFFFFC0 = /26 */

    /* Network address = ip AND mask */
    uint32_t network   = ip & mask;

    /* Broadcast = network OR (NOT mask) */
    uint32_t broadcast = network | ~mask;

    /* First and last host */
    uint32_t first = htonl(ntohl(network) + 1);
    uint32_t last  = htonl(ntohl(broadcast) - 1);

    /* Usable host count */
    uint32_t hosts = ntohl(broadcast) - ntohl(network) - 1;

    char buf[INET_ADDRSTRLEN];
    printf("Network:   %s\n", inet_ntop(AF_INET, &network,   buf, sizeof(buf)));
    printf("Broadcast: %s\n", inet_ntop(AF_INET, &broadcast, buf, sizeof(buf)));
    printf("First:     %s\n", inet_ntop(AF_INET, &first,     buf, sizeof(buf)));
    printf("Last:      %s\n", inet_ntop(AF_INET, &last,      buf, sizeof(buf)));
    printf("Hosts:     %u\n", hosts);
    return 0;
}

SPECIAL IPv4 ADDRESS RANGES — KNOW THESE BY HEART

🗺️

Reserved and Special Address Ranges

REFERENCE
RangeNamePurposeRFCNGFW Relevance
10.0.0.0/8Private Class AInternal networks — not routed on internetRFC 1918Typically "inside" zone — allow policy
172.16.0.0/12Private Class BInternal networks — covers 172.16–172.31.x.xRFC 1918Often used for DMZ / management
192.168.0.0/16Private Class CInternal networks — common in SOHO/officesRFC 1918Home/branch office subnets
127.0.0.0/8LoopbackLocal host communication. 127.0.0.1 = "localhost"RFC 5735Never route this — drop at perimeter
169.254.0.0/16Link-Local / APIPAAuto-assigned when DHCP fails. Not routableRFC 3927Indicator of DHCP failure on host
100.64.0.0/10Shared Address SpaceCGN (Carrier-Grade NAT) — ISP internal useRFC 6598Treat like RFC 1918 — don't route externally
0.0.0.0/8Unspecified0.0.0.0 = "this host" — used before IP assignedRFC 1122Drop all packets with source 0.0.0.0
255.255.255.255/32BroadcastLimited broadcast — all hosts on local networkRFC 919Drop at firewall — never route
224.0.0.0/4MulticastGroup communication (OSPF, video streaming)RFC 5771Allow selectively (OSPF: 224.0.0.5/6)
240.0.0.0/4ReservedReserved for future use — treat as invalidRFC 1112Drop all packets in this range
192.0.2.0/24TEST-NET-1Documentation and examples — never real trafficRFC 5737Drop at perimeter
198.51.100.0/24TEST-NET-2Documentation — as aboveRFC 5737Drop at perimeter
203.0.113.0/24TEST-NET-3Documentation — as aboveRFC 5737Drop at perimeter
🛡️

Bogon Filtering — NGFW First Line of Defence

SECURITY

A bogon is an IP address that should never appear as a source on the public internet — either because it's reserved (RFC 1918 private, loopback, link-local) or unallocated. An NGFW at the internet perimeter should drop all packets with bogon source addresses — they indicate either misconfiguration or deliberate spoofing (attack).

/* Bogon filter — drop these source IPs at internet-facing interface */
/* These are source addresses that should NEVER arrive from the internet */

Bogon source ranges to block:
  10.0.0.0/8          RFC 1918 private
  172.16.0.0/12       RFC 1918 private
  192.168.0.0/16      RFC 1918 private
  127.0.0.0/8         Loopback
  169.254.0.0/16      Link-local
  100.64.0.0/10       Shared address space
  0.0.0.0/8           Unspecified
  240.0.0.0/4         Reserved
  224.0.0.0/4         Multicast (as source — invalid)
  192.0.2.0/24        TEST-NET-1
  198.51.100.0/24     TEST-NET-2
  203.0.113.0/24      TEST-NET-3

/* Unicas Reverse Path Forwarding (uRPF) — a smarter bogon filter */
/* Router drops packets if the source IP has no route back via the */
/* same interface the packet arrived on — prevents spoofed sources */

In VPP, bogon filtering is implemented as an IP feature arc plugin with a bihash lookup of source address against a prefix table. You'll build a version of this in Phase 6 (NGFW Development).

TTL, ROUTING BASICS, AND HOW ROUTERS FORWARD PACKETS

⏱️

TTL — Time To Live

TTL

TTL is an 8-bit counter in the IP header that starts at a value set by the sender (typically 64 for Linux, 128 for Windows, 255 for many routers) and is decremented by 1 at every router hop. When TTL reaches 0, the router discards the packet and sends an ICMP Time Exceeded message back to the original sender.

Why TTL exists: Without TTL, a packet caught in a routing loop (two routers sending it back and forth) would circulate forever, consuming bandwidth indefinitely. TTL guarantees every packet has a finite lifetime.

/* TTL trace: packet from your laptop to 8.8.8.8 */
Hop 1: Your router     TTL: 64 → 63   (decremented, forwarded)
Hop 2: ISP router 1   TTL: 63 → 62   (decremented, forwarded)
Hop 3: ISP router 2   TTL: 62 → 61   (decremented, forwarded)
...
Hop 12: Google router  TTL: 52 → 51   (decremented, forwarded)
Hop 13: 8.8.8.8        TTL: 51        (received — destination reached)

/* If TTL hits 0 at an intermediate router: */
Router discards packet + sends ICMP Type 11, Code 0 (Time Exceeded)
Sender receives ICMP with source IP of the discarding router
→ This is how traceroute works! (see ICMP tab)</span>

/* Default TTL values by OS */
Linux:   64    (set in /proc/sys/net/ipv4/ip_default_ttl)
Windows: 128
Cisco:   255
macOS:   64

NGFW use of TTL: TTL can reveal OS fingerprinting — a packet arriving with TTL=127 likely came from Windows (started at 128, lost 1 hop). Firewalls can use this for passive OS detection. Some NGFW features normalise TTL values to prevent fingerprinting attacks.

🗺️

How Routers Make Forwarding Decisions

ROUTING BASICS

Every router maintains a routing table (also called the FIB — Forwarding Information Base). When a packet arrives, the router looks up the destination IP address in the FIB using Longest Prefix Match (LPM): find the most specific route that covers the destination.

/* Example routing table on a Linux router */
$ ip route show

10.0.0.0/8       via 192.168.1.1 dev eth0        # Match any 10.x.x.x
10.10.0.0/16     via 192.168.1.2 dev eth0        # More specific match
10.10.1.0/24     dev eth1 proto kernel scope link # Most specific — local
0.0.0.0/0        via 203.0.113.1 dev eth2         # Default route (catch-all)

/* LPM example: packet destined for 10.10.1.55 */
Matches 0.0.0.0/0    → /0  — too broad
Matches 10.0.0.0/8   → /8  — candidate
Matches 10.10.0.0/16 → /16 — more specific
Matches 10.10.1.0/24 → /24 — MOST SPECIFIC → this one wins

/* Router actions after lookup: */
1. Decrement TTL (if TTL becomes 0: drop + send ICMP Time Exceeded)
2. Recompute IP header checksum (TTL changed)
3. ARP-resolve next-hop MAC if not cached
4. Rewrite Ethernet header: new dst MAC (next-hop) + src MAC (this router's outgoing port)
5. Transmit on outgoing interface

💡 What the router does NOT touch: Source IP, destination IP, and the entire IP payload (TCP/UDP/application data). IP routing is transparent to endpoints — your laptop doesn't know or care how many routers handled its packet. Routers only touch the Ethernet header and the TTL/checksum fields of the IP header.

IP FRAGMENTATION AND PATH MTU DISCOVERY

✂️

Why Fragmentation Exists

CONCEPT

Every network link has a Maximum Transmission Unit (MTU) — the largest IP packet it can carry. Standard Ethernet: 1500 bytes. Some links are smaller (PPPoE adds 8 bytes overhead, reducing effective MTU to 1492). When a packet larger than a link's MTU needs to cross that link, IP fragments it into smaller pieces.

Fragmentation happens at any router along the path (not just the sender) and reassembly happens only at the destination host — not at intermediate routers. This design choice avoids reassembly overhead at every hop.

✂️

How Fragmentation Works

MECHANICS

Scenario: a 4000-byte IP packet arrives at a router whose outgoing link has MTU 1500. The router fragments it into three pieces.

Original packet: 4000 bytes (IP header 20B + data 3980B) — too large for MTU 1500
IP Hdr
20B
Data: 3980 bytes
↓ Router fragments at MTU 1500 boundary
Fragment 1 — 1500B total
IP Hdr
20B
ID=x
MF=1
off=0
Data bytes 0–1479
(1480 bytes)
Fragment 2 — 1500B total
IP Hdr
20B
ID=x
MF=1
off=185
Data bytes 1480–2959
(1480 bytes)
Fragment 3 — 1040B total
IP Hdr
20B
ID=x
MF=0
off=370
Data bytes 2960–3979
(1020 bytes)

Fragment field values explained:

  • Identification = x — same value in all 3 fragments (receiver uses this to group them)
  • MF=1 — More Fragments — set on first two fragments, MF=0 on the last
  • Fragment Offset — in units of 8 bytes: 0 / 185 (1480÷8) / 370 (2960÷8)
  • Fragment data size — must be multiple of 8 bytes (except last) to allow correct offset calculation
⚠️

Fragmentation Problems and PMTUD

ISSUES

Fragmentation causes several real-world problems:

  • Performance overhead — reassembly at the destination consumes CPU and memory. Fragments must be buffered until all arrive.
  • Firewall complexity — stateful firewalls must reassemble fragments before inspecting the transport header (TCP/UDP ports are only in the first fragment). This is a significant processing cost.
  • Fragment attacks — attackers exploit fragmentation: overlapping fragments (Teardrop), tiny first fragment (hides TCP flags from firewall), missing last fragment (holds reassembly buffer forever).
  • ICMP filtering — some networks block ICMP, which breaks Path MTU Discovery (see below).

Path MTU Discovery (PMTUD)

Modern systems avoid fragmentation by discovering the smallest MTU on the path before sending large packets:

Sender sets DF=1 on all packets
Don't Fragment bit = 1 tells routers not to fragment — drop instead.
Router with smaller MTU drops packet
Router drops the oversized packet and sends ICMP Type 3, Code 4 "Fragmentation Needed" with the MTU of its link.
ICMP: Type=3 Code=4 Next-Hop-MTU=1492
Sender reduces packet size
Sender receives the ICMP and records the reduced MTU for this destination. Future packets use the smaller size. TCP adjusts its MSS (Maximum Segment Size) accordingly.

⚠️ ICMP Black Holes: If a firewall blocks ICMP (a common but misguided practice), PMTUD breaks. The sender never receives the "Fragmentation Needed" message, packets keep getting dropped silently, and connections hang. This manifests as "large downloads hang after a few KB". NGFW policy must allow ICMP Type 3, Code 4 through for PMTUD to work correctly.

ICMP — INTERNET CONTROL MESSAGE PROTOCOL (RFC 792)

📨

What ICMP Is and Why It Exists

OVERVIEW

ICMP is IP's built-in diagnostic and error-reporting protocol. It travels inside IP packets (Protocol = 1) and is used by routers and hosts to report errors and exchange control information. ICMP itself has no concept of ports — it operates below TCP/UDP.

ICMP is essential for:

  • Ping — testing reachability (Echo Request/Reply)
  • Traceroute — discovering the path to a destination (abuses TTL expiry)
  • Error reporting — telling senders why their packets were dropped
  • Path MTU Discovery — informing senders of MTU limitations (Type 3, Code 4)

ICMP format: 8-byte fixed header (Type, Code, Checksum, + 4 bytes of type-specific data) followed by optional additional data.

📋

ICMP Message Types — Complete Reference

REFERENCE
TypeCodeNameDirectionCaused By / Use
00Echo ReplyHost → PingerResponse to ping (Type 8)
30Dest Unreachable — NetRouter → SenderNo route to destination network
31Dest Unreachable — HostRouter → SenderNo route to specific host
32Dest Unreachable — ProtocolHost → SenderProtocol not supported on destination
33Dest Unreachable — PortHost → SenderUDP port not listening (no process bound)
34Fragmentation NeededRouter → SenderPacket too large + DF=1 set. Includes next-hop MTU.
39Dest Unreachable — FilteredRouter/FW → SenderFirewall rejected the packet (admin filter)
50-3RedirectRouter → HostBetter route exists via different gateway
80Echo RequestSender → HostPing — tests reachability
110Time Exceeded (TTL)Router → SenderTTL reached 0 — used by traceroute
111Time Exceeded (Reassembly)Host → SenderFragment reassembly timer expired
120-2Parameter ProblemRouter/Host → SenderIP header field is invalid
🔍

Traceroute — How It Works

TECHNIQUE

Traceroute exploits TTL and ICMP Time Exceeded messages to discover every router on the path to a destination. It sends packets with progressively increasing TTL values (1, 2, 3, ...) and each router that drops the packet (TTL=0) sends back its IP address in the ICMP Time Exceeded message.

/* Traceroute algorithm */
Round 1: Send 3 packets with TTL=1
  → First router decrements to 0, drops, sends ICMP Time Exceeded
  → Reveal: first hop IP = 192.168.1.1 (your gateway)

Round 2: Send 3 packets with TTL=2
  → Second router decrements to 0, drops, sends ICMP Time Exceeded
  → Reveal: second hop IP = 10.10.1.1 (ISP edge router)

Round 3: Send 3 packets with TTL=3
  → Third router... and so on until destination replies

/* Two implementations */
traceroute on Linux: sends UDP packets to high port (33434+)
                     destination replies with ICMP Port Unreachable (type 3, code 3)
tracert   on Windows: sends ICMP Echo Requests
                     destination replies with Echo Reply (type 0)

/* Run it */
$ traceroute -n 8.8.8.8    # -n skips DNS resolution (faster)
$ traceroute -I 8.8.8.8    # -I uses ICMP instead of UDP
$ mtr 8.8.8.8              # live updating traceroute

Interpreting traceroute output:

  • * * * — router doesn't respond to probes (rate-limited or blocks ICMP) — does NOT mean the path is broken there
  • Increasing RTT — normal as you move further away
  • RTT jumping down — ICMP is rate-limited and TTL-exceeded replies travel a shorter path back
  • Asymmetric routing — forward and return path may be different (explains apparent RTT anomalies)
🛡️

ICMP and NGFW — What to Allow, What to Block

NGFW POLICY

A common mistake is to block all ICMP at the firewall. This breaks PMTUD and troubleshooting. Here's the correct NGFW policy for ICMP:

ICMP Type/CodeDirectionNGFW ActionReason
Type 8 (Echo Request)Inbound from internetBlock or rate-limitReduces attack surface, prevents mapping
Type 0 (Echo Reply)Inbound (reply to outbound ping)Allow (stateful)Return traffic for initiated pings
Type 3, Code 4 (Frag Needed)InboundAlways allowPMTUD — blocking this breaks connections
Type 3, Code 0-3 (Dest Unreach)InboundAllow (stateful)Error replies for existing connections
Type 11 (TTL Exceeded)InboundAllowTraceroute return path, debugging
Type 5 (Redirect)InboundBlockICMP redirect attacks — can reroute traffic
All ICMPOutboundAllowInternal users need full diagnostic capability
LAB 1

Dissect an IPv4 Header with Scapy and Wireshark

Objective: Craft raw IP packets with specific field values and observe exactly how each field appears in Wireshark. Build deep familiarity with every byte of the IP header.

1
Install Scapy: pip3 install scapy. Open Python3 as root: sudo python3. Import: from scapy.all import *.
2
Construct a minimal IP packet and inspect its fields: p = IP(dst="8.8.8.8") / ICMP() then p.show(). Observe every field Scapy has automatically set: version=4, ihl=5, ttl=64, proto=1 (ICMP), src (your IP), dst (8.8.8.8).
3
Examine the raw bytes: bytes(p). Count them — 20 bytes of IP header + 8 bytes ICMP = 28 bytes. Identify which bytes correspond to which fields (e.g., bytes 8–9 = TTL + Protocol).
4
Start a Wireshark capture. Send the packet: send(p). Find it in Wireshark. Expand the Internet Protocol layer. Verify every field matches what Scapy showed.
5
Now deliberately set unusual field values and observe: p = IP(dst="8.8.8.8", ttl=1, flags="DF", id=0xBEEF) / ICMP(). Send it. In Wireshark: (a) Does TTL appear as 1? (b) Is the DF flag set? (c) Is the Identification 0xBEEF (decimal 48879)? (d) What ICMP error did you get back — Time Exceeded?
6
Bonus — Hexdump analysis: hexdump(p) in Scapy shows the raw hex. Manually decode the first 20 bytes: byte 0 = version+IHL (0x45 = version 4, IHL 5), bytes 2-3 = total length, byte 8 = TTL, byte 9 = protocol. Cross-reference with the IP header diagram in Tab 1.
LAB 2

Subnetting Practice — Design an NGFW Network

Objective: Design a complete network layout for an NGFW deployment from scratch using subnetting. This simulates real-world work.

1
Task: You have the network 10.0.0.0/16 and need to create: (a) Inside LAN for 500 hosts, (b) DMZ for 50 servers, (c) Management network for 20 devices, (d) NGFW to router link (2 hosts only). Design subnets with the minimum waste. Calculate network address, mask, broadcast, first host, last host for each.
2
Write the answer before checking. Then verify using: python3 -c "import ipaddress; n = ipaddress.ip_network('10.0.0.0/23'); print(list(n.hosts())[0], list(n.hosts())[-1], n.broadcast_address)". Use Python's ipaddress module to validate all your subnet calculations.
3
Configure your subnets on Linux loopback interfaces to test them: sudo ip addr add 10.0.0.1/23 dev lo label lo:0. Add routes for each subnet: sudo ip route add 10.0.2.0/25 dev lo. Verify routing with ip route get 10.0.2.50.
4
Write iptables rules reflecting your NGFW policy between zones: allow all from Inside LAN to internet (MASQUERADE), allow only HTTP/HTTPS from internet to DMZ, deny Inside LAN to DMZ except port 443, deny all to Management except SSH from specific host. Verify with iptables -L -n -v.
LAB 3

ICMP and Traceroute Analysis

Objective: Capture and fully decode ICMP messages including ping, Time Exceeded (traceroute), and Destination Unreachable. Understand the complete ICMP interaction with IP.

1
Start Wireshark capture with filter icmp. Run: ping -c 4 8.8.8.8. Identify Type 8 (Echo Request) and Type 0 (Echo Reply) packets. In the hex dump, find: byte 20 = ICMP Type, byte 21 = ICMP Code, bytes 22-23 = Checksum, bytes 24-27 = Identifier + Sequence number.
2
Run traceroute while capturing: sudo traceroute -n -I 8.8.8.8. Capture Type 11 (Time Exceeded) replies from intermediate routers. Expand an ICMP Time Exceeded packet — notice it includes the first 8 bytes of the original IP payload (the original IP header is embedded) so the sender knows which packet caused the error.
3
Generate a Destination Unreachable (Port Unreachable): nc -u 8.8.8.8 9999 then type anything and press Enter. You'll get an ICMP Type 3, Code 3 (Port Unreachable) back from Google's server since nothing listens on UDP 9999. Capture and decode it in Wireshark.
4
Use Scapy to build a custom ICMP packet: force a specific TTL to trigger Time Exceeded from a known router hop. Use ans, unans = sr(IP(dst="8.8.8.8", ttl=3)/ICMP(), timeout=2). Then ans.show() — you'll see the response from hop 3 on your path to 8.8.8.8.
5
Bonus: Write a 20-line Python traceroute using Scapy. Send ICMP with TTL=1,2,3,... until you reach the destination or TTL=30. Print the IP of each responding router. This is the core of how traceroute works — you're implementing it from scratch.

M03 MASTERY CHECKLIST

When complete: Move to M04 - IPv6. You now have deep IPv4 knowledge. IPv6 keeps the same layered approach but changes addressing fundamentally — 128-bit addresses, no broadcast, mandatory SLAAC, ICMPv6 replaces ARP. Much of what you learned here maps directly.

← M02 Ethernet and L2 🗺️ Roadmap Next: M04 - IPv6 →