THE INTERNET PROTOCOL — WHY IT EXISTS
IP — The Language of the Internet
FOUNDATIONThe Internet Protocol (IP) is the fundamental protocol that makes the internet work. Defined in RFC 791 (1981), it gives every device a logical address and defines how data is packaged into packets and routed across interconnected networks.
Without IP, you could only talk to devices on your same physical network — your switch's MAC table handles that. IP is what lets your laptop in Mumbai send data to a server in Frankfurt through dozens of intermediate networks and routers, none of which need to know anything about your laptop or the server directly.
IP's three core jobs:
- Logical addressing — Every device gets an IP address. Unlike MAC addresses (hardware), IP addresses are logical and can be assigned, changed, and hierarchically organised for efficient routing
- Packet fragmentation and reassembly — If a packet is too large for a network link, IP splits it into smaller fragments and reassembles at the destination
- Best-effort delivery — IP makes its best effort to deliver packets but makes no guarantees. Packets can be lost, duplicated, reordered, or corrupted. Reliability is left to upper layers (TCP at L4)
IP is like the postal service. Your letter (packet) has a destination address (IP address). The postal system (internet) routes it through intermediate sorting offices (routers) without you needing to know the route. Each sorting office reads the destination address, decides which direction to send it, and passes it along. If the letter is too thick for a slot (MTU exceeded), it gets split into multiple envelopes (fragmentation). The postal service doesn't guarantee delivery — letters can get lost, arrive late, or arrive out of order. If you need guarantees, you use registered mail (TCP).
IP in Context — The Protocol Stack
POSITION IN STACKIP sits at Layer 3 of the OSI model — above Ethernet (L2) and below TCP/UDP (L4). Every TCP connection, every UDP datagram, every DNS query, every HTTP request — they all travel inside IP packets.
/* Stack position of IPv4 */
Application layer: HTTP data ("GET /index.html...")
↓ TCP wraps with segment header
Transport layer: [TCP hdr: sport=52341 dport=80] + [HTTP data]
↓ IP wraps with packet header
Network layer: [IP hdr: src=10.0.0.5 dst=93.184.216.34] + [TCP] + [HTTP]
↓ Ethernet wraps with frame header
Data Link layer: [Eth hdr: dst_mac src_mac 0x0800] + [IP] + [TCP] + [HTTP] + [CRC]
↓ NIC transmits as bits
Physical layer: 01001000 01010100 01010100...The IP header's Protocol field (1 byte) tells the receiver what L4 protocol lives inside the packet: 6 = TCP, 17 = UDP, 1 = ICMP, 89 = OSPF, 50 = ESP (IPsec). This is how the kernel knows which protocol handler to pass the packet to after stripping the IP header.
IPv4 HEADER — 20 BYTES MINIMUM, EVERY FIELD EXPLAINED
IPv4 Header Layout
HEADER FORMATThe IPv4 header is a minimum of 20 bytes (160 bits). It precedes the payload (TCP segment, UDP datagram, ICMP message, etc.). Each row below represents 32 bits (4 bytes) as transmitted on the wire.
Every Field — What It Does and Why It Matters
FIELD REFERENCEVersion (4 bits)
Always 0100 = 4 for IPv4. IPv6 uses 0110 = 6. The receiver checks this first to confirm which IP version it's dealing with. In DPDK/VPP, this is the first thing ip4-input validates.
IHL — Internet Header Length (4 bits)
Specifies the header length in 32-bit words. Minimum value is 5 (5 × 4 bytes = 20 bytes, the minimum header with no options). Maximum is 15 (15 × 4 = 60 bytes). IHL tells the receiver where the payload starts: payload offset = IHL × 4.
/* C: find where IP payload begins */ uint8_t *ip_hdr = packet_start; uint8_t ihl = (ip_hdr[0] & 0x0F); /* low nibble of first byte */ uint8_t *payload = ip_hdr + (ihl * 4); /* jump over header */
DSCP / ECN (8 bits — formerly TOS)
Originally called Type of Service, now split into two fields:
- DSCP (Differentiated Services Code Point, 6 bits) — QoS marking. Routers and firewalls use this to prioritise packets. Common values:
0= Best Effort,46= Expedited Forwarding (voice/video),34= Assured Forwarding. NGFW policy engines can mark and classify traffic using DSCP. - ECN (Explicit Congestion Notification, 2 bits) — allows congestion notification without packet drops. Routers mark ECN bits when they're near capacity; the receiver signals the sender to slow down.
Total Length (16 bits)
The total size of the IP packet in bytes — header + payload. Maximum value: 65535. Practical maximum on standard Ethernet: 1500 (MTU). This field is critical: receivers use it to know how many bytes to read, and it allows detection of truncated packets.
Identification (16 bits)
A unique ID assigned by the sender to identify all fragments of the same original packet. When a large packet is fragmented, all fragments get the same Identification value — the receiver uses it to reassemble them. Not used for non-fragmented packets (but still set by the OS).
Flags (3 bits)
- DF (Don't Fragment) — tells routers not to fragment this packet. If the packet is too large for a link and DF=1, the router drops it and sends an ICMP "Fragmentation Needed" message back. Used by Path MTU Discovery (PMTUD)
- MF (More Fragments) — set to 1 on all fragment except the last. Receiver uses this to know when it has collected all fragments
Fragment Offset (13 bits)
Position of this fragment's data within the original packet, measured in units of 8 bytes. A value of 185 means this fragment's data starts at byte offset 185 × 8 = 1480 in the original packet. The receiver uses Identification + Fragment Offset to put fragments back in order.
TTL — Time To Live (8 bits)
A counter decremented by 1 at each router hop. When TTL reaches 0, the router discards the packet and sends an ICMP Time Exceeded message back to the sender. Purpose: prevent packets from looping forever in a routing loop. Starting TTL is typically 64 (Linux), 128 (Windows), or 255 (some routers). We cover TTL in detail in the TTL and Routing tab.
Protocol (8 bits)
Identifies the L4 protocol inside the payload:
1— ICMP6— TCP17— UDP41— IPv6-in-IPv4 (6in4 tunnel)47— GRE50— ESP (IPsec)51— AH (IPsec)89— OSPF132— SCTP
Header Checksum (16 bits)
A checksum computed over the IP header only (not the payload — TCP/UDP have their own checksums). Each router must recompute it after decrementing TTL. If the checksum fails, the packet is silently dropped. Modern NICs (including your Mellanox) offload checksum verification to hardware.
Source IP Address (32 bits) and Destination IP Address (32 bits)
The 4-byte IPv4 addresses of sender and receiver. These are the primary fields routers use for forwarding decisions. In NAT, both source and destination addresses may be rewritten by the firewall/NAT device.
💡 In DPDK/VPP code, the IPv4 header is accessed via ip4_header_t (VPP) or a manual struct. Key fields accessed in the fast path: ip4->dst_address (FIB lookup), ip4->protocol (dispatch to TCP/UDP), ip4->ttl (decrement), ip4->checksum (recompute after TTL change). These are the fields your graph nodes will read millions of times per second.
IPv4 ADDRESSING — 32-BIT ADDRESSES, NOTATION, CLASSES
IPv4 Address Structure
CORE CONCEPTAn IPv4 address is a 32-bit number — four groups of 8 bits (octets) separated by dots. We write it in dotted-decimal notation where each octet is expressed as a decimal number from 0 to 255.
Every IP address has two parts — a network portion and a host portion. The subnet mask tells you which bits are the network part (1s) and which are the host part (0s).
- All devices in the same network have identical network bits
- Each device has a unique host portion within its network
- Routers forward packets based on the network portion — they don't care about individual host bits
Classful Addressing — Historical but Still Referenced
BACKGROUNDBefore CIDR (1993), IPv4 addresses were divided into fixed classes. You still hear these terms in networking conversations:
| Class | First Bits | Range | Default Mask | Networks | Hosts/Network | Use |
|---|---|---|---|---|---|---|
| A | 0xxxxxxx | 1.0.0.0 – 126.255.255.255 | /8 (255.0.0.0) | 126 | 16,777,214 | Large orgs |
| B | 10xxxxxx | 128.0.0.0 – 191.255.255.255 | /16 (255.255.0.0) | 16,384 | 65,534 | Medium orgs |
| C | 110xxxxx | 192.0.0.0 – 223.255.255.255 | /24 (255.255.255.0) | 2,097,152 | 254 | Small orgs |
| D | 1110xxxx | 224.0.0.0 – 239.255.255.255 | N/A | N/A | N/A | Multicast |
| E | 1111xxxx | 240.0.0.0 – 255.255.255.255 | N/A | N/A | N/A | Reserved/Experimental |
Classful addressing wasted enormous numbers of IP addresses (a company needing 300 hosts got a Class B with 65,534 addresses — 65,234 wasted). CIDR replaced classful addressing, but the Class A/B/C terminology persists in configuration and documentation.
Subnet Mask — The Network/Host Boundary
SUBNET MASKA subnet mask is a 32-bit number where all network bits are 1 and all host bits are 0. Two notation forms:
- Dotted-decimal:
255.255.255.0— easier to read for humans - CIDR prefix length:
/24— count of 1-bits. Much more compact.
/* Example: 192.168.1.100/24 */ IP address: 192.168.1.100 = 11000000.10101000.00000001.01100100 Subnet mask: 255.255.255.0 = 11111111.11111111.11111111.00000000 ←──── Network portion ────→ ←Host→ /* AND operation: IP & mask = Network address */ Network addr: 192.168.1.0 = 11000000.10101000.00000001.00000000 /* Broadcast: network with all host bits = 1 */ Broadcast: 192.168.1.255 = 11000000.10101000.00000001.11111111 /* Usable hosts: from .1 to .254 (254 hosts for /24) */ First host: 192.168.1.1 Last host: 192.168.1.254
Three critical addresses in every subnet:
- Network address — host bits all 0. Identifies the subnet itself, not assignable to a host (e.g.,
192.168.1.0) - Broadcast address — host bits all 1. Sends to all hosts in the subnet, not assignable (e.g.,
192.168.1.255) - Usable host range — everything between. For /24: 192.168.1.1 to 192.168.1.254 = 254 usable hosts
SUBNETTING AND CIDR — DIVIDING ADDRESS SPACE EFFICIENTLY
Why Subnetting Exists
MOTIVATIONSubnetting takes a large network and divides it into smaller sub-networks. This is done for three reasons:
- Security isolation — different departments/zones in different subnets, firewall between them (your NGFW use case)
- Performance — smaller broadcast domains mean less broadcast noise
- Address efficiency — allocate exactly as many IPs as you need, no wastage
When you subnet, you borrow bits from the host portion and add them to the network portion — increasing the prefix length. More network bits = smaller subnets = fewer hosts per subnet.
CIDR Prefix Reference Table
REFERENCE| Prefix | Subnet Mask | Hosts | Usable Hosts | Typical Use |
|---|---|---|---|---|
/8 | 255.0.0.0 | 16,777,216 | 16,777,214 | ISP, large org backbone |
/16 | 255.255.0.0 | 65,536 | 65,534 | Large campus, cloud VPC |
/20 | 255.255.240.0 | 4,096 | 4,094 | Medium office, data centre zone |
/24 | 255.255.255.0 | 256 | 254 | Standard office LAN, server subnet |
/25 | 255.255.255.128 | 128 | 126 | Split /24 into two halves |
/26 | 255.255.255.192 | 64 | 62 | Department subnets |
/27 | 255.255.255.224 | 32 | 30 | Small team subnet |
/28 | 255.255.255.240 | 16 | 14 | Small server cluster |
/29 | 255.255.255.248 | 8 | 6 | Router-to-router links |
/30 | 255.255.255.252 | 4 | 2 | Point-to-point links (2 hosts only) |
/31 | 255.255.255.254 | 2 | 2* | P2P links (RFC 3021 — no network/broadcast) |
/32 | 255.255.255.255 | 1 | 1 | Host route, loopback, BGP next-hop |
Formula: Hosts = 2^(32-prefix). Usable = Hosts - 2 (subtract network and broadcast). Exception: /31 and /32 have special rules.
Subnetting by Hand — Step-by-Step Method
TECHNIQUEProblem: You have 192.168.10.0/24 and need to create 4 equal subnets. What are the subnets?
Step 1 — How many bits to borrow?
You need 4 subnets = 2² → borrow 2 bits from the host portion. New prefix = /24 + 2 = /26.
Step 2 — What is the block size?
Block size = 256 - subnet_mask_last_octet = 256 - 192 = 64. (For /26: mask = 255.255.255.192, last octet = 192.)
Step 3 — List the subnets (increment by block size in the last octet):
| Subnet | Network Addr | First Host | Last Host | Broadcast |
|---|---|---|---|---|
/26 #1 | 192.168.10.0 | 192.168.10.1 | 192.168.10.62 | 192.168.10.63 |
/26 #2 | 192.168.10.64 | 192.168.10.65 | 192.168.10.126 | 192.168.10.127 |
/26 #3 | 192.168.10.128 | 192.168.10.129 | 192.168.10.190 | 192.168.10.191 |
/26 #4 | 192.168.10.192 | 192.168.10.193 | 192.168.10.254 | 192.168.10.255 |
Visual — network vs host bits for /26:
192.168.10.xx
0–63 per subnet
💡 NGFW application: In a typical enterprise NGFW deployment you'll design security zones as subnets: 10.0.1.0/24 = Inside LAN, 10.0.2.0/24 = DMZ servers, 10.0.3.0/24 = Management. The firewall sits between these subnets and applies policy at the IP layer. Knowing subnetting lets you write precise ACL rules like permit ip 10.0.1.0/24 10.0.2.0/24.
Subnet Arithmetic in C
CODE#include <stdio.h> #include <arpa/inet.h> #include <stdint.h> int main() { /* IP and prefix */ uint32_t ip = inet_addr("192.168.10.100"); /* network byte order */ uint32_t prefix = 26; /* Build mask: ~0 shifted left by (32-prefix) bits */ uint32_t mask = htonl(~0u << (32 - prefix)); /* 0xFFFFFFC0 = /26 */ /* Network address = ip AND mask */ uint32_t network = ip & mask; /* Broadcast = network OR (NOT mask) */ uint32_t broadcast = network | ~mask; /* First and last host */ uint32_t first = htonl(ntohl(network) + 1); uint32_t last = htonl(ntohl(broadcast) - 1); /* Usable host count */ uint32_t hosts = ntohl(broadcast) - ntohl(network) - 1; char buf[INET_ADDRSTRLEN]; printf("Network: %s\n", inet_ntop(AF_INET, &network, buf, sizeof(buf))); printf("Broadcast: %s\n", inet_ntop(AF_INET, &broadcast, buf, sizeof(buf))); printf("First: %s\n", inet_ntop(AF_INET, &first, buf, sizeof(buf))); printf("Last: %s\n", inet_ntop(AF_INET, &last, buf, sizeof(buf))); printf("Hosts: %u\n", hosts); return 0; }
SPECIAL IPv4 ADDRESS RANGES — KNOW THESE BY HEART
Reserved and Special Address Ranges
REFERENCE| Range | Name | Purpose | RFC | NGFW Relevance |
|---|---|---|---|---|
10.0.0.0/8 | Private Class A | Internal networks — not routed on internet | RFC 1918 | Typically "inside" zone — allow policy |
172.16.0.0/12 | Private Class B | Internal networks — covers 172.16–172.31.x.x | RFC 1918 | Often used for DMZ / management |
192.168.0.0/16 | Private Class C | Internal networks — common in SOHO/offices | RFC 1918 | Home/branch office subnets |
127.0.0.0/8 | Loopback | Local host communication. 127.0.0.1 = "localhost" | RFC 5735 | Never route this — drop at perimeter |
169.254.0.0/16 | Link-Local / APIPA | Auto-assigned when DHCP fails. Not routable | RFC 3927 | Indicator of DHCP failure on host |
100.64.0.0/10 | Shared Address Space | CGN (Carrier-Grade NAT) — ISP internal use | RFC 6598 | Treat like RFC 1918 — don't route externally |
0.0.0.0/8 | Unspecified | 0.0.0.0 = "this host" — used before IP assigned | RFC 1122 | Drop all packets with source 0.0.0.0 |
255.255.255.255/32 | Broadcast | Limited broadcast — all hosts on local network | RFC 919 | Drop at firewall — never route |
224.0.0.0/4 | Multicast | Group communication (OSPF, video streaming) | RFC 5771 | Allow selectively (OSPF: 224.0.0.5/6) |
240.0.0.0/4 | Reserved | Reserved for future use — treat as invalid | RFC 1112 | Drop all packets in this range |
192.0.2.0/24 | TEST-NET-1 | Documentation and examples — never real traffic | RFC 5737 | Drop at perimeter |
198.51.100.0/24 | TEST-NET-2 | Documentation — as above | RFC 5737 | Drop at perimeter |
203.0.113.0/24 | TEST-NET-3 | Documentation — as above | RFC 5737 | Drop at perimeter |
Bogon Filtering — NGFW First Line of Defence
SECURITYA bogon is an IP address that should never appear as a source on the public internet — either because it's reserved (RFC 1918 private, loopback, link-local) or unallocated. An NGFW at the internet perimeter should drop all packets with bogon source addresses — they indicate either misconfiguration or deliberate spoofing (attack).
/* Bogon filter — drop these source IPs at internet-facing interface */ /* These are source addresses that should NEVER arrive from the internet */ Bogon source ranges to block: 10.0.0.0/8 RFC 1918 private 172.16.0.0/12 RFC 1918 private 192.168.0.0/16 RFC 1918 private 127.0.0.0/8 Loopback 169.254.0.0/16 Link-local 100.64.0.0/10 Shared address space 0.0.0.0/8 Unspecified 240.0.0.0/4 Reserved 224.0.0.0/4 Multicast (as source — invalid) 192.0.2.0/24 TEST-NET-1 198.51.100.0/24 TEST-NET-2 203.0.113.0/24 TEST-NET-3 /* Unicas Reverse Path Forwarding (uRPF) — a smarter bogon filter */ /* Router drops packets if the source IP has no route back via the */ /* same interface the packet arrived on — prevents spoofed sources */
In VPP, bogon filtering is implemented as an IP feature arc plugin with a bihash lookup of source address against a prefix table. You'll build a version of this in Phase 6 (NGFW Development).
TTL, ROUTING BASICS, AND HOW ROUTERS FORWARD PACKETS
TTL — Time To Live
TTLTTL is an 8-bit counter in the IP header that starts at a value set by the sender (typically 64 for Linux, 128 for Windows, 255 for many routers) and is decremented by 1 at every router hop. When TTL reaches 0, the router discards the packet and sends an ICMP Time Exceeded message back to the original sender.
Why TTL exists: Without TTL, a packet caught in a routing loop (two routers sending it back and forth) would circulate forever, consuming bandwidth indefinitely. TTL guarantees every packet has a finite lifetime.
/* TTL trace: packet from your laptop to 8.8.8.8 */ Hop 1: Your router TTL: 64 → 63 (decremented, forwarded) Hop 2: ISP router 1 TTL: 63 → 62 (decremented, forwarded) Hop 3: ISP router 2 TTL: 62 → 61 (decremented, forwarded) ... Hop 12: Google router TTL: 52 → 51 (decremented, forwarded) Hop 13: 8.8.8.8 TTL: 51 (received — destination reached) /* If TTL hits 0 at an intermediate router: */ Router discards packet + sends ICMP Type 11, Code 0 (Time Exceeded) Sender receives ICMP with source IP of the discarding router → This is how traceroute works! (see ICMP tab)</span> /* Default TTL values by OS */ Linux: 64 (set in /proc/sys/net/ipv4/ip_default_ttl) Windows: 128 Cisco: 255 macOS: 64
NGFW use of TTL: TTL can reveal OS fingerprinting — a packet arriving with TTL=127 likely came from Windows (started at 128, lost 1 hop). Firewalls can use this for passive OS detection. Some NGFW features normalise TTL values to prevent fingerprinting attacks.
How Routers Make Forwarding Decisions
ROUTING BASICSEvery router maintains a routing table (also called the FIB — Forwarding Information Base). When a packet arrives, the router looks up the destination IP address in the FIB using Longest Prefix Match (LPM): find the most specific route that covers the destination.
/* Example routing table on a Linux router */ $ ip route show 10.0.0.0/8 via 192.168.1.1 dev eth0 # Match any 10.x.x.x 10.10.0.0/16 via 192.168.1.2 dev eth0 # More specific match 10.10.1.0/24 dev eth1 proto kernel scope link # Most specific — local 0.0.0.0/0 via 203.0.113.1 dev eth2 # Default route (catch-all) /* LPM example: packet destined for 10.10.1.55 */ Matches 0.0.0.0/0 → /0 — too broad Matches 10.0.0.0/8 → /8 — candidate Matches 10.10.0.0/16 → /16 — more specific Matches 10.10.1.0/24 → /24 — MOST SPECIFIC → this one wins /* Router actions after lookup: */ 1. Decrement TTL (if TTL becomes 0: drop + send ICMP Time Exceeded) 2. Recompute IP header checksum (TTL changed) 3. ARP-resolve next-hop MAC if not cached 4. Rewrite Ethernet header: new dst MAC (next-hop) + src MAC (this router's outgoing port) 5. Transmit on outgoing interface
💡 What the router does NOT touch: Source IP, destination IP, and the entire IP payload (TCP/UDP/application data). IP routing is transparent to endpoints — your laptop doesn't know or care how many routers handled its packet. Routers only touch the Ethernet header and the TTL/checksum fields of the IP header.
IP FRAGMENTATION AND PATH MTU DISCOVERY
Why Fragmentation Exists
CONCEPTEvery network link has a Maximum Transmission Unit (MTU) — the largest IP packet it can carry. Standard Ethernet: 1500 bytes. Some links are smaller (PPPoE adds 8 bytes overhead, reducing effective MTU to 1492). When a packet larger than a link's MTU needs to cross that link, IP fragments it into smaller pieces.
Fragmentation happens at any router along the path (not just the sender) and reassembly happens only at the destination host — not at intermediate routers. This design choice avoids reassembly overhead at every hop.
How Fragmentation Works
MECHANICSScenario: a 4000-byte IP packet arrives at a router whose outgoing link has MTU 1500. The router fragments it into three pieces.
20B
20B
ID=x
MF=1
off=0
(1480 bytes)
20B
ID=x
MF=1
off=185
(1480 bytes)
20B
ID=x
MF=0
off=370
(1020 bytes)
Fragment field values explained:
- Identification = x — same value in all 3 fragments (receiver uses this to group them)
- MF=1 — More Fragments — set on first two fragments, MF=0 on the last
- Fragment Offset — in units of 8 bytes: 0 / 185 (1480÷8) / 370 (2960÷8)
- Fragment data size — must be multiple of 8 bytes (except last) to allow correct offset calculation
Fragmentation Problems and PMTUD
ISSUESFragmentation causes several real-world problems:
- Performance overhead — reassembly at the destination consumes CPU and memory. Fragments must be buffered until all arrive.
- Firewall complexity — stateful firewalls must reassemble fragments before inspecting the transport header (TCP/UDP ports are only in the first fragment). This is a significant processing cost.
- Fragment attacks — attackers exploit fragmentation: overlapping fragments (Teardrop), tiny first fragment (hides TCP flags from firewall), missing last fragment (holds reassembly buffer forever).
- ICMP filtering — some networks block ICMP, which breaks Path MTU Discovery (see below).
Path MTU Discovery (PMTUD)
Modern systems avoid fragmentation by discovering the smallest MTU on the path before sending large packets:
⚠️ ICMP Black Holes: If a firewall blocks ICMP (a common but misguided practice), PMTUD breaks. The sender never receives the "Fragmentation Needed" message, packets keep getting dropped silently, and connections hang. This manifests as "large downloads hang after a few KB". NGFW policy must allow ICMP Type 3, Code 4 through for PMTUD to work correctly.
ICMP — INTERNET CONTROL MESSAGE PROTOCOL (RFC 792)
What ICMP Is and Why It Exists
OVERVIEWICMP is IP's built-in diagnostic and error-reporting protocol. It travels inside IP packets (Protocol = 1) and is used by routers and hosts to report errors and exchange control information. ICMP itself has no concept of ports — it operates below TCP/UDP.
ICMP is essential for:
- Ping — testing reachability (Echo Request/Reply)
- Traceroute — discovering the path to a destination (abuses TTL expiry)
- Error reporting — telling senders why their packets were dropped
- Path MTU Discovery — informing senders of MTU limitations (Type 3, Code 4)
ICMP format: 8-byte fixed header (Type, Code, Checksum, + 4 bytes of type-specific data) followed by optional additional data.
ICMP Message Types — Complete Reference
REFERENCE| Type | Code | Name | Direction | Caused By / Use |
|---|---|---|---|---|
0 | 0 | Echo Reply | Host → Pinger | Response to ping (Type 8) |
3 | 0 | Dest Unreachable — Net | Router → Sender | No route to destination network |
3 | 1 | Dest Unreachable — Host | Router → Sender | No route to specific host |
3 | 2 | Dest Unreachable — Protocol | Host → Sender | Protocol not supported on destination |
3 | 3 | Dest Unreachable — Port | Host → Sender | UDP port not listening (no process bound) |
3 | 4 | Fragmentation Needed | Router → Sender | Packet too large + DF=1 set. Includes next-hop MTU. |
3 | 9 | Dest Unreachable — Filtered | Router/FW → Sender | Firewall rejected the packet (admin filter) |
5 | 0-3 | Redirect | Router → Host | Better route exists via different gateway |
8 | 0 | Echo Request | Sender → Host | Ping — tests reachability |
11 | 0 | Time Exceeded (TTL) | Router → Sender | TTL reached 0 — used by traceroute |
11 | 1 | Time Exceeded (Reassembly) | Host → Sender | Fragment reassembly timer expired |
12 | 0-2 | Parameter Problem | Router/Host → Sender | IP header field is invalid |
Traceroute — How It Works
TECHNIQUETraceroute exploits TTL and ICMP Time Exceeded messages to discover every router on the path to a destination. It sends packets with progressively increasing TTL values (1, 2, 3, ...) and each router that drops the packet (TTL=0) sends back its IP address in the ICMP Time Exceeded message.
/* Traceroute algorithm */ Round 1: Send 3 packets with TTL=1 → First router decrements to 0, drops, sends ICMP Time Exceeded → Reveal: first hop IP = 192.168.1.1 (your gateway) Round 2: Send 3 packets with TTL=2 → Second router decrements to 0, drops, sends ICMP Time Exceeded → Reveal: second hop IP = 10.10.1.1 (ISP edge router) Round 3: Send 3 packets with TTL=3 → Third router... and so on until destination replies /* Two implementations */ traceroute on Linux: sends UDP packets to high port (33434+) destination replies with ICMP Port Unreachable (type 3, code 3) tracert on Windows: sends ICMP Echo Requests destination replies with Echo Reply (type 0) /* Run it */ $ traceroute -n 8.8.8.8 # -n skips DNS resolution (faster) $ traceroute -I 8.8.8.8 # -I uses ICMP instead of UDP $ mtr 8.8.8.8 # live updating traceroute
Interpreting traceroute output:
* * *— router doesn't respond to probes (rate-limited or blocks ICMP) — does NOT mean the path is broken there- Increasing RTT — normal as you move further away
- RTT jumping down — ICMP is rate-limited and TTL-exceeded replies travel a shorter path back
- Asymmetric routing — forward and return path may be different (explains apparent RTT anomalies)
ICMP and NGFW — What to Allow, What to Block
NGFW POLICYA common mistake is to block all ICMP at the firewall. This breaks PMTUD and troubleshooting. Here's the correct NGFW policy for ICMP:
| ICMP Type/Code | Direction | NGFW Action | Reason |
|---|---|---|---|
| Type 8 (Echo Request) | Inbound from internet | Block or rate-limit | Reduces attack surface, prevents mapping |
| Type 0 (Echo Reply) | Inbound (reply to outbound ping) | Allow (stateful) | Return traffic for initiated pings |
| Type 3, Code 4 (Frag Needed) | Inbound | Always allow | PMTUD — blocking this breaks connections |
| Type 3, Code 0-3 (Dest Unreach) | Inbound | Allow (stateful) | Error replies for existing connections |
| Type 11 (TTL Exceeded) | Inbound | Allow | Traceroute return path, debugging |
| Type 5 (Redirect) | Inbound | Block | ICMP redirect attacks — can reroute traffic |
| All ICMP | Outbound | Allow | Internal users need full diagnostic capability |
Dissect an IPv4 Header with Scapy and Wireshark
Objective: Craft raw IP packets with specific field values and observe exactly how each field appears in Wireshark. Build deep familiarity with every byte of the IP header.
pip3 install scapy. Open Python3 as root: sudo python3. Import: from scapy.all import *.p = IP(dst="8.8.8.8") / ICMP() then p.show(). Observe every field Scapy has automatically set: version=4, ihl=5, ttl=64, proto=1 (ICMP), src (your IP), dst (8.8.8.8).bytes(p). Count them — 20 bytes of IP header + 8 bytes ICMP = 28 bytes. Identify which bytes correspond to which fields (e.g., bytes 8–9 = TTL + Protocol).send(p). Find it in Wireshark. Expand the Internet Protocol layer. Verify every field matches what Scapy showed.p = IP(dst="8.8.8.8", ttl=1, flags="DF", id=0xBEEF) / ICMP(). Send it. In Wireshark: (a) Does TTL appear as 1? (b) Is the DF flag set? (c) Is the Identification 0xBEEF (decimal 48879)? (d) What ICMP error did you get back — Time Exceeded?hexdump(p) in Scapy shows the raw hex. Manually decode the first 20 bytes: byte 0 = version+IHL (0x45 = version 4, IHL 5), bytes 2-3 = total length, byte 8 = TTL, byte 9 = protocol. Cross-reference with the IP header diagram in Tab 1.Subnetting Practice — Design an NGFW Network
Objective: Design a complete network layout for an NGFW deployment from scratch using subnetting. This simulates real-world work.
10.0.0.0/16 and need to create: (a) Inside LAN for 500 hosts, (b) DMZ for 50 servers, (c) Management network for 20 devices, (d) NGFW to router link (2 hosts only). Design subnets with the minimum waste. Calculate network address, mask, broadcast, first host, last host for each.python3 -c "import ipaddress; n = ipaddress.ip_network('10.0.0.0/23'); print(list(n.hosts())[0], list(n.hosts())[-1], n.broadcast_address)". Use Python's ipaddress module to validate all your subnet calculations.sudo ip addr add 10.0.0.1/23 dev lo label lo:0. Add routes for each subnet: sudo ip route add 10.0.2.0/25 dev lo. Verify routing with ip route get 10.0.2.50.iptables -L -n -v.ICMP and Traceroute Analysis
Objective: Capture and fully decode ICMP messages including ping, Time Exceeded (traceroute), and Destination Unreachable. Understand the complete ICMP interaction with IP.
icmp. Run: ping -c 4 8.8.8.8. Identify Type 8 (Echo Request) and Type 0 (Echo Reply) packets. In the hex dump, find: byte 20 = ICMP Type, byte 21 = ICMP Code, bytes 22-23 = Checksum, bytes 24-27 = Identifier + Sequence number.sudo traceroute -n -I 8.8.8.8. Capture Type 11 (Time Exceeded) replies from intermediate routers. Expand an ICMP Time Exceeded packet — notice it includes the first 8 bytes of the original IP payload (the original IP header is embedded) so the sender knows which packet caused the error.nc -u 8.8.8.8 9999 then type anything and press Enter. You'll get an ICMP Type 3, Code 3 (Port Unreachable) back from Google's server since nothing listens on UDP 9999. Capture and decode it in Wireshark.ans, unans = sr(IP(dst="8.8.8.8", ttl=3)/ICMP(), timeout=2). Then ans.show() — you'll see the response from hop 3 on your path to 8.8.8.8.M03 MASTERY CHECKLIST
- Can explain IP's three core jobs: logical addressing, fragmentation, best-effort delivery
- Know the IPv4 header is 20 bytes minimum (5 × 32-bit rows) and can draw it from memory
- Know every header field: Version, IHL, DSCP/ECN, Total Length, ID, Flags, Fragment Offset, TTL, Protocol, Checksum, Src IP, Dst IP
- Know the key Protocol field values: 1=ICMP, 6=TCP, 17=UDP, 47=GRE, 50=ESP, 89=OSPF
- Know how to find the payload start using IHL: payload_offset = IHL × 4
- Know the two flag bits: DF (Don't Fragment) and MF (More Fragments) and what each does
- Know the three classful address classes (A/B/C) and their default masks (/8, /16, /24)
- Can convert between dotted-decimal and CIDR notation for any prefix length
- Can manually calculate network address, broadcast address, first host, last host, and host count for any given CIDR prefix
- Can subnet a given network into N equal subnets by hand using the block-size method
- Know all RFC 1918 private ranges by heart: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
- Know at least 8 special address ranges: loopback, link-local, multicast, broadcast, shared (CGN), TEST-NETs
- Understand bogon filtering and can list the key ranges an NGFW should drop at the internet perimeter
- Know how TTL works: decremented at each router, ICMP Time Exceeded when 0, default values per OS
- Understand how LPM routing works: router looks up destination IP in FIB, most specific match wins
- Know what a router does and does NOT modify: decrements TTL, recomputes checksum, rewrites Ethernet header — never touches IP src/dst or payload
- Can explain IP fragmentation: what triggers it, Identification/MF/Offset fields, reassembly at destination only
- Understand Path MTU Discovery (PMTUD): DF=1 + ICMP Type 3 Code 4 — and why blocking ICMP breaks it
- Know key ICMP types: 0 (Echo Reply), 3 (Unreachable), 5 (Redirect), 8 (Echo Request), 11 (Time Exceeded)
- Know which ICMP types to allow at NGFW: always allow Type 3 Code 4, block Type 5 (Redirect)
- Can explain how traceroute works: sends TTL=1,2,3... probes, collects ICMP Time Exceeded from each hop
- Completed Lab 1: crafted IPv4 packets in Scapy, decoded header bytes, verified fields in Wireshark
- Completed Lab 2: designed an NGFW subnet layout for 4 zones, configured on Linux, wrote iptables rules
- Completed Lab 3: captured ping, traceroute, and Destination Unreachable ICMP messages, wrote custom traceroute in Scapy
✅ When complete: Move to M04 - IPv6. You now have deep IPv4 knowledge. IPv6 keeps the same layered approach but changes addressing fundamentally — 128-bit addresses, no broadcast, mandatory SLAAC, ICMPv6 replaces ARP. Much of what you learned here maps directly.