NETWORKING MASTERY · PHASE 3 · MODULE 13 · WEEK 11 · PHASE 3 FINAL
🔗 MPLS, VxLAN, GRE and Tunneling
Label switching · Overlay networks · VxLAN VTEP · GRE encapsulation · IPsec tunnels · Tunnel comparison
Intermediate → Advanced Prerequisite: M10, M12 RFC 3032 · RFC 7348 · RFC 2784 Data Centre and VPN Core 2 Labs

WHY TUNNELING EXISTS — OVERLAY OVER UNDERLAY

🔗

The Tunneling Concept

OVERVIEW

Tunneling encapsulates one network protocol inside another — creating a virtual link between two endpoints that may be separated by many intermediate hops that don't need to understand the inner protocol. The underlay is the physical/IP network; the overlay is the virtual network running on top.

Core use cases for tunneling:

  • Carry non-IP traffic over IP — legacy protocols (IPX, SNA) encapsulated in IP/GRE for transport over modern IP networks
  • Connect private networks over public internet — VPN tunnels (GRE+IPsec, WireGuard) connect branch offices over the internet as if they were directly connected
  • Scale L2 over L3 — VxLAN extends Layer 2 Ethernet broadcast domains across Layer 3 IP networks — essential for data centre multi-tenancy and VM migration
  • Traffic engineering — MPLS labels allow routers to forward packets along pre-computed explicit paths, bypassing normal IP routing
  • Network virtualisation — SDN overlays (OVN, NSX, ACI) use tunnels to implement virtual networks with arbitrary topology on top of physical hardware
📦

Encapsulation Overhead Comparison

OVERHEAD
Tunnel TypeAdded HeadersTotal OverheadEffective MTU (from 1500)
GRE (basic)IP(20) + GRE(4)24 bytes1476 bytes
GRE + IPsec (ESP)IP(20) + GRE(4) + ESP(~50)~74 bytes~1426 bytes
VxLANEth(14) + IP(20) + UDP(8) + VxLAN(8)50 bytes1450 bytes
MPLS (1 label)MPLS label(4)4 bytes per label1496 bytes
MPLS (2 labels)MPLS label(8)8 bytes1492 bytes
WireGuardIP(20) + UDP(8) + WireGuard(~32)~60 bytes~1440 bytes
IPsec (ESP transport)ESP(~40)~40 bytes~1460 bytes

⚠️ MTU fragmentation is the #1 tunneling operational problem. When the effective MTU is reduced by tunnel overhead, packets that filled the original MTU now exceed the tunnel's MTU. If DF=1 is set (common with TCP), they get dropped. Solutions: MSS clamping (TCP only), Path MTU Discovery, configuring tunnel endpoints with reduced MTU, jumbo frames on the underlay.

MPLS — MULTIPROTOCOL LABEL SWITCHING

🏷️

MPLS Architecture and Label Forwarding

MPLS

MPLS (RFC 3032) inserts a 32-bit label between the Layer 2 header and the IP header — often called "Layer 2.5". Labels allow routers to forward packets based on a fixed-length label lookup (O(1)) rather than an IP LPM lookup (more complex), and enable traffic engineering by pre-computing explicit paths through the network.

/* MPLS label format (32 bits) */
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                Label (20 bits)                | Exp(3b)|S|  TTL  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Label:  20-bit forwarding label (0–15 = reserved)
Exp:    3-bit traffic class (QoS, formerly called "EXP")
S bit:  Bottom of Stack — set on the innermost label
TTL:    copied from IP TTL on ingress, decremented at each LSR hop

/* MPLS packet structure */
[Ethernet hdr][MPLS label 1][MPLS label 2][IP hdr][TCP hdr][Data]
                ↑ outer label  ↑ inner label
                (multiple labels = "label stack")

/* Label operations */
PUSH:   Ingress LER adds label(s) to packet
SWAP:   Transit LSR replaces label with new label (the forwarding operation)
POP:    Egress LER removes label, exposes inner packet

/* MPLS forwarding table (LFIB) */
Incoming label | Operation | Outgoing label | Outgoing interface
100            | SWAP→200  | 200            | eth1
200            | POP       | (none)         | eth2  → IP routing takes over
300            | PUSH 400  | 400            | eth3  → add outer label

💡 Penultimate Hop Popping (PHP): The second-to-last router in an MPLS path removes the label (POP) before forwarding to the egress router. This allows the egress router to process the packet as pure IP without needing a label lookup. Signalled by the egress router advertising label 3 (Implicit NULL) to its upstream neighbour.

🛣️

MPLS Traffic Engineering and VPNs

APPLICATIONS

MPLS has two dominant applications in service-provider networks:

MPLS-TE (Traffic Engineering)

RSVP-TE or LDP establishes explicit Label Switched Paths (LSPs) through the network following a pre-computed route (not necessarily the shortest IGP path). Allows bandwidth reservation, fast-reroute (50ms failover), and load distribution across parallel paths.

MPLS L3VPN (BGP/MPLS VPN)

Service providers use MPLS+BGP to provide isolated virtual private networks to customers. Customer routes are carried in BGP with a Route Distinguisher (RD) to separate them. The MPLS label stack (outer=transport, inner=VPN) directs packets to the correct customer VRF at the egress PE router.

GRE — GENERIC ROUTING ENCAPSULATION (RFC 2784)

🔧

GRE Header and Operation

GRE

GRE (Generic Routing Encapsulation) is the simplest tunnel protocol. It encapsulates any L3 protocol packet inside an IP packet with a small GRE header. GRE itself provides no encryption or authentication — it's just a wrapper. Encryption is typically added by combining GRE with IPsec.

/* GRE packet structure */
[Outer IP hdr: src=tunnel_src dst=tunnel_dst proto=47]
[GRE header: 4 bytes minimum]
  Flags(4b) | Reserved(9b) | Version(3b) | Protocol Type(16b)
  [Optional: Checksum(16b) + Reserved(16b)]
  [Optional: Key(32b)]
  [Optional: Sequence Number(32b)]
[Inner IP packet: src=orig_src dst=orig_dst]
[Original payload]

/* GRE Protocol Type field — what's inside */
0x0800 = IPv4 (most common)
0x86DD = IPv6
0x0806 = ARP
0x8847 = MPLS

/* Linux GRE tunnel setup */
# Create GRE tunnel interface
ip tunnel add gre1 mode gre local 203.0.113.1 remote 198.51.100.1 ttl 255
ip link set gre1 up
ip addr add 10.100.0.1/30 dev gre1

# Route traffic through tunnel
ip route add 192.168.2.0/24 via 10.100.0.2 dev gre1

# Verify
ip tunnel show
ping 10.100.0.2   # ping tunnel endpoint

/* GRE keepalives (Cisco extension) */
# GRE itself has no keepalive — use OSPF/BFD over the tunnel for failure detection
# Or configure GRE keepalives (encapsulate keepalive inside GRE inside tunnel)

💡 GRE + IPsec is the classic site-to-site VPN. GRE provides the tunnel (any-protocol encapsulation, routing over the tunnel), and IPsec provides encryption and authentication. Most enterprise VPN gateways still use this combination. Modern alternatives: WireGuard (simpler, faster), IPsec IKEv2 (no GRE needed), OpenVPN.

VxLAN — VIRTUAL EXTENSIBLE LAN (RFC 7348)

🏗️

Why VxLAN Exists — Scaling L2 Over L3

VXLAN

Traditional VLANs have a fundamental limitation: they are bounded by a Layer 3 network. Two VMs in the same VLAN must be on the same L2 segment — you can't have VLAN 100 span across multiple data centre buildings connected by IP routing. With cloud and hyperscale data centres needing millions of isolated tenant networks, the 4094 VLAN limit was also a constraint.

VxLAN solves both problems: it encapsulates entire Ethernet frames (including VLAN tags) inside UDP/IP packets, allowing L2 segments to span any IP network. The VxLAN Network Identifier (VNI) is 24 bits — supporting 16 million isolated networks.

📦

VxLAN Encapsulation and VTEP

VXLAN DETAILS
/* VxLAN packet structure */
[Outer Ethernet: src=VTEP_MAC dst=next-hop_MAC type=0x0800]
[Outer IP: src=VTEP_IP dst=remote_VTEP_IP proto=17 (UDP)]
[Outer UDP: src=ephemeral dst=4789 (IANA VxLAN port)]
[VxLAN header: 8 bytes]
  Flags(8b) | Reserved(24b) | VNI(24b) | Reserved(8b)
  (I flag = 1 when VNI is valid)
[Inner Ethernet frame: src=VM_MAC dst=dest_VM_MAC type=0x0800]
[Inner IP packet]
[Payload]

Total overhead: 50 bytes → effective MTU 1450 from standard 1500-byte underlay

/* VNI — VxLAN Network Identifier */
24 bits → 16,777,216 unique overlay networks
Equivalent to VLAN ID but vastly larger scale
Each VNI is a separate L2 broadcast domain

/* VTEP — VxLAN Tunnel End Point */
The device that encapsulates/decapsulates VxLAN:
  On ingress (from VM): Ethernet frame → wrap in VxLAN/UDP/IP
  On egress (to VM):    VxLAN/UDP/IP → unwrap → deliver Ethernet frame
VTEPs can be:
  - Hypervisor (Linux bridge/OVS with VXLAN)
  - Hardware switch (ToR switch with VxLAN support)
  - Dedicated gateway appliance

/* Linux VxLAN setup */
# Create VxLAN tunnel interface
ip link add vxlan100 type vxlan id 100 dstport 4789 \
    local 10.0.0.1 remote 10.0.0.2 dev eth0

ip link set vxlan100 up
ip addr add 192.168.100.1/24 dev vxlan100

# Add static FDB entry (tell Linux: MAC xx is at remote VTEP 10.0.0.2)
bridge fdb add aa:bb:cc:dd:ee:ff dev vxlan100 dst 10.0.0.2

# Multicast VxLAN (learning mode)
ip link add vxlan100 type vxlan id 100 group 239.1.1.1 dev eth0
# BUM (Broadcast, Unknown unicast, Multicast) traffic → multicast group
# VTEPs join the multicast group — learn each other's MACs via flooding
🎛️

EVPN — BGP Control Plane for VxLAN

EVPN

Traditional VxLAN floods BUM (Broadcast, Unknown unicast, Multicast) traffic to discover MACs — this doesn't scale. EVPN (Ethernet VPN, RFC 7432) uses BGP as a control plane to distribute MAC-to-IP-to-VTEP mappings, eliminating flooding:

/* EVPN Route Types (the key ones) */
Type 2 (MAC/IP Advertisement):
  "MAC aa:bb:cc:dd:ee:ff, IP 192.168.1.5 is at VTEP 10.0.0.1, VNI 100"
  → VTEPs learn MAC/IP locations via BGP, no flooding needed

Type 3 (Inclusive Multicast):
  "VTEP 10.0.0.1 participates in VNI 100 BUM forwarding"
  → Ingress replication list instead of multicast

/* Symmetric IRB — Integrated Routing and Bridging */
# Layer 3 routing between VNIs without leaving the VxLAN fabric
# Each VTEP acts as a distributed gateway for its local VMs
# No hairpinning through a central gateway router

/* Modern data centre: Leaf-Spine with VxLAN+EVPN */
Spine switches:  pure IP underlay + iBGP route reflector for EVPN
Leaf switches:   VTEPs + EVPN BGP speakers
VMs/containers:  connected to leaf switches, in VxLAN VNIs

/* FRR VxLAN+EVPN config */
router bgp 65001
  address-family l2vpn evpn
    neighbor SPINE activate
    advertise-all-vni

OTHER TUNNEL TYPES — GENEVE, WIREGUARD, 6IN4

🔧

Tunnel Protocol Reference

REFERENCE
ProtocolRFCTransportOverheadUse Case
GRERFC 2784IP Proto 4724BSite-to-site VPN (with IPsec), multi-protocol transport, GRE keepalives
IP-in-IPRFC 2003IP Proto 420BSimple IPv4-in-IPv4; no options/encryption, minimum overhead
6in4RFC 4213IP Proto 4120BIPv6-in-IPv4 tunnels; connect IPv6 islands over IPv4 backbone
VxLANRFC 7348UDP 478950BData centre overlay, VM mobility, L2 over L3, cloud networking
GENEVERFC 8926UDP 608150B+Next-gen overlay (OpenStack, OVN); extensible TLV options in header
MPLSRFC 3032Between L2/L34B/labelService provider TE, L3VPN, L2VPN, fast-reroute
IPsec (tunnel)RFC 4303IP Proto 50/51~50BEncrypted site-to-site and remote-access VPN; mandatory encryption
WireGuardUDP (custom)~60BModern VPN: simple, fast, strong crypto (ChaCha20/Poly1305/Curve25519)
VLAN (802.1Q)IEEE 802.1QEthernet tag4BL2 network segmentation; not technically a tunnel but a virtual L2 overlay
PPPoERFC 2516Ethernet8BISP DSL access; encapsulates PPP in Ethernet; reduces MTU to 1492

WHEN TO USE WHICH TUNNEL — DECISION GUIDE

🎯

Tunnel Selection Decision Guide

DECISION
/* Which tunnel to use — decision tree */

Need to connect two office networks over internet securely?
  → IPsec IKEv2 (standard, vendor-interoperable)
  → WireGuard (modern, simple, fast — if both ends are Linux/modern)
  → GRE + IPsec (if you need routing protocols over the tunnel)

Need to carry non-IP traffic (e.g., IPX, MPLS) over IP?
  → GRE (supports any EtherType in Protocol Type field)

Need to scale L2 (VMs, containers) across IP data centre fabric?
  → VxLAN (with EVPN for control plane)
  → GENEVE (if you need extensible metadata in the header)

Need traffic engineering and bandwidth reservation in SP network?
  → MPLS-TE with RSVP-TE

Need the absolute minimum overhead (no encryption needed)?
  → IP-in-IP (20 bytes overhead, IPv4 only)

Connecting IPv6 island over IPv4 network?
  → 6in4 (static), 6to4 (automatic), Teredo (through NAT)

Need a simple test or diagnostic tunnel?
  → GRE (easiest to configure on Linux with ip tunnel add)

NGFW CHALLENGES WITH TUNNELED TRAFFIC

🛡️

The Tunnel Inspection Problem

NGFW

Tunnels present a fundamental challenge for NGFWs: the firewall sees the outer packet (which may be innocuous — UDP to port 4789, or IP proto 47) but not the inner packet (which may contain malicious traffic). An attacker can use a tunnel to bypass firewall rules by hiding prohibited traffic inside permitted tunnel traffic.

Tunnel TypeWhat NGFW Sees Without InspectionInspection Approach
GREIP packets destined to tunnel endpoint (Proto 47)Decapsulate GRE at firewall, inspect inner IP packet against policy, re-encapsulate or forward
VxLANUDP port 4789 traffic between VTEPsDecapsulate at hypervisor/switch level before reaching NGFW, or deploy NGFW as a VTEP; EVPN allows policy attachment to VNIs
IPsec (encrypted)Encrypted ESP/AH packets — opaque contentTerminate IPsec at NGFW → inspect decrypted content → re-encrypt. Or use split-tunneling to bypass NGFW for trusted traffic
DNS tunnellingLegitimate-looking UDP 53 trafficDeep DNS inspection: entropy analysis, label length, query frequency (see M07)
HTTPS tunnelsTLS-encrypted traffic on 443SSL inspection (see M08)
ICMP tunnelsICMP Echo Request/ReplyInspect ICMP data field for non-standard content (see M06)
/* GRE decapsulation in NGFW (VPP-style) */
/* Packet arrives: outer IP → GRE → inner IP → TCP → payload */

1. ip4-input: outer IP validated, routed to gre-input graph node
2. gre-input: outer IP and GRE header stripped
3. Inner packet injected back into ip4-input
4. ip4-input: inner IP subject to full policy (ACL, conntrack, DPI)
5. If policy permits: route inner packet; NGFW logs both
   outer (IP src/dst of tunnel endpoints) and inner (actual src/dst)

/* VxLAN inspection flow */
Outer UDP dst=4789 → vxlan-input → strip outer Eth+IP+UDP+VxLAN
Inner Ethernet frame → subject to L2/L3 policy per VNI
VNI 100 = "tenant network A" → apply tenant A's security policy
VNI 200 = "tenant network B" → apply tenant B's security policy
LAB 1

GRE Tunnel Setup and Analysis

Objective: Create a GRE tunnel between two Linux VMs, route traffic through it, and capture the encapsulated packets to understand the header structure.

1
On VM1 (outer IP 10.0.0.1): sudo ip tunnel add gre1 mode gre local 10.0.0.1 remote 10.0.0.2 ttl 255; sudo ip link set gre1 up; sudo ip addr add 172.16.0.1/30 dev gre1. On VM2 (outer IP 10.0.0.2): same commands with reversed IPs. Test: ping 172.16.0.2.
2
Capture the traffic: on VM1, run sudo tcpdump -i eth0 proto 47 -v while pinging through the tunnel. You should see GRE packets (IP proto 47) with an outer IP src/dst and an inner ICMP payload. Note the double IP header in the capture.
3
Open the capture in Wireshark. Expand the GRE packet: outer Ethernet, outer IP (proto=47), GRE header (protocol type=0x0800 = IPv4), inner IP, inner ICMP. Identify the tunnel overhead: how many extra bytes vs a direct ICMP ping?
4
Test MTU: ping with large packets: ping -M do -s 1472 172.16.0.2. The effective MTU through GRE is 1476 (1500-20-4). With -s 1473 (1501B IP = exceeds 1476B GRE MTU), you should get "Frag needed". Add a route to a remote subnet through the tunnel and verify end-to-end connectivity.
LAB 2

VxLAN Overlay Network

Objective: Create a VxLAN overlay that allows two VMs on different physical hosts (different subnets) to appear as if they're on the same L2 segment.

1
On Host1 (underlay IP 10.0.0.1): create VxLAN interface with VNI 100: sudo ip link add vxlan100 type vxlan id 100 dstport 4789 local 10.0.0.1 remote 10.0.0.2 dev eth0; sudo ip link set vxlan100 up; sudo ip addr add 192.168.100.1/24 dev vxlan100. On Host2: same with .2 addresses.
2
Capture VxLAN traffic: on Host1, sudo tcpdump -i eth0 udp port 4789 -v while pinging 192.168.100.2. In Wireshark, expand the packet: outer Ethernet, outer IP (UDP), VxLAN header (VNI=100), inner Ethernet, inner ICMP.
3
Verify the inner Ethernet frame: the inner Ethernet dst/src are the VxLAN interface MAC addresses, not the physical interface MACs. This is the key insight: to the overlay network, the VxLAN interfaces appear directly connected at L2 regardless of the physical topology.
4
Bonus — multiple VNIs: Add a second VxLAN interface with VNI 200 on both hosts with a different /24 overlay subnet. Verify VNI 100 and VNI 200 are completely isolated — ping from VNI 100 cannot reach VNI 200 addresses (no inter-VNI routing configured). This is L2 isolation between tenants.

M13 MASTERY CHECKLIST

🎉 Phase 3 Complete — Routing and Forwarding

You have completed all 4 modules of Phase 3: Routing and FIB (M10), OSPF (M11), BGP (M12), and Tunneling (M13). You can now design, analyse, and implement the routing infrastructure an enterprise or service-provider network requires. Move to Phase 4 — Linux Networking and Socket Programming, starting with M14 - Linux Network Stack.

← M12 BGP 🗺️ Roadmap Next: M14 - Linux Network Stack →