SYSTEM DESIGN MASTERY · TRACK B · MODULE B14 · WEEK 16 CONTAINER ORCHESTRATION · MICROSERVICES · ISTIO

THE DE FACTO OPERATING SYSTEM OF THE CLOUD

KUBERNETES &
CONTAINERS

Moving from single servers to distributed clusters. Learn how Kubernetes automates deployment, scaling, healing, and operations of application containers across clusters of hosts.

CONTROL PLANE

WORKER NODES

OOTB

AUTO-HEALING

Cluster Architecture

How Kubernetes manages state across thousands of independent machines.

The Goal: Abstract away individual machines. Instead of saying "Deploy this Node.js app to Server A, Server B, and Server C," you tell Kubernetes, "Ensure there are always 3 copies of this Node.js app running somewhere in the cluster, evenly distributed."

Control Plane BRAIN

The master node(s) that manage the cluster state. It makes global decisions (scheduling, responding to cluster events). Contains:

API Server: The front door. All commands (`kubectl`) go here.
etcd: Highly-available Key-Value store holding cluster state.
Scheduler: Assigns newly created Pods to Worker Nodes based on CPU/RAM.
Controller Manager: Runs background loops to constantly match current state to desired state.

Worker Node MUSCLE

The VMs or physical servers where your application containers actually run. Contains:

Kubelet: The agent running on each node. Ensures containers are healthy and running in their Pods.
Kube-Proxy: Maintains network rules on the node, allowing network communication to your Pods.
Container Runtime: The software that actually runs containers (e.g., containerd, CRI-O).

The Hierarchy of K8s Primitives

The abstractions you use to define your infrastructure as code.

Pod

The smallest deployable unit. Usually contains one container (e.g., your Spring Boot app), but can contain sidecars. Pods are ephemeral — if a node dies, the Pod dies and is not revived. A new one takes its place elsewhere.

ReplicaSet

Ensures a specified number of identical Pod replicas are running at any given time. Usually not managed directly by humans, but by Deployments.

Deployment

State: Stateless Apps

Manages ReplicaSets to provide declarative updates. You define the desired state ("I want v2 of my app with 3 replicas"), and the Deployment handles the rolling update from v1 to v2 with zero downtime.

Service

An abstract way to expose an application running on a set of Pods as a network service. Because Pod IPs constantly change, the Service provides a single stable IP and DNS name that load-balances across the healthy Pods.

StatefulSet

State: Stateful Apps (DBs)

Like a Deployment, but provides guarantees about the ordering and uniqueness of Pods. Pods get sticky, persistent identities (e.g., `mysql-0`, `mysql-1`) and persistent storage volumes that survive Pod restarts. Essential for running databases in K8s.

Autoscaling Dimensions

Reacting to traffic spikes automatically without human intervention.

HPA (Horizontal Pod Autoscaler)

Scales: Up & Out (More Pods)

Watches metrics like CPU or memory utilization. If CPU > 75%, it automatically increases the `# of replicas` in the Deployment. Essential tool for handling variable daily traffic.

VPA (Vertical Pod Autoscaler)

Scales: Up (Bigger Pods)

Automatically adjusts the CPU and memory reservations for your Pods over time. If your Java app keeps OOM-killing, VPA will restart it with a higher memory limit automatically.

Cluster Autoscaler

Scales: Worker Nodes (Infrastructure)

When HPA creates new Pods, but all Worker Nodes are full, those Pods stay in a "Pending" state. Cluster Autoscaler notices this and talks to AWS/GCP to provision completely new underlying VMs, joining them to the cluster.

Interview Tip: When asked "How does the system handle a massive spike?" configuring HPA + Cluster Autoscaler allows the system to seamlessly burst without manual ops intervention.

Service Mesh & Ingress

Managing complex L7 network routing, security, and observability.

Ingress Controller (e.g., NGINX Ingress)

An API object that manages external access to the services in a cluster. It provides HTTP/HTTPS routing based on URLs. E.g., `api.example.com/billing` goes to the Billing Service, while `api.example.com/users` goes to the Users Service. It typically handles SSL termination.

Service Mesh (Istio / Linkerd)

The Sidecar Pattern

As your microservices grow to 50+, managing retries, timeouts, circuit breakers, and mutual TLS between every service inside the application code becomes an unmaintainable nightmare.

A Service Mesh solves this by injecting a Sidecar Proxy (like Envoy) into every single Pod. Your application code only talks to `localhost`. The sidecar intercepts all traffic and handles:

Transparent mTLS encryption between all services.
L7 Load Balancing (e.g., canary routing 10% traffic to v2).
Automatic retries, timeouts, and circuit breaking.
Distributed tracing and metrics collection (Prometheus/Jaeger integration).

Quick Answers

K8s concepts commonly tested in system design interviews.

Should you run a relational database (PostgreSQL/MySQL) in Kubernetes?

Technically possible using StatefulSets and PVs (Persistent Volumes). However, in most enterprise environments, it is far safer and less operationally burdensome to use managed databases (like AWS RDS or Aurora). Use K8s for stateless compute, use Cloud Providers for stateful storage. If you MUST run it in K8s, use an Operator (like Patroni) to manage failover.

What is the difference between a Liveness Probe and a Readiness Probe?

Readiness Probe: Does this container have established DB connections and is ready to receive traffic? If it fails, K8s stops sending traffic to the Pod via Services, but doesn't kill it.

Liveness Probe: Is this container deadlocked or crashed? If it fails, K8s restarts the container.

What is the "Split-Brain" problem in etcd?

etcd uses the Raft consensus algorithm, which strictly requires a majority quorum `(N/2)+1` to write state. Running a 2-node etcd cluster is dangerous because if the network partitions, neither node has a majority, and the cluster becomes read-only. Always run etcd (Control Plane) in odd numbers: 3, 5, or 7.

0 / 6 completedMODULE B14 PROGRESS

Understand the separation of Control Plane and Worker Nodes.

Can differentiate Pods, Deployments, Services, and StatefulSets.

Know the three dimensions of autoscaling (HPA, VPA, Cluster Autoscaler).

Understand what Ingress is and how it routes L7 HTTP traffic.

Can explain the Sidecar pattern and the primary uses for a Service Mesh.

Completed HLD Track and ready for full System Design mock interviews.

← B13 ML SYSTEMS ↑ ROADMAP NEXT: C1 CONSENSUS →