M07 — NoSQL: Redis, MongoDB & Cassandra

NoSQL taxonomy, Redis data structures with complexity guarantees, caching patterns, persistence & pub/sub, Lua scripting, rate limiting, MongoDB aggregation pipeline, Cassandra data modeling by access pattern, and hiredis in C.

Phase 2 — Databases & Storage ~5 hrs

🗺️ NoSQL Taxonomy — When to Choose What

Category	Model	Flagship	Strength	Weakness	Sweet Spot
Key-Value	Hash map	Redis, DynamoDB	Sub-ms reads; simple	No relational query	Sessions, caches, counters
Document	JSON/BSON tree	MongoDB, Couchbase	Flexible schema; rich queries	Multi-doc transactions costly	Catalogs, user profiles, CMS
Wide-Column	Partition → rows	Cassandra, HBase	Write-optimized; linear scale	Query-driven design required	Time-series, IoT, activity feeds
Graph	Vertices + edges	Neo4j, Amazon Neptune	Relationship traversal	Doesn't scale as wide	Social graphs, recommendations
Time-Series	Timestamped metrics	InfluxDB, TimescaleDB	Compression, retention policies	Poor ad-hoc relational queries	Monitoring, telemetry

⚖️ CAP Theorem — The Impossibility Triangle

In a network partition you must choose between Consistency and Availability. You can never have all three simultaneously.

          Consistency
          (every read sees
           latest write)
              /\
             /  \
            /    \
           /  CA  \        ← no real-world distributed system
          /--------\
         / CP  | AP \
        /      |     \
Partition ─────────── Availability
Tolerance              (always responds,
(nodes can             may be stale)
 fail/split)

CP examples: HBase, Zookeeper, Redis Cluster (default)
AP examples: Cassandra (tunable), DynamoDB, CouchDB
CA example:  Single-node PostgreSQL (no partition tolerance)

PACELC extension: Even without a partition (P), there is a latency (L) vs consistency (C) tradeoff. Cassandra lets you tune this per-query with consistency levels (ONE → QUORUM → ALL).

🔄 SQL vs NoSQL — Decision Checklist

Choose SQL when:

Data is naturally relational with many joins
ACID transactions across multiple entities are required
Schema is stable and well-understood
Complex ad-hoc reporting or analytics
Team is familiar with SQL tooling

Choose NoSQL when:

Access pattern is known and narrow (read by key)
Horizontal scale > vertical scale (write throughput)
Schema evolves rapidly (document stores)
Geo-distributed with tunable consistency
Specific data model fits: graph, time-series, cache

Analogy: SQL is a Swiss Army knife — powerful for unknown problems. NoSQL tools are surgical instruments — each optimized for one job. Use the right tool.

🏗️ Redis Architecture — Single-Threaded Event Loop

Client 1 ──┐
Client 2 ──┤   TCP socket   ┌─────────────────────────────────┐
Client 3 ──┼───────────────►│  I/O Multiplexer (epoll/kqueue) │
   ...      │               │  ─────────────────────────────  │
            └───────────────►│  Command Queue (FIFO)           │
                            │  ─────────────────────────────  │
                            │  Single Worker Thread            │
                            │    executes commands serially    │
                            │    → no locking needed          │
                            │  ─────────────────────────────  │
                            │  In-Memory Data Structures       │
                            │  (dict, quicklist, listpack,    │
                            │   skiplist, rax, stream)         │
                            └─────────────────────────────────┘
                                        │
                                  Background threads:
                                  • AOF fsync
                                  • RDB fork + write
                                  • Lazy free (UNLINK)

Because the main thread is single-threaded, a slow command (e.g., KEYS * on a large dataset) blocks all other clients. Never run KEYS in production — use SCAN with a cursor instead.

📦 Data Structures — Commands & Complexity

Type	Key Commands	Complexity	Internal Encoding	Use Case
String	`SET/GET`, `INCR`, `MSET/MGET`, `SETNX`, `SETEX`	O(1)	SDS (simple dynamic string)	Counters, cache values, distributed lock tokens
Hash	`HSET/HGET`, `HMSET`, `HGETALL`, `HINCRBY`, `HDEL`	O(1) field ops; O(n) HGETALL	listpack (small) → dict (large)	User profile fields, config objects
List	`LPUSH/RPUSH`, `LPOP/RPOP`, `LRANGE`, `BLPOP`	O(1) push/pop; O(n) LRANGE	listpack (small) → quicklist	Task queues, recent activity feeds
Set	`SADD/SREM`, `SISMEMBER`, `SUNION/SINTER/SDIFF`	O(1) SADD/SREM; O(n) SUNION	listpack (small) → intset → dict	Unique visitors, tags, friend lists
Sorted Set	`ZADD`, `ZRANGE`, `ZRANGEBYSCORE`, `ZRANK`, `ZINCRBY`	O(log n) ZADD/ZRANK; O(log n + m) ZRANGE	listpack (small) → skiplist + dict	Leaderboards, delayed job queues, rate limiting windows
Stream	`XADD`, `XREAD`, `XRANGE`, `XGROUP CREATE`, `XACK`	O(1) XADD; O(n) XRANGE	listpack + rax (radix tree)	Event log, message bus, audit trail
Bitmap	`SETBIT/GETBIT`, `BITCOUNT`, `BITOP`	O(1) SETBIT; O(n) BITCOUNT	String (bit-addressed)	Feature flags, daily active user tracking
HyperLogLog	`PFADD`, `PFCOUNT`, `PFMERGE`	O(1), uses ~12KB max	Probabilistic sketch	Unique count estimates (± 0.81% error)

🔑 Key Patterns & Naming Conventions

# Naming: use colon-separated namespaces user:42:profile # Hash — user 42's profile fields session:abc123 # String — session token → user_id mapping post:7:views # String — view counter for post 7 leaderboard:"2026-03" # Sorted Set — monthly leaderboard queue:email # List — email job queue # Setting a value with TTL (session expires in 30 min) SET session:abc123 42 EX 1800 # Atomic increment — safe without transactions INCR post:7:views # returns new value, atomic # Hash — store object fields separately (partial updates) HSET user:42:profile name "Alice" email "alice@example.com" age 28 HGET user:42:profile name # → "Alice" HINCRBY user:42:profile age 1 # happy birthday — no read-modify-write # Sorted set leaderboard ZADD leaderboard:"2026-03" 1500 "alice" ZADD leaderboard:"2026-03" 2200 "bob" ZRANGE leaderboard:"2026-03" 0 -1 REV WITHSCORES # → [bob 2200, alice 1500] (descending)

Key expiry gotcha: Redis TTL applies to the top-level key, not fields. If you store user fields in user:42:profile hash, setting TTL on the hash expires ALL fields at once. There is no per-field TTL.

🚫 Commands to Avoid in Production

Dangerous Command	Why Dangerous	Safe Alternative
`KEYS pattern`	O(n) — scans all keys; blocks event loop	`SCAN 0 MATCH pattern COUNT 100`
`FLUSHALL`	Deletes every key in all databases	`SCAN` + targeted `DEL`; or `FLUSHDB ASYNC`
`DEBUG SLEEP`	Explicitly blocks the server	Never in production
`DEL big-key`	O(n) — synchronous deletion blocks loop	`UNLINK big-key` (async lazy-free)
`LRANGE 0 -1` on huge list	Transfers entire list over network	Paginate with `LRANGE 0 99`, then next page
`SMEMBERS big-set`	O(n) — returns all members	`SSCAN` with cursor

🔄 Cache-Aside (Lazy Loading) — Most Common Pattern

Application                   Cache (Redis)           Database
     │                              │                     │
     │──── GET user:42 ────────────►│                     │
     │                              │ MISS                │
     │◄─── nil ─────────────────────│                     │
     │                              │                     │
     │──── SELECT * FROM users ───────────────────────────►│
     │◄─── row {id:42, name:...} ──────────────────────────│
     │                              │                     │
     │──── SET user:42 ... EX 300 ─►│                     │
     │                              │ stored              │
     │  (later)                     │                     │
     │──── GET user:42 ────────────►│                     │
     │◄─── {id:42, name:...} ───────│   HIT — no DB call  │

// Node.js pseudo-code async function getUser(userId) { const key = `user:${userId}`; const cached = await redis.get(key); if (cached) return JSON.parse(cached); const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]); await redis.set(key, JSON.stringify(user), 'EX', 300); // TTL 5 min return user; } // On update: invalidate the cache async function updateUser(userId, data) { await db.query('UPDATE users SET name=$1 WHERE id=$2', [data.name, userId]); await redis.del(`user:${userId}`); // evict stale entry }

📊 All Four Caching Patterns Compared

Pattern	Who Manages Cache	On Read Miss	On Write	Consistency	Best For
Cache-Aside	Application	App reads DB, populates cache	App updates DB, deletes/updates cache	Eventual (TTL-bounded)	General-purpose, read-heavy
Read-Through	Cache library/proxy	Cache fetches from DB automatically	App writes to cache; cache syncs DB	Eventual	When you can plug in a cache provider
Write-Through	Cache library/proxy	Cache fetches from DB	Write to cache AND DB synchronously	Strong (but higher write latency)	Read-heavy, strong consistency needed
Write-Behind	Cache library/proxy	Cache fetches from DB	Write to cache only; async flush to DB	Weak (data loss risk on crash)	Write-heavy, can tolerate brief loss

🗑️ Eviction Policies — What Happens When Memory Is Full

Policy	Behavior	When to Use
`noeviction`	Return error on write when memory full	When data loss is unacceptable (primary store)
`allkeys-lru`	Evict least-recently-used key across all keys	General cache — frequently accessed items stay hot
`volatile-lru`	Evict LRU key only from keys with TTL set	Mix of persistent + cached keys in same instance
`allkeys-lfu`	Evict least-frequently-used (Redis 4+)	Better than LRU for skewed access distributions
`volatile-ttl`	Evict key with shortest remaining TTL first	When shorter-lived items are more "disposable"
`allkeys-random`	Evict random key — no intelligence	Uniform access patterns (rare)

# redis.conf maxmemory 2gb maxmemory-policy allkeys-lru

⚠️ Cache Stampede (Thundering Herd) & Fixes

Scenario: popular cache key expires at T=0

T=0:  100 requests hit cache → all MISS
      → all 100 simultaneously query DB
      → DB overwhelmed, latency spikes

Fix 1: Probabilistic early re-computation
  When remaining TTL < threshold: randomly re-cache
  → one request re-caches while others still get old value

Fix 2: Lock / Mutex (Redis SET NX)
  First miss acquires distributed lock → fetches DB
  Others wait → then all read from cache (or retry)

Fix 3: Background refresh
  Scheduled job refreshes cache before TTL expires
  → cache never actually empty for popular keys

// Redis distributed lock for cache stampede prevention const lockKey = `lock:user:${userId}`; const token = crypto.randomUUID(); // SET NX EX — atomic: only succeeds if key doesn't exist const acquired = await redis.set(lockKey, token, 'NX', 'EX', 5); if (acquired) { const user = await db.fetchUser(userId); await redis.set(`user:${userId}`, JSON.stringify(user), 'EX', 300); await redis.del(lockKey); // release lock } else { // Another request is fetching — wait & retry await sleep(50); return getUser(userId); // retry }

💾 Persistence — RDB vs AOF

	RDB (Redis Database)	AOF (Append-Only File)
Mechanism	Periodic fork + full memory snapshot to `.rdb` file	Log every write command; replay on restart
Trigger	`save 900 1` (after 1 change in 15 min), `BGSAVE`	Every write; fsync configurable
Restart speed	Fast (load binary snapshot)	Slow if AOF is huge (replay all commands)
Data loss risk	Up to snapshot interval (minutes)	Up to 1 second (`appendfsync everysec`)
File size	Compact binary	Grows; periodically compacted with `BGREWRITEAOF`
Production rec.	Use for backups / fast restarts	Use for durability (near-zero data loss)

Best practice: Run both. RDB for point-in-time backups; AOF with appendfsync everysec for durability. Redis docs call this "the best of both worlds."

# redis.conf — recommended production settings # RDB save 900 1 # snapshot if ≥1 change in 900s save 300 10 # snapshot if ≥10 changes in 300s save 60 10000 # snapshot if ≥10000 changes in 60s # AOF appendonly yes appendfsync everysec # balance: 1s max loss auto-aof-rewrite-percentage 100 # rewrite when AOF doubles auto-aof-rewrite-min-size 64mb

📢 Pub/Sub — Fire-and-Forget Messaging

Publisher                    Redis                     Subscribers
    │                          │                           │
    │── PUBLISH notifications  │                           │
    │   "{"type":"like",...}" ─►│                           │
    │                          │──► "{"type":"like",...}" ─►│ Sub A
    │                          │──► "{"type":"like",...}" ─►│ Sub B
    │                          │                           │
    │ (publisher doesn't know  │ (no message persistence   │
    │  who is subscribed)      │  — if sub is offline,     │
    │                          │  message is LOST)         │

// Publisher (Node.js) await redis.publish('notifications', JSON.stringify({ type: 'like', postId: 7, userId: 42 })); // Subscriber (must use a separate connection — SUBSCRIBE blocks it) const sub = redis.duplicate(); await sub.subscribe('notifications', (message) => { const event = JSON.parse(message); console.log('Received:', event); });

Pub/Sub vs Streams: Pub/Sub has no persistence and no consumer groups. If a subscriber is down, messages are lost. For reliable messaging with replay and consumer groups, use Redis Streams (XADD/XREAD/XGROUP).

⚙️ Lua Scripting — Atomic Multi-Command Operations

Redis executes Lua scripts atomically — no other command runs between script operations. This is the safe way to implement read-modify-write patterns without transactions.

-- Lua: atomic check-and-set with condition -- KEYS[1] = key, ARGV[1] = expected value, ARGV[2] = new value local current = redis.call('GET', KEYS[1]) if current == ARGV[1] then redis.call('SET', KEYS[1], ARGV[2]) return 1 end return 0

// Node.js: run the Lua script (EVAL) const script = ` local current = redis.call('GET', KEYS[1]) if current == ARGV[1] then redis.call('SET', KEYS[1], ARGV[2]) return 1 end return 0 `; const result = await redis.eval(script, 1, 'mykey', 'old-value', 'new-value');

🚦 Rate Limiting with Redis

Pattern 1 — Fixed Window (INCR + EXPIRE)

// Allow 100 requests per minute per IP async function isAllowed(ip) { const window = Math.floor(Date.now() / 60000); // 1-minute window const key = `ratelimit:${ip}:${window}`; const count = await redis.incr(key); if (count === 1) await redis.expire(key, 60); // set TTL on first request return count <= 100; }

Pattern 2 — Sliding Window (Sorted Set)

// More accurate: tracks exact timestamps of requests async function isAllowedSliding(ip, limit = 100, windowMs = 60000) { const now = Date.now(); const key = `ratelimit:sliding:${ip}`; const pipeline = redis.multi(); pipeline.zremrangebyscore(key, 0, now - windowMs); // evict old entries pipeline.zadd(key, now, now.toString()); // add current request pipeline.zcard(key); // count in window pipeline.expire(key, Math.ceil(windowMs / 1000)); // auto-cleanup const results = await pipeline.exec(); const count = results[2][1]; // ZCARD result return count <= limit; }

Fixed window is simpler but has a boundary burst problem: 100 requests at 0:59 and 100 at 1:01 = 200 requests in 2 seconds. Sliding window prevents this at the cost of more memory per key.

📄 Document Model — BSON & Schema Design

// BSON document example (stored as blog post) { _id: ObjectId("65e3f1a2b4c8d9e0f1234567"), // 12-byte: timestamp+machine+pid+counter title: "Understanding Redis", slug: "understanding-redis", author: { id: ObjectId("..."), name: "Alice" // denormalized — avoid join }, tags: ["redis", "backend", "caching"], publishedAt: ISODate("2026-03-27T10:00:00Z"), stats: { views: 1502, likes: 87 }, status: "published" }

Embedding vs Referencing — the core schema decision:

Embed when…	Reference when…
Data is always accessed together (post + author preview)	Data has its own lifecycle independent of parent
The embedded array has bounded size (≤ a few hundred items)	Array could grow unbounded (post comments → millions)
Update pattern writes the whole document	Many documents share the same sub-document

16 MB document limit: MongoDB caps documents at 16 MB. Embedding unbounded arrays (e.g., all comments in a post document) will hit this limit. Use references + separate collection for comments.

🔍 Indexes in MongoDB

// Single field index — ascending (1) or descending (-1) db.posts.createIndex({ slug: 1 }, { unique: true }); // Compound index — left-prefix rule applies (same as SQL) db.posts.createIndex({ status: 1, publishedAt: -1 }); // Supports: {status}, {status, publishedAt} NOT: {publishedAt} alone // Multikey index — automatically created when field is an array db.posts.createIndex({ tags: 1 }); // Allows: db.posts.find({ tags: "redis" }) ← single element match // Text index — full-text search db.posts.createIndex({ title: "text", body: "text" }); db.posts.find({ $text: { $search: "redis caching" } }); // Partial index — only index documents matching filter (saves space) db.posts.createIndex( { publishedAt: -1 }, { partialFilterExpression: { status: "published" } } ); // Explain query plan db.posts.find({ status: "published" }).explain("executionStats");

🔗 Aggregation Pipeline — Multi-Stage Transforms

Collection → [$match] → [$lookup] → [$unwind] → [$group] → [$sort] → [$limit] → Result
               filter    join        flatten     aggregate   order     paginate

db.posts.aggregate([ // Stage 1: filter published posts from 2026 { $match: { status: "published", publishedAt: { $gte: new Date("2026-01-01") } }}, // Stage 2: join with users collection { $lookup: { from: "users", localField: "author.id", foreignField: "_id", as: "authorDoc" }}, // Stage 3: group by tag to count posts per tag { $unwind: "$tags" }, { $group: { _id: "$tags", count: { $sum: 1 }, totalViews: { $sum: "$stats.views" } }}, // Stage 4: sort by count descending, return top 10 { $sort: { count: -1 } }, { $limit: 10 } ]);

$match early: Always put $match stages as early as possible in the pipeline to reduce documents flowing through subsequent stages. MongoDB can use indexes for the first $match stage.

✏️ Write Operations & Operators

// insertOne / insertMany await db.collection('posts').insertOne({ title: "New Post", status: "draft" }); // updateOne — $set updates specific fields, $inc increments atomically await db.collection('posts').updateOne( { _id: postId }, { $set: { status: "published", publishedAt: new Date() }, $inc: { 'stats.views': 1 }, // atomic increment $push: { tags: "featured" } // append to array } ); // findOneAndUpdate — atomic read + update const updated = await db.collection('tasks').findOneAndUpdate( { status: "pending" }, { $set: { status: "processing", lockedAt: new Date() } }, { sort: { createdAt: 1 }, returnDocument: "after" } // FIFO queue claim );

🏛️ Cassandra Architecture — Write-Optimized, Distributed

Cassandra Cluster (3 nodes, replication_factor=3)

Write path:
  Client → Coordinator Node
    → hash(partition_key) → token ring → target nodes
    → Commit Log (WAL) + Memtable
    → Memtable flush → SSTable on disk

Read path:
  Client → Coordinator → target nodes
    → Row Cache (if enabled)
    → Bloom Filter (fast "definitely not here" check)
    → Key Cache → SSTable index → SSTable data

Compaction:
  SSTables merge periodically → remove tombstones (deletes)
  → smaller read amplification

Token Ring (consistent hashing):
  Each node owns a range of tokens
  Replication: each row copied to RF=3 consecutive nodes
  Coordinator routes any write to correct nodes

🔑 Data Modeling — Partition Key, Clustering Key

Cassandra schema design is query-driven: design your table for one specific query. Joins do not exist; denormalization is expected.

-- Schema for: "get user's posts, ordered by date descending" -- Query pattern: WHERE user_id = ? ORDER BY created_at DESC LIMIT 20 CREATE TABLE posts_by_user ( user_id uuid, created_at timestamp, -- clustering key: sorted on disk post_id uuid, title text, status text, PRIMARY KEY ((user_id), created_at, post_id) -- ───────────── ────────────────────────── -- partition key clustering keys (sort order) ) WITH CLUSTERING ORDER BY (created_at DESC, post_id ASC) AND COMPACTION = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': 1, 'compaction_window_unit': 'DAYS'}; -- TWCS: optimized for time-series (SSTable per time window)

Partition Key

Determines which node(s) store the row
All rows with same partition key → same partition
Must appear in every query (no full-table scans)
Keep partitions balanced — hot partition = hot node
Partition size limit: ~100 MB recommended

Clustering Key

Defines sort order within a partition
Enables range queries on clustering columns
Can query WHERE created_at > X within a partition
Cannot skip clustering keys in WHERE clause
Choose DESC if you mostly read recent data first

📊 Consistency Levels — Tunable per Query

Level	Writes to	Reads from	Tradeoff
`ONE`	1 node	1 node	Fastest; may read stale data
`QUORUM`	RF/2+1 nodes	RF/2+1 nodes	Strong consistency (write+read quorum overlap); balanced
`LOCAL_QUORUM`	Quorum in local DC	Quorum in local DC	Strong consistency within DC; avoids cross-DC latency
`ALL`	All RF nodes	All RF nodes	Strongest; unavailable if any node down
`ANY`	At least 1 (hint OK)	N/A (write only)	Highest availability; weakest durability

Strong consistency formula: Write CL + Read CL > RF
Example with RF=3: QUORUM write (2) + QUORUM read (2) = 4 > 3 ✓ → guaranteed to see latest write.

-- CQL: set consistency level per query in cqlsh CONSISTENCY QUORUM; SELECT * FROM posts_by_user WHERE user_id = abc123 LIMIT 20;

⚠️ Common Cassandra Anti-Patterns

Anti-Pattern	Why It Fails	Fix
ALLOW FILTERING in queries	Forces full partition scan; slow at scale	Redesign table for the query; use secondary index carefully
Unbounded partition growth	Single partition → single node bottleneck; >2GB bad	Add time bucket to partition key (user_id + year_month)
High-cardinality secondary indexes	Distributed index = scatter-gather on every node	Materialize a separate table for each query pattern
Large IN queries	Coordinator fans out to many nodes; serial waits	Async parallel queries; smaller batch sizes
Logged batches for performance	Batches add coordinator overhead; not for performance, only for atomicity across tables	Use unlogged batches only for same-partition multi-row writes

🆚 MongoDB vs Cassandra vs Redis — Quick Reference

	Redis	MongoDB	Cassandra
Model	Key-value / data structures	Document (BSON)	Wide-column (partitioned rows)
Query	Key lookup; limited range	Rich ad-hoc; aggregation pipeline	Query-driven; CQL; no ad-hoc
Transactions	MULTI/EXEC; Lua scripts; limited	Multi-document ACID (v4+)	Lightweight transactions (LWT); limited
Scale	Cluster mode (hash slots)	Replica sets; sharded clusters	Linear horizontal scale; no master
Consistency	Strong within shard	Strong (primary); eventual (secondaries)	Tunable per query (ONE → ALL)
Best for	Caching, sessions, rate limiting	Flexible catalogs, CMS, user data	Time-series, activity feeds, IoT

⚙️ hiredis — Redis Client in C

hiredis is the official, lightweight C client for Redis. It provides a synchronous API for simple use cases and an async API (libevent/libev/libuv adapters) for non-blocking I/O.

/* hiredis_demo.c — connect, set, get, expire, hash ops */ #include <hiredis/hiredis.h> #include <stdio.h> #include <stdlib.h> #include <string.h> /* Helper: check reply type and abort on error */ static void check(redisReply *r, const char *label) { if (!r) { fprintf(stderr, "%s: null reply\n", label); exit(1); } if (r->type == REDIS_REPLY_ERROR) { fprintf(stderr, "%s error: %s\n", label, r->str); freeReplyObject(r); exit(1); } } int main(void) { /* Connect */ redisContext *c = redisConnect("127.0.0.1", 6379); if (!c || c->err) { fprintf(stderr, "Connect error: %s\n", c ? c->errstr : "OOM"); exit(1); } printf("Connected to Redis\n"); redisReply *reply; /* SET with EX (expire in 300 seconds) */ reply = (redisReply *)redisCommand(c, "SET user:42 Alice EX 300"); check(reply, "SET"); printf("SET: %s\n", reply->str); /* "OK" */ freeReplyObject(reply); /* GET */ reply = (redisReply *)redisCommand(c, "GET user:42"); check(reply, "GET"); printf("GET user:42 = %s\n", reply->type == REDIS_REPLY_NIL ? "(nil)" : reply->str); freeReplyObject(reply); /* INCR — atomic counter */ reply = (redisReply *)redisCommand(c, "INCR post:7:views"); check(reply, "INCR"); printf("post:7:views = %lld\n", reply->integer); freeReplyObject(reply); /* HSET — store object fields */ reply = (redisReply *)redisCommand(c, "HSET user:42:profile name Alice email alice@example.com age 28"); check(reply, "HSET"); printf("HSET: added %lld fields\n", reply->integer); freeReplyObject(reply); /* HGETALL — read all hash fields */ reply = (redisReply *)redisCommand(c, "HGETALL user:42:profile"); check(reply, "HGETALL"); printf("Profile fields:\n"); for (size_t i = 0; i + 1 < reply->elements; i += 2) printf(" %s = %s\n", reply->element[i]->str, reply->element[i+1]->str); freeReplyObject(reply); redisFree(c); return 0; }

# Compile: link against hiredis gcc -o hiredis_demo hiredis_demo.c -lhiredis

🔢 Pipelining — Batch Commands Without Round-Trips

Without pipelining (N commands = N round-trips):
  Client ──SET──► Server ──OK──► Client ──INCR──► Server ──1──► Client ...
  RTT: N × (50ms) = 500ms for 10 commands

With pipelining (N commands = 1 round-trip):
  Client ──[SET, INCR, HSET, ...]──► Server
  Server ──[OK, 1, 3, ...]──────────► Client
  RTT: 1 × 50ms = 50ms for 10 commands

/* hiredis pipelining — queue commands, flush once */ void pipeline_demo(redisContext *c) { /* Queue commands without waiting for reply */ redisAppendCommand(c, "SET key1 val1"); redisAppendCommand(c, "SET key2 val2"); redisAppendCommand(c, "INCR counter"); redisAppendCommand(c, "EXPIRE key1 3600"); /* Flush and collect replies */ redisReply *r; for (int i = 0; i < 4; i++) { redisGetReply(c, (void **)&r); if (r) { if (r->type == REDIS_REPLY_INTEGER) printf("reply[%d] = %lld\n", i, r->integer); else if (r->type == REDIS_REPLY_STATUS) printf("reply[%d] = %s\n", i, r->str); freeReplyObject(r); } } }

🔒 Distributed Lock in C (Redlock-lite)

/* Simple Redis distributed lock using SET NX EX */ #include <hiredis/hiredis.h> #include <string.h> #include <time.h> /* Returns 1 if lock acquired, 0 otherwise. token must be unique per lock-holder (used to safely release) */ int redis_lock(redisContext *c, const char *key, const char *token, int ttl_sec) { redisReply *r = (redisReply *)redisCommand(c, "SET %s %s NX EX %d", key, token, ttl_sec); int acquired = (r && r->type == REDIS_REPLY_STATUS && strcmp(r->str, "OK") == 0); if (r) freeReplyObject(r); return acquired; } /* Release only if our token matches (Lua ensures atomicity) */ void redis_unlock(redisContext *c, const char *key, const char *token) { const char *lua = "if redis.call('GET',KEYS[1])==ARGV[1] then " " return redis.call('DEL',KEYS[1]) " "else return 0 end"; redisReply *r = (redisReply *)redisCommand(c, "EVAL %s 1 %s %s", lua, key, token); if (r) freeReplyObject(r); } /* Usage */ int main(void) { redisContext *c = redisConnect("127.0.0.1", 6379); const char *lock_key = "lock:job:42"; const char *lock_token = "unique-token-abc"; /* use UUID in practice */ if (redis_lock(c, lock_key, lock_token, 5)) { printf("Lock acquired — doing work\n"); /* ... critical section ... */ redis_unlock(c, lock_key, lock_token); printf("Lock released\n"); } else { printf("Could not acquire lock — another process holds it\n"); } redisFree(c); return 0; }

🧪 Lab 1 — Redis Caching Layer for a REST API

Goal: Add cache-aside caching to an Express.js API, measure cache hit rate.

Spin up Redis locally: docker run -p 6379:6379 redis:7-alpine

Create an Express endpoint GET /users/:id that hits a PostgreSQL DB.

Wrap the handler with cache-aside logic: check Redis first, populate on miss, TTL = 300s.

Add a middleware that increments cache:hits and cache:misses counters in Redis.

Load test with wrk -t4 -c100 -d30s http://localhost:3000/users/42.

Check hit rate: redis-cli GET cache:hits vs GET cache:misses. Expect >95% hits after warm-up.

Test invalidation: update user in DB, verify Redis key is deleted, next request repopulates.

🧪 Lab 2 — Rate Limiter Middleware (Sliding Window)

Goal: Implement a sliding-window rate limiter in Redis; test boundary behavior.

Implement the isAllowedSliding(ip, limit=10, windowMs=60000) function using Redis sorted sets.

Wire it as Express middleware: return 429 Too Many Requests with Retry-After header when over limit.

Verify: send 10 rapid requests — all succeed. Send 11th — gets 429.

Verify sliding window: wait 30s, send 5 more requests — all succeed (window slid, old entries purged).

Inspect Redis: ZSCORE ratelimit:sliding:127.0.0.1 — confirm timestamps are in sorted set.

🧪 Lab 3 — MongoDB Aggregation: Top Tags Report

Goal: Build an aggregation pipeline that computes per-tag post counts and total views.

Insert 50 sample posts with mongosh using a seed script. Include varied tags and view counts.

Write the pipeline: $match (published) → $unwind (tags) → $group (count, sumViews) → $sort → $limit 10.

Run with .explain("executionStats") — confirm the initial $match uses an index.

Create the compound index {status:1, publishedAt:-1} and re-run explain — compare totalDocsExamined.

Cache the result in Redis as report:top-tags with TTL 3600. Serve from cache on repeat requests.

✅ Module Mastery Checklist

Explain the five NoSQL categories and give a use case for each
Describe CAP theorem and classify Redis, MongoDB, and Cassandra under it
List all six Redis data types, their O() complexities, and one use case each
Implement cache-aside pattern with correct TTL invalidation on write
Contrast cache-aside vs write-through vs write-behind consistency guarantees
Explain cache stampede and implement the SET NX lock pattern to prevent it
Configure Redis maxmemory and choose appropriate eviction policy
Describe RDB vs AOF persistence trade-offs and recommend correct production config
Implement a fixed-window rate limiter using INCR + EXPIRE
Implement a sliding-window rate limiter using a Sorted Set
Use Lua scripting in Redis for an atomic read-modify-write operation
Explain MongoDB embedding vs referencing decision criteria
Write a MongoDB aggregation pipeline with $match, $lookup, $group, $sort
Define Cassandra partition key, clustering key, and explain query-driven design
Calculate whether QUORUM reads + QUORUM writes guarantee strong consistency for a given RF

← M06 SQL Indexing ↑ Roadmap M08 DB Scaling →