M07 — NoSQL: Redis, MongoDB & Cassandra

NoSQL taxonomy, Redis data structures with complexity guarantees, caching patterns, persistence & pub/sub, Lua scripting, rate limiting, MongoDB aggregation pipeline, Cassandra data modeling by access pattern, and hiredis in C.

Phase 2 — Databases & Storage ~5 hrs
🗺️ NoSQL Taxonomy — When to Choose What
CategoryModelFlagshipStrengthWeaknessSweet Spot
Key-ValueHash mapRedis, DynamoDBSub-ms reads; simpleNo relational querySessions, caches, counters
DocumentJSON/BSON treeMongoDB, CouchbaseFlexible schema; rich queriesMulti-doc transactions costlyCatalogs, user profiles, CMS
Wide-ColumnPartition → rowsCassandra, HBaseWrite-optimized; linear scaleQuery-driven design requiredTime-series, IoT, activity feeds
GraphVertices + edgesNeo4j, Amazon NeptuneRelationship traversalDoesn't scale as wideSocial graphs, recommendations
Time-SeriesTimestamped metricsInfluxDB, TimescaleDBCompression, retention policiesPoor ad-hoc relational queriesMonitoring, telemetry
⚖️ CAP Theorem — The Impossibility Triangle

In a network partition you must choose between Consistency and Availability. You can never have all three simultaneously.

          Consistency
          (every read sees
           latest write)
              /\
             /  \
            /    \
           /  CA  \        ← no real-world distributed system
          /--------\
         / CP  | AP \
        /      |     \
Partition ─────────── Availability
Tolerance              (always responds,
(nodes can             may be stale)
 fail/split)

CP examples: HBase, Zookeeper, Redis Cluster (default)
AP examples: Cassandra (tunable), DynamoDB, CouchDB
CA example:  Single-node PostgreSQL (no partition tolerance)
PACELC extension: Even without a partition (P), there is a latency (L) vs consistency (C) tradeoff. Cassandra lets you tune this per-query with consistency levels (ONE → QUORUM → ALL).
🔄 SQL vs NoSQL — Decision Checklist
Choose SQL when:
  • Data is naturally relational with many joins
  • ACID transactions across multiple entities are required
  • Schema is stable and well-understood
  • Complex ad-hoc reporting or analytics
  • Team is familiar with SQL tooling
Choose NoSQL when:
  • Access pattern is known and narrow (read by key)
  • Horizontal scale > vertical scale (write throughput)
  • Schema evolves rapidly (document stores)
  • Geo-distributed with tunable consistency
  • Specific data model fits: graph, time-series, cache
Analogy: SQL is a Swiss Army knife — powerful for unknown problems. NoSQL tools are surgical instruments — each optimized for one job. Use the right tool.
🏗️ Redis Architecture — Single-Threaded Event Loop
Client 1 ──┐
Client 2 ──┤   TCP socket   ┌─────────────────────────────────┐
Client 3 ──┼───────────────►│  I/O Multiplexer (epoll/kqueue) │
   ...      │               │  ─────────────────────────────  │
            └───────────────►│  Command Queue (FIFO)           │
                            │  ─────────────────────────────  │
                            │  Single Worker Thread            │
                            │    executes commands serially    │
                            │    → no locking needed          │
                            │  ─────────────────────────────  │
                            │  In-Memory Data Structures       │
                            │  (dict, quicklist, listpack,    │
                            │   skiplist, rax, stream)         │
                            └─────────────────────────────────┘
                                        │
                                  Background threads:
                                  • AOF fsync
                                  • RDB fork + write
                                  • Lazy free (UNLINK)
Because the main thread is single-threaded, a slow command (e.g., KEYS * on a large dataset) blocks all other clients. Never run KEYS in production — use SCAN with a cursor instead.
📦 Data Structures — Commands & Complexity
TypeKey CommandsComplexityInternal EncodingUse Case
String SET/GET, INCR, MSET/MGET, SETNX, SETEX O(1) SDS (simple dynamic string) Counters, cache values, distributed lock tokens
Hash HSET/HGET, HMSET, HGETALL, HINCRBY, HDEL O(1) field ops; O(n) HGETALL listpack (small) → dict (large) User profile fields, config objects
List LPUSH/RPUSH, LPOP/RPOP, LRANGE, BLPOP O(1) push/pop; O(n) LRANGE listpack (small) → quicklist Task queues, recent activity feeds
Set SADD/SREM, SISMEMBER, SUNION/SINTER/SDIFF O(1) SADD/SREM; O(n) SUNION listpack (small) → intset → dict Unique visitors, tags, friend lists
Sorted Set ZADD, ZRANGE, ZRANGEBYSCORE, ZRANK, ZINCRBY O(log n) ZADD/ZRANK; O(log n + m) ZRANGE listpack (small) → skiplist + dict Leaderboards, delayed job queues, rate limiting windows
Stream XADD, XREAD, XRANGE, XGROUP CREATE, XACK O(1) XADD; O(n) XRANGE listpack + rax (radix tree) Event log, message bus, audit trail
Bitmap SETBIT/GETBIT, BITCOUNT, BITOP O(1) SETBIT; O(n) BITCOUNT String (bit-addressed) Feature flags, daily active user tracking
HyperLogLog PFADD, PFCOUNT, PFMERGE O(1), uses ~12KB max Probabilistic sketch Unique count estimates (± 0.81% error)
🔑 Key Patterns & Naming Conventions
# Naming: use colon-separated namespaces user:42:profile # Hash — user 42's profile fields session:abc123 # String — session token → user_id mapping post:7:views # String — view counter for post 7 leaderboard:"2026-03" # Sorted Set — monthly leaderboard queue:email # List — email job queue # Setting a value with TTL (session expires in 30 min) SET session:abc123 42 EX 1800 # Atomic increment — safe without transactions INCR post:7:views # returns new value, atomic # Hash — store object fields separately (partial updates) HSET user:42:profile name "Alice" email "alice@example.com" age 28 HGET user:42:profile name # → "Alice" HINCRBY user:42:profile age 1 # happy birthday — no read-modify-write # Sorted set leaderboard ZADD leaderboard:"2026-03" 1500 "alice" ZADD leaderboard:"2026-03" 2200 "bob" ZRANGE leaderboard:"2026-03" 0 -1 REV WITHSCORES # → [bob 2200, alice 1500] (descending)
Key expiry gotcha: Redis TTL applies to the top-level key, not fields. If you store user fields in user:42:profile hash, setting TTL on the hash expires ALL fields at once. There is no per-field TTL.
🚫 Commands to Avoid in Production
Dangerous CommandWhy DangerousSafe Alternative
KEYS patternO(n) — scans all keys; blocks event loopSCAN 0 MATCH pattern COUNT 100
FLUSHALLDeletes every key in all databasesSCAN + targeted DEL; or FLUSHDB ASYNC
DEBUG SLEEPExplicitly blocks the serverNever in production
DEL big-keyO(n) — synchronous deletion blocks loopUNLINK big-key (async lazy-free)
LRANGE 0 -1 on huge listTransfers entire list over networkPaginate with LRANGE 0 99, then next page
SMEMBERS big-setO(n) — returns all membersSSCAN with cursor
🔄 Cache-Aside (Lazy Loading) — Most Common Pattern
Application                   Cache (Redis)           Database
     │                              │                     │
     │──── GET user:42 ────────────►│                     │
     │                              │ MISS                │
     │◄─── nil ─────────────────────│                     │
     │                              │                     │
     │──── SELECT * FROM users ───────────────────────────►│
     │◄─── row {id:42, name:...} ──────────────────────────│
     │                              │                     │
     │──── SET user:42 ... EX 300 ─►│                     │
     │                              │ stored              │
     │  (later)                     │                     │
     │──── GET user:42 ────────────►│                     │
     │◄─── {id:42, name:...} ───────│   HIT — no DB call  │
// Node.js pseudo-code async function getUser(userId) { const key = `user:${userId}`; const cached = await redis.get(key); if (cached) return JSON.parse(cached); const user = await db.query('SELECT * FROM users WHERE id = $1', [userId]); await redis.set(key, JSON.stringify(user), 'EX', 300); // TTL 5 min return user; } // On update: invalidate the cache async function updateUser(userId, data) { await db.query('UPDATE users SET name=$1 WHERE id=$2', [data.name, userId]); await redis.del(`user:${userId}`); // evict stale entry }
📊 All Four Caching Patterns Compared
PatternWho Manages CacheOn Read MissOn WriteConsistencyBest For
Cache-Aside Application App reads DB, populates cache App updates DB, deletes/updates cache Eventual (TTL-bounded) General-purpose, read-heavy
Read-Through Cache library/proxy Cache fetches from DB automatically App writes to cache; cache syncs DB Eventual When you can plug in a cache provider
Write-Through Cache library/proxy Cache fetches from DB Write to cache AND DB synchronously Strong (but higher write latency) Read-heavy, strong consistency needed
Write-Behind Cache library/proxy Cache fetches from DB Write to cache only; async flush to DB Weak (data loss risk on crash) Write-heavy, can tolerate brief loss
🗑️ Eviction Policies — What Happens When Memory Is Full
PolicyBehaviorWhen to Use
noevictionReturn error on write when memory fullWhen data loss is unacceptable (primary store)
allkeys-lruEvict least-recently-used key across all keysGeneral cache — frequently accessed items stay hot
volatile-lruEvict LRU key only from keys with TTL setMix of persistent + cached keys in same instance
allkeys-lfuEvict least-frequently-used (Redis 4+)Better than LRU for skewed access distributions
volatile-ttlEvict key with shortest remaining TTL firstWhen shorter-lived items are more "disposable"
allkeys-randomEvict random key — no intelligenceUniform access patterns (rare)
# redis.conf maxmemory 2gb maxmemory-policy allkeys-lru
⚠️ Cache Stampede (Thundering Herd) & Fixes
Scenario: popular cache key expires at T=0

T=0:  100 requests hit cache → all MISS
      → all 100 simultaneously query DB
      → DB overwhelmed, latency spikes

Fix 1: Probabilistic early re-computation
  When remaining TTL < threshold: randomly re-cache
  → one request re-caches while others still get old value

Fix 2: Lock / Mutex (Redis SET NX)
  First miss acquires distributed lock → fetches DB
  Others wait → then all read from cache (or retry)

Fix 3: Background refresh
  Scheduled job refreshes cache before TTL expires
  → cache never actually empty for popular keys
// Redis distributed lock for cache stampede prevention const lockKey = `lock:user:${userId}`; const token = crypto.randomUUID(); // SET NX EX — atomic: only succeeds if key doesn't exist const acquired = await redis.set(lockKey, token, 'NX', 'EX', 5); if (acquired) { const user = await db.fetchUser(userId); await redis.set(`user:${userId}`, JSON.stringify(user), 'EX', 300); await redis.del(lockKey); // release lock } else { // Another request is fetching — wait & retry await sleep(50); return getUser(userId); // retry }
💾 Persistence — RDB vs AOF
RDB (Redis Database)AOF (Append-Only File)
MechanismPeriodic fork + full memory snapshot to .rdb fileLog every write command; replay on restart
Triggersave 900 1 (after 1 change in 15 min), BGSAVEEvery write; fsync configurable
Restart speedFast (load binary snapshot)Slow if AOF is huge (replay all commands)
Data loss riskUp to snapshot interval (minutes)Up to 1 second (appendfsync everysec)
File sizeCompact binaryGrows; periodically compacted with BGREWRITEAOF
Production rec.Use for backups / fast restartsUse for durability (near-zero data loss)
Best practice: Run both. RDB for point-in-time backups; AOF with appendfsync everysec for durability. Redis docs call this "the best of both worlds."
# redis.conf — recommended production settings # RDB save 900 1 # snapshot if ≥1 change in 900s save 300 10 # snapshot if ≥10 changes in 300s save 60 10000 # snapshot if ≥10000 changes in 60s # AOF appendonly yes appendfsync everysec # balance: 1s max loss auto-aof-rewrite-percentage 100 # rewrite when AOF doubles auto-aof-rewrite-min-size 64mb
📢 Pub/Sub — Fire-and-Forget Messaging
Publisher                    Redis                     Subscribers
    │                          │                           │
    │── PUBLISH notifications  │                           │
    │   "{"type":"like",...}" ─►│                           │
    │                          │──► "{"type":"like",...}" ─►│ Sub A
    │                          │──► "{"type":"like",...}" ─►│ Sub B
    │                          │                           │
    │ (publisher doesn't know  │ (no message persistence   │
    │  who is subscribed)      │  — if sub is offline,     │
    │                          │  message is LOST)         │
// Publisher (Node.js) await redis.publish('notifications', JSON.stringify({ type: 'like', postId: 7, userId: 42 })); // Subscriber (must use a separate connection — SUBSCRIBE blocks it) const sub = redis.duplicate(); await sub.subscribe('notifications', (message) => { const event = JSON.parse(message); console.log('Received:', event); });
Pub/Sub vs Streams: Pub/Sub has no persistence and no consumer groups. If a subscriber is down, messages are lost. For reliable messaging with replay and consumer groups, use Redis Streams (XADD/XREAD/XGROUP).
⚙️ Lua Scripting — Atomic Multi-Command Operations

Redis executes Lua scripts atomically — no other command runs between script operations. This is the safe way to implement read-modify-write patterns without transactions.

-- Lua: atomic check-and-set with condition -- KEYS[1] = key, ARGV[1] = expected value, ARGV[2] = new value local current = redis.call('GET', KEYS[1]) if current == ARGV[1] then redis.call('SET', KEYS[1], ARGV[2]) return 1 end return 0
// Node.js: run the Lua script (EVAL) const script = ` local current = redis.call('GET', KEYS[1]) if current == ARGV[1] then redis.call('SET', KEYS[1], ARGV[2]) return 1 end return 0 `; const result = await redis.eval(script, 1, 'mykey', 'old-value', 'new-value');
🚦 Rate Limiting with Redis

Pattern 1 — Fixed Window (INCR + EXPIRE)

// Allow 100 requests per minute per IP async function isAllowed(ip) { const window = Math.floor(Date.now() / 60000); // 1-minute window const key = `ratelimit:${ip}:${window}`; const count = await redis.incr(key); if (count === 1) await redis.expire(key, 60); // set TTL on first request return count <= 100; }

Pattern 2 — Sliding Window (Sorted Set)

// More accurate: tracks exact timestamps of requests async function isAllowedSliding(ip, limit = 100, windowMs = 60000) { const now = Date.now(); const key = `ratelimit:sliding:${ip}`; const pipeline = redis.multi(); pipeline.zremrangebyscore(key, 0, now - windowMs); // evict old entries pipeline.zadd(key, now, now.toString()); // add current request pipeline.zcard(key); // count in window pipeline.expire(key, Math.ceil(windowMs / 1000)); // auto-cleanup const results = await pipeline.exec(); const count = results[2][1]; // ZCARD result return count <= limit; }
Fixed window is simpler but has a boundary burst problem: 100 requests at 0:59 and 100 at 1:01 = 200 requests in 2 seconds. Sliding window prevents this at the cost of more memory per key.
📄 Document Model — BSON & Schema Design
// BSON document example (stored as blog post) { _id: ObjectId("65e3f1a2b4c8d9e0f1234567"), // 12-byte: timestamp+machine+pid+counter title: "Understanding Redis", slug: "understanding-redis", author: { id: ObjectId("..."), name: "Alice" // denormalized — avoid join }, tags: ["redis", "backend", "caching"], publishedAt: ISODate("2026-03-27T10:00:00Z"), stats: { views: 1502, likes: 87 }, status: "published" }

Embedding vs Referencing — the core schema decision:

Embed when…Reference when…
Data is always accessed together (post + author preview)Data has its own lifecycle independent of parent
The embedded array has bounded size (≤ a few hundred items)Array could grow unbounded (post comments → millions)
Update pattern writes the whole documentMany documents share the same sub-document
16 MB document limit: MongoDB caps documents at 16 MB. Embedding unbounded arrays (e.g., all comments in a post document) will hit this limit. Use references + separate collection for comments.
🔍 Indexes in MongoDB
// Single field index — ascending (1) or descending (-1) db.posts.createIndex({ slug: 1 }, { unique: true }); // Compound index — left-prefix rule applies (same as SQL) db.posts.createIndex({ status: 1, publishedAt: -1 }); // Supports: {status}, {status, publishedAt} NOT: {publishedAt} alone // Multikey index — automatically created when field is an array db.posts.createIndex({ tags: 1 }); // Allows: db.posts.find({ tags: "redis" }) ← single element match // Text index — full-text search db.posts.createIndex({ title: "text", body: "text" }); db.posts.find({ $text: { $search: "redis caching" } }); // Partial index — only index documents matching filter (saves space) db.posts.createIndex( { publishedAt: -1 }, { partialFilterExpression: { status: "published" } } ); // Explain query plan db.posts.find({ status: "published" }).explain("executionStats");
🔗 Aggregation Pipeline — Multi-Stage Transforms
Collection → [$match] → [$lookup] → [$unwind] → [$group] → [$sort] → [$limit] → Result
               filter    join        flatten     aggregate   order     paginate
db.posts.aggregate([ // Stage 1: filter published posts from 2026 { $match: { status: "published", publishedAt: { $gte: new Date("2026-01-01") } }}, // Stage 2: join with users collection { $lookup: { from: "users", localField: "author.id", foreignField: "_id", as: "authorDoc" }}, // Stage 3: group by tag to count posts per tag { $unwind: "$tags" }, { $group: { _id: "$tags", count: { $sum: 1 }, totalViews: { $sum: "$stats.views" } }}, // Stage 4: sort by count descending, return top 10 { $sort: { count: -1 } }, { $limit: 10 } ]);
$match early: Always put $match stages as early as possible in the pipeline to reduce documents flowing through subsequent stages. MongoDB can use indexes for the first $match stage.
✏️ Write Operations & Operators
// insertOne / insertMany await db.collection('posts').insertOne({ title: "New Post", status: "draft" }); // updateOne — $set updates specific fields, $inc increments atomically await db.collection('posts').updateOne( { _id: postId }, { $set: { status: "published", publishedAt: new Date() }, $inc: { 'stats.views': 1 }, // atomic increment $push: { tags: "featured" } // append to array } ); // findOneAndUpdate — atomic read + update const updated = await db.collection('tasks').findOneAndUpdate( { status: "pending" }, { $set: { status: "processing", lockedAt: new Date() } }, { sort: { createdAt: 1 }, returnDocument: "after" } // FIFO queue claim );
🏛️ Cassandra Architecture — Write-Optimized, Distributed
Cassandra Cluster (3 nodes, replication_factor=3)

Write path:
  Client → Coordinator Node
    → hash(partition_key) → token ring → target nodes
    → Commit Log (WAL) + Memtable
    → Memtable flush → SSTable on disk

Read path:
  Client → Coordinator → target nodes
    → Row Cache (if enabled)
    → Bloom Filter (fast "definitely not here" check)
    → Key Cache → SSTable index → SSTable data

Compaction:
  SSTables merge periodically → remove tombstones (deletes)
  → smaller read amplification

Token Ring (consistent hashing):
  Each node owns a range of tokens
  Replication: each row copied to RF=3 consecutive nodes
  Coordinator routes any write to correct nodes
🔑 Data Modeling — Partition Key, Clustering Key

Cassandra schema design is query-driven: design your table for one specific query. Joins do not exist; denormalization is expected.

-- Schema for: "get user's posts, ordered by date descending" -- Query pattern: WHERE user_id = ? ORDER BY created_at DESC LIMIT 20 CREATE TABLE posts_by_user ( user_id uuid, created_at timestamp, -- clustering key: sorted on disk post_id uuid, title text, status text, PRIMARY KEY ((user_id), created_at, post_id) -- ───────────── ────────────────────────── -- partition key clustering keys (sort order) ) WITH CLUSTERING ORDER BY (created_at DESC, post_id ASC) AND COMPACTION = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': 1, 'compaction_window_unit': 'DAYS'}; -- TWCS: optimized for time-series (SSTable per time window)
Partition Key
  • Determines which node(s) store the row
  • All rows with same partition key → same partition
  • Must appear in every query (no full-table scans)
  • Keep partitions balanced — hot partition = hot node
  • Partition size limit: ~100 MB recommended
Clustering Key
  • Defines sort order within a partition
  • Enables range queries on clustering columns
  • Can query WHERE created_at > X within a partition
  • Cannot skip clustering keys in WHERE clause
  • Choose DESC if you mostly read recent data first
📊 Consistency Levels — Tunable per Query
LevelWrites toReads fromTradeoff
ONE1 node1 nodeFastest; may read stale data
QUORUMRF/2+1 nodesRF/2+1 nodesStrong consistency (write+read quorum overlap); balanced
LOCAL_QUORUMQuorum in local DCQuorum in local DCStrong consistency within DC; avoids cross-DC latency
ALLAll RF nodesAll RF nodesStrongest; unavailable if any node down
ANYAt least 1 (hint OK)N/A (write only)Highest availability; weakest durability
Strong consistency formula: Write CL + Read CL > RF
Example with RF=3: QUORUM write (2) + QUORUM read (2) = 4 > 3 ✓ → guaranteed to see latest write.
-- CQL: set consistency level per query in cqlsh CONSISTENCY QUORUM; SELECT * FROM posts_by_user WHERE user_id = abc123 LIMIT 20;
⚠️ Common Cassandra Anti-Patterns
Anti-PatternWhy It FailsFix
ALLOW FILTERING in queriesForces full partition scan; slow at scaleRedesign table for the query; use secondary index carefully
Unbounded partition growthSingle partition → single node bottleneck; >2GB badAdd time bucket to partition key (user_id + year_month)
High-cardinality secondary indexesDistributed index = scatter-gather on every nodeMaterialize a separate table for each query pattern
Large IN queriesCoordinator fans out to many nodes; serial waitsAsync parallel queries; smaller batch sizes
Logged batches for performanceBatches add coordinator overhead; not for performance, only for atomicity across tablesUse unlogged batches only for same-partition multi-row writes
🆚 MongoDB vs Cassandra vs Redis — Quick Reference
RedisMongoDBCassandra
ModelKey-value / data structuresDocument (BSON)Wide-column (partitioned rows)
QueryKey lookup; limited rangeRich ad-hoc; aggregation pipelineQuery-driven; CQL; no ad-hoc
TransactionsMULTI/EXEC; Lua scripts; limitedMulti-document ACID (v4+)Lightweight transactions (LWT); limited
ScaleCluster mode (hash slots)Replica sets; sharded clustersLinear horizontal scale; no master
ConsistencyStrong within shardStrong (primary); eventual (secondaries)Tunable per query (ONE → ALL)
Best forCaching, sessions, rate limitingFlexible catalogs, CMS, user dataTime-series, activity feeds, IoT
⚙️ hiredis — Redis Client in C

hiredis is the official, lightweight C client for Redis. It provides a synchronous API for simple use cases and an async API (libevent/libev/libuv adapters) for non-blocking I/O.

/* hiredis_demo.c — connect, set, get, expire, hash ops */ #include <hiredis/hiredis.h> #include <stdio.h> #include <stdlib.h> #include <string.h> /* Helper: check reply type and abort on error */ static void check(redisReply *r, const char *label) { if (!r) { fprintf(stderr, "%s: null reply\n", label); exit(1); } if (r->type == REDIS_REPLY_ERROR) { fprintf(stderr, "%s error: %s\n", label, r->str); freeReplyObject(r); exit(1); } } int main(void) { /* Connect */ redisContext *c = redisConnect("127.0.0.1", 6379); if (!c || c->err) { fprintf(stderr, "Connect error: %s\n", c ? c->errstr : "OOM"); exit(1); } printf("Connected to Redis\n"); redisReply *reply; /* SET with EX (expire in 300 seconds) */ reply = (redisReply *)redisCommand(c, "SET user:42 Alice EX 300"); check(reply, "SET"); printf("SET: %s\n", reply->str); /* "OK" */ freeReplyObject(reply); /* GET */ reply = (redisReply *)redisCommand(c, "GET user:42"); check(reply, "GET"); printf("GET user:42 = %s\n", reply->type == REDIS_REPLY_NIL ? "(nil)" : reply->str); freeReplyObject(reply); /* INCR — atomic counter */ reply = (redisReply *)redisCommand(c, "INCR post:7:views"); check(reply, "INCR"); printf("post:7:views = %lld\n", reply->integer); freeReplyObject(reply); /* HSET — store object fields */ reply = (redisReply *)redisCommand(c, "HSET user:42:profile name Alice email alice@example.com age 28"); check(reply, "HSET"); printf("HSET: added %lld fields\n", reply->integer); freeReplyObject(reply); /* HGETALL — read all hash fields */ reply = (redisReply *)redisCommand(c, "HGETALL user:42:profile"); check(reply, "HGETALL"); printf("Profile fields:\n"); for (size_t i = 0; i + 1 < reply->elements; i += 2) printf(" %s = %s\n", reply->element[i]->str, reply->element[i+1]->str); freeReplyObject(reply); redisFree(c); return 0; }
# Compile: link against hiredis gcc -o hiredis_demo hiredis_demo.c -lhiredis
🔢 Pipelining — Batch Commands Without Round-Trips
Without pipelining (N commands = N round-trips):
  Client ──SET──► Server ──OK──► Client ──INCR──► Server ──1──► Client ...
  RTT: N × (50ms) = 500ms for 10 commands

With pipelining (N commands = 1 round-trip):
  Client ──[SET, INCR, HSET, ...]──► Server
  Server ──[OK, 1, 3, ...]──────────► Client
  RTT: 1 × 50ms = 50ms for 10 commands
/* hiredis pipelining — queue commands, flush once */ void pipeline_demo(redisContext *c) { /* Queue commands without waiting for reply */ redisAppendCommand(c, "SET key1 val1"); redisAppendCommand(c, "SET key2 val2"); redisAppendCommand(c, "INCR counter"); redisAppendCommand(c, "EXPIRE key1 3600"); /* Flush and collect replies */ redisReply *r; for (int i = 0; i < 4; i++) { redisGetReply(c, (void **)&r); if (r) { if (r->type == REDIS_REPLY_INTEGER) printf("reply[%d] = %lld\n", i, r->integer); else if (r->type == REDIS_REPLY_STATUS) printf("reply[%d] = %s\n", i, r->str); freeReplyObject(r); } } }
🔒 Distributed Lock in C (Redlock-lite)
/* Simple Redis distributed lock using SET NX EX */ #include <hiredis/hiredis.h> #include <string.h> #include <time.h> /* Returns 1 if lock acquired, 0 otherwise. token must be unique per lock-holder (used to safely release) */ int redis_lock(redisContext *c, const char *key, const char *token, int ttl_sec) { redisReply *r = (redisReply *)redisCommand(c, "SET %s %s NX EX %d", key, token, ttl_sec); int acquired = (r && r->type == REDIS_REPLY_STATUS && strcmp(r->str, "OK") == 0); if (r) freeReplyObject(r); return acquired; } /* Release only if our token matches (Lua ensures atomicity) */ void redis_unlock(redisContext *c, const char *key, const char *token) { const char *lua = "if redis.call('GET',KEYS[1])==ARGV[1] then " " return redis.call('DEL',KEYS[1]) " "else return 0 end"; redisReply *r = (redisReply *)redisCommand(c, "EVAL %s 1 %s %s", lua, key, token); if (r) freeReplyObject(r); } /* Usage */ int main(void) { redisContext *c = redisConnect("127.0.0.1", 6379); const char *lock_key = "lock:job:42"; const char *lock_token = "unique-token-abc"; /* use UUID in practice */ if (redis_lock(c, lock_key, lock_token, 5)) { printf("Lock acquired — doing work\n"); /* ... critical section ... */ redis_unlock(c, lock_key, lock_token); printf("Lock released\n"); } else { printf("Could not acquire lock — another process holds it\n"); } redisFree(c); return 0; }
🧪 Lab 1 — Redis Caching Layer for a REST API

Goal: Add cache-aside caching to an Express.js API, measure cache hit rate.

1
Spin up Redis locally: docker run -p 6379:6379 redis:7-alpine
2
Create an Express endpoint GET /users/:id that hits a PostgreSQL DB.
3
Wrap the handler with cache-aside logic: check Redis first, populate on miss, TTL = 300s.
4
Add a middleware that increments cache:hits and cache:misses counters in Redis.
5
Load test with wrk -t4 -c100 -d30s http://localhost:3000/users/42.
6
Check hit rate: redis-cli GET cache:hits vs GET cache:misses. Expect >95% hits after warm-up.
7
Test invalidation: update user in DB, verify Redis key is deleted, next request repopulates.
🧪 Lab 2 — Rate Limiter Middleware (Sliding Window)

Goal: Implement a sliding-window rate limiter in Redis; test boundary behavior.

1
Implement the isAllowedSliding(ip, limit=10, windowMs=60000) function using Redis sorted sets.
2
Wire it as Express middleware: return 429 Too Many Requests with Retry-After header when over limit.
3
Verify: send 10 rapid requests — all succeed. Send 11th — gets 429.
4
Verify sliding window: wait 30s, send 5 more requests — all succeed (window slid, old entries purged).
5
Inspect Redis: ZSCORE ratelimit:sliding:127.0.0.1 — confirm timestamps are in sorted set.
🧪 Lab 3 — MongoDB Aggregation: Top Tags Report

Goal: Build an aggregation pipeline that computes per-tag post counts and total views.

1
Insert 50 sample posts with mongosh using a seed script. Include varied tags and view counts.
2
Write the pipeline: $match (published) → $unwind (tags) → $group (count, sumViews) → $sort$limit 10.
3
Run with .explain("executionStats") — confirm the initial $match uses an index.
4
Create the compound index {status:1, publishedAt:-1} and re-run explain — compare totalDocsExamined.
5
Cache the result in Redis as report:top-tags with TTL 3600. Serve from cache on repeat requests.

✅ Module Mastery Checklist
  • Explain the five NoSQL categories and give a use case for each
  • Describe CAP theorem and classify Redis, MongoDB, and Cassandra under it
  • List all six Redis data types, their O() complexities, and one use case each
  • Implement cache-aside pattern with correct TTL invalidation on write
  • Contrast cache-aside vs write-through vs write-behind consistency guarantees
  • Explain cache stampede and implement the SET NX lock pattern to prevent it
  • Configure Redis maxmemory and choose appropriate eviction policy
  • Describe RDB vs AOF persistence trade-offs and recommend correct production config
  • Implement a fixed-window rate limiter using INCR + EXPIRE
  • Implement a sliding-window rate limiter using a Sorted Set
  • Use Lua scripting in Redis for an atomic read-modify-write operation
  • Explain MongoDB embedding vs referencing decision criteria
  • Write a MongoDB aggregation pipeline with $match, $lookup, $group, $sort
  • Define Cassandra partition key, clustering key, and explain query-driven design
  • Calculate whether QUORUM reads + QUORUM writes guarantee strong consistency for a given RF
← M06 SQL Indexing ↑ Roadmap M08 DB Scaling →