Mental Models for Caching: Stop Breaking Production with Cache Logic
TL;DR: Caching is easy to speed up and hard to make correct. Think like human memory—recency, frequency, and forgetting—to avoid the 90% of cache bugs that take down production.
The Cache Bug That Cost $2.8M in Revenue
Black Friday, 11:47 PM. The e-commerce site’s cache “optimization” decides to refresh all product prices simultaneously. Database connections max out. Cache misses cascade. The site goes down for 18 minutes during peak shopping hours.
The bug wasn’t in the cache implementation—it was in the mental model.
Most developers think of caches as “fast storage.” But caches are forgetting systems. Understanding when and how they forget prevents 90% of production incidents.
Why Smart Engineers Write Broken Cache Logic
Traditional caching approaches create hidden failure modes:
- The thundering herd: Cache expires, everyone hits the database simultaneously
- Cache stampede: Popular keys expire during traffic spikes
- Stale-but-broken: Serving cached errors instead of fresh data
- Consistency nightmares: Multiple cache layers with different expiration times
The core problem: optimizing for hit rate instead of failure isolation.
The Core Insight: Model Caches Like Human Memory
Great caches behave like healthy human memory: they remember important things longer, forget gracefully, and don’t let forgetting break the whole system.
Mental Model: The Three Memory Systems
Human Memory → Cache Design
───────────────────── ─────────────
Sensory (immediate) → L1: In-process
Short-term (seconds) → L2: Redis/Memcached
Long-term (lifetime) → L3: Database/CDN
Forgetting Strategy:
├── Recency: Recently used = important
├── Frequency: Often used = important
└── Graceful degradation: Forgetting doesn't break thinking
Implementation: From Naive Caching to Memory-Inspired Design
Step 1: Stale-While-Revalidate (Like Human Memory)
// ❌ Naive caching: Everyone waits for fresh data
async function getUser(id: string): Promise<User> {
const cached = cache.get(`user:${id}`);
if (cached && !isExpired(cached)) {
return cached.value;
}
// Problem: Everyone waits here during cache miss
const fresh = await database.users.findById(id);
cache.set(`user:${id}`, fresh, { ttl: 3600 });
return fresh;
}
// ✅ Stale-while-revalidate: Return fast, update in background
class MemoryLikeCache<T> {
private cache = new Map<string, CacheEntry<T>>();
private refreshPromises = new Map<string, Promise<T>>();
async get(key: string, loadFn: () => Promise<T>): Promise<T> {
const entry = this.cache.get(key);
const now = Date.now();
// Fresh hit: return immediately
if (entry && entry.freshUntil > now) {
return entry.value;
}
// Stale hit: return stale, refresh in background
if (entry && entry.staleUntil > now) {
this.backgroundRefresh(key, loadFn);
return entry.value;
}
// Miss: load fresh (but prevent thundering herd)
return this.loadWithSingleFlight(key, loadFn);
}
private async backgroundRefresh(key: string, loadFn: () => Promise<T>) {
if (this.refreshPromises.has(key)) return; // Already refreshing
const refreshPromise = loadFn()
.then(value => this.setFresh(key, value))
.catch(error => {
// Keep serving stale on refresh failure
console.warn(`Background refresh failed for ${key}:`, error);
})
.finally(() => this.refreshPromises.delete(key));
this.refreshPromises.set(key, refreshPromise);
}
}
Step 2: Frequency-Based Retention (Hot/Cold Data)
// Track access patterns like human memory
class FrequencyAwareCache<T> {
private entries = new Map<string, FrequencyEntry<T>>();
private accessLog = new Map<string, number[]>();
async get(key: string, loadFn: () => Promise<T>): Promise<T> {
this.recordAccess(key);
const entry = this.entries.get(key);
if (entry && !this.shouldEvict(key)) {
return entry.value;
}
const value = await loadFn();
this.set(key, value);
return value;
}
private shouldEvict(key: string): boolean {
const accessHistory = this.accessLog.get(key) || [];
const now = Date.now();
// Recent high frequency = keep longer
const recentAccesses = accessHistory.filter(time => now - time < 60_000);
const accessFrequency = recentAccesses.length;
// Adaptive TTL based on access pattern
const baseTtl = 60_000; // 1 minute
const frequencyMultiplier = Math.min(accessFrequency, 10);
const adaptiveTtl = baseTtl * (1 + frequencyMultiplier);
const entry = this.entries.get(key)!;
return now - entry.cachedAt > adaptiveTtl;
}
}
Real-World Case Study: Netflix’s Multi-Tier Caching
Challenge: Serve 200M+ users with sub-100ms response times globally.
Solution: Memory-inspired cache hierarchy:
- L1: JVM heap cache (1-10ms)
- L2: SSD-based cache (10-50ms)
- L3: Origin servers (50-200ms)
Results:
- 99.9% availability during cache failures
- 40ms average response time globally
- $100M+ annual infrastructure savings
Your Caching Transformation Checklist
Before Implementing Caching
- What’s the read/write ratio?
- How stale can data be before it’s wrong?
- What happens if the cache is completely down?
Cache Design Review
- Does it handle thundering herds?
- Can it serve stale data gracefully?
- Are invalidation semantics clear?
Conclusion: Cache Like Your Brain, Not Like a Database
- Today: Identify your highest-traffic read operation and add stale-while-revalidate
- This week: Implement frequency-based TTL for your hottest data
- This month: Add circuit breakers to prevent cache failures from cascading
Remember: Perfect cache hit rates don’t matter if cache misses take down production.
Design for graceful forgetting, not perfect memory.
References & Deep Dives
- Caffeine Cache - Java cache with memory-inspired eviction
- Facebook’s TAO - Social graph caching at scale
- Redis Best Practices - Production caching patterns