Mental Models for Caching: Stop Breaking Production with Cache Logic

8/13/2025

performance · caching · reliability · distributed-systems

Distributed Systems12 min read4 hours to implement

TL;DR: Caching is easy to speed up and hard to make correct. Think like human memory—recency, frequency, and forgetting—to avoid the 90% of cache bugs that take down production.

The Cache Bug That Cost $2.8M in Revenue

Black Friday, 11:47 PM. The e-commerce site’s cache “optimization” decides to refresh all product prices simultaneously. Database connections max out. Cache misses cascade. The site goes down for 18 minutes during peak shopping hours.

The bug wasn’t in the cache implementation—it was in the mental model.

Most developers think of caches as “fast storage.” But caches are forgetting systems. Understanding when and how they forget prevents 90% of production incidents.

Why Smart Engineers Write Broken Cache Logic

Traditional caching approaches create hidden failure modes:

The thundering herd: Cache expires, everyone hits the database simultaneously
Cache stampede: Popular keys expire during traffic spikes
Stale-but-broken: Serving cached errors instead of fresh data
Consistency nightmares: Multiple cache layers with different expiration times

The core problem: optimizing for hit rate instead of failure isolation.

The Core Insight: Model Caches Like Human Memory

Great caches behave like healthy human memory: they remember important things longer, forget gracefully, and don’t let forgetting break the whole system.

Mental Model: The Three Memory Systems

Human Memory           →    Cache Design
─────────────────────       ─────────────
Sensory (immediate)    →    L1: In-process
Short-term (seconds)   →    L2: Redis/Memcached  
Long-term (lifetime)   →    L3: Database/CDN

Forgetting Strategy:
├── Recency: Recently used = important
├── Frequency: Often used = important  
└── Graceful degradation: Forgetting doesn't break thinking

Implementation: From Naive Caching to Memory-Inspired Design

Step 1: Stale-While-Revalidate (Like Human Memory)

// ❌ Naive caching: Everyone waits for fresh data
async function getUser(id: string): Promise<User> {
  const cached = cache.get(`user:${id}`);
  if (cached && !isExpired(cached)) {
    return cached.value;
  }
  
  // Problem: Everyone waits here during cache miss
  const fresh = await database.users.findById(id);
  cache.set(`user:${id}`, fresh, { ttl: 3600 });
  return fresh;
}

// ✅ Stale-while-revalidate: Return fast, update in background
class MemoryLikeCache<T> {
  private cache = new Map<string, CacheEntry<T>>();
  private refreshPromises = new Map<string, Promise<T>>();
  
  async get(key: string, loadFn: () => Promise<T>): Promise<T> {
    const entry = this.cache.get(key);
    const now = Date.now();
    
    // Fresh hit: return immediately
    if (entry && entry.freshUntil > now) {
      return entry.value;
    }
    
    // Stale hit: return stale, refresh in background
    if (entry && entry.staleUntil > now) {
      this.backgroundRefresh(key, loadFn);
      return entry.value;
    }
    
    // Miss: load fresh (but prevent thundering herd)
    return this.loadWithSingleFlight(key, loadFn);
  }
  
  private async backgroundRefresh(key: string, loadFn: () => Promise<T>) {
    if (this.refreshPromises.has(key)) return; // Already refreshing
    
    const refreshPromise = loadFn()
      .then(value => this.setFresh(key, value))
      .catch(error => {
        // Keep serving stale on refresh failure
        console.warn(`Background refresh failed for ${key}:`, error);
      })
      .finally(() => this.refreshPromises.delete(key));
    
    this.refreshPromises.set(key, refreshPromise);
  }
}

Step 2: Frequency-Based Retention (Hot/Cold Data)

// Track access patterns like human memory
class FrequencyAwareCache<T> {
  private entries = new Map<string, FrequencyEntry<T>>();
  private accessLog = new Map<string, number[]>();
  
  async get(key: string, loadFn: () => Promise<T>): Promise<T> {
    this.recordAccess(key);
    
    const entry = this.entries.get(key);
    if (entry && !this.shouldEvict(key)) {
      return entry.value;
    }
    
    const value = await loadFn();
    this.set(key, value);
    return value;
  }
  
  private shouldEvict(key: string): boolean {
    const accessHistory = this.accessLog.get(key) || [];
    const now = Date.now();
    
    // Recent high frequency = keep longer
    const recentAccesses = accessHistory.filter(time => now - time < 60_000);
    const accessFrequency = recentAccesses.length;
    
    // Adaptive TTL based on access pattern
    const baseTtl = 60_000; // 1 minute
    const frequencyMultiplier = Math.min(accessFrequency, 10);
    const adaptiveTtl = baseTtl * (1 + frequencyMultiplier);
    
    const entry = this.entries.get(key)!;
    return now - entry.cachedAt > adaptiveTtl;
  }
}

Real-World Case Study: Netflix’s Multi-Tier Caching

Challenge: Serve 200M+ users with sub-100ms response times globally.

Solution: Memory-inspired cache hierarchy:

L1: JVM heap cache (1-10ms)
L2: SSD-based cache (10-50ms)
L3: Origin servers (50-200ms)

Results:

99.9% availability during cache failures
40ms average response time globally
$100M+ annual infrastructure savings

Your Caching Transformation Checklist

Before Implementing Caching

What’s the read/write ratio?
How stale can data be before it’s wrong?
What happens if the cache is completely down?

Cache Design Review

Does it handle thundering herds?
Can it serve stale data gracefully?
Are invalidation semantics clear?

Conclusion: Cache Like Your Brain, Not Like a Database

Today: Identify your highest-traffic read operation and add stale-while-revalidate
This week: Implement frequency-based TTL for your hottest data
This month: Add circuit breakers to prevent cache failures from cascading

Remember: Perfect cache hit rates don’t matter if cache misses take down production.

Design for graceful forgetting, not perfect memory.

References & Deep Dives

Caffeine Cache - Java cache with memory-inspired eviction
Facebook’s TAO - Social graph caching at scale
Redis Best Practices - Production caching patterns