AlgoMaster Logo

Cache Invalidation

Last Updated: January 9, 2026

Ashish

Ashish Pratap Singh

"There are only two hard things in Computer Science: cache invalidation and naming things."

This quote from Phil Karlton has become a cliche, but it endures because it captures a real truth. Cache invalidation is genuinely difficult, and most developers underestimate it until they have been burned.

The moment you introduce a cache, you create two sources of truth for the same data: the database and the cache. Keeping them in sync is the cache invalidation problem.

Get it wrong, and users see stale data, experience inconsistencies, or worse, make decisions based on outdated information.

In this chapter, we will explore:

  • What is cache invalidation?
  • Why is it so hard?
  • Invalidation strategies
  • Race conditions and how to prevent them
  • Cache stampede and mitigation techniques
  • Invalidation in distributed systems
  • Best practices for production systems

What is Cache Invalidation?

Cache invalidation is the process of removing or updating cached data when the underlying source data changes. The goal is to ensure that applications never serve stale data beyond an acceptable threshold.

When data is updated in the database, the cached copy becomes stale. Without proper invalidation, the application continues serving the old value. The challenge is detecting when data has changed and ensuring the cache reflects those changes promptly.

Why is Cache Invalidation Hard?

Cache invalidation would be simple if you only had to deal with one cache, one writer, and synchronous operations. In reality, you face:

Multiple Writers

When multiple application instances can update data, coordinating cache invalidation becomes complex. Each writer needs to invalidate the cache, but they might do so in different orders or at different times.

Distributed Systems

In a distributed environment, you have multiple cache nodes, network partitions, and varying latencies. A cache invalidation message might arrive at different nodes at different times, or not arrive at all.

Race Conditions

The most insidious problem is race conditions between reads and writes. Consider this timeline:

Thread A started its read before the update but completed its cache write after the invalidation. The cache now contains stale data with no scheduled expiration.

Dependency Chains

Cached data often depends on other data. When a user's profile changes, you might need to invalidate:

  • The user profile cache
  • The user's posts cache (if it includes profile info)
  • The followers' feed caches
  • Search results containing the user
  • Any aggregations that include the user

Tracking these dependencies is error-prone and difficult to maintain.

Invalidation Strategies

There is no perfect invalidation strategy. Each approach trades off between consistency, complexity, and performance. Understanding these trade-offs helps you choose the right strategy for your use case.

1. Time-To-Live (TTL)

The simplest approach: cached data expires automatically after a fixed duration. You do not explicitly invalidate anything; you just wait for entries to expire.

Pros

  • Simple to implement
  • No coordination required
  • Automatic cleanup of stale data
  • Works across distributed systems

Cons

  • Data can be stale for up to TTL duration
  • No control over when data refreshes
  • Short TTLs increase database load
  • Long TTLs increase staleness

When to use:

  • Data that changes infrequently
  • Staleness is acceptable (recommendations, analytics)
  • You cannot track all places where data is cached

2. Event-Driven Invalidation (Delete on Write)

When data changes, explicitly delete the corresponding cache entry. The next read will repopulate the cache with fresh data.

Pros

  • Immediate consistency after invalidation
  • Cache only stores data that has been read
  • Simple mental model

Cons

  • Must track all cache keys affected by a change
  • First read after invalidation hits database
  • Race conditions possible (covered later)

When to use:

  • Data must be fresh after updates
  • You can enumerate all affected cache keys
  • Read-heavy workloads where occasional cache misses are acceptable

3. Write-Through

Update the cache and database together as a single operation. The cache is always up-to-date because every write updates both.

Pros

  • Cache always contains latest data
  • No cache miss after writes
  • Simple consistency model

Cons

  • Higher write latency (two writes)
  • Cache stores data that may never be read
  • If cache write fails after DB write, inconsistency occurs

When to use:

  • Reads typically follow writes (user updates their profile, then views it)
  • Write latency is acceptable
  • Data is frequently accessed after being written

4. Write-Behind (Write-Back)

Write to the cache immediately, then persist to the database asynchronously. This prioritizes write performance over durability.

Pros

  • Very low write latency
  • Can batch multiple writes to database
  • Absorbs write spikes

Cons

  • Data loss risk if cache fails before persistence
  • Complex failure handling
  • Eventually consistent
  • Not suitable for critical data

When to use:

  • Write-heavy workloads
  • Some data loss is acceptable
  • Need to absorb traffic spikes

5. Pub/Sub Invalidation

Broadcast invalidation messages to all cache instances using a publish-subscribe mechanism. This ensures all caches are notified when data changes.

Pros

  • Works with local caches on multiple servers
  • Decouples writers from cache locations
  • Scalable to many subscribers

Cons

  • Additional infrastructure (message broker)
  • Messages can be lost or delayed
  • Eventual consistency

When to use:

  • Multiple application servers with local caches
  • Need to invalidate across a cluster
  • Can tolerate brief inconsistency during message propagation

Strategy Comparison

StrategyConsistencyComplexityWrite LatencyBest For
TTLEventualLowNoneInfrequently changing data
Delete on WriteStrongMediumLowRead-heavy, consistency needed
Write-ThroughStrongMediumHighRead-after-write patterns
Write-BehindEventualHighVery LowWrite-heavy, some loss OK
Pub/SubEventualHighLowDistributed local caches

Race Conditions and Solutions

Race conditions are the primary source of cache invalidation bugs. Let us examine the most common scenarios and their solutions.

The Read-Update Race

This is the classic race condition mentioned earlier:

Solution 1: Delayed Double Deletion

Delete the cache before and after the database update, with a delay to catch in-flight reads.

The delay must exceed the time for a read operation (DB query + cache write). This is not perfect but significantly reduces the race window.

Solution 2: Version-Based Caching

Include a version number with cached data. Only accept writes with the current version.

Solution 3: Compare-And-Set (CAS)

Only update the cache if the current value matches what you expect.

The Update-Update Race

When two concurrent updates happen, the cache might end up with the older value.

Solution: Delete Instead of Update

When invalidating on write, always delete rather than update the cache. Let the next read populate it.

This works because delete operations are idempotent. Two concurrent deletes result in the same state (cache entry removed), whereas two concurrent sets result in whoever runs last "winning."

The Database-Cache Ordering Problem

What if the database update succeeds but the cache invalidation fails?

Solution 1: Transactional Outbox

Write invalidation events to a database table within the same transaction. A background process reads the outbox and performs invalidations.

Solution 2: TTL as Safety Net

Always set a TTL even when using explicit invalidation. If invalidation fails, the cache entry will eventually expire.

Invalidation in Distributed Systems

Distributed systems add layers of complexity to cache invalidation. You must deal with network partitions, message ordering, and clock skew.

Multi-Region Challenges

When your application spans multiple regions, each region might have its own cache. Invalidating across regions introduces latency and potential message loss.

Solutions:

  1. Accept regional staleness: Use TTL and accept that remote regions will be eventually consistent
  2. Sticky sessions: Route users to their "home" region where their writes occur
  3. Global invalidation channel: Use a reliable messaging system (Kafka) to propagate invalidations

Message Ordering

Invalidation messages can arrive out of order, causing a newer value to be overwritten by an older invalidation.

Solution: Use Timestamps or Versions

Network Partitions

During a network partition, invalidation messages might not reach all nodes. When the partition heals, some caches have stale data.

Solutions:

  1. TTL safety net: Always set TTLs so stale data eventually expires
  2. Periodic reconciliation: Background job that checks cache against database
  3. Read repair: On cache hit, occasionally verify against database

Best Practices

These practices come from production experience with cache invalidation at scale.

1. Prefer Delete Over Update

When invalidating, delete the cache entry rather than updating it. Deletes are idempotent; updates can race.

2. Always Use TTL as a Backstop

Even with explicit invalidation, set a TTL. If invalidation fails for any reason, the TTL ensures eventual consistency.

3. Track Cache Dependencies

Document which cache keys are affected by each type of data change. Consider using a naming convention that makes dependencies clear.

4. Invalidate Close to the Source

Invalidate as close to the database write as possible. The longer the gap, the greater the risk of stale reads.

5. Monitor Invalidation Health

Track metrics to detect invalidation problems:

MetricWhat It Tells You
Cache hit rateSudden drop may indicate over-invalidation
Stale read rateSample reads and compare cache vs DB
Invalidation latencyTime from DB write to cache invalidation
Invalidation failuresFailed attempts to delete/update cache

6. Test Invalidation Logic

Write tests specifically for race conditions and failure scenarios.

7. Consider Cache-Aside Pattern

The cache-aside pattern minimizes invalidation complexity because the cache only contains data that has been read. Writes go directly to the database and invalidate the cache.

Summary

Cache invalidation is hard because it requires coordinating state between two systems (cache and database) in the presence of concurrency, failures, and distributed systems challenges.

The key takeaways:

TTL is your foundation. Even with explicit invalidation, always set a TTL. It is your safety net when everything else fails.

Delete, do not update. When invalidating on write, delete the cache entry rather than updating it. Deletes are idempotent and avoid update races.

Race conditions are real. The read-update race is the most common bug. Solutions include delayed double deletion, versioning, and locking.

Prevent stampedes. When popular cache entries expire, use locking, probabilistic early refresh, or stale-while-revalidate to prevent thundering herds.

Distributed systems amplify the problem. Network partitions, message ordering, and cross-region latency all complicate invalidation. Design for eventual consistency and use TTLs as a backstop.

Monitor and test. Track cache hit rates and stale read rates. Write tests for concurrent scenarios.

The goal is not perfect consistency, which is often impossible without sacrificing performance. The goal is bounded staleness: knowing the maximum time your cache can be out of sync and ensuring that time is acceptable for your use case.