AlgoMaster Logo

What is Caching?

Last Updated: January 16, 2026

Ashish

Ashish Pratap Singh

3 min read

Caching is one of those ideas that feels almost too simple, until you realize it powers the speed of nearly every modern app you use.

At a high level, a cache is a fast storage layer that keeps copies of frequently used data so future requests can be served quickly, without repeatedly hitting slower systems like databases or external APIs.

Done well, caching can dramatically reduce latency, lower infrastructure cost, and improve reliability under traffic spikes.

Why Caching Matters

Consider a social media application. When a user opens their feed, the application must:

  1. Authenticate the user (database query)
  2. Fetch the user's profile (database query)
  3. Retrieve the list of followed accounts (database query)
  4. Fetch recent posts from followed accounts (many database queries)
  5. Get like counts and comment counts for each post (more queries)
  6. Retrieve profile pictures for all post authors (even more queries)

Without caching, a single feed load might trigger 50+ database queries. Multiply by millions of users, and no database can keep up.

Performance Impact:

MetricWithout CacheWith CacheImprovement
Response time500ms50ms10x faster
Database queries/sec500,00050,00010x reduction
Database CPU95%30%Headroom for growth
Infrastructure cost$50,000/month$20,000/month60% savings

The Anatomy of a Cache

At its core, a cache is a key-value store optimized for fast lookups:

Key Components

Keys: Unique identifiers for cached data. Good keys are:

  • Deterministic: The same request always produces the same key
  • Descriptive: You can understand what data the key represents
  • Collision-free: Different data should never share a key

Values: The cached data itself. Can be:

  • Serialized objects (JSON, Protocol Buffers, MessagePack)
  • Raw strings or binary data
  • Pre-computed results (HTML fragments, aggregated statistics)

Metadata: Information about the cached entry:

  • Creation timestamp
  • Expiration time (TTL)
  • Access count (for eviction decisions)
  • Size in bytes

Cache Layers

Caching happens at multiple levels in a system. Each layer has different characteristics:

Browser Cache

The closest cache to the user. Stores static assets and API responses based on HTTP headers.

CharacteristicValue
LocationUser's device
Latency~0ms
Size50MB - 2GB typically
ControlHTTP headers (Cache-Control, ETag)
Best forStatic assets, infrequently changing data

CDN Cache

Content Delivery Networks cache content at edge locations worldwide.

CharacteristicValue
LocationEdge servers globally
Latency10-50ms (geographic proximity)
SizeTerabytes across the network
ControlHTTP headers, CDN configuration
Best forStatic content, media, public pages

Application Cache

In-memory cache within the application process itself.

CharacteristicValue
LocationApplication server memory
Latency<1ms (in-process)
SizeLimited by server RAM
ControlApplication code
Best forHot data, computed values, session state

The downside: each application instance has its own cache, leading to inconsistency and memory duplication across instances.

Distributed Cache

A separate caching service shared by all application instances. Redis and Memcached are the most common choices.

CharacteristicValue
LocationDedicated cache servers
Latency1-5ms (network hop)
SizeHundreds of GB to TB
ControlApplication code, cache client
Best forShared state, session storage, database query results

Database Buffer Pool

Databases maintain their own cache of frequently accessed pages in memory.

CharacteristicValue
LocationDatabase server memory
Latency<1ms for cached pages
SizeConfigured (often 70-80% of RAM)
ControlDatabase configuration
Best forTransparent to application, automatic

Cache Hit and Miss

When the application requests data from the cache, two things can happen:

Cache Hit: The data exists in the cache. Return it immediately.

Cache Miss: The data is not in the cache. Fetch from the source, optionally store in cache, then return.

Cache Hit Ratio

The percentage of requests served from cache versus total requests:

Hit RatioInterpretation
> 95%Excellent. Cache is highly effective.
80-95%Good. Normal for most applications.
50-80%Moderate. May need tuning or different caching strategy.
< 50%Poor. Cache may be undersized or data not cache-friendly.

A 90% hit ratio means 90% of requests avoid the database entirely. If your database can handle 10,000 QPS, a 90% hit ratio means your system can effectively handle 100,000 QPS.

What to Cache

Not all data benefits equally from caching. Good candidates:

High Cache Value

Data TypeWhy It Works
Read-heavy dataSame data requested many times
Expensive computationsAggregations, joins, transformations
Slow data sourcesExternal APIs, legacy systems
Stable dataConfiguration, reference data

Poor Cache Candidates

Data TypeWhy It Does Not Work
Write-heavy dataCache invalidation overhead exceeds benefit
Unique requestsEach request needs different data
Large objectsConsume cache memory quickly
Time-sensitive dataStale data is unacceptable

The 80/20 Rule

In most applications, 20% of the data serves 80% of the requests. Focus caching efforts on that hot 20%:

Cache Consistency

The hardest problem in caching is keeping cached data consistent with the source of truth. When the underlying data changes, the cache can become stale.

Consistency Approaches

ApproachHow It WorksTrade-off
TTL-basedData expires after a time periodSimple but allows staleness up to TTL
InvalidationExplicitly remove/update cache on changesConsistent but complex to implement
Write-throughUpdate cache and database togetherConsistent but higher write latency
Eventual consistencyAccept temporary stalenessHigh performance but requires tolerance

The right approach depends on your consistency requirements. A product price being stale for 5 minutes might be acceptable. A bank account balance being stale for 5 seconds is not.

Caching Anti-Patterns

Caching is not free. These patterns often cause more problems than they solve:

Cache Everything

Blindly caching all data leads to:

  • Memory exhaustion
  • Low hit ratios (cache filled with rarely accessed data)
  • Increased complexity without proportional benefit

Infinite TTL

Data that never expires:

  • Becomes stale indefinitely
  • Requires explicit invalidation for every change
  • Creates subtle bugs when invalidation is missed

Cache as Primary Storage

Treating the cache as the source of truth:

  • Data loss on cache failure or eviction
  • No durability guarantees
  • Recovery becomes impossible

The Thundering Herd

When a popular cache entry expires, many requests simultaneously hit the database:

Solutions include locking (only one request fetches), probabilistic early expiration, and background refresh.

Cache in Distributed Systems

In distributed systems, caching introduces additional considerations:

Consistency Across Nodes

When multiple application servers share a cache, or when data is replicated:

Cache entries must be invalidated across all nodes. Replication introduces latency before all nodes are consistent.

Data Partitioning

Large caches partition data across multiple nodes. Consistent hashing minimizes data movement when nodes are added or removed:

Failure Handling

What happens when cache is unavailable?

StrategyBehaviorUse Case
Fail openBypass cache, hit database directlyCache is optional optimization
Fail closedReturn error to userCache data is critical
Graceful degradationServe stale data if availableAvailability over consistency

Measuring Cache Performance

Key metrics to monitor:

MetricWhat It Tells You
Hit ratioCache effectiveness
Latency (p50, p99)Response time impact
Memory usageCapacity utilization
Eviction rateWhether cache is undersized
Miss latencyDatabase impact on misses

A sudden drop in hit ratio or spike in evictions signals a problem: either traffic patterns changed, the cache is undersized, or keys are being invalidated too aggressively.

Summary

Caching is fundamental to building scalable systems. Key takeaways:

  • Caching stores frequently accessed data in a faster storage layer to reduce latency and backend load.
  • Multiple cache layers exist: browser, CDN, application, distributed cache, and database buffer pool. Each has different characteristics and use cases.
  • Hit ratio is the key metric. A 90% hit ratio means 10x effective capacity improvement.
  • Cache the right data: read-heavy, expensive to compute, stable data. Avoid caching write-heavy or unique request data.
  • Consistency is the hard problem. Choose between TTL-based expiration, explicit invalidation, or write-through patterns based on your requirements.
  • Watch for anti-patterns: caching everything, infinite TTLs, treating cache as primary storage, and thundering herd.

Understanding what caching is and why it matters sets the stage for the next question: how exactly should your application interact with the cache? This brings us to caching patterns, starting with the most common one, the cache-aside pattern, where the application explicitly manages what goes in and out of the cache.