AlgoMaster Logo

Kappa Architecture

Last Updated: January 8, 2026

Ashish

Ashish Pratap Singh

Lambda Architecture solves the latency-accuracy trade-off by running batch and stream processing in parallel. It works, but maintaining two separate codebases for the same logic is painful. Every change requires updates to both systems. Testing is complex. Bugs can cause the two implementations to diverge.

In 2014, Jay Kreps (co-creator of Apache Kafka) proposed a simpler alternative: Kappa Architecture. The core idea is radical. Instead of running batch and stream in parallel, use only stream processing. If you need to reprocess historical data, replay the event log through the same streaming system.

Kappa Architecture trades the complexity of dual systems for the challenge of making stream processing robust enough to handle all use cases. With modern streaming engines and event logs, this is increasingly practical.

In this chapter, you will learn:

  • The core idea of Kappa Architecture
  • How event log replay enables reprocessing
  • Comparing Kappa with Lambda Architecture
  • When Kappa works and when it does not
  • Implementing Kappa with modern tools

The Core Idea

Kappa Architecture takes the messy “two pipelines” problem and replaces it with a single, consistent path:

The Key Insight

Everything is a stream.

Batch processing is just a special case. It is a stream with an end. If your stream processor can handle a continuous stream, it can also handle a bounded stream.

Kappa works when you have three things:

  1. An immutable log of all events: Every event is appended and never modified.
  2. A stream processor that can replay from any point: If your logic changes, you re-run the same computation by replaying events.
  3. Views that can be rebuilt: Your outputs are materialized views, not the source of truth.

If those conditions hold, you do not need a separate batch layer. Reprocessing is simply replaying history through the same pipeline.

Reprocessing in Kappa

Reprocessing is the moment Kappa becomes real.

When you find a bug, change business rules, or want to add a new derived field, you do not spin up a separate batch job. You do a controlled cutover.

The common pattern

  1. Start a new streaming job with the updated code (v2)
  2. Replay events from the log (often from the beginning, sometimes from a checkpoint date)
  3. Write results into new output views (v2 views)
  4. Wait until v2 catches up to real-time
  5. Switch reads to v2
  6. Delete v1 views (optional, after validation)

Why this is powerful

  • You keep one codepath for transformation logic.
  • You avoid the classic Lambda problem where batch and stream versions drift apart.
  • You can iterate faster because every change follows the same operational playbook.

The cost is that replay can be expensive if your history is huge and your pipeline is heavy.

Lambda vs Kappa

Architecture Comparison

Detailed Comparison

AspectLambdaKappa
Processing systemsTwo (batch + stream)One (stream only)
CodebaseDuplicate logicSingle codebase
ReprocessingRe-run batch jobReplay events through stream
ComplexityOperational (two systems)Conceptual (stream everything)
LatencyBatch: hours, Stream: secondsAlways streaming latency
AccuracyBatch is authoritativeStream must be robust
Event log retentionOptional for batchRequired for reprocessing
MaturityProven at scaleNewer, evolving

When Lambda Still Wins

Kappa is not automatically better. There are situations where Lambda is still the more practical choice.

ScenarioWhy Lambda
Very complex aggregationsSome computations need batch semantics
Petabyte-scale historyReplaying years of data takes too long
Team expertiseTeam knows batch, stream is new
Existing infrastructureHadoop cluster already running
Strict accuracy requirementsBatch provides stronger guarantees

When Kappa Wins

Kappa shines when simplicity and speed of iteration matter, and when your system is already event-driven.

ScenarioWhy Kappa
Simpler operationsOne pipeline to deploy, monitor, and debug
Unified logicSingle codebase, no divergence
Fast iterationOne deployment path for changes
Stream-first use caseYou are building real-time features anyway
Modern stackUsing Kafka, Flink, cloud-native

The Event Log

The event log is the foundation of Kappa. It is the system of record.

Your views can be deleted and rebuilt. Your processors can be upgraded. Your source of truth remains the log.

Requirements for the Event Log

RequirementWhy
ImmutableEvents never change, only append
OrderedWithin partition, events have deterministic order
DurableReplicated storage, survives failures
ReplayableCan seek to any offset and replay
Long retentionMust keep history for reprocessing

For true Kappa, you typically want long retention, often “effectively infinite” through tiered storage.

Apache Kafka as Event Log

Kafka is the most common implementation because it supports the event-log model naturally:

  • Partitioned log with ordering per partition
  • Consumer groups and offsets
  • Ability to replay from any offset
  • High throughput reads and writes
  • Retention controls

It also supports multiple consumers at once, for example:

  • the main streaming job
  • a separate backfill or reprocessing job
  • downstream consumers such as analytics or monitoring

Retention Strategies

Retention is a design decision. It shapes what kind of “Kappa” you can realistically support.

StrategyDescriptionUse Case
Time-basedKeep 7 days, 30 days, etc.Logs, metrics
Size-basedKeep 1TB per partitionWhen storage is limited
CompactKeep latest value per keyChangelog streams
InfiniteNever deleteFull reprocessing capability

For true Kappa Architecture, you need long or infinite retention.

Tiered Storage

Tiered storage makes long retention practical by splitting data across cost tiers:

  • Hot tier for recent data on fast local disks
  • Cold tier for older data in object storage (S3, GCS, ADLS)

Modern Kafka supports tiered storage:

You still get the same log abstraction. Consumers can replay old data even if it lives in the cold tier. You get reprocessing capability without paying premium storage costs forever.

Reprocessing in Practice

Kappa sounds simple on paper: replay the log, rebuild the views, move on. In practice, the trick is doing it without breaking production, confusing users, or losing confidence in your numbers.

The good news is that the workflow is well understood and repeatable.

The Reprocessing Workflow

Think of reprocessing as a controlled migration from v1 to v2 of your streaming job and its output views.

Phase 1: Normal processing

  • The application reads from Old View (v1)
  • Old Stream Job (v1) consumes current events from the event log
  • Old View stays up to date

Phase 2: Deploy new version

  • You deploy New Stream Job (v2) with the fixed logic
  • It starts consuming from the event log at offset 0 (or another chosen point)
  • It writes to a New View (v2)

At this moment, you are running two pipelines:

  • v1 continues serving users
  • v2 is replaying history in the background

Phase 3: Catch up

  • v2 replays historical events as fast as possible
  • Eventually, it reaches “now”
  • At that point, v2 is producing results at the same pace as v1

Phase 4: Cutover

  • You switch the application to query New View (v2)
  • This switch should be atomic from the application’s point of view

Phase 5: Decommission

  • You stop Old Stream Job (v1)
  • You delete or archive Old View (v1)
  • v2 becomes the new normal

The key principle: never mutate the old view in place. Build the new one side-by-side, then switch.

Challenges with Reprocessing

ChallengeSolution
Time to reprocessParallelize, more resources
Storage for two viewsTemporary during transition
Cutover coordinationAtomic switch in load balancer
Consistency during transitionAccept brief inconsistency
Ordering guaranteesSame partitioning as original

A practical mindset: during reprocessing, you are intentionally running in a “migration mode.” You plan for temporary cost and temporary complexity, but you make the cutover clean.

Parallel Reprocessing

Reprocessing speed is often limited by how fast you can read and process the log. Partitioning is your friend.

If the event log has 100 partitions, you can assign them across multiple instances of the new job:

  • Instance 1: partitions 0–24
  • Instance 2: partitions 25–49
  • Instance 3: partitions 50–74
  • Instance 4: partitions 75–99

All instances write into the same New View, usually through a sink that can handle concurrent writers safely.

Kappa Architecture Implementation

A complete Kappa setup typically has these pieces:

1. Event log

Kafka (or similar) with long retention. This is your system of record.

2. Stream processing engine

Flink, Kafka Streams, or Spark Streaming, often doing stateful processing and windowing.

3. Serving views

Stores optimized for query patterns:

  • OLAP: Druid, Pinot
  • Key-value or wide-column: Cassandra
  • Low-latency: Redis for hot aggregates

4. Cache layer

Optional but common, especially for dashboards and APIs that need very fast reads.

5. API and dashboards

The application queries the serving views, not the event log.

Technology Stack

ComponentOptions
Event LogKafka, Pulsar, Kinesis, Event Hubs
Stream ProcessorFlink, Kafka Streams, Spark Streaming
Serving StoreCassandra, Druid, Pinot, Redis
Schema RegistryConfluent Schema Registry, AWS Glue

Stream Processing Requirements

Kappa moves the burden of correctness to streaming. If your stream processing is flimsy, your entire architecture becomes fragile.

A practical Kappa-capable processor needs:

RequirementWhy
Exactly-once semanticsAccurate results despite failures
Stateful processingMaintain aggregations across events
CheckpointingRecover from failures without reprocessing all
WindowingHandle time-based aggregations
Late event handlingProcess events that arrive out of order

Handling Edge Cases

Late Arriving Events

Events can arrive out of order. How do you handle an event from 2 hours ago?

Common strategies:

  • Watermarks: estimate event-time progress and allow lateness
  • Allowed lateness window: accept events up to N minutes late and update results
  • Late data side stream: route very late events to another topic for separate handling
  • Continuous correction: update views when late events arrive, even if it changes past windows

Which one you choose depends on what “correctness” means for your product. Dashboards often allow corrections. Billing systems often require strict cutoffs and explicit adjustments.

Windowed Aggregations

For queries like "events per hour," define windows:

Common window types:

  • Tumbling windows: fixed, non-overlapping intervals. Example: 09:00–10:00, 10:00–11:00
  • Sliding windows: overlapping intervals. Example: last 60 minutes updated every minute
  • Session windows: dynamic, based on activity gaps. Example: a session ends after 30 minutes of inactivity

Window choice is not just a technical detail. It directly affects how users interpret the metric.

State Management

Stream processors maintain state for aggregations:

The typical pattern:

  • keep state locally (often RocksDB) for fast reads and updates
  • periodically checkpoint to durable storage (S3 or HDFS)
  • on failure, restore from the checkpoint and replay from the last consistent point

This is what makes long-running streaming jobs reliable. Without state plus checkpointing, you either lose accuracy or you are forced to replay huge ranges after every incident.

Kappa vs Lambda: Making the Choice

Choosing between Kappa and Lambda is not about picking the “modern” one. It is about matching the architecture to your constraints.

A good decision comes down to two questions:

  1. Do you need real-time results?
  2. If yes, can streaming handle the full workload with the level of correctness you need?

If the answer to the second question is uncertain, Lambda often becomes the safer choice. If the answer is clearly yes, Kappa usually wins on simplicity.

Decision Framework

Summary Comparison

FactorChoose KappaChoose Lambda
Operational simplicity✓ One systemTwo systems
Code simplicity✓ One codebaseDuplicate logic
Reprocessing speedSlower (replay)✓ Batch is fast
Accuracy guaranteesStream must be robust✓ Batch is authoritative
Team expertiseNeed stream skillsBatch is familiar
Petabyte scaleChallenging✓ Batch handles well

Summary

Kappa Architecture simplifies data processing by using only stream processing:

  • Core idea: Everything is a stream. Reprocess by replaying events, not running batch.
  • Event log: The foundation. Kafka with long retention enables reprocessing.
  • Single codebase: Same logic for real-time and reprocessing, no dual maintenance.
  • Reprocessing: Deploy new job, replay from event log, switch views, decommission old.
  • Trade-offs: Simpler operations but requires robust streaming and can be slow for massive replays.
  • Best for: Stream-first use cases, teams with streaming expertise, modern cloud-native stacks.
  • Not for: Petabyte-scale historical processing, complex batch-only computations.