Last Updated: January 8, 2026
Throughout this section, we have discussed stream processing as a concept. We have seen it as the speed layer in Lambda Architecture and as the sole processing paradigm in Kappa Architecture.
But how do streaming engines actually work? What makes them capable of processing millions of events per second while maintaining exactly-once semantics?
Streaming engines are complex distributed systems that handle continuous data flows. They must maintain state across events, handle failures gracefully, and provide the primitives for building real-time applications. Understanding how they work helps you choose the right engine and use it effectively.
In this chapter, you will learn:
Before diving into specific engines, let us establish the foundational concepts they all share. The engine changes, but the mental model stays the same.
In streaming, time is surprisingly tricky because events rarely arrive exactly when they happen.
| Time Type | Definition | Use Case |
|---|---|---|
| Event time | When the event actually occurred | Accurate analytics |
| Processing time | When the engine processes it | Simple, low-latency |
| Ingestion time | When event enters the system | Compromise |
Most production analytics pipelines prefer event time because correctness matters, even if it adds complexity.
If you use event time, you immediately hit a problem: events can arrive late. So when do you decide a window is “done”?
That is what watermarks solve. Watermarks track progress through event time, signaling that no events earlier than the watermark will arrive.
So if your watermark is 10:00, the engine is saying it does not expect any more events earlier than 10:00. If something earlier arrives, it is considered late.
In real systems, you usually set an allowed lateness window and accept that sometimes data arrives outside that window.
Streaming systems often compute aggregates over time: revenue per hour, clicks per minute, error rate in the last five minutes.
Windows define how events are grouped.
| Window Type | Description | Use Case |
|---|---|---|
| Tumbling | Fixed, non-overlapping | Hourly reports |
| Sliding | Fixed, overlapping | Moving averages |
| Session | Dynamic, activity-based | User sessions |
A good way to think about it: tumbling windows are calendars, sliding windows are rolling cameras, and session windows follow user behavior.
Engines also differ in how they behave under failures.
| Guarantee | Description | Complexity |
|---|---|---|
| At-most-once | Events may be lost | Simple |
| At-least-once | Events may be duplicated | Medium |
| Exactly-once | Each event processed once | Complex |
Distributed systems fail in awkward ways: crashes mid-write, retries, partial commits, network partitions. The challenge is not processing an event. The challenge is producing results without double-counting or losing updates.
Most “exactly-once” implementations rely on two mechanisms:
Flink is one of the most widely used open-source engines for stream processing, especially when you need low latency, event-time correctness, and strong state management.
It is often described as “true streaming” because it processes events continuously rather than in micro-batches.
Flink is a distributed system with a control plane and a data plane.
JobManager
TaskManagers
Client
Flink processes events one at a time (true streaming), unlike micro-batch approaches:
Flink builds your program into a dataflow graph.
This is why Flink can deliver very low latency while still supporting complex stateful computations.
Real streaming applications almost always need state:
Flink handles state transparently through a state backend:
In practice, production Flink deployments with large state often rely on RocksDB because it scales beyond memory.
Flink achieves exactly-once through distributed snapshots:
The important point is that the snapshot represents a consistent global state of the entire pipeline, not just one operator.
Exactly-once also depends on the sink. Flink can maintain consistent internal state, but you still need sinks that commit outputs in a way that matches checkpoint boundaries.
| Feature | Description |
|---|---|
| Exactly-once | Checkpoint-based, with transactional sinks |
| Event time | First-class support with watermarks |
| Savepoints | Manual snapshots for upgrades/migration |
| State TTL | Automatic expiration of state entries |
| Queryable state | Query state directly without sink |
| SQL/Table API | High-level APIs alongside DataStream |
Kafka Streams is not a cluster or a separate service. It is a library you embed into your application. You scale it the same way you scale any stateless service: run more instances.
Each application instance runs the Kafka Streams library. Kafka is both the input and the backing system for state and recovery.
The key difference from Flink:
That makes Kafka Streams appealing when you want operational simplicity and you are already fully committed to Kafka.
Kafka Streams stores state locally, usually in RocksDB, and makes it fault tolerant by logging changes back into Kafka.
On restart:
This design is elegant because Kafka is both the event log and the recovery mechanism.
Kafka Streams achieves exactly-once through Kafka transactions:
All operations are atomic:
If the transaction commits, the update is visible and offsets move forward. If it fails, none of it becomes visible.
| Feature | Description |
|---|---|
| No cluster | Library, deploy with application |
| Exactly-once | Kafka transactions |
| Interactive queries | Query local state via API |
| Fault tolerance | Changelog topics for state |
| Simple deployment | Standard application deployment |
| Tight Kafka integration | Native Kafka performance |
Spark Structured Streaming treats streams as a sequence of small batches. It is streaming built on top of Spark’s batch engine.
Instead of processing each event as soon as it arrives, Spark collects events for a short interval, then runs a batch computation.
Spark also introduced a continuous processing mode aimed at lower latency. It can get closer to true streaming latency, but it comes with constraints and is not as broadly used as the standard micro-batch model.
Continuous mode trades throughput for lower latency.
Spark maintains state for aggregations and joins, and persists it via checkpointing to a durable location. It also supports event-time features such as watermarks and allowed lateness.
| Feature | Description |
|---|---|
| Unified API | Same code for batch and stream |
| Mature ecosystem | SparkSQL, MLlib, GraphX |
| High throughput | Batch efficiency for each micro-batch |
| Late data handling | Watermarks and allowed lateness |
| Output modes | Append, Complete, Update |
| Managed cloud options | Databricks, EMR, Dataproc |
| Feature | Flink | Kafka Streams | Spark Streaming |
|---|---|---|---|
| Processing model | True streaming | True streaming | Micro-batch |
| Latency | Milliseconds | Milliseconds | 100ms+ |
| Deployment | Cluster | Library | Cluster |
| State size | Very large | Large | Large |
| Exactly-once | Checkpointing | Kafka transactions | Checkpointing |
| SQL support | Yes | KSQL separate | Yes |
| Source flexibility | Any source | Kafka only | Many sources |
| Engine | Best For |
|---|---|
| Flink | Low latency, complex event processing, large state |
| Kafka Streams | Kafka-centric, simple deployment, microservices |
| Spark Streaming | Unified batch/stream, ML pipelines, SQL users |
Streaming engines power real-time data processing with sophisticated mechanisms: