Last Updated: January 9, 2026
In the previous chapter, we learned about delivery semantics and how at-least-once delivery retries failed messages. But what happens when a message keeps failing?
If a message has a bug, missing data, or references a resource that does not exist, no amount of retrying will help. Without a solution, these "poison" messages would either block the queue forever or consume resources with infinite retries.
Dead letter queues (DLQs) solve this problem.
A dead letter queue is a special queue that stores messages that cannot be successfully processed. Instead of retrying forever or losing the message, the system moves it to the DLQ after a configured number of attempts.
In this chapter, you will learn:
A dead letter queue is a queue that stores messages that could not be processed successfully. It acts as a holding area for problematic messages that need investigation.
Messages fail for various reasons. Understanding these helps you diagnose and fix issues.
Consumer cannot parse or validate the message.
Message references a resource that does not exist:
The message is valid but the operation cannot complete.
The problem is in the consumer code, not the message.
External service was down during all retry attempts.
Some systems move messages to DLQ when they exceed their time-to-live (TTL), even without processing failure.
When a message lands in a DLQ, you need context to debug it. Most systems attach metadata:
| Field | Purpose |
|---|---|
| Original message | The actual payload |
| Source queue/topic | Where it came from |
| First received timestamp | When processing started |
| Dead lettered timestamp | When moved to DLQ |
| Receive/attempt count | How many tries |
| Last error | What went wrong |
| Consumer ID | Which instance failed |
This information is essential for debugging and deciding how to handle the message.
Messages in a DLQ need attention. Here is a systematic approach:
What to monitor:
Alert thresholds:
| Failure Type | Typical Action |
|---|---|
| Malformed message | Drop (after fixing producer) |
| Missing dependency | Create missing data, then replay |
| Consumer bug | Fix bug, then replay |
| Transient failure | Replay immediately |
| Business validation | Manual intervention or drop |
Replay: Move message back to main queue for reprocessing
Drop: Delete the message (after investigation)
Always log dropped messages for compliance and debugging.
Each queue has its own DLQ:
Pros: Clear ownership, isolated failures Cons: Many queues to monitor
Multiple queues share one DLQ:
Pros: Simpler monitoring, fewer resources Cons: Need metadata to identify source, mixed message types
Automated DLQ processing:
This automates common cases while alerting on exceptions.
Always include:
Rule: DLQ should normally be empty. Any message is worth investigating.
Messages should not live forever in DLQ:
DLQ messages may contain sensitive data:
Dead letter queues are essential for reliable messaging systems:
Key insight: DLQs are not just error buckets. They are an operational tool that provides visibility into system health and a path to recovery. A well-designed DLQ strategy includes monitoring, alerting, investigation tooling, and clear procedures for replay or drop decisions.