Last Updated: January 7, 2026
A user reports that their order failed. You open your log aggregation system and search for errors around that time. You find thousands of log entries. Some are from the order service, some from payments, some from inventory. Which ones are related to this user's failed order?
Without a way to connect them, you are reduced to guessing based on timestamps. This is where correlation IDs come in.
A correlation ID is a unique identifier that follows a request through every service it touches. Every log entry, every database query, every external API call includes this ID. When something goes wrong, you search for that single ID and see the complete story of what happened.
Correlation IDs are simple in concept but transformative in practice. They turn a haystack of unrelated logs into a coherent narrative.
In this chapter, you will learn:
This technique works hand-in-hand with the logging practices we covered earlier. Structured logs with correlation IDs become exponentially more useful.
Consider a simple request that touches multiple services:
Each service logs its activity:
But at 10:23:45, your system handled 500 requests per second. These 7 log lines are mixed with 3,000 others from the same time window. How do you know which API Gateway request led to which Order Service log, which led to which Payment failure?
Without correlation IDs, you cannot. You are left matching timestamps and hoping for the best.
A correlation ID (also called request ID, trace ID, or transaction ID) is a unique identifier assigned at the entry point of a request and propagated through all downstream services.
Now the same logs become traceable:
Query: correlation_id = "abc-123" returns exactly these 7 logs, showing the complete request flow.
Correlation IDs must be unique across all requests and allow easy propagation across every hop.
Example:
Best for: default choice when you want maximum compatibility and minimal effort.
Example:
Best for: systems where you often sort or scan by time and want IDs that work well in logs and indexes.
Example:
Typical format
Best for: internal systems where readability matters and volume is moderate, or when you add a readable wrapper around a truly unique base ID.
These are time-ordered, numeric IDs generated in a distributed way (commonly 18–19 digits).
Best for: high-volume distributed systems that already use time-ordered numeric identifiers across data stores.
| Format | Length | Sortable | Readability | Best For |
|---|---|---|---|---|
| UUID v4 | 36 | No | Low | General use, compatibility |
| ULID | 26 | Yes | Medium | Time-series queries |
| Custom | Variable | Optional | High | Human debugging |
| Snowflake ID | 18-19 | Yes | Low | High-volume distributed systems |
The correlation ID must flow through every hop in your system. If one service drops the ID, your “single request story” breaks and logs become scattered again.
For synchronous service-to-service calls, HTTP headers are the most common approach.
Typical header names:
X-Correlation-ID: very common custom headerX-Request-ID: popular alternative name (often used by proxies)traceparent: W3C standard for tracing context (works for tracing and can double as correlation)A good propagation flow looks like this:
This turns correlation IDs into a system-wide “breadcrumb trail.”
Every service should implement middleware (or filters/interceptors) that handles correlation IDs automatically. You want developers to get correlation IDs “for free,” not by remembering to add them everywhere.
That is the core loop. Once you have this in place, correlation IDs become a standard part of the request lifecycle.
Propagation is straightforward in a single synchronous thread. It gets tricky when execution hops across threads, async boundaries, or message queues.
In synchronous request handling, thread-local storage works well:
correlationId in ThreadLocal and logging MDCThis is why correlation IDs often “just work” in simple web apps.
Async breaks the thread-local model because the work may resume on a different thread:
There are three common solutions.
Simple and explicit, but easy to forget and messy in deep call chains.
This is the clean “make it automatic” approach. The executor captures context at submission time and restores it when running the task.
Many modern runtimes have a built-in concept of request context:
ContextCoroutineContextcontext.ContextAsyncLocalStorageIf your stack already uses one of these, integrate correlation IDs into that context rather than reinventing it.
Once you introduce queues, you no longer have HTTP headers. The same principle still applies: put the correlation ID in message metadata.
This way, an async payment job is still tied back to the original user request.
In real systems, one ID is rarely enough. Different IDs answer different questions, and mixing them up leads to confusion. A good observability design uses a small set of IDs with clear meanings and consistent propagation rules.
The key idea is: each ID has a scope. Some identify a single request. Others connect many requests into a user journey. Others exist only for tracing.
Here are the IDs you will see most often:
| ID Type | Scope | Purpose |
|---|---|---|
| Correlation ID | Single request across services | Link logs for debugging |
| Session ID | Multiple requests from one session | Track user journey |
| User ID | All activity by one user | User-centric debugging |
| Trace ID | Single request (tracing systems) | Distributed tracing |
| Span ID | Single operation within a trace | Trace hierarchy |
| Request ID | Single HTTP request | Per-service request tracking |
Debugging a single failed request: Search by correlation ID to see all services involved.
Investigating a user's experience: Search by user ID to see all their requests over time.
Analyzing a session: Search by session ID to see the sequence of user actions.
Performance analysis: Use trace ID with distributed tracing tools for timing data.
A good log entry often includes multiple IDs because each one helps in a different way. You are not adding noise, you are making logs searchable from multiple angles.
This lets you:
correlation_id)trace_id)user_id)session_id)Correlation IDs are the foundation. They give you a shared identifier across services. Distributed tracing builds on that by adding structure and timing.
If you use W3C Trace Context (traceparent) for propagation, you get compatibility with modern tracing systems (OpenTelemetry, Jaeger, Zipkin, many managed APMs).
A practical approach many teams adopt:
traceparent for tracing context propagationtrace_id and span_id automaticallyX-Correlation-ID header for clients and support workflowsWhether you keep correlation ID separate or reuse the trace ID depends on your tooling and org conventions. The most important thing is that engineers can reliably search and correlate.
Create the correlation ID at the first entry point:
Do not generate new IDs in downstream services. If they do not receive one, that is a bug in propagation.
Every log entry must include the correlation ID. Use logging framework features to automate this:
Include in:
Include the correlation ID in response headers so clients can reference it:
When a user reports an issue, they can provide this ID for faster debugging.
Prefer standard or widely-used header names:
| Header | Usage |
|---|---|
traceparent | W3C Trace Context (best for tracing compatibility) |
X-Request-ID | Common convention |
X-Correlation-ID | Common convention |
X-B3-TraceId | Zipkin B3 format |
Avoid creating custom names unless you have a specific reason.
Correlation IDs are simple in theory: generate once, propagate everywhere, log consistently. In practice, most failures come from a few predictable mistakes. Fixing them early saves hours during incidents.
If any service fails to propagate the ID, everything downstream becomes disconnected. You will see logs that look correct inside a service, but you cannot stitch the full request together.
Fix: Add tests that verify correlation ID propagation. Use tracing tools to detect missing context.
This is one of the most common mistakes. Each service generates its own correlation ID, so no one can correlate across services.
Fix: Only the entry point generates the ID. All other services receive and propagate it.
Thread-local and request-scoped context works fine in synchronous flows. Async breaks it because work often resumes on a different thread.
Fix: Use context-aware async primitives or explicitly pass the ID.
The correlation ID is most valuable when something goes wrong, because it gives support and engineers a handle to find the exact logs fast.
Make sure it appears in:
If a client reports “checkout failed,” you want them to paste an ID, not a screenshot.
Correlation IDs should not be used as metric labels. With millions of unique values, your metrics system will explode.
Correlation IDs transform distributed debugging from guesswork to precision:
Key implementation points:
Common ID formats: UUID, ULID, or custom formats with timestamps. Use standard headers like X-Correlation-ID or traceparent for compatibility.
Pitfalls to avoid: