Last Updated: February 3, 2026
Lets say your metrics dashboard shows that p99 latency spiked from 200ms to 2 seconds. You know something is wrong, but where? The request touches 8 services. Is it the database? The payment gateway? Network latency between services? A slow cache lookup?
Logs can tell you what happened in each service, but piecing together the timeline across 8 services is tedious. Metrics show aggregate latency but not which component is slow. Correlation IDs link logs together but do not show timing.
Distributed tracing solves this by recording the journey of each request through your system, including exactly how long each step took. It shows you a timeline of every service call, database query, and external API request. When latency spikes, you can look at slow traces and immediately see where the time went.
In this chapter, you will learn:
This chapter builds directly on correlation IDs. Distributed tracing is correlation IDs with structure, timing, and visualization.
A distributed trace is a record of a request's journey through a system. It captures every service, database call, and external API request, along with timing information.
From this trace, you can immediately see:
Without tracing, finding this information would require correlating logs across 5 services and manually calculating timing differences.