Last Updated: January 6, 2026
When discussing system performance, three terms come up repeatedly: latency, throughput, and bandwidth. These concepts are often confused or used interchangeably, but they measure fundamentally different things.
Understanding these metrics is crucial for:
In this chapter, we will break down each concept, explore how they relate to each other, and discuss when each metric matters most.
Before diving into definitions, let us use a simple analogy. Think of a highway connecting two cities:
A highway might have 4 lanes (high bandwidth), but if there is an accident, only 100 cars per hour pass through (low throughput). Meanwhile, each car might take 2 hours to complete the journey (high latency).
This analogy helps explain why these metrics do not always move together.
Latency is the time it takes for a single request to travel from source to destination and back. It measures delay.
In networking, latency is often called round-trip time (RTT), the time from sending a request to receiving a response.
Latency is not a single value. It is the sum of multiple delays:
Latency is typically measured using percentiles:
Why percentiles matter: Average latency hides outliers. A system with 10ms average might have p99 of 500ms, meaning 1% of users experience terrible performance.
Throughput is the amount of work completed per unit of time. It measures volume.
For web systems, throughput is often expressed as requests per second (RPS) or transactions per second (TPS).
A common confusion: bandwidth is theoretical maximum capacity, while throughput is actual achieved rate.
You can never have throughput higher than bandwidth, but throughput is almost always lower due to:
For a single-threaded system:
For a multi-threaded system:
The bottleneck determines maximum throughput. A system is only as fast as its slowest component.
Bandwidth is the maximum rate at which data can be transferred. It measures capacity.
Bandwidth is typically expressed in bits per second (bps): Kbps, Mbps, Gbps.
An important concept that connects bandwidth and latency:
BDP represents how much data can be "in flight" at any moment.
Example:
This means 12.5 MB of data can be traveling through the pipe at any instant. If your TCP window size is smaller than BDP, you will not fully utilize available bandwidth.