AlgoMaster Logo

Latency vs Throughput vs Bandwidth

Last Updated: January 6, 2026

Ashish

Ashish Pratap Singh

When discussing system performance, three terms come up repeatedly: latencythroughput, and bandwidth. These concepts are often confused or used interchangeably, but they measure fundamentally different things.

Understanding these metrics is crucial for:

  • Diagnosing performance bottlenecks
  • Making informed architectural decisions
  • Setting realistic expectations with stakeholders
  • Answering system design interview questions

In this chapter, we will break down each concept, explore how they relate to each other, and discuss when each metric matters most.

The Highway Analogy

Before diving into definitions, let us use a simple analogy. Think of a highway connecting two cities:

  • Bandwidth is the number of lanes on the highway. More lanes mean more cars can travel simultaneously.
  • Throughput is how many cars actually pass through per hour. This depends on traffic conditions, not just the number of lanes.
  • Latency is the time it takes for a single car to travel from one city to the other.

A highway might have 4 lanes (high bandwidth), but if there is an accident, only 100 cars per hour pass through (low throughput). Meanwhile, each car might take 2 hours to complete the journey (high latency).

This analogy helps explain why these metrics do not always move together.

Latency

Latency is the time it takes for a single request to travel from source to destination and back. It measures delay.

In networking, latency is often called round-trip time (RTT), the time from sending a request to receiving a response.

Components of Latency

Latency is not a single value. It is the sum of multiple delays:

  1. Propagation delay: Time for signals to travel through the medium. Light in fiber travels at ~200,000 km/s. A cross-Atlantic request (6,000 km) takes ~30ms just for propagation.
  2. Transmission delay: Time to push bits onto the wire. Depends on packet size and link bandwidth.
  3. Processing delay: Time for routers, load balancers, and servers to process packets.
  4. Queuing delay: Time spent waiting in buffers when components are busy.

Measuring Latency

Latency is typically measured using percentiles:

MetricDescription
p50 (median)50% of requests are faster than this
p9595% of requests are faster than this
p9999% of requests are faster than this
p99.999.9% of requests are faster than this

Why percentiles matter: Average latency hides outliers. A system with 10ms average might have p99 of 500ms, meaning 1% of users experience terrible performance.

What Affects Latency?

FactorImpact
Geographic distanceMore distance = more propagation delay
Network congestionCauses queuing delays
Server loadIncreases processing time
Database queriesSlow queries add latency
DNS resolutionCold requests need DNS lookup
TLS handshakeAdds 1-2 round trips

Reducing Latency

  • Use CDNs: Serve content from edge locations closer to users
  • Caching: Eliminate round trips by caching at multiple layers
  • Connection pooling: Avoid repeated connection setup
  • Database optimization: Add indexes, optimize queries
  • Geographic distribution: Deploy servers closer to users
  • Protocol optimization: Use HTTP/2, HTTP/3 (QUIC)

Throughput

Throughput is the amount of work completed per unit of time. It measures volume.

For web systems, throughput is often expressed as requests per second (RPS) or transactions per second (TPS).

Throughput vs Bandwidth

A common confusion: bandwidth is theoretical maximum capacity, while throughput is actual achieved rate.

Scroll
MetricDefinitionExample
BandwidthMaximum possible data transfer rate1 Gbps network link
ThroughputActual data transfer achieved600 Mbps actual transfer

You can never have throughput higher than bandwidth, but throughput is almost always lower due to:

  • Protocol overhead (headers, acknowledgments)
  • Congestion and packet loss
  • Processing limitations
  • Inefficient resource utilization

Calculating Throughput

For a single-threaded system:

For a multi-threaded system:

What Limits Throughput?

The bottleneck determines maximum throughput. A system is only as fast as its slowest component.

Improving Throughput

  • Horizontal scaling: Add more servers
  • Vertical scaling: Add more CPU, memory
  • Async processing: Do not block on slow operations
  • Batching: Process multiple items together
  • Caching: Reduce work by reusing results
  • Connection pooling: Reuse expensive connections
  • Load balancing: Distribute work evenly

Bandwidth

Bandwidth is the maximum rate at which data can be transferred. It measures capacity.

Bandwidth is typically expressed in bits per second (bps): Kbps, Mbps, Gbps.

Types of Bandwidth

TypeDescription
Network bandwidthCapacity of network links (1 Gbps Ethernet)
Memory bandwidthRate of data transfer to/from RAM (DDR4: ~25 GB/s)
Disk bandwidthRead/write speed of storage (SSD: ~500 MB/s)
Bus bandwidthInternal data transfer rate (PCIe 4.0: ~64 GB/s)

Bandwidth-Delay Product

An important concept that connects bandwidth and latency:

BDP represents how much data can be "in flight" at any moment.

Example:

  • Bandwidth: 1 Gbps = 125 MB/s
  • Latency: 100ms (coast-to-coast US)
  • BDP: 125 MB/s × 0.1s = 12.5 MB

This means 12.5 MB of data can be traveling through the pipe at any instant. If your TCP window size is smaller than BDP, you will not fully utilize available bandwidth.