AlgoMaster Logo

External Sort

Last Updated: May 29, 2026

Ashish

Ashish Pratap Singh

10 min read

Every sorting algorithm covered so far assumes the data fits in RAM. External Sort is the algorithm that runs when it does not. Sorting a 1 TB log file on a machine with 16 GB of memory, or merging the per-shard output of a distributed MapReduce job, or processing a CSV that is too large for pandas to load, all reduce to External Sort.

The algorithm is conceptually simple and has been the standard answer since the 1950s, when "external" meant "magnetic tape." The two phases:

  1. Split-and-sort. Read the input in chunks that fit in memory, sort each chunk with any in-memory algorithm, write the sorted chunk back to disk as a temporary file.
  2. Multi-way merge. Read from all the sorted chunks simultaneously, use a min-heap to merge them into the final sorted output.

This chapter covers the algorithm, the heap-based merge that makes it efficient, the I/O complexity, the tuning knobs that matter in practice (chunk size, fan-in), and the systems that ship External Sort variants internally.

Premium Content

Subscribe to unlock full access to this content and more premium articles.