Last Updated: May 29, 2026
Every sorting algorithm covered so far assumes the data fits in RAM. External Sort is the algorithm that runs when it does not. Sorting a 1 TB log file on a machine with 16 GB of memory, or merging the per-shard output of a distributed MapReduce job, or processing a CSV that is too large for pandas to load, all reduce to External Sort.
The algorithm is conceptually simple and has been the standard answer since the 1950s, when "external" meant "magnetic tape." The two phases:
This chapter covers the algorithm, the heap-based merge that makes it efficient, the I/O complexity, the tuning knobs that matter in practice (chunk size, fan-in), and the systems that ship External Sort variants internally.