Last Updated: May 26, 2026
Replication is the simplest way to protect data: store multiple copies. If one copy is lost, read another one.
The problem is cost. With 3x replication, every 1 TB of user data consumes 3 TB of raw storage. That 200% overhead is acceptable for hot data, but painful for large archives, backups, data lakes, object stores, and cold file systems.
Erasure coding provides fault tolerance with less storage. Instead of storing full copies, it splits data into fragments, computes additional parity fragments, and stores all fragments across different failure domains. If some fragments are lost, the original data can be reconstructed from the surviving fragments.
The trade-off is not subtle: erasure coding saves storage, but makes writes, degraded reads, and repairs more expensive.