AlgoMaster Logo

Erasure Coding

Last Updated: May 26, 2026

Ashish

Ashish Pratap Singh

Low Priority
11 min read

Replication is the simplest way to protect data: store multiple copies. If one copy is lost, read another one.

The problem is cost. With 3x replication, every 1 TB of user data consumes 3 TB of raw storage. That 200% overhead is acceptable for hot data, but painful for large archives, backups, data lakes, object stores, and cold file systems.

Erasure coding provides fault tolerance with less storage. Instead of storing full copies, it splits data into fragments, computes additional parity fragments, and stores all fragments across different failure domains. If some fragments are lost, the original data can be reconstructed from the surviving fragments.

The trade-off is not subtle: erasure coding saves storage, but makes writes, degraded reads, and repairs more expensive.

1. The Problem with Replication

Premium Content

This content is for premium members only.