Last Updated: January 12, 2026
In the previous chapter, we explored key-value stores and their blazing-fast performance for simple lookups.
But what happens when you need that kind of scale for more structured data? What if you have billions of rows, each with potentially hundreds of columns, and you need to write millions of records per second while still querying efficiently?
This is the domain of wide-column databases. They originated from Google's Bigtable, created to handle the scale of Google's web indexing and analytics.
The ideas from Bigtable inspired Apache HBase and Apache Cassandra, which now power some of the largest systems in the world.
Wide-column databases sit somewhere between key-value stores and relational databases.
Like key-value stores, they partition data by key for horizontal scaling. But unlike simple key-value pairs, they organize data into column families, allowing you to store and retrieve groups of related columns efficiently.
The name "wide-column" can be confusing. It does not mean tables with many columns. It refers to a data model where each row can have a different set of columns, and columns are grouped into families that determine storage layout.
A wide-column table consists of:
Notice several key characteristics:
User 1001 has an email; user 1002 has a phone. User 1001 has three preferences; user 1002 has one. No storage is wasted on NULL values because columns are only stored when they have values.
You define column families at table creation. In the example, basic and preferences are the families.
You can add new columns (like phone or timezone) without schema changes. Different rows can have completely different columns within a family.
All columns in a family are stored together on disk. Reading the basic family does not require reading preferences.
| Aspect | Relational | Wide-Column |
|---|---|---|
| Schema | Fixed columns for all rows | Column families fixed, columns dynamic |
| Sparse data | NULL values stored | Only non-null values stored |
| Storage | Row-oriented (all columns together) | Column-family oriented |
| Queries | SQL with joins | Limited queries by row key and columns |
| Transactions | ACID across tables | Limited to single partition |
You can think of wide-column stores as two-dimensional key-value stores:
This two-level structure allows efficient access to specific columns without reading the entire row, which is valuable when rows have many columns.
Wide-column databases are optimized for write throughput. The key innovation is the Log-Structured Merge-tree (LSM tree), which converts random writes into sequential writes.
Traditional databases (like relational) use B-trees, which require updating data in place. This means random I/O for each write. LSM trees take a different approach:
Reads are more complex because data might be in multiple locations:
Read amplification: In the worst case, a read might need to check multiple SSTables. Compaction reduces this by merging SSTables, but there is inherent trade-off between write and read performance.
Each column family is stored separately, like a mini key-value store:
This separation has important implications:
basic columns does not touch preferences storageApache Cassandra is the most widely deployed wide-column database. It was originally developed at Facebook for inbox search and is now used by Netflix, Apple, Uber, and many others.
Cassandra uses a peer-to-peer architecture with no single point of failure:
Key characteristics:
Cassandra allows you to choose consistency on a per-query basis:
| Level | Write Behavior | Read Behavior |
|---|---|---|
ONE | Wait for 1 replica | Read from 1 replica |
QUORUM | Wait for majority | Read from majority |
ALL | Wait for all replicas | Read from all replicas |
LOCAL_QUORUM | Majority in local datacenter | Majority in local datacenter |
Strong consistency formula: If R + W > N, you get strong consistency, where:
For example, with RF=3 and QUORUM reads and writes (2 each), 2 + 2 = 4 > 3, so reads will always see the latest write.
Cassandra uses CQL, which looks like SQL but has significant limitations:
Key concepts:
(user_id) determines which node stores the dataactivity_date, activity_time determine sort order within a partition| SQL Feature | CQL Support |
|---|---|
| Joins | Not supported |
| Subqueries | Not supported |
| Aggregations | Limited (COUNT, SUM, AVG on single partition) |
| WHERE on non-key columns | Requires secondary index or ALLOW FILTERING |
| OR conditions | Not supported |
| ORDER BY arbitrary columns | Only clustering columns, in defined order |
This is not a limitation of Cassandra. It is by design. These restrictions ensure queries can be executed efficiently in a distributed system.
Apache HBase is another major wide-column database, built on top of the Hadoop ecosystem. It was inspired directly by Google's Bigtable paper.
Unlike Cassandra's peer-to-peer design, HBase uses a master-replica architecture:
| Aspect | HBase | Cassandra |
|---|---|---|
| Architecture | Master-replica | Peer-to-peer |
| Consistency | Strong (per region) | Tunable |
| Storage | HDFS | Local disk |
| Ecosystem | Hadoop integration | Standalone |
| Multi-datacenter | Complex | Native support |
| Write availability | May be impacted by master | Always available |
| Use case | Hadoop analytics + real-time | High availability, global scale |
Choose HBase when:
Choose Cassandra when:
Wide-column data modeling is fundamentally different from relational modeling. You design tables around your queries, not around entities.
In relational databases, you normalize data and then figure out queries. In wide-column databases, you start with queries and design tables to serve them:
You will often have multiple tables storing the same data in different structures to serve different queries efficiently.
Wide-column databases excel at time-series data. A common pattern:
(sensor_id, date) keeps each day's readings togetherYou must denormalize data to avoid joins. Store data redundantly in multiple tables:
Trade-off: More storage, more writes, but reads are fast and self-contained.
Prevent hot partitions by distributing data:
This spreads a user's data across multiple partitions, enabling parallel reads and preventing any single partition from becoming too large.
Some wide-column databases support these to reduce denormalization:
Cassandra Materialized Views:
The database automatically keeps the view in sync with the base table. However, this adds write latency and can cause consistency issues.
Secondary Indexes:
Secondary indexes have performance implications. They work well for low-cardinality columns but poorly for high-cardinality or frequently updated columns.
Compaction is the process of merging SSTables to reduce read amplification and reclaim space from deleted data. Different strategies optimize for different workloads.
Groups SSTables of similar size and merges them:
Pros: Good for write-heavy workloads, low write amplification.
Cons: Space amplification (may need 2x storage during compaction), can lead to many SSTables.
Organizes SSTables into levels with non-overlapping key ranges:
Pros: Bounded read amplification (check fewer SSTables), consistent read latency. Cons: Higher write amplification (data may be rewritten multiple times as it moves through levels).
Optimized for time-series data where old data is rarely modified:
Each time window gets its own SSTable. Old windows are never compacted together, making TTL-based deletion efficient (just delete the old SSTable file).
Pros: Efficient for time-series, great for TTL workloads. Cons: Requires data to be time-ordered, poor for updates to old data.
Partitions should be sized appropriately:
| Size | Implication |
|---|---|
| Too small (< 10 KB) | Overhead dominates, many partitions to manage |
| Optimal (10 KB - 100 MB) | Good balance of distribution and efficiency |
| Too large (> 100 MB) | Hot spots, slow reads, compaction issues |
Rule of thumb: Aim for partitions under 100 MB and less than 100,000 rows.
Uneven data distribution causes hot spots:
| Optimization | Write Impact | Read Impact |
|---|---|---|
| More column families | Slightly slower (multiple writes) | Faster (read only needed data) |
| More denormalization | More writes | Faster reads |
| Higher consistency level | Slower writes | Slower reads, but fresher data |
| Larger partitions | Faster writes | Slower reads |
Wide-column databases are the right choice when:
Wide-column databases may not fit when:
Wide-column databases extend the key-value model to handle structured data at massive scale:
| Aspect | Wide-Column Approach |
|---|---|
| Data model | Row key + column families + dynamic columns |
| Storage | LSM trees, column-family oriented |
| Writes | Optimized via sequential I/O |
| Reads | May check multiple SSTables (read amplification) |
| Consistency | Tunable (Cassandra) or strong (HBase) |
| Scaling | Horizontal via consistent hashing or regions |
The next chapter explores graph databases, which take a fundamentally different approach by making relationships first-class citizens.