Every application needs to store data somewhere. But not all data is the same, and not all storage needs are equal. A social media app storing billions of posts has very different requirements from a banking system tracking account balances.
This diversity in data and access patterns has led to a rich ecosystem of database types, each optimized for specific use cases. Understanding these types is not just academic. The database you choose fundamentally shapes your application's architecture, performance characteristics, and operational complexity.
In this chapter, you will learn:
- The major categories of databases and what makes each unique
- Key characteristics and trade-offs of different database types
- How to match database types to specific use cases
- Real-world examples of where each type excels
- How to think about database selection in system design interviews
This chapter provides the landscape. The following chapters will dive deeper into specific comparisons and concepts like ACID transactions and data modeling that cut across database types.
Why Different Database Types Exist
In the early days of computing, relational databases dominated. Oracle, MySQL, PostgreSQL, SQL Server. They handled everything from inventory systems to websites. For many use cases, they still do.
But as the internet scaled, new problems emerged that relational databases struggled with:
- Massive scale: Social networks with billions of users could not fit on a single machine
- Flexible schemas: Agile development needed databases that did not require schema migrations for every change
- Specialized access patterns: Full-text search, graph traversals, and time-series analytics needed specialized data structures
- Geographic distribution: Global applications needed data replicated across continents with low latency
These pressures birthed the NoSQL movement and a proliferation of specialized databases. Today, the question is not "which database" but "which databases" since many systems use multiple types together.
When asked about database selection, do not jump to a specific technology. First, understand the data model, access patterns, scale requirements, and consistency needs. The right database type emerges from these requirements.
The Database Landscape
Let us explore each category in detail.
Relational Databases (RDBMS)
Relational databases store data in tables with rows and columns. They use SQL (Structured Query Language) for queries and enforce schemas that define the structure of data. Relationships between tables are expressed through foreign keys.
Key Characteristics
How Data is Organized
Strengths
- ACID guarantees: Transactions ensure data integrity, critical for financial systems
- Complex queries: Joins across tables enable sophisticated analytics
- Mature ecosystem: Decades of tooling, optimization, and operational knowledge
- Strong consistency: Reads always return the latest committed data
Weaknesses
- Scaling limitations: Sharding relational databases is complex and often sacrifices some SQL features
- Schema rigidity: Schema changes can be painful and require migrations
- Not ideal for: Hierarchical data, sparse data, or highly variable schemas
Popular Relational Databases
When to Choose Relational
- Your data has clear relationships and benefits from joins
- You need ACID transactions (financial systems, inventory)
- Your schema is relatively stable
- You need complex queries with aggregations and subqueries
- Your scale fits on one server or a few replicas
Key-Value Stores
Key-value stores are the simplest database type. They store data as a collection of key-value pairs, like a giant hash map. You look up values by their key, and that is basically it.
Key Characteristics
How Data is Organized
Strengths
- Blazing fast: O(1) lookups, often sub-millisecond latency
- Simple operations: Easy to understand and use
- Horizontal scaling: Partition by key for near-linear scalability
- Flexible values: Store whatever you need in the value
Weaknesses
- Limited queries: Cannot query by value, only by exact key
- No relationships: No joins or references between entries
- No complex operations: Aggregations require reading all data
Popular Key-Value Stores
When to Choose Key-Value
- Caching frequently accessed data
- Session storage
- User preferences and settings
- Shopping carts
- Real-time leaderboards and counters
- Any use case where you always know the key
Key-value stores are often the first line of defense before hitting a database. When discussing system design, consider adding a caching layer with Redis or Memcached to reduce load on your primary database.
Document Databases
Document databases store data as semi-structured documents, typically JSON or BSON. Each document is self-contained and can have a different structure from other documents in the same collection.
Key Characteristics
How Data is Organized
Notice how each document can have different fields. The laptop has cpu and ram, while the phone has screen and colors. This flexibility is a core strength of document databases.
Strengths
- Flexible schema: Add fields without migrations, great for evolving data models
- Natural mapping: Documents map directly to objects in most programming languages
- Nested data: Embed related data in a single document, avoiding joins
- Developer productivity: Less impedance mismatch between code and database
Weaknesses
- No joins: Cross-document queries require multiple round trips or denormalization
- Potential data duplication: Embedding data leads to redundancy
- Schema management: Flexibility can become chaos without discipline
Popular Document Databases
When to Choose Document
- Your data is naturally hierarchical or nested
- Schema evolves frequently
- You want to avoid complex joins by embedding related data
- Building content management systems, catalogs, or user profiles
- Rapid prototyping where schema is not finalized
Wide-Column Stores
Wide-column stores organize data by columns rather than rows. They excel at handling massive amounts of data across many servers with high write throughput.
Key Characteristics
How Data is Organized
Notice that user_1 has email, last_login, and pref_theme, while user_2 only has name and phone. Each row can have different columns.
Strengths
- Massive scale: Built for petabytes of data across thousands of nodes
- High write throughput: Append-only writes, no read-before-write
- Flexible columns: Rows can have different columns without schema changes
- Geographic distribution: Multi-datacenter replication built-in
Weaknesses
- Limited query flexibility: Queries must follow primary key patterns
- No joins: Data must be denormalized
- Operational complexity: Running at scale requires expertise
- Eventual consistency: Reads may return stale data
Popular Wide-Column Stores
When to Choose Wide-Column
- Massive scale (billions of rows, petabytes of data)
- High write throughput requirements
- Time-series data, event logging, audit trails
- Multi-datacenter deployments with tunable consistency
- IoT data ingestion at scale
Wide-column stores like Cassandra are often mentioned for time-series data and messaging systems. If the interviewer mentions "millions of writes per second" or "multi-region availability," consider wide-column stores.
Graph Databases
Graph databases model data as nodes (entities) and edges (relationships). They are optimized for traversing connections, making them ideal for social networks, recommendation engines, and fraud detection.
Key Characteristics
How Data is Organized
Queries naturally express traversals: "Find all friends of Alice who know Python and work at the same company."
Strengths
- Relationship-centric: Relationships are first-class citizens, not foreign keys
- Fast traversals: Multi-hop queries are efficient, unlike SQL joins
- Intuitive modeling: Matches how we think about connected data
- Pattern matching: Find complex patterns in connected data
Weaknesses
- Not for everything: Simple CRUD without relationships does not benefit
- Scaling challenges: Graph partitioning is an open research problem
- Learning curve: Graph query languages are different from SQL
Popular Graph Databases
When to Choose Graph
- Social networks (friends, followers, connections)
- Recommendation engines (users who liked X also liked Y)
- Fraud detection (find suspicious transaction patterns)
- Knowledge graphs and semantic search
- Network and IT operations (dependencies, impact analysis)
- Supply chain and logistics optimization
Specialized Databases
Beyond the main categories, specialized databases optimize for specific data types and access patterns.
Time-Series Databases
Optimized for timestamped data points, common in monitoring, IoT, and financial applications.
Use when: You are storing metrics, sensor data, stock prices, or any time-ordered data that needs time-based aggregations.
Search Engines
Optimized for full-text search, relevance ranking, and faceted navigation.
Use when: Users need to search through text, you need faceted search (filter by category, price range), or you need log aggregation and analysis.
Vector Databases
Store and search high-dimensional vectors, essential for AI applications like semantic search and recommendations.
Use when: Building semantic search, recommendation systems, image similarity, or RAG (Retrieval-Augmented Generation) applications.
Choosing the Right Database
Decision Framework
Quick Reference Table
Polyglot Persistence
Modern systems often use multiple database types, each for what it does best:
This approach is called polyglot persistence. It adds complexity but lets you use the right tool for each job.
Summary
The database landscape has evolved from relational-database-for-everything to a rich ecosystem of specialized tools:
- Relational databases remain the workhorse for structured data with complex queries and ACID requirements
- Key-value stores provide blazing-fast lookups for caching and simple data
- Document databases offer schema flexibility for evolving applications
- Wide-column stores handle massive scale and high write throughput
- Graph databases excel at relationship-heavy data and traversals
- Specialized databases optimize for specific patterns like time-series, search, and vectors
The key insight is that database selection is not about finding the "best" database. It is about matching the database characteristics to your specific requirements. Often, the answer is multiple databases working together.