What is WhatsApp?
WhatsApp is a widely used instant messaging application that enables real-time communication between users through instant message delivery. Users can send text messages, media files, and other content to individuals or groups, with messages delivered within milliseconds.
The core idea is deceptively simple: User A sends a message, and User B receives it instantly. However, achieving this at scale with billions of users, while ensuring message delivery guarantees, handling offline users, and supporting group conversations, introduces significant distributed systems challenges.
Other Popular Examples: Facebook Messenger, Telegram, Signal, WeChat
In this chapter, we will dive into the high-level design of a messaging system like WhatsApp.
This problem is a favorite in system design interviews because it tests your understanding of real-time communication, connection management, message ordering, and handling the complexities of a truly global-scale system.
Let's start by clarifying the requirements.
1. Clarifying Requirements
Before diving into the design, it's important to ask thoughtful questions to uncover hidden assumptions, clarify ambiguities, and define the system's scope more precisely.
Here is an example of how a discussion between the candidate and the interviewer might unfold:
Candidate: "What is the expected scale? How many users and messages per day should the system support?"
Interviewer: "Let's design for 500 million daily active users (DAU) sending an average of 40 messages per day."
Candidate: "Should we support only one-on-one messaging, or also group chats?"
Interviewer: "Both. Group chats should support up to 500 members."
Candidate: "What types of content should messages support? Text only, or also media like images and videos?"
Interviewer: "Focus on text messages for the core design. You can mention media handling at a high level, but detailed media processing is out of scope."
Candidate: "Do we need to show online/offline status and typing indicators?"
Interviewer: "Yes, presence indicators (online/offline/last seen) are important. Typing indicators are nice-to-have."
Candidate: "What about message delivery guarantees? Should users see read receipts?"
Interviewer: "Yes. Users should see when their message is delivered and when it's read. Messages should never be lost."
Candidate: "Should messages be stored permanently, or can they expire?"
Interviewer: "Messages should be stored until explicitly deleted by the user. We need to support message history sync across devices."
Candidate: "What about end-to-end encryption?"
Interviewer: "You can mention it conceptually, but detailed cryptographic implementation is out of scope."
After gathering the details, we can summarize the key system requirements.
1.1 Functional Requirements
- One-on-One Chat: Users can send and receive messages in real-time with other users.
- Group Chat: Users can create groups and send messages to multiple recipients (up to 500 members).
- Message Delivery Status: Users can see delivery receipts (sent, delivered, read).
- Online Presence: Users can see if their contacts are online, offline, or their last seen time.
- Message History: Users can access their message history and sync across multiple devices.
- Push Notifications: Offline users receive push notifications for new messages.
Out of Scope:
- Media Messages: Images, videos, voice notes (mentioned conceptually only).
- Voice/Video Calls: Real-time audio and video communication.
- End-to-End Encryption: Detailed cryptographic implementation.
- Stories/Status Updates: Ephemeral content sharing.
1.2 Non-Functional Requirements
- Low Latency: Messages should be delivered within milliseconds for online users. Target: p99 < 100ms for message delivery.
- High Availability: The system must be highly available (99.99% uptime). Users expect messaging to work 24/7.
- Reliability: Messages must never be lost. Once sent, a message should eventually be delivered, even if the recipient is offline.
- Scalability: Support 500M+ daily active users and 20B+ messages per day.
- Ordering: Messages within a conversation should appear in the correct order.
- Consistency: Eventually consistent for presence, strong consistency for message delivery.
2. Back-of-the-Envelope Estimation
To understand the scale of our system, let's make some reasonable assumptions.
Assumptions
- Daily Active Users (DAU): 500 million
- Messages per user per day: 40
- Average message size: 100 bytes (text content + metadata)
- Average group size: 20 members
- Percentage of group messages: 30%
Message Throughput
- Total messages per day: 500M users x 40 messages = 20 billion messages/day
- Average messages per second: 20B / 86,400 = ~230,000 messages/second
- Peak load (3x factor): ~700,000 messages/second
Connection Load
- Concurrent connections: If 10% of DAU are online at any time = 50 million concurrent connections
- Peak concurrent connections: ~100 million
Each connection requires maintaining a persistent WebSocket, which is a significant infrastructure challenge.
Storage (Per Day)
- Message storage: 20B messages x 100 bytes = 2 TB/day
- Annual storage: 2 TB x 365 = 730 TB/year (just for messages)
Bandwidth
- Incoming bandwidth: 230K msg/sec x 100 bytes = ~23 MB/sec (inbound)
- Outgoing bandwidth: Higher due to group message fanout. For a message sent to a group of 20, it needs to be delivered 20 times.
3. Core APIs
The messaging system needs a minimal but powerful set of APIs. Below are the core APIs required for the basic functionality.
1. Send Message
Endpoint: WebSocket message or POST /messages
Sends a message from one user to another user or group.
Request Parameters:
- sender_id (required): ID of the user sending the message.
- recipient_id (required): ID of the recipient user or group.
- message_type (required): Type of recipient (
user or group). - content (required): Message content (text).
- client_message_id (required): Client-generated unique ID for deduplication.
- timestamp (required): Client-side timestamp when message was created.
Sample Response:
- message_id: Server-generated unique message ID.
- status: Current status (
sent, delivered, read). - server_timestamp: Server-side timestamp for ordering.
Error Cases:
400 Bad Request: Invalid message format or missing required fields.403 Forbidden: User not authorized to send to this recipient.429 Too Many Requests: Rate limit exceeded.
2. Fetch Messages
Endpoint: GET /conversations/{conversation_id}/messages
Retrieves message history for a conversation.
Request Parameters:
- conversation_id (required): ID of the conversation.
- cursor (optional): Pagination cursor for fetching older messages.
- limit (optional): Number of messages to fetch (default: 50, max: 100).
Sample Response:
- messages: Array of message objects with id, sender, content, timestamp, status.
- next_cursor: Cursor for fetching the next page.
- has_more: Boolean indicating if more messages exist.
3. Update Message Status
Endpoint: POST /messages/{message_id}/status
Updates the delivery status of a message (delivered, read).
Request Parameters:
- message_id (required): ID of the message.
- status (required): New status (
delivered or read). - timestamp (required): When the status change occurred.
4. Get User Presence
Endpoint: GET /users/{user_id}/presence
Gets the online status and last seen time of a user.
Sample Response:
- user_id: ID of the user.
- status: Current status (
online, offline). - last_seen: Timestamp of last activity (if offline).
4. High-Level Design
At a high level, our system must satisfy three core requirements:
- Real-time Message Delivery: Messages should reach online recipients instantly.
- Offline Message Handling: Messages for offline users should be stored and delivered when they come online.
- Group Message Distribution: A single message should be efficiently distributed to all group members.
The key insight is that messaging is fundamentally a push-based system. Unlike request-response APIs, we need to maintain persistent connections with clients to push messages as they arrive.
Instead of presenting the full architecture at once, we'll build it incrementally by addressing one requirement at a time. This approach is easier to follow and mirrors how you would explain the design in an interview.
4.1 Requirement 1: Real-time One-on-One Messaging
Let's start with the core use case: User A sends a message to User B, who is currently online.
Components Needed
Chat Servers
These are stateful servers that maintain persistent WebSocket connections with clients. Each chat server handles thousands of concurrent connections.
Responsibilities:
- Maintain WebSocket connections with clients
- Receive messages from senders
- Route messages to recipients (directly or via other chat servers)
- Handle connection lifecycle (connect, disconnect, heartbeat)
Session Service
A fast lookup service that maps user IDs to their currently connected chat server.
Responsibilities:
- Track which chat server each online user is connected to
- Update mappings when users connect/disconnect
- Provide O(1) lookup for message routing
This is typically implemented using Redis for its speed and pub/sub capabilities.
Message Service
Handles message persistence and retrieval.
Responsibilities:
- Persist messages to the database
- Generate server-side message IDs and timestamps
- Handle message status updates
Flow: Sending a One-on-One Message
- User A sends a message through their WebSocket connection to Chat Server 1.
- Chat Server 1 sends the message to the Message Service for persistence.
- Message Service stores the message in the database and returns a server-generated message ID and timestamp.
- Chat Server 1 queries the Session Service to find which server User B is connected to.
- Session Service returns that User B is connected to Chat Server 2.
- Chat Server 1 forwards the message to Chat Server 2 (via internal RPC or message queue).
- Chat Server 2 pushes the message to User B through their WebSocket connection.
- User B's client sends an acknowledgment back.
- The delivery status is updated, and User A sees the "delivered" checkmark.
4.2 Requirement 2: Handling Offline Users
What happens when User B is offline? We need to store the message and deliver it when they come online.
Additional Components Needed
Message Queue
For offline users, messages are queued for later delivery.
Responsibilities:
- Store messages for offline users
- Ensure messages are delivered in order when user comes online
- Handle retry logic for failed deliveries
Push Notification Service
Sends push notifications to offline users' devices.
Responsibilities:
- Integrate with APNs (iOS) and FCM (Android)
- Send notifications for new messages
- Handle notification preferences and quiet hours
Flow: Message to Offline User
- User A sends a message to User B.
- Chat Server 1 persists the message in the database via Message Service.
- Chat Server 1 queries Session Service and finds User B is offline.
- The message is added to User B's message queue (pending delivery).
- Push Notification Service sends a push notification to User B's device.
- When User B comes online:
- They establish a WebSocket connection to a Chat Server
- The server fetches all pending messages from the queue
- Messages are delivered in order
- Queue entries are cleared after successful delivery
4.3 Requirement 3: Group Messaging
Group messaging introduces a fanout challenge. When a user sends a message to a group of 100 members, the message needs to be delivered to all 100 recipients.
Approaches to Group Message Fanout
Approach 1: Sender-Side Fanout
The sender's chat server handles delivering to all group members.
Pros: Simple to implementCons: Puts heavy load on a single server; doesn't scale for large groups
Approach 2: Message Queue Fanout
Use a message queue with pub/sub capabilities (like Kafka) to distribute the work.
How it works:
- Sender publishes message to a group topic
- Multiple consumers process the message
- Each consumer handles delivery to a subset of group members
Pros: Scales horizontally; work is distributedCons: Adds latency; more complex infrastructure
Approach 3: Hybrid Approach (Recommended)
- Small groups (< 100 members): Direct fanout from sender's server
- Large groups (100+ members): Use message queue for distributed fanout
Flow: Group Message Delivery
- User A sends a message to Group G.
- Chat Server 1 persists the message with
group_id in the database. - Chat Server 1 queries the Group Service to get the list of group members.
- For each member, it queries Session Service to find their chat server.
- Messages are batched by destination chat server and forwarded.
- Each chat server delivers to its connected group members.
- Offline members' messages go to the message queue for later delivery.
4.4 Putting It All Together
Here's the complete architecture combining all requirements:
| Component | Purpose |
|---|
| Load Balancer | Distributes WebSocket connections across chat servers |
| Chat Servers | Maintain persistent connections, route messages in real-time |
| API Gateway | Handles REST API requests for non-real-time operations |
| Session Service (Redis) | Maps users to their connected chat server |
| Message Service | Handles message persistence and retrieval |
| Group Service | Manages group membership and metadata |
| Message Queue (Kafka) | Buffers messages for offline users and handles fanout |
| Push Notification Service | Sends push notifications via APNs/FCM |
| Message Database | Stores message history (Cassandra for scale) |
| User Database | Stores user profiles and relationships (PostgreSQL) |
5. Database Design
5.1 SQL vs NoSQL
To choose the right database for messages, let's consider the access patterns:
- Write-heavy workload: 20 billion messages per day
- Simple queries: Fetch messages by conversation, ordered by time
- No complex joins: Messages are self-contained with sender/recipient IDs
- Time-series nature: Recent messages are accessed far more than old ones
- High availability required: Users expect messaging to always work
Given these points, a wide-column NoSQL database like Apache Cassandra or ScyllaDB is ideal for message storage due to:
- Excellent write performance
- Linear horizontal scalability
- Time-series data optimization
- Tunable consistency levels
For user and group data, a relational database like PostgreSQL works well due to the need for transactions and complex queries.
5.2 Database Schema
1. Messages Table (Cassandra)
Stores all messages with partition key optimized for conversation-based queries.
| Field | Type | Description |
|---|
conversation_id | UUID (Partition Key) | Unique identifier for the conversation |
message_id | TimeUUID (Clustering Key) | Time-based UUID for ordering |
sender_id | UUID | ID of the message sender |
content | Text | Message content |
message_type | Text | Type: text, image, video |
status | Text | Delivery status: sent, delivered, read |
created_at | Timestamp | Server timestamp |
Partition Key: conversation_id ensures all messages in a conversation are stored together.
Clustering Key: message_id (TimeUUID) ensures messages are sorted by time within each partition.
2. User Conversations Table (Cassandra)
Index table to quickly find all conversations for a user.
| Field | Type | Description |
|---|
user_id | UUID (Partition Key) | User ID |
conversation_id | UUID (Clustering Key) | Conversation ID |
last_message_at | Timestamp | Time of last message |
unread_count | Integer | Number of unread messages |
last_message_preview | Text | Preview of last message |
3. Groups Table (PostgreSQL)
Stores group metadata.
| Field | Type | Description |
|---|
group_id | UUID (PK) | Unique group identifier |
name | VARCHAR(100) | Group name |
creator_id | UUID (FK) | User who created the group |
created_at | Timestamp | Creation time |
member_count | Integer | Number of members |
4. Group Members Table (PostgreSQL)
Maps users to groups.
| Field | Type | Description |
|---|
group_id | UUID (PK, FK) | Group ID |
user_id | UUID (PK, FK) | User ID |
role | VARCHAR(20) | Role: admin, member |
joined_at | Timestamp | When user joined |
6. Design Deep Dive
Now that we have the high-level architecture and database schema in place, let's dive deeper into some critical design choices.
6.1 WebSocket vs Long Polling vs Server-Sent Events
Real-time message delivery requires maintaining persistent connections between clients and servers. Let's compare the options.
Approach 1: HTTP Long Polling
The client makes an HTTP request, and the server holds it open until new data is available (or timeout).
How It Works
- Client sends HTTP request: "Any new messages?"
- Server holds the connection open (up to 30-60 seconds)
- When a message arrives, server responds immediately
- Client processes the response and immediately makes a new request
- If timeout occurs with no messages, server responds empty and client reconnects
Pros
- Works through all firewalls and proxies
- Simple to implement on the server side
- Compatible with existing HTTP infrastructure
Cons
- High overhead: new TCP connection for each polling cycle
- Latency: messages wait until next poll cycle
- Server resource waste: holding many idle connections
Approach 2: Server-Sent Events (SSE)
A one-way channel where the server can push data to the client over a single HTTP connection.
How It Works
- Client opens a persistent HTTP connection
- Server sends events as they occur
- Connection stays open indefinitely
- Client sends messages via separate HTTP POST requests
Pros
- Lower overhead than long polling
- Automatic reconnection built into the protocol
- Works with HTTP/2 for multiplexing
Cons
- Unidirectional: requires separate channel for client-to-server messages
- Limited browser support for certain features
- Not ideal for bidirectional real-time communication
Approach 3: WebSocket (Recommended)
A full-duplex, bidirectional communication channel over a single TCP connection.
How It Works
- Client initiates WebSocket handshake via HTTP upgrade request
- Server accepts and upgrades the connection
- Both sides can send messages at any time
- Connection stays open until explicitly closed
Pros
- True bidirectional: Both client and server can send messages anytime
- Low latency: No HTTP overhead after initial handshake
- Efficient: Single TCP connection for all messages
- Real-time: Messages delivered instantly
Cons
- Requires WebSocket-aware load balancers
- Stateful connections complicate horizontal scaling
- Connection management overhead (heartbeats, reconnection)
Summary and Recommendation
| Approach | Latency | Efficiency | Complexity | Best For |
|---|
| Long Polling | High | Low | Low | Legacy systems, simple notifications |
| SSE | Medium | Medium | Medium | One-way streaming (news feeds, stock tickers) |
| WebSocket | Low | High | High | Bidirectional real-time apps (chat, gaming) |
Recommendation: Use WebSocket for messaging systems. The bidirectional, low-latency nature is essential for chat applications. Implement fallback to long polling for environments where WebSocket is blocked.
6.2 Message Delivery Guarantees
Users expect three levels of visibility into message status:
- Sent (single checkmark): Message reached the server
- Delivered (double checkmark): Message reached recipient's device
- Read (blue checkmarks): Recipient opened and viewed the message
How Delivery Confirmation Works
Ensuring At-Least-Once Delivery
Messages must never be lost, even if servers crash or networks fail.
Client-Side Retry with Idempotency
- Client generates a unique
client_message_id before sending - Client sends message to server
- If no ACK received within timeout, client retries with same
client_message_id - Server uses
client_message_id to deduplicate - Duplicate messages are acknowledged but not stored twice
Server-Side Persistence Before Acknowledgment
Critical rule: Never acknowledge a message until it's persisted.
- Server receives message
- Server writes to database
- Only after successful write: Server sends ACK to client
If the server crashes between receiving and persisting, the client will retry.
Handling Out-of-Order Messages
Network conditions can cause messages to arrive out of order. Solutions:
- Sequence numbers per conversation: Each message gets an incrementing sequence number
- Server-side timestamp: Server assigns authoritative timestamp for ordering
- Client-side reordering: Client sorts messages by sequence number before display
6.3 Presence System (Online/Offline Status)
Presence indicates whether a user is currently online, offline, or their last seen time.
Challenges
- Scale: With 50 million concurrent users, presence updates are frequent
- Fanout: A user's presence change needs to reach all their contacts
- Consistency: Status should be reasonably accurate without being perfect
Approach 1: Heartbeat-Based Presence
Clients send periodic heartbeats (every 5-10 seconds) to indicate they're online.
How It Works
- Client connects and sends initial "online" signal
- Client sends heartbeat every 5 seconds
- Server marks user online, sets expiry (e.g., 30 seconds)
- If heartbeat stops, user becomes "offline" after expiry
- On disconnect, immediate "offline" status
Pros
- Simple to implement
- Works across network disruptions (graceful degradation)
Cons
- Delay in detecting offline status (up to expiry time)
- Heartbeat overhead for millions of users
Approach 2: Presence Channels with Pub/Sub
Use Redis pub/sub to distribute presence updates to interested parties.
How It Works
- User A's contacts subscribe to channel
presence:user_a - When User A's status changes, publish to
presence:user_a - All subscribers receive the update in real-time
Fanout Optimization
For users with many contacts (e.g., 1000), full fanout is expensive. Solutions:
- Lazy presence: Only query presence when user opens a chat
- Presence batching: Batch multiple presence updates together
- Presence on demand: Contacts request presence only when viewing contact list
Last Seen Timestamp
Instead of binary online/offline, show "last seen at [time]":
- Update
last_seen timestamp on every user action - When queried, return the timestamp
- Client displays relative time ("last seen 5 minutes ago")
This provides useful information without real-time presence overhead.
Recommendation
Use heartbeat-based presence with lazy querying:
- Update presence on heartbeat (every 10 seconds)
- Store in Redis with TTL
- Query presence only when needed (opening chat, viewing contacts)
- Avoid broadcasting presence to all contacts
6.4 Message Synchronization Across Devices
Users expect their message history to be available across all their devices (phone, tablet, web).
Sync Strategies
Approach 1: Pull-Based Sync
Client pulls messages it doesn't have by requesting messages after a certain timestamp or sequence number.
Pros: Simple, client controls sync timingCons: May miss messages if client is offline for long periods
Approach 2: Push-Based Sync
Server pushes new messages to all connected devices in real-time.
Pros: Instant sync across devicesCons: Requires tracking all device connections per user
Approach 3: Hybrid (Recommended)
Combine both approaches:
- Real-time push: When a device is connected, push new messages immediately
- Catch-up pull: When a device comes online, pull any messages it missed
Multi-Device Message Delivery
When User A has 3 devices connected:
- Message arrives for User A
- Session Service returns all 3 device connections
- Message is pushed to all 3 devices
- Each device sends independent ACK
- "Delivered" status is set when any device acknowledges
- "Read" status is set when user opens the message on any device
6.5 Scaling Chat Servers
Chat servers are the most resource-intensive component because they maintain millions of persistent WebSocket connections.
Connection Limits
A single server can handle approximately 50,000-100,000 concurrent WebSocket connections (depending on hardware and message throughput).
For 50 million concurrent users, we need: 50M / 50K = 1,000 chat servers
Sticky Sessions
WebSocket connections are stateful. Once established, all messages for that user must go through the same server.
Load balancer configuration:
- Use consistent hashing based on
user_id - Or use connection-aware load balancing
Handling Server Failures
When a chat server crashes:
- All connected clients detect disconnection
- Clients automatically reconnect to another server
- New server registers the connection in Session Service
- Pending messages are fetched from the message queue
- Message delivery resumes
Graceful Shutdown
For planned maintenance:
- Stop accepting new connections
- Notify connected clients to reconnect elsewhere
- Wait for connections to drain (with timeout)
- Shutdown server
6.6 End-to-End Encryption (Conceptual)
End-to-end encryption ensures that only the sender and recipient can read messages. Not even the service provider can decrypt them.
High-Level Approach (Signal Protocol)
- Key Generation: Each device generates a public/private key pair
- Key Exchange: Users exchange public keys when starting a conversation
- Message Encryption: Sender encrypts message with recipient's public key
- Transmission: Encrypted message travels through servers
- Decryption: Only recipient's private key can decrypt
Server's Role
With E2E encryption, the server:
- Can: Route encrypted messages, store encrypted data, manage delivery status
- Cannot: Read message content, provide message content to third parties
Trade-offs
- Security: Strong privacy protection
- Complexity: Key management across devices, handling key changes
- Features limited: Server-side search, spam detection more difficult
References
Quiz
Design WhatsApp Quiz
1 / 20For WhatsApp-like messaging, what is the main benefit of using a persistent connection (e.g., long-lived TCP/WebSocket) for online users?