WhatsApp is a widely used instant messaging application that allows users to exchange messages, media files, voice notes, and make audio or video calls over the internet. It started as a simple chat app but has evolved into a full-featured communication platform.
With over 2.5 billion active users and more than 100 billion messages exchanged daily, WhatsApp is the world’s most popular messaging app.
In this chapter, we will dive into the high-level design of a messaging system like WhatsApp.
Let’s begin by clarifying the requirements.
Before diving into the design, lets outline the functional and non-functional requirements.
Let’s assume the following traffic characteristics for our chat application:
Total Users: Assume 1 billion registered users.
Daily Active Users (DAU): Around 500 million users actively use the app each day.
Peak Concurrent Connections: Approximately 50 million users connected at peak times.
Average Messages per Day: If each active user sends an average of 10 messages daily, this results in 5 billion messages per day.
Assuming each message is around 1 KB.
With 50 million users connected concurrently during peak times.
The Chat Servers manage a large number of concurrent connections, facilitate real-time communication, and ensure that messages are delivered efficiently between users with minimal latency.
To support seamless two-way messaging, a protocol like WebSockets—designed for native bidirectional communication between clients and servers—is ideal. (We'll delve into this in more detail later.)
The load balancer efficiently distributes incoming traffic from users across multiple instances of chat servers and user-facing services, such as the media service.
Here’s how the connection is established between the user and chat servers via the load balancer:
An alternative approach is to use service discovery, which enables users to connect directly to chat servers.
In this case, users first connect to the service discovery layer to identify the chat server they should connect to and then establish a WebSocket connection directly with that server.
The User Connection Cache is a fast, in-memory cache (e.g., Redis) that stores each user's active connection details, such as the chat server they’re connected to and their last_active timestamp.
Clients periodically send heartbeat signals to their connected server, and each heartbeat updates the user’s last_active timestamp in the cache.
This setup enables efficient support for online/offline status and last seen functionality.
If the difference between the current time and the last_active timestamp is within a defined threshold (e.g., 3 seconds), the user is shown as online; otherwise, they are marked as offline.
The Notification Service is responsible for delivering real-time notifications to users, especially when they are offline or not actively using the application.
When a user is offline, the chat server forwards the message to the Notification Service.
To enhance efficiency, the chat server can send this message to a message queue rather than directly interacting with the Notification Service and waiting for a response.
The Notification service integrates with external push notification providers like Firebase Cloud Messaging (FCM) and Apple Push Notification Service (APNS) to deliver messages as push notifications to offline users.
The Message Queue is a distributed, high-throughput queue (e.g., Kafka, RabbitMQ) that temporarily stores messages before they are consumed by the Message Storage Service.
By acting as an intermediary, the Message Queue decouples message storage from real-time message handling on chat servers, reducing latency and enhancing the scalability of the application.
The Message Storage Service is responsible for the reliable storage, fast retrieval, and efficient archival of chat messages.
It consumes incoming messages from the Message Queue and persists them in the Message DB for efficient storage and retrieval.
The Message DB stores all chat messages in a reliable and efficient manner, ensuring users can access past messages.
This database is designed for high-write throughput and efficient retrieval (e.g., Cassandra) to handle the large volume of messages in real-time chat applications.
The Group Service is responsible for handling all group-related functionalities, including creating groups, updating group details and managing group memberships.
When a message needs to be delivered to a group conversation, the Chat Servers query the Group Service to retrieve the current list of group members.
The Group DB stores and retrieves all data associated with group chats, including group IDs, member lists, admin roles, and group setting.
The Media Service handles the uploading and management of multimedia content, such as images, videos, and audio files.
It securely stores media files in a blob storage system, while maintaining metadata—such as file type, size, and upload timestamps—in a separate database for easy access and organization.
By offloading media storage from the main chat servers, the Media Service reduces bandwidth usage on the chat servers and enhances overall app performance.
The Blob Store is the storage backend for a chat application’s multimedia content, including images, videos, audio files, and documents.
It’s designed to handle large volumes of media content while ensuring fast, secure, and reliable access.
The Media Store typically leverages cloud-based object storage solutions (such as Amazon S3, Google Cloud Storage, or Azure Blob Storage) that provide high durability, scalability, and cost-effectiveness.
To reduce latency when uploading or downloading multimedia content, files are distributed to locations geographically closer to users via a Content Delivery Network (CDN).
When a user shares a multimedia file, the client application uploads it directly to the CDN, storing it in a location close to the recipient.
Instead of sending the file itself, the client sends the file’s URL to the chat server as part of the message, allowing other users to download and access the content quickly and efficiently from the nearest CDN location.
Once files are uploaded to the CDN, the Media Service retrieves them and stores them in a blob store for long-term storage.
This approach reduces the load on chat servers, minimizes latency, and significantly improves media delivery speed for users.
For a chat application, the database needs to handle core entities like users, messages and groups.
Here’s a breakdown of a database design that could support a scalable and efficient chat application.
To store the user, group and conversations data, we can use a SQL database like PostgreSQL.
For message data, it’s preferred to use a NoSQL database like Cassandra due to high-write throughput.
For media files, an object store like AWS S3 provides scalable and secure storage.
To understand why WebSockets are ideal for real-time messaging, let’s examine other potential solutions and their limitations:
In polling, the client periodically sends HTTP requests to the server to check for new messages.
Drawbacks: Polling can be resource-intensive, especially with high polling frequency. Since the server responds with "no new messages" most times, this approach can add substantial overhead and waste server resources.
In long polling, the client holds an open connection with the server until a new message is available or a timeout occurs. When the server has new data, it responds, and the client immediately re-establishes the connection, restarting the process.
While this reduces the need for repeated requests as in standard polling, long polling has several limitations:
Overall, long polling’s connection and resource demands make it less suited for real-time chat applications.
WebSockets on the other hand eliminate the need for repeated HTTP handshakes, headers, and responses, reducing overhead and enhancing performance.
The client and server establish a connection once, and this connection stays open for the entire chat session, enabling seamless data transfer.
This persistent connection makes WebSocket ideal for real-time communication, where both the client and server need to exchange data frequently and in a timely manner.
When a user opens the chat app, it initiates a WebSocket connection with one of the chat servers. This persistent connection remains open throughout the chat session, enabling real-time communication between the client and server without requiring repeated HTTP requests.
Once connected, when User A sends a message to User B, the message travels through User A’s WebSocket connection to the server managing that connection (Server A).
Server A then looks up the user connection cache to determine whether User B is online and, if so, which server currently holds User B's connection.
WebSockets enable real-time status updates for messages (e.g., “message sent,” “message delivered,” “message read”), providing users with instant feedback on message states.
When User A sends a message, it is transmitted over their WebSocket connection to the server handling their connection (Server A).
If User A is offline when attempting to send a message, the message won’t be sent until they are back online. The message remains in a pending state on User A's device until it reconnects and successfully sends the message to Server A.
Once the User B receives the message, it sends an acknowledgment to Server B.
If User B is offline, Server A will not receive an acknowledgment of delivery from Server B, so the message remains in the “sent” state for User A until User B reconnects.
When User B comes online, the client app sends the updates to Server B, at which point it sends a “delivered” acknowledgment to Server A. User A’s app is then updated to reflect the “delivered” status.
When User B opens the chat window and views the message, their app sends a “read” acknowledgment to Server B.
If User B is offline, they cannot view the message, so it will not trigger a “read” status. When User B reconnects and opens the chat, their app will send a “read” acknowledgment to Server B.
The Server B logs the “read” status in the message queue and forwards it to Server A. User A’s app then receives this update, marking the message as “read.”
In a group chat, every message sent must be distributed (or "fanned out") to each group member.
As group size increases, so does the fan-out workload. For instance, in a group of 500 members, each message requires 500 individual message deliveries, which can quickly overwhelm the server.
That’s why, most chat applications put limits on the number of members a group can have (WhatsApp currently has 1024).
To ensure that all group members can access the message history, Server A pushes the message to the Message Queue for storage.
The Message Storage Service consumes the message from the queue and stores it in the Message DB, where it will be available for group members to retrieve later.
To retrieve recent messages efficiently, we can use a time-based message ID.
A time-based message ID typically combines a timestamp with a unique identifier for each message.
This allows us to order messages by timestamp, retrieve messages within a specific time range and support pagination (“load more“) where you retrieve messages before or after a specific timestamp without re-sorting the dataset.
A common format might include:
For instance, a message sent at 2024-11-05 12:34:56.789 with a sequence 001 could have an ID: 20241105123456789001.
In the event of a chat server failure, all clients connected to that server will lose their connection.
To recover, clients automatically attempt to reconnect, this time establishing a new connection with a different available server.
The load balancer continuously monitors the health of each chat server through regular health checks.
If a server goes down, the load balancer immediately stops directing traffic to it, ensuring new connections are routed to healthy servers only.
To support horizontal scaling and efficient data access, we can implement sharding across different data types:
user_id. This will allow us to distribute user records across multiple servers and enable us to scale as the user base grows.message_id, using a timestamp-based message_id to enable efficient time-based searches. This structure allows recent messages to be accessed quickly and older messages to be located based on timestamp.With large volumes of messages and multimedia content, optimizing storage costs is essential.
Here are some effective strategies: