Google Docs is a cloud-based word processing application that allows users to create, edit, and share documents online. Unlike traditional editors, it enables real-time collaboration where multiple users can work on the same document simultaneously.
Every change is saved automatically, and users can see edits, comments, and suggestions from others in near real time.
To design a system like Google Docs, we must solve several complex challenges like:
Real-time editing with low latency
Consistent document state across multiple users
Conflict resolution when edits overlap
Efficiently storing the documents
Version history support
and fine-grained access control
And all of this needs to scale to millions of users and thousands of concurrent document edits without sacrificing performance.
In this article, we will explore the high-level architecture, low-level details, and the database and API design of a real-time collaborative editing system that supports all these features.
Let’s begin by clarifying the requirements.
1. Requirement Gathering
Before diving into the architecture, let's summarize the core functional and non-functional requirements:
1.1 Functional Requirements
Create & Retrieve Documents: Users should be able to create new documents and retrieve them instantly.
Collaborative Editing: Multiple users should be able to edit the same document simultaneously, and view each other’s changes in real-time.
Rich Text Formatting: The system should support full document structure and formatting including headings, bold/italic text, lists, hyperlinks, etc.
Live Cursors & Presence: Users should be able to see the cursor positions and presence of others.
Offline Access and Sync: Users should be able to edit documents offline (e.g. without internet), and the system should automatically sync changes once they reconnect.
Access Control and Sharing: Users should be able to share documents with specific permissions (view-only, comment, or edit).
1.2 Non-Functional Requirements
Real-Time Collaboration: Edits should be reflected to all participants within milliseconds.
Scalability: The system should handle millions of users, and thousands of documents being edited concurrently.
Version History: The system should keep a history of changes for each document, allowing users to view or revert to earlier versions.
Data Consistency: Despite concurrent edits, all users should eventually see the same final document state.
2. Capacity Estimation
Users and Documents
Monthly Active Users (MAU): 100 million
Daily Active Users (DAU): ~50 million
Peak Concurrent Users: ~1 million
Average Documents per User: 20
Total Documents: 100M users × 20 docs = 2 billion documents
Document Characteristics
Average Document Size: 100 KB (structured text with formatting, comments, metadata)