What is a Search Autocomplete System?

A search autocomplete system suggests possible queries to users as they type into a search bar. For example, typing "best re" might prompt completions like "best restaurants near me" or "best recipes for dinner."

Autocomplete improves user experience by reducing typing effort, guiding queries, and surfacing popular or trending searches.

In this chapter, we will explore the high-level design of a search autocomplete system.

Let’s begin by clarifying the requirements.

1. Clarifying Requirements

Before diving into the design, it’s important to clarify assumptions and define scope. Here’s an example of how a candidate–interviewer discussion might go:

Discussion

Candidate: Should the system suggest completions based only on historical queries or also on trending data?

Interviewer: Both. Suggestions should come from past queries, but trending searches should be prioritized.

Candidate: Do we need to personalize suggestions for each user?

Interviewer: Personalization is useful, but focus first on generic suggestions.

Candidate: Should suggestions update after each keystroke?

Interviewer: Yes, suggestions should update dynamically as the user types.

Candidate: How many suggestions should we return?

Interviewer: Assume 5–10 ranked suggestions per query.

Candidate: Do we need to support multiple languages?

Interviewer: Start with English, but the design should allow extensions.

Candidate: Should the system filter inappropriate or malicious queries?

Interviewer: Yes, filtering is required to maintain quality and safety.

Candidate: What about scale?

Interviewer: Assume millions of users and queries per second worldwide.

After gathering the details, we can summarize the key system requirements.

1.1 Functional Requirements

Top N Suggestions: The system must return the top N (e.g., 5-10) most relevant suggestions for a given input prefix.
Dynamic Updates: Suggestions should update instantly as the user types each character.
Ranking: Support flexible ranking mechanisms based on popularity, recency, or personalization.
Typo Tolerance: Ideally, the system should handle minor typos and offer corrections.
Multi-language Support: The system should be able to provide suggestions for multiple languages or locales.

1.2 Non-Functional Requirements

Low Latency: Suggestions must be returned within a very tight latency budget, typically < 100 ms per query.
High Scalability: The system needs to scale to millions of active users and billions of queries per day, especially for popular platforms.
Fault Tolerance & High Availability: The system should remain operational even if some components fail.
Near Real-time Updates: The list of popular or trending terms should be updated frequently (e.g., hourly or daily) to reflect current trends.

Scale Estimation

Assume we're building for a large-scale platform like Google Search or Amazon:

500 million daily active users
Each user performs 10 searches per day on average
Each search query generates 5-10 autocomplete requests (one per keystroke)
Total: 25-50 billion autocomplete queries per day
Peak QPS (queries per second): ~1 million QPS during peak hours
Storage: 100 million unique search terms, with metadata

These numbers guide our architectural decisions around caching, sharding, and replication.

2. Understanding the Problem

At its core, "autocomplete" is about efficiently finding words that start with a given prefix and then intelligently ranking them.

It's a blend of efficient data retrieval and smart relevance scoring.

Technically, this means:

Prefix Search: The system needs a way to rapidly search for all terms that begin with the characters the user has typed so far.
Relevance Ranking: Once potential completions are found, they need to be ordered by how likely or useful they are to the user.
Dynamic Updates: The underlying data, especially popularity scores, must be refreshed regularly to stay relevant.
Speed through Caching: Popular queries and their results should be stored closer to the user for even faster access.

Here's a typical workflow:

The user types "sp," the client immediately sends a request, the service looks up matching terms, ranks them based on relevance signals, and returns the top suggestions. This entire round trip needs to complete before the user finishes typing the next character.

3. High-Level Architecture

Premium Content

This content is for premium members only.

Design Search Autocomplete System

Ashish Pratap Singh