A search autocomplete system suggests possible queries to users as they type into a search bar. For example, typing "best re" might prompt completions like "best restaurants near me" or "best recipes for dinner."
Autocomplete improves user experience by reducing typing effort, guiding queries, and surfacing popular or trending searches.
In this chapter, we will explore the high-level design of a search autocomplete system.
Let’s begin by clarifying the requirements.
Before diving into the design, it’s important to clarify assumptions and define scope. Here’s an example of how a candidate–interviewer discussion might go:
Candidate: Should the system suggest completions based only on historical queries or also on trending data?
Interviewer: Both. Suggestions should come from past queries, but trending searches should be prioritized.
Candidate: Do we need to personalize suggestions for each user?
Interviewer: Personalization is useful, but focus first on generic suggestions.
Candidate: Should suggestions update after each keystroke?
Interviewer: Yes, suggestions should update dynamically as the user types.
Candidate: How many suggestions should we return?
Interviewer: Assume 5–10 ranked suggestions per query.
Candidate: Do we need to support multiple languages?
Interviewer: Start with English, but the design should allow extensions.
Candidate: Should the system filter inappropriate or malicious queries?
Interviewer: Yes, filtering is required to maintain quality and safety.
Candidate: What about scale?
Interviewer: Assume millions of users and queries per second worldwide.
After gathering the details, we can summarize the key system requirements.
Assume we're building for a large-scale platform like Google Search or Amazon:
These numbers guide our architectural decisions around caching, sharding, and replication.
At its core, "autocomplete" is about efficiently finding words that start with a given prefix and then intelligently ranking them.
It's a blend of efficient data retrieval and smart relevance scoring.
Technically, this means:
Here's a typical workflow:
The user types "sp," the client immediately sends a request, the service looks up matching terms, ranks them based on relevance signals, and returns the top suggestions. This entire round trip needs to complete before the user finishes typing the next character.