Imagine you are building a ride-sharing app like Uber. A user opens the app looking for the nearest available driver. There are 100,000 active drivers in the city. How do you find the 10 closest ones in under 100 milliseconds?
The naive approach is straightforward: calculate the distance from the user to every single driver and return the closest 10. With 100,000 drivers, that means 100,000 distance calculations per request. At 1,000 requests per second, the system performs 100 million calculations per second. This does not scale.
You might think: just add a database index. But here is the problem. Traditional B-tree indexes work on one dimension at a time. They can efficiently answer "find all users with age > 25" but struggle with "find all drivers within 2km" because location has two dimensions that must be queried together.
This is the fundamental challenge of location data. It powers countless applications, from ride-sharing and food delivery to dating apps and real estate searches. Yet the queries it requires are fundamentally different from what traditional databases were designed to handle.
In this chapter, we will explore how to efficiently store, index, and query location data at scale.
These techniques appear in many system design interviews: