Top K Elements is a super useful pattern to find the K largest, k smallest or K most frequent elements in a dataset.
This pattern can be applied to problems like:
A straightforward approach to solve these types of problems is to sort the entire dataset and take the first k or last k elements.
But, this isn’t the most optimal approach since sorting takes O(nlogn) time.
The Top K Elements pattern allows us to solve this problem more optimally using heaps bringing down the time complexity down from O(nlogn) to O(nlogk)
In this chapter, you'll learn:
Here are some common scenarios where this pattern can be applied:
Lets understand this pattern with an example: Find the top k largest elements in an array.
Suppose you have an array:
You need to find the top 3 largest elements. The answer would be: [10, 15, 20].
There are multiple approaches to solve this problem:
The simplest approach is to sort the array in descending order and take the first K elements.
However, sorting the array takes O(n log n) time, which can be too slow for large datasets.
So while sorting works, it’s not the most optimal solution.
Instead, we can use heap data structure to solve this problem more efficiently.
There are two main types of heaps: max heap and min heap.
In a max heap, the largest element is at the top and each parent node is larger than its children.
A min heap is the opposite.
The smallest element is at the top, and every each node is smaller than its children.
The heap maintains this property whenever we add or remove elements from it.
We can solve this problem using both types of heaps, but each works a bit differently.
Let’s start with the max heap approach.
Here’s how it works:
heapq module only supports min heaps, we can simulate a max heap by pushing negative values into the heap.n takes O(log n) time.The space complexity of this approach is O(n) because we’re adding all the elements to the heap.
Now, let’s look at the more efficient approach using a min heap.
Here’s how it works:
Let’s break down the time and space complexity of this approach:
N-K elements, the heappushpop operation takes O(log K) time.The space complexity is O(K) because we’re only storing K elements in the heap.
Let’s walk through the min heap approach with the example array and find the top 3 largest elements.
1. Initialize the min heap with the first 3 elements: [7, 10, 4].
After heapifing it becomes ➔ [4, 10, 7].
Now process the rest of the array:
3 is smaller than 4, so we skip it.
20 is greater than 4, so we remove 4 and insert 20.
15 is greater than 7, so we remove 7 and insert 15
After processing all elements, the heap contains [10, 15, 20], which are the top 3 largest elements.
For this problem, using a min heap is more efficient compared to a max heap, because it only keeps track of K elements at a time, instead of storing all elements in the heap.
However, if you need to find the K smallest elements instead, you would use a max heap approach.