AlgoMaster Logo

Find Median from Data Stream

Last Updated: April 4, 2026

hard
4 min read

Understanding the Problem

Finding a median in a static, sorted array is trivial: just grab the middle element (or average the two middle elements if the length is even). The twist here is that numbers arrive one at a time as a stream, and after each insertion we might be asked for the current median. So the real challenge is maintaining some structure that lets us both insert efficiently and retrieve the median quickly.

A naive approach would be to keep a sorted list and insert each number into its correct position. That gives us O(1) median access but O(n) insertion because of shifting. On the other end, we could just append and sort on every findMedian call, but that is O(n log n) per query. Neither scales well when we have up to 50,000 operations.

The key observation is that we don't need the entire list sorted. We only need to know the one or two elements sitting right at the middle. If we could split the data into a "lower half" and an "upper half" and quickly access the largest element of the lower half and the smallest element of the upper half, we'd have the median instantly.

Key Constraints:

  • -10^5 <= num <= 10^5 → Values are bounded integers. This opens the door to counting sort approaches for the follow-up, but for the general solution we need a comparison-based structure.
  • At most 5 * 10^4 calls → With up to 50,000 operations, O(n) per operation gives 2.5 billion in the worst case, which is too slow. We need O(log n) per addNum to stay comfortable.
  • At least one element before findMedian → No need to handle the empty case for median queries.

Approach 1: Sort on Each Query

Intuition

The simplest way to find a median: keep all numbers in a list, sort it whenever someone asks for the median, and grab the middle. No fancy data structures, no balancing logic. Just brute force it.

This is exactly what you'd do if someone handed you a pile of index cards with numbers and asked for the median. You'd lay them all out in order and pick the middle one.

Algorithm

  1. Maintain a list of all numbers added so far.
  2. When addNum(num) is called, append num to the list.
  3. When findMedian() is called, sort the list.
  4. If the list has odd length, return the middle element.
  5. If the list has even length, return the average of the two middle elements.

Example Walkthrough

1Initialize: data = [], no elements yet
[]
1/6

Code

Every time findMedian is called, we re-sort the entire list from scratch. What if we maintained a structure that stays partially ordered as we insert, giving us instant access to the middle element(s)?

Approach 2: Two Heaps (Max-Heap + Min-Heap)

Intuition

Here's the key insight: to find the median, we only care about the element(s) in the middle. We don't need the entire collection sorted. We just need to know the boundary between the lower half and the upper half.

Imagine splitting all numbers into two groups. The "small" group holds the smaller half of all numbers, and the "large" group holds the larger half. If we could instantly access the largest number in the small group and the smallest number in the large group, we'd have the median.

That's exactly what two heaps give us. A max-heap for the lower half gives us the largest small number at its root. A min-heap for the upper half gives us the smallest large number at its root. The median is always at the boundary between these two heaps.

The balancing rule is simple: keep the heaps roughly equal in size. Specifically, let the max-heap (lower half) have at most one more element than the min-heap. This way, if the total count is odd, the median is the top of the max-heap. If the total count is even, the median is the average of both heap tops.

Algorithm

  1. Create two heaps: maxHeap (lower half) and minHeap (upper half).
  2. For addNum(num):
    • Add num to maxHeap.
    • Move the top of maxHeap to minHeap (this ensures the largest element from the lower half goes to the upper half if needed).
    • If minHeap has more elements than maxHeap, move the top of minHeap back to maxHeap.
  3. For findMedian():
    • If the heaps are equal size, return the average of both tops.
    • Otherwise, return the top of maxHeap (since it holds one extra element).

Example Walkthrough

1Initialize: maxHeap = [], minHeap = []
[]
1/6
1Initialize: minHeap = []
[]
1/6

Code