Find Median from Data Stream
Problem
Design a data structure that supports adding numbers from a data stream and finding the median of all elements added so far.
- At most 5 * 10⁴ calls will be made to addNum and findMedian.
- −10⁵ ≤ num ≤ 10⁵
Example
addNum(1), addNum(2), findMedian(), addNum(3), findMedian()1.5, 2After adding 1 and 2, the data structure contains [1, 2]. The median is (1 + 2) / 2 = 1.5. After adding 3, the data structure contains [1, 2, 3]. The median is 2, the middle element.
Approach
Straightforward Solution
A naive approach is to store all numbers in a list and sort it after each insertion. This leads to O(n log n) time per insertion, which is inefficient for large data streams.
Core Observation
The median divides a sorted list into two halves: the lower half and the upper half. Maintaining these halves separately allows efficient median retrieval. The lower half contains the smaller numbers, and the upper half contains the larger numbers. The median is either the maximum of the lower half, the minimum of the upper half, or the average of both, depending on the total number of elements.
Path to Optimal
PreviewThe key recognition signals are 'median' and 'data stream', which indicate the need for a data structure that supports efficient insertion and median retrieval…
Full step-by-step walkthrough on Pro →
Optimal Approach
PreviewUse two heaps: a max-heap to store the smaller half of the numbers and a min-heap to store the larger half. When adding a number, push it into the max-heap (inverted sign to simulate max-heap)…
Full step-by-step walkthrough on Pro →
Want the full reasoning chain?
Unlock the complete walkthrough, line-by-line analysis, and recall drill.
Unlock ProTime
O(log n) per insertion, O(1) per median retrieval
Each insertion involves pushing into a heap and possibly rebalancing, each operation costing O(log n). Median retrieval is O(1) by peeking at the heap tops.
Space
O(n)
All inserted elements are stored in the two heaps, so space grows linearly with the number of elements.
Pattern Spotlight
Heap / Priority Queue (Dual Heap Median Maintenance)
Maintain two heaps to represent the lower and upper halves of the data stream, balancing their sizes to ensure the median can be retrieved in O(1) time, while insertions remain O(log n).
Solution
| 1 | import heapq
|
| 2 |
|
| 3 | class MedianFinder:
|
| 4 | def __init__(self):
|
| 5 | self.small = []
|
| 6 | self.large = []
|
| 7 |
|
| 8 | def addNum(self, num: int) -> None:
|
| 9 | heapq.heappush(self.small, -1 * num)
|
| 10 |
|
| 11 | if (self.small and self.large and
|
| 12 | (-1 * self.small[0]) > self.large[0]):
|
| 13 | val = -1 * heapq.heappop(self.small)
|
| 14 | heapq.heappush(self.large, val)
|
| 15 |
|
| 16 | if len(self.small) > len(self.large) + 1:
|
| 17 | val = -1 * heapq.heappop(self.small)
|
| 18 | heapq.heappush(self.large, val)
|
| 19 | if len(self.large) > len(self.small) + 1:
|
| 20 | val = heapq.heappop(self.large)
|
| 21 | heapq.heappush(self.small, -1 * val)
|
| 22 |
|
| 23 | def findMedian(self) -> float:
|
| 24 | if len(self.small) > len(self.large):
|
| 25 | return -1 * self.small[0]
|
| 26 | if len(self.large) > len(self.small):
|
| 27 | return self.large[0]
|
| 28 |
|
| 29 | return (-1 * self.small[0] + self.large[0]) / 2.0 |
Step-by-Step Solution
Initialize Two Heaps for Median Maintenance
| 4 | def __init__(self):
|
| 5 | self.small = []
|
| 6 | self.large = []
|
Objective
To create the lower-half max-heap and upper-half min-heap used to maintain the stream median.
Key Insight
The median is determined by the boundary between the lower and upper halves. Keeping the lower half in a max-heap and the upper half in a min-heap gives constant-time access to the two middle candidates.
Interview Quick-Check
Core Logic
The lower half needs max-heap behavior; Python simulates it with negative values.
Core Logic
The upper half uses a normal min-heap.
Insert Number and Maintain Heap Order
To insert the new number and restore the invariant that every lower-half value is less than or equal to every upper-half value.
Balance Heap Sizes to Maintain Median Property
To ensure the two heaps differ in size by at most one element.
Retrieve Median Based on Heap Sizes
To return the correct median from heap tops in constant time.
3 more steps with full analysis available on Pro.
Line Analysis
This solution has 4 Critical lines interviewers watch for.
heapq.heappush(self.small, -1 * num)
Push the negative of the new number into the max-heap to simulate max-heap behavior.
Python's heapq is a min-heap by default; negating the number allows it to behave as a max-heap, ensuring the largest number in the lower half is accessible at the top.
(-1 * self.small[0]) > self.large[0]):
Compare the lower-half maximum against the upper-half minimum.
If the largest lower-half value exceeds the smallest upper-half value, the two heaps are ordered incorrectly and must be repaired.
return -1 * self.small[0]
Return the median as the negated top element of the max-heap.
Negating the stored negative value retrieves the actual median when the lower half is larger.
Full line-by-line criticality + rationale for all 21 lines available on Pro.
Test Your Understanding
Why are two heaps necessary instead of a single heap or a sorted list?
See the answer with Pro.
Related Problems
Heap / Priority Queue pattern
Don't just read it. Drill it.
Reconstruct Find Median from Data Stream from memory until it sticks. AlgoDrill blanks out key lines and makes you fill them back in, step by step.
Unlock the Find Median from Data Stream drill