LeetCode 295: Find Median from Data Stream
i recently solved the find median from data stream problem on leetcode, and it’s a great example of heap data structures and streaming algorithms. this hard problem tests your understanding of priority queues, data structure design, and efficient algorithm implementation.
Problem Statement
the median is the middle value in an ordered integer list. if the size of the list is even, there is no middle value and the median is the mean of the two middle values.
- for example, for arr = [2,3,4], the median is 3.
- for example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.
implement the medianfinder class:
- medianfinder() initializes the medianfinder object.
- void addnum(int num) adds the integer num from the data stream to the data structure.
- double findmedian() returns the median of all elements so far.
example:
input:
["medianfinder", "addnum", "addnum", "findmedian", "addnum", "findmedian"]
[[], [1], [2], [], [3], []]
output:
[null, null, null, 1.5, null, 2.0]
explanation:
medianfinder medianfinder = new medianfinder();
medianfinder.addnum(1); // arr = [1]
medianfinder.addnum(2); // arr = [1, 2]
medianfinder.findmedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianfinder.addnum(3); // arr[1, 2, 3]
medianfinder.findmedian(); // return 2.0
My Approach
when i first saw this problem, i immediately thought about using a two-heap approach. the key insight is that we can maintain two heaps to efficiently track the median of a streaming data set without storing all elements in sorted order.
Initial Thoughts
i started by thinking about different approaches:
- two-heap approach - use max heap for lower half and min heap for upper half
- sorted array approach - maintain sorted array and find median
- balanced bst approach - use tree structure for median tracking
Solution Strategy
i decided to use a two-heap approach with the following strategy:
- max heap - store lower half of numbers
- min heap - store upper half of numbers
- balancing - keep heaps balanced (size difference ≤ 1)
- efficient insertion - O(log n) insertion time
- constant median retrieval - O(1) median access
My Solution
import "container/heap"
type medianfinder struct {
maxheap *maxheap // lower half
minheap *minheap // upper half
}
func constructor() medianfinder {
return medianfinder{
maxheap: &maxheap{},
minheap: &minheap{},
}
}
func (this *medianfinder) addnum(num int) {
// add to max heap first
heap.push(this.maxheap, num)
// move largest element from max heap to min heap
heap.push(this.minheap, heap.pop(this.maxheap))
// balance heaps: max heap should be larger or equal
if this.maxheap.len() < this.minheap.len() {
heap.push(this.maxheap, heap.pop(this.minheap))
}
}
func (this *medianfinder) findmedian() float64 {
if this.maxheap.len() > this.minheap.len() {
return float64((*this.maxheap)[0])
}
return float64((*this.maxheap)[0] + (*this.minheap)[0]) / 2.0
}
// maxheap implementation
type maxheap []int
func (h maxheap) len() int { return len(h) }
func (h maxheap) less(i, j int) bool { return h[i] > h[j] }
func (h maxheap) swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *maxheap) push(x interface{}) {
*h = append(*h, x.(int))
}
func (h *maxheap) pop() interface{} {
old := *h
n := len(old)
x := old[n-1]
*h = old[0 : n-1]
return x
}
// minheap implementation
type minheap []int
func (h minheap) len() int { return len(h) }
func (h minheap) less(i, j int) bool { return h[i] < h[j] }
func (h minheap) swap(i, j int) { h[i], h[j] = h[j], h[i] }
func (h *minheap) push(x interface{}) {
*h = append(*h, x.(int))
}
func (h *minheap) pop() interface{} {
old := *h
n := len(old)
x := old[n-1]
*h = old[0 : n-1]
return x
}
Code Breakdown
let me walk through how this solution works:
1. Data Structure Design
type medianfinder struct {
maxheap *maxheap // lower half
minheap *minheap // upper half
}
we use two heaps:
- maxheap: stores the lower half of numbers (largest at root)
- minheap: stores the upper half of numbers (smallest at root)
2. Constructor
func constructor() medianfinder {
return medianfinder{
maxheap: &maxheap{},
minheap: &minheap{},
}
}
initialize both heaps as empty slices. go’s heap package will handle the heap operations.
3. Insertion Logic
func (this *medianfinder) addnum(num int) {
// add to max heap first
heap.push(this.maxheap, num)
// move largest element from max heap to min heap
heap.push(this.minheap, heap.pop(this.maxheap))
// balance heaps: max heap should be larger or equal
if this.maxheap.len() < this.minheap.len() {
heap.push(this.maxheap, heap.pop(this.minheap))
}
}
the insertion process:
- add to max heap - insert the new number
- move to min heap - move largest from max heap to min heap
- rebalance - ensure max heap is larger or equal in size
4. Median Calculation
func (this *medianfinder) findmedian() float64 {
if this.maxheap.len() > this.minheap.len() {
return float64((*this.maxheap)[0])
}
return float64((*this.maxheap)[0] + (*this.minheap)[0]) / 2.0
}
median calculation logic:
- odd count: return root of max heap
- even count: return average of both heap roots
5. Heap Implementation
type maxheap []int
func (h maxheap) len() int { return len(h) }
func (h maxheap) less(i, j int) bool { return h[i] > h[j] }
func (h maxheap) swap(i, j int) { h[i], h[j] = h[j], h[i] }
implement the heap.interface:
- len(): return slice length
- less(): define heap property (max heap: parent > children)
- swap(): swap elements
- push()/pop(): slice operations
Example Walkthrough
let’s trace through the example: [1, 2, 3]
initial: maxheap=[], minheap=[]
add 1:
- push to maxheap: maxheap=[1], minheap=[]
- move largest: maxheap=[], minheap=[1]
- rebalance: maxheap=[1], minheap=[]
- median = 1
add 2:
- push to maxheap: maxheap=[2,1], minheap=[]
- move largest: maxheap=[1], minheap=[2]
- rebalance: maxheap=[1], minheap=[2]
- median = (1+2)/2 = 1.5
add 3:
- push to maxheap: maxheap=[3,1], minheap=[2]
- move largest: maxheap=[1], minheap=[2,3]
- rebalance: maxheap=[2,1], minheap=[3]
- median = 2
Time and Space Complexity
- time complexity: O(log n) for insertion, O(1) for median retrieval
- space complexity: O(n) for storing all elements
the algorithm is efficient because:
- heap operations are O(log n)
- we only need to access heap roots for median
- no need to maintain full sorted order
Key Insights
- two-heap approach - maintain sorted order without full sorting
- heap balancing - ensure max heap is larger or equal
- constant median access - median is always at heap roots
- go heap package - provides efficient heap operations
Alternative Approaches
i also considered:
-
sorted array approach - maintain sorted array
- insertion: O(n) time
- median: O(1) time
- less efficient for large datasets
-
balanced bst approach - use tree structure
- insertion: O(log n) time
- median: O(log n) time
- more complex implementation
-
counting sort approach - for small integer ranges
- insertion: O(1) time
- median: O(n) time
- limited to small ranges
Edge Cases to Consider
- empty stream - handle no elements case
- single element - return the element itself
- duplicate elements - handle repeated numbers
- large numbers - handle integer overflow
- negative numbers - work with any integers
- streaming order - median changes with each insertion
Go-Specific Features
- heap package - container/heap provides heap operations
- interface implementation - heap.interface for custom heaps
- slice operations - efficient push/pop operations
- type embedding - struct composition for data structure
Lessons Learned
this problem taught me:
- heap data structures - efficient priority queue operations
- streaming algorithms - processing data incrementally
- data structure design - choosing appropriate structures
- go interfaces - implementing custom heap types
Real-World Applications
this problem has applications in:
- financial systems - tracking median prices in real-time
- sensor networks - monitoring median sensor readings
- streaming analytics - real-time data analysis
- load balancing - tracking median response times
Conclusion
the find median from data stream problem is an excellent exercise in heap data structures and streaming algorithms. the key insight is using two heaps to maintain the median efficiently without storing all elements in sorted order.
you can find my complete solution on leetcode.
this is part of my leetcode problem-solving series. i’m documenting my solutions to help others learn and to track my own progress.