I got handed a fun coding challenge the other day. Calculate the running median on a set of numbers as they are passed to our application. Not hard to do using simple patterns. You could just add your numbers to an array, sort it, and use some math to check the median every time your array grows right? But what about when your list hits 100,000 entries? 200,000? Well, performance falls through the floor. Re-sorting the list every add is expensive. So, how do we fix this?

For those interested in solving the problem themselves, this challenge is presented in Cracking the Coding Interview, and is available at HackerRank

Simple Heap Implementation

Lets start with a simple heap implementation. We need to efficiently find the middle of a data set so we'll want a max-heap and a min-heap, each containing half the data. Fortunately, there's not a whole lot of functional difference, so we can implement them with a single class.

Worth noting, is that JavaScript does some things for us that make this easier, such as automatically creating a .length property on arrays that's efficient to access. In other languages, such as Python, you won't have this. For speed, you'd want to hold the size of your heap in a property of your class and update it yourself.

Our Constructor

We want to allow the heap to be a min or max heap, so we'll let the constructor take an argument type. I'm don't want to get too into invalid parameters, so I'm just telling it to fall back to "min" if anything other than "max" is passed.

We'll also initialize our data set.

class MySuperSpecialHeapClass {
  constructor(type) {
    this.type = type == "max"
              ? "max"
              : "min"
    this.data = [0]
  }
}

Hang on a tick. That's just an array. Turns out, with a little bit of math, that's all you need. More on that next.

Keeping our heap straight

The defining trait of a binary heap is that everything is loosely sorted. For a max-heap, that means that the largest value is always the root node. For a min-heap, the smallest.

Let's write a couple of methods to help us keep our heap in line. We'll start with percolating a value up the heap. We'll be super creative here and call it upHeap.

upHeap(index) {
  let parent
    , temp
  while ( ~~(index / 2) > 0 ) {
    parent = ~~(index / 2)
    if (
       (this.type === 'min' && this.data[parent] > this.data[index])
    || (this.type === 'max' && this.data[parent] < this.data[index])
    ) {
      temp = this.data[parent]
      this.data[parent] = this.data[index]
      this.data[index] = temp
    }
    index = parent
  }
}

Quick note: ~~ is a double bit-wise inversion. Basically, we're flipping all the bits in the number and then flipping them back. In JavaScript, this has the effect of stripping off everything after the decimal. It's a fast, dirty version of Math.floor()

But, what does it mean? upHeap takes the array index of the node you want and shifts it up the heap as high as it will go. We do that, by first finding the node's parent and comparing the two. But how does any of that ↑↑ help us do this. Time to explain the math... and that [0] from earlier.

We're just storing everything in an array. An array is not a heap, but it can hold one. The [0] is an empty node that's just there to make everything else a little smoother. Our root node will live at data[1], it's two children will live at data[2] and data[3].

2 = 1 * 2
3 = 1 * 2 + 1

This holds true for every node as the heap gets bigger. data[2] can have 2 children which will live at data[4](2 × 2 = 4) and data[5](2 × 2 + 1 = 5), data[3] can have 2 children at data[6] and data[7], and so on. Any time we add a node to the end of our array, we are adding it to the next available leaf spot in our tree.

Say we perform these adds: 4, 3, 22, 34, 2, 7, 6 to a min heap without up-heaping. We'd end up with something that looked like this:
heap1
Not very useful. We'd have to scan half the heap to find our minimal value of 2, but if we upHeap each value as we add it, then the process looks like this this:
heap2-1
Which gives us:
heap3

That doesn't look too well organized, but that's okay. The goal of a heap isn't to keep everything in order, it's to keep everything sorted in a way that makes it efficient to always keep the minimal or maximal element at the root for quick access.

What if a value is inserted at the top of our heap though, or one changes? We need to also be able to shift values down the heap.

downHeap(index) {
  let mc
    , left
    , right
    , temp
  while (index * 2 <= this.data.length) {
    left = index * 2
    right = left + 1
    mc = this.type === 'min'
       ? !this.data[right] || this.data[left] < this.data[right]
         ? left
         : right
       : !this.data[right] || this.data[left] > this.data[right]
         ? left
         : right
    if (
       (this.type === 'min' && this.data[index] > this.data[mc])
    || (this.type === 'max' && this.data[index] < this.data[mc])
    ) {
      temp = this.data[index]
      this.data[index] = this.data[mc]
      this.data[mc] = temp
    }
    index = mc
  }
}

This one's a bit more complicated. Same basic operation, but now we're trying to swap the parent node down instead of swapping a child node up. To keep things straight, we want to check against the smallest child for a min-heap or the largest child for a max-heap. Remember, we're trying to get that value above the current one in our heap.

But what about keeping things DRY? You may have noticed that the swap code could easily be moved to a separate method so I don't have to repeat myself. That would add function call overhead to every iteration just to save 1 line of code. Don't do that.

Insertion

For this case, we want to be able to push values to our heap, but also pop the minimal/maximal value. We'll need 2 methods

push(value) {
  this.data.push(value)
  this.upHeap(this.data.length-1)
}
    
pushpop(value) {
  this.data.push(value)
  this.upHeap(this.data.length-1)
  let retval = this.data[1]
  this.data[1] = this.data[this.data.length-1]
  this.data.pop()
  this.downHeap(1)
  return retval
}

Yep, that's it. We add our new value to the bottom of our heap(end of the array), shift it up as high as it will go. Then we grab the minimal/maximal value and store it. Swap the last value into its place and shift it down as far as it will go. Then we return the min/max value.

Splitting Our Heap

I did say that we needed 2 heaps, right? Well, new class time.

class MyAwesomeSplitHeap {
  constructor() {
    this.upper = new MySuperSpecialHeapClass('min')
    this.lower = new MySuperSpecialHeapClass('max')
  },
  push(value) {
    value = this.upper.pushpop(value)
    value = this.lower.pushpop(value)
    if (this.upper.data.length > this.lower.data.length) {
      this.lower.push(value)
    } else {
      this.upper.push(value)
    }
  }
}

Awesome! We now have a split heap that will allow us quick access to our middle value(s). Our split heap class makes 2 heaps - one min and one max - and then shuffles data through them, keeping them roughly balanced. But weren't we trying to do something specific with this data? Oh, right, we wanted the median. Let's add a method for that.

median() {
  return parseFloat( this.upper.length > this.lower.length
                   ? this.upper[1]
                   : (this.upper[1] + this.lower[1]) / 2
                   ).toFixed(1)
}

Again, pretty basic. Because of the way we're pushing new values on to our split array, this.upper will always be the same size or one larger than this.lower. If it's larger, then we can just return its minimal value. If they're the same size, then we return the mean of this.upper's minimal value and this.lower's maximal value. Both of which, are at index 1. Finally, we turn it into a float rounded to the nearest tenth.

Aaaand, we're done. There are a couple of spots where this code can be improved. See if you can find them, just keep in mind, this is a large-scale data structure. Improved doesn't necessarily mean more readable or broken out into smaller chunks. When trying to parse records in the millions or billions, adding function call overhead or unnecessary language operators can grind your system to a halt. Here it means, faster. Just faster.

tenor