Merge Sort

Divide and conquer algorithms

Divide-and-conquer

Both merge sort and quicksort employ a common algorithmic paradigm based on recursion. This paradigm, divide-and-conquer, breaks a problem into subproblems that are similar to the original problem, recursively solves the subproblems, and finally combines the solutions to the subproblems to solve the original problem. Because divide-and-conquer solves subproblems recursively, each subproblem must be smaller than the original problem, and there must be a base case for subproblems. You should think of a divide-and-conquer algorithm as having three parts:

  1. Divide the problem into a number of subproblems that are smaller instances of the same problem.

  2. Conquer the subproblems by solving them recursively. If they are small enough, solve the subproblems as base cases.

  3. Combine the solutions to the subproblems into the solution for the original problem.

You can easily remember the steps of a divide-and-conquer algorithm as divide, conquer, combine. Here's how to view one step, assuming that each divide step creates two subproblems (though some divide-and-conquer algorithms create more than two):

If we expand out two more recursive steps, it looks like this:

Because divide-and-conquer creates at least two subproblems, a divide-and-conquer algorithm makes multiple recursive calls.

Overview of merge sort

  1. Conquer by recursively sorting the subarrays in each of the two subproblems created by the divide step. That is, recursively sort the subarray array[p..q] and recursively sort the subarray array[q+1..r].

  2. Combine by merging the two sorted subarrays back into the single sorted subarray array[p..r].

  • The conquer step has us sort the two subarrays array[0..3], which contains [14, 7, 3, 12], and array[4..7], which contains [9, 11, 6, 2]. When we come back from the conquer step, each of the two subarrays is sorted: array[0..3] contains [3, 7, 12, 14] and array[4..7] contains [2, 6, 9, 11], so that the full array is [3, 7, 12, 14, 2, 6, 9, 11].

  • Finally, the combine step merges the two sorted subarrays in the first half and the second half, producing the final sorted array [2, 3, 6, 7, 9, 11, 12, 14].

The subarrays array[0..0] and array[1..1] are base cases, since each contains fewer than two elements. Here is how the entire merge sort algorithm unfolds:

In the next challenge, you'll focus on implementing the overall merge sort algorithm, to make sure you understand how to divide and conquer recursively. After you've done that, we'll dive deeper into how to merge two sorted subarrays efficiently and you'll implement that in the later challenge.

// Takes in an array that has two sorted subarrays,
//  from [p..q] and [q+1..r], and merges the array
var merge = function(array, p, q, r) {
    var lowHalf = [], 
        highHalf = [], 
        k = p, d, e;
    // get everything on the left side
    for (d = 0; k <= q; d++, k++) {
        lowHalf[d] = array[k];
    }
    // get everything on the right side
    for (e = 0; k <= r; e++, k++) {
        highHalf[e] = array[k];
    }
    // switch elements if needed
    k = p;
    for (e = d = 0; d < lowHalf.length && e < highHalf.length;) {
        if (lowHalf[d] < highHalf[e]) {
            array[k] = lowHalf[d];
            d++;
        } else {
            array[k] = highHalf[e];
            e++;
        }
        k++;
    }
    // left side
    for (; d < lowHalf.length;) {
        array[k] = lowHalf[d];
        d++;
        k++;
    }
    // right side
    for (; e < highHalf.length;) {
        array[k] = highHalf[e];
        e++;
        k++;
    }
};

// Takes in an array and recursively merge sorts it
var mergeSort = function(array, p, r) {
    if (r > p) {
        var q = floor((r-p)/2) + p;
        mergeSort(array, p, q); 
        mergeSort(array, q+1, r);
        merge(array, p, q, r);
    }
};

var array = [14, 7, 3, 12, 9, 11, 6, 2];
mergeSort(array, 0, array.length-1);
println("Array after sorting: " + array);
// Program.assertEqual(array, [2, 3, 6, 7, 9, 11, 12, 14]);

Linear-time merging

In order to merge the sorted subarrays array[p..q] and array[q+1..r] and have the result in array[p..r], we first need to make temporary arrays and copy array[p..q] and array[q+1..r] into these temporary arrays. We can't write over the positions in array[p..r] until we have the elements originally in array[p..q] and array[q+1..r] safely copied.

The numbers in array are grayed out to indicate that although these array positions contain values, the "real" values are now in lowHalf and highHalf. We may overwrite the grayed numbers at will.

Next, we merge the two sorted subarrays, now in lowHalf and highHalf, back into array[p..r]. We should put the smallest value in either of the two subarrays into array[p]. Where might this smallest value reside? Because the subarrays are sorted, the smallest value must be in one of just two places: either lowHalf[0] or highHalf[0]. (It's possible that the same value is in both places, and then we can call either one the smallest value.) With just one comparison, we can determine whether to copy lowHalf[0] or highHalf[0]into array[p]. In our example, highHalf[0] was smaller. Let's also establish three variables to index into the arrays:

  • i indexes the next element of lowHalf that we have not copied back into array. Initially, i is 0.

  • j indexes the next element of highHalf that we have not copied back into array. Initially, j is 0.

  • k indexes the next location in array that we copy into. Initially, k equals p.

After we copy from lowHalf or highHalf into array, we must increment (add 1 to) k so that we copy the next smallest element into the next position of array. We also have to increment i if we copied from lowHalf, or increment j if we copied from highHalf. So here are the arrays before and after the first element is copied back into array:

We've grayed out highHalf[0] to indicate that it no longer contains a value that we're going to consider. The un-merged part of the highHalf array starts at index j, which is now 1. The value in array[p] is no longer grayed out, because we copied a "real" value into it.

Where must the next value to copy back into array reside? It's either first untaken element in lowHalf (lowHalf[0]) or the first untaken element in highHalf (highHalf[1]). With one comparison, we determine that lowHalf[0] is smaller, and so we copy it into array[k] and increment k and i:

Next, we compare lowHalf[1] and highHalf[1], determining that we should copy highHalf[1] into array[k]. We then increment k and j:

Keep going, always comparing lowHalf[i] and highHalf[j], copying the smaller of the two into array[k], and incrementing either i or j:

Eventually, either all of lowHalf or all of highHalf is copied back into array. In this example, all of highHalf is copied back before the last few elements of lowHalf. We finish up by just copying the remaining untaken elements in either lowHalf or highHalf:

  1. Copy each element in array[p..r] into either lowHalf or highHalf.

  2. As long as some elements are untaken in both lowHalf and highHalf, compare the first two untaken elements and copy the smaller one back into array.

  3. Once either lowHalf or highHalf has had all its elements copied back into array, copy each remaining untaken element from the other temporary array back into array.

Analysis of merge sort

Let's draw out the merging times in a "tree":

One other thing about merge sort is worth noting. During merging, it makes a copy of the entire array being sorted, with one half in lowHalf and the other half in highHalf. Because it copies more than a constant number of elements at some time, we say that merge sort does not work in place. By contrast, both selection sort and insertion sort do work in place, since they never make a copy of more than a constant number of array elements at any one time. Computer scientists like to consider whether an algorithm works in place, because there are some systems where space is at a premium, and thus in-place algorithms are preferred.

Last updated