Merge Sort
Divide and conquer algorithms
Divide-and-conquer
Both merge sort and quicksort employ a common algorithmic paradigm based on recursion. This paradigm, divide-and-conquer, breaks a problem into subproblems that are similar to the original problem, recursively solves the subproblems, and finally combines the solutions to the subproblems to solve the original problem. Because divide-and-conquer solves subproblems recursively, each subproblem must be smaller than the original problem, and there must be a base case for subproblems. You should think of a divide-and-conquer algorithm as having three parts:
Divide the problem into a number of subproblems that are smaller instances of the same problem.
Conquer the subproblems by solving them recursively. If they are small enough, solve the subproblems as base cases.
Combine the solutions to the subproblems into the solution for the original problem.
You can easily remember the steps of a divide-and-conquer algorithm as divide, conquer, combine. Here's how to view one step, assuming that each divide step creates two subproblems (though some divide-and-conquer algorithms create more than two):
If we expand out two more recursive steps, it looks like this:
Because divide-and-conquer creates at least two subproblems, a divide-and-conquer algorithm makes multiple recursive calls.
Overview of merge sort
Conquer by recursively sorting the subarrays in each of the two subproblems created by the divide step. That is, recursively sort the subarray
array[p..q]
and recursively sort the subarrayarray[q+1..r]
.Combine by merging the two sorted subarrays back into the single sorted subarray
array[p..r]
.
The conquer step has us sort the two subarrays
array[0..3]
, which contains [14, 7, 3, 12], andarray[4..7]
, which contains [9, 11, 6, 2]. When we come back from the conquer step, each of the two subarrays is sorted:array[0..3]
contains [3, 7, 12, 14] andarray[4..7]
contains [2, 6, 9, 11], so that the full array is [3, 7, 12, 14, 2, 6, 9, 11].Finally, the combine step merges the two sorted subarrays in the first half and the second half, producing the final sorted array [2, 3, 6, 7, 9, 11, 12, 14].
The subarrays array[0..0]
and array[1..1]
are base cases, since each contains fewer than two elements. Here is how the entire merge sort algorithm unfolds:
In the next challenge, you'll focus on implementing the overall merge sort algorithm, to make sure you understand how to divide and conquer recursively. After you've done that, we'll dive deeper into how to merge two sorted subarrays efficiently and you'll implement that in the later challenge.
Linear-time merging
In order to merge the sorted subarrays array[p..q]
and array[q+1..r]
and have the result in array[p..r]
, we first need to make temporary arrays and copy array[p..q]
and array[q+1..r]
into these temporary arrays. We can't write over the positions in array[p..r]
until we have the elements originally in array[p..q]
and array[q+1..r]
safely copied.
The numbers in array
are grayed out to indicate that although these array positions contain values, the "real" values are now in lowHalf
and highHalf
. We may overwrite the grayed numbers at will.
Next, we merge the two sorted subarrays, now in lowHalf
and highHalf
, back into array[p..r]
. We should put the smallest value in either of the two subarrays into array[p]
. Where might this smallest value reside? Because the subarrays are sorted, the smallest value must be in one of just two places: either lowHalf[0]
or highHalf[0]
. (It's possible that the same value is in both places, and then we can call either one the smallest value.) With just one comparison, we can determine whether to copy lowHalf[0]
or highHalf[0]
into array[p]
. In our example, highHalf[0]
was smaller. Let's also establish three variables to index into the arrays:
i
indexes the next element oflowHalf
that we have not copied back intoarray
. Initially,i
is 0.j
indexes the next element ofhighHalf
that we have not copied back intoarray
. Initially,j
is 0.k
indexes the next location inarray
that we copy into. Initially,k
equalsp
.
After we copy from lowHalf
or highHalf
into array
, we must increment (add 1 to) k
so that we copy the next smallest element into the next position of array
. We also have to increment i
if we copied from lowHalf
, or increment j
if we copied from highHalf
. So here are the arrays before and after the first element is copied back into array
:
We've grayed out highHalf[0]
to indicate that it no longer contains a value that we're going to consider. The un-merged part of the highHalf
array starts at index j
, which is now 1. The value in array[p]
is no longer grayed out, because we copied a "real" value into it.
Where must the next value to copy back into array
reside? It's either first untaken element in lowHalf
(lowHalf[0]
) or the first untaken element in highHalf
(highHalf[1]
). With one comparison, we determine that lowHalf[0]
is smaller, and so we copy it into array[k]
and increment k
and i
:
Next, we compare lowHalf[1]
and highHalf[1]
, determining that we should copy highHalf[1]
into array[k]
. We then increment k
and j
:
Keep going, always comparing lowHalf[i]
and highHalf[j]
, copying the smaller of the two into array[k]
, and incrementing either i
or j
:
Eventually, either all of lowHalf
or all of highHalf
is copied back into array
. In this example, all of highHalf
is copied back before the last few elements of lowHalf
. We finish up by just copying the remaining untaken elements in either lowHalf
or highHalf
:
Copy each element in
array[p..r]
into eitherlowHalf
orhighHalf
.As long as some elements are untaken in both
lowHalf
andhighHalf
, compare the first two untaken elements and copy the smaller one back intoarray
.Once either
lowHalf
orhighHalf
has had all its elements copied back intoarray
, copy each remaining untaken element from the other temporary array back intoarray
.
Analysis of merge sort
Let's draw out the merging times in a "tree":
One other thing about merge sort is worth noting. During merging, it makes a copy of the entire array being sorted, with one half in lowHalf
and the other half in highHalf
. Because it copies more than a constant number of elements at some time, we say that merge sort does not work in place. By contrast, both selection sort and insertion sort do work in place, since they never make a copy of more than a constant number of array elements at any one time. Computer scientists like to consider whether an algorithm works in place, because there are some systems where space is at a premium, and thus in-place algorithms are preferred.
Last updated