scrapbook
  • "Unorganized" Notes
  • The Best Public Datasets for Machine Learning and Data Science
  • Practice Coding
  • plaid-API project
  • Biotech
    • Machine Learning vs. Deep Learning
  • Machine Learning for Computer Graphics
  • Books (on GitHub)
  • Ideas/Thoughts
  • Ziva for feature animation: Stylized simulation and machine learning-ready workflows
  • Tools
  • 🪶math
    • Papers
    • Math for ML (coursera)
      • Linear Algebra
        • Wk1
        • Wk2
        • Wk3
        • Wk4
        • Wk5
      • Multivariate Calculus
    • Improving your Algorithms & Data Structure Skills
    • Algorithms
    • Algorithms (MIT)
      • Lecture 1: Algorithmic Thinking, Peak Finding
    • Algorithms (khan academy)
      • Binary Search
      • Asymptotic notation
      • Sorting
      • Insertion sort
      • Recursion
      • Solve Hanoi recursively
      • Merge Sort
      • Representing graphs
      • The breadth-first search algorithm
      • Breadth First Search in JavaScript
      • Breadth-first vs Depth-first Tree Traversal in Javascript
    • Algorithms (udacity)
      • Social Network
    • Udacity
      • Linear Algebra Refresher /w Python
    • math-notes
      • functions
      • differential calculus
      • derivative
      • extras
      • Exponentials & logarithms
      • Trigonometry
    • Probability (MIT)
      • Unit 1
        • Probability Models and Axioms
        • Mathematical background: Sets; sequences, limits, and series; (un)countable sets.
    • Statistics and probability (khan academy)
      • Analyzing categorical data
      • Describing and comparing distributions
      • Outliers Definition
      • Mean Absolute Deviation (MAD)
      • Modeling data distribution
      • Exploring bivariate numerical data
      • Study Design
      • Probability
      • Counting, permutations, and combinations
      • Binomial variables
        • Binomial Distribution
        • Binomial mean and standard deviation formulas
        • Geometric random variable
      • Central Limit Theorem
      • Significance Tests (hypothesis testing)
    • Statistics (hackerrank)
      • Mean, Medium, Mode
      • Weighted Mean
      • Quartiles
      • Standard Deviation
      • Basic Probability
      • Conditional Probability
      • Permutations & Combinations
      • Binomial Distribution
      • Negative Binomial
      • Poisson Distribution
      • Normal Distribution
      • Central Limit Theorem
      • Important Concepts in Bayesian Statistics
  • 📽️PRODUCT
    • Product Strategy
    • Product Design
    • Product Development
    • Product Launch
  • 👨‍💻coding
    • of any interest
    • Maya API
      • Python API
    • Python
      • Understanding Class Inheritance in Python 3
      • 100+ Python challenging programming exercises
      • coding
      • Iterables vs. Iterators vs. Generators
      • Generator Expression
      • Stacks (LIFO) / Queues (FIFO)
      • What does -1 mean in numpy reshape?
      • Fold Left and Right in Python
      • Flatten a nested list of lists
      • Flatten a nested dictionary
      • Traverse A Tree
      • How to Implement Breadth-First Search
      • Breadth First Search
        • Level Order Tree Traversal
        • Breadth First Search or BFS for a Graph
        • BFS for Disconnected Graph
      • Trees and Tree Algorithms
      • Graph and its representations
      • Graph Data Structure Interview Questions
      • Graphs in Python
      • GitHub Repo's
    • Python in CG Production
    • GLSL/HLSL Shading programming
    • Deep Learning Specialization
      • Neural Networks and Deep Learning
      • Untitled
      • Untitled
      • Untitled
    • TensorFlow for AI, ML, and DL
      • Google ML Crash Course
      • TensorFlow C++ API
      • TensorFlow - coursera
      • Notes
      • An Introduction to different Types of Convolutions in Deep Learning
      • One by One [ 1 x 1 ] Convolution - counter-intuitively useful
      • SqueezeNet
      • Deep Compression
      • An Overview of ResNet and its Variants
      • Introducing capsule networks
      • What is a CapsNet or Capsule Network?
      • Xception
      • TensorFlow Eager
    • GitHub
      • Project README
    • Agile - User Stories
    • The Open-Source Data Science Masters
    • Coding Challenge Websites
    • Coding Interview
      • leetcode python
      • Data Structures
        • Arrays
        • Linked List
        • Hash Tables
        • Trees: Basic
        • Heaps, Stacks, Queues
        • Graphs
          • Shortest Path
      • Sorting & Searching
        • Depth-First Search & Breadth-First Search
        • Backtracking
        • Sorting
      • Dynamic Programming
        • Dynamic Programming: Basic
        • Dynamic Programming: Advanced
    • spaCy
    • Pandas
    • Python Packages
    • Julia
      • jupyter
    • macos
    • CPP
      • Debugging
      • Overview of memory management problems
      • What are lvalues and rvalues?
      • The Rule of Five
      • Concurrency
      • Avoiding Data Races
      • Mutex
      • The Monitor Object Pattern
      • Lambdas
      • Maya C++ API Programming Tips
      • How can I read and parse CSV files in C++?
      • Cpp NumPy
    • Advanced Machine Learning
      • Wk 1
      • Untitled
      • Untitled
      • Untitled
      • Untitled
  • data science
    • Resources
    • Tensorflow C++
    • Computerphile
      • Big Data
    • Google ML Crash Course
    • Kaggle
      • Data Versioning
      • The Basics of Rest APIs
      • How to Make an API
      • How to deploying your API
    • Jupiter Notebook Tips & Tricks
      • Jupyter
    • Image Datasets Notes
    • DS Cheatsheets
      • Websites & Blogs
      • Q&A
      • Strata
      • Data Visualisation
      • Matplotlib etc
      • Keras
      • Spark
      • Probability
      • Machine Learning
        • Fast Computation of AUC-ROC score
    • Data Visualisation
    • fast.ai
      • deep learning
      • How to work with Jupyter Notebook on a remote machine (Linux)
      • Up and Running With Fast.ai and Docker
      • AWS
    • Data Scientist
    • ML for Beginners (Video)
    • ML Mastery
      • Machine Learning Algorithms
      • Deep Learning With Python
    • Linear algebra cheat sheet for deep learning
    • DL_ML_Resources
    • Awesome Machine Learning
    • web scraping
    • SQL Style Guide
    • SQL - Tips & Tricks
  • 💡Ideas & Thoughts
    • Outdoors
    • Blog
      • markdown
      • How to survive your first day as an On-set VFX Supervisor
    • Book Recommendations by Demi Lee
  • career
    • Skills
    • learn.co
      • SQL
      • Distribution
      • Hypothesis Testing Glossary
      • Hypothesis Tests
      • Hypothesis & AB Testing
      • Combinatorics Continued and Maximum Likelihood Estimation
      • Bayesian Classification
      • Resampling and Monte Carlo Simulation
      • Extensions To Linear Models
      • Time Series
      • Distance Metrics
      • Graph Theory
      • Logistic Regression
      • MLE (Maximum Likelihood Estimation)
      • Gradient Descent
      • Decision Trees
      • Ensemble Methods
      • Spark
      • Machine Learning
      • Deep Learning
        • Backpropagation - math notation
        • PRACTICE DATASETS
        • Big Data
      • Deep Learning Resources
      • DL Datasets
      • DL Tutorials
      • Keras
      • Word2Vec
        • Word2Vec Tutorial Part 1 - The Skip-Gram Model
        • Word2Vec Tutorial Part 2 - Negative Sampling
        • An Intuitive Explanation of Convolutional Neural Networks
      • Mod 4 Project
        • Presentation
      • Mod 5 Project
      • Capstone Project Notes
        • Streaming large training and test files into Tensorflow's DNNClassifier
    • Carrier Prep
      • The Job Search
        • Building a Strong Job Search Foundation
        • Key Traits of Successful Job Seekers
        • Your Job Search Mindset
        • Confidence
        • Job Search Action Plan
        • CSC Weekly Activity
        • Managing Your Job Search
      • Your Online Presence
        • GitHub
      • Building Your Resume
        • Writing Your Resume Summary
        • Technical Experience
      • Effective Networking
        • 30 Second Elevator Pitch
        • Leveraging Your Network
        • Building an Online Network
        • Linkedin For Research And Networking
        • Building An In-Person Network
        • Opening The Line Of Communication
      • Applying to Jobs
        • Applying To Jobs Online
        • Cover Letters
      • Interviewing
        • Networking Coffees vs Formal Interviews
        • The Coffee Meeting/ Informational Interview
        • Communicating With Recruiters And HR Professional
        • Research Before an Interview
        • Preparing Questions for Interviews
        • Phone And Video/Virtual Interviews
        • Cultural/HR Interview Questions
        • The Salary Question
        • Talking About Apps/Projects You Built
        • Sending Thank You's After an Interview
      • Technical Interviewing
        • Technical Interviewing Formats
        • Code Challenge Best Practices
        • Technical Interviewing Resources
      • Communication
        • Following Up
        • When You Haven't Heard From an Employer
      • Job Offers
        • Approaching Salary Negotiations
      • Staying Current in the Tech Industry
      • Module 6 Post Work
      • Interview Prep
  • projects
    • Text Classification
    • TERRA-REF
    • saildrone
  • Computer Graphics
  • AI/ML
  • 3deeplearning
    • Fast and Deep Deformation Approximations
    • Compress and Denoise MoCap with Autoencoders
    • ‘Fast and Deep Deformation Approximations’ Implementation
    • Running a NeuralNet live in Maya in a Python DG Node
    • Implement a Substance like Normal Map Generator with a Convolutional Network
    • Deploying Neural Nets to the Maya C++ API
  • Tools/Plugins
  • AR/VR
  • Game Engine
  • Rigging
    • Deformer Ideas
    • Research
    • brave rabbit
    • Useful Rigging Links
  • Maya
    • Optimizing Node Graph for Parallel Evaluation
  • Houdini
    • Stuff
    • Popular Built-in VEX Attributes (Global Variables)
Powered by GitBook
On this page
  • Divide and conquer algorithms
  • Overview of merge sort
  • Linear-time merging
  • Analysis of merge sort
  1. math
  2. Algorithms (khan academy)

Merge Sort

PreviousSolve Hanoi recursivelyNextRepresenting graphs

Last updated 6 years ago

Divide and conquer algorithms

The two sorting algorithms we've seen so far, and , have worst-case running times of Θ(n2)\Theta(n^2)Θ(n2). When the size of the input array is large, these algorithms can take a long time to run. In this tutorial and the next one, we'll see two other sorting algorithms, merge sort and quicksort, whose running times are better. In particular, merge sort runs in Θ(nlg⁡n)\Theta(n \lg n)Θ(nlgn) time in all cases, and quicksort runs in Θ(nlg⁡n)\Theta(n \lg n)Θ(nlgn) time in the best case and on average, though its worst-case running time is Θ(n2)\Theta(n^2)Θ(n2). Here's a table of these four sorting algorithms and their running times:

Algorithm

Worst-case running time

Best-case running time

Average-case running time

Selection sort

Insertion sort

Merge sort

Quicksort

Divide-and-conquer

Both merge sort and quicksort employ a common algorithmic paradigm based on recursion. This paradigm, divide-and-conquer, breaks a problem into subproblems that are similar to the original problem, recursively solves the subproblems, and finally combines the solutions to the subproblems to solve the original problem. Because divide-and-conquer solves subproblems recursively, each subproblem must be smaller than the original problem, and there must be a base case for subproblems. You should think of a divide-and-conquer algorithm as having three parts:

  1. Divide the problem into a number of subproblems that are smaller instances of the same problem.

  2. Conquer the subproblems by solving them recursively. If they are small enough, solve the subproblems as base cases.

  3. Combine the solutions to the subproblems into the solution for the original problem.

You can easily remember the steps of a divide-and-conquer algorithm as divide, conquer, combine. Here's how to view one step, assuming that each divide step creates two subproblems (though some divide-and-conquer algorithms create more than two):

If we expand out two more recursive steps, it looks like this:

Because divide-and-conquer creates at least two subproblems, a divide-and-conquer algorithm makes multiple recursive calls.

Overview of merge sort

Because we're using divide-and-conquer to sort, we need to decide what our subproblems are going to look like. The full problem is to sort an entire array. Let's say that a subproblem is to sort a subarray. In particular, we'll think of a subproblem as sorting the subarray starting at index ppp and going through index rrr. It will be convenient to have a notation for a subarray, so let's say that array[p..r] denotes this subarray of array. Note that this "two-dot" notation is not legal JavaScript; we're using it just to describe the algorithm, rather than a particular implementation of the algorithm in code. In terms of our notation, for an array of nnn elements, we can say that the original problem is to sort array[0..n-1].Here's how merge sort uses divide-and-conquer:

  1. Divide by finding the number qqq of the position midway between ppp and rrr. Do this step the same way we found the midpoint in binary search: add pppand rrr, divide by 2, and round down.

  2. Conquer by recursively sorting the subarrays in each of the two subproblems created by the divide step. That is, recursively sort the subarray array[p..q] and recursively sort the subarray array[q+1..r].

  3. Combine by merging the two sorted subarrays back into the single sorted subarray array[p..r].

We need a base case. The base case is a subarray containing fewer than two elements, that is, when p≥rp \geq rp≥r, since a subarray with no elements or just one element is already sorted. So we'll divide-conquer-combine only when p<rp < rp<r.

Let's see an example. Let's start with array holding [14, 7, 3, 12, 9, 11, 6, 2], so that the first subarray is actually the full array, array[0..7] (p=0p=0p=0 and r=7r=7r=7). This subarray has at least two elements, and so it's not a base case.

  • In the divide step, we compute q=3q = 3q=3.

  • The conquer step has us sort the two subarrays array[0..3], which contains [14, 7, 3, 12], and array[4..7], which contains [9, 11, 6, 2]. When we come back from the conquer step, each of the two subarrays is sorted: array[0..3] contains [3, 7, 12, 14] and array[4..7] contains [2, 6, 9, 11], so that the full array is [3, 7, 12, 14, 2, 6, 9, 11].

  • Finally, the combine step merges the two sorted subarrays in the first half and the second half, producing the final sorted array [2, 3, 6, 7, 9, 11, 12, 14].

How did the subarray array[0..3] become sorted? The same way. It has more than two elements, and so it's not a base case. With p=0p=0p=0 and r=3r=3r=3, compute q=1q=1q=1, recursively sort array[0..1] ([14, 7]) and array[2..3] ([3, 12]), resulting in array[0..3] containing [7, 14, 3, 12], and merge the first half with the second half, producing [3, 7, 12, 14].

How did the subarray array[0..1] become sorted? With p=0p=0p=0 and r=1r=1r=1, compute q=0q=0q=0, recursively sort array[0..0] ([14]) and array[1..1] ([7]), resulting in array[0..1] still containing [14, 7], and merge the first half with the second half, producing [7, 14].

The subarrays array[0..0] and array[1..1] are base cases, since each contains fewer than two elements. Here is how the entire merge sort algorithm unfolds:

Most of the steps in merge sort are simple. You can check for the base case easily. Finding the midpoint qqq in the divide step is also really easy. You have to make two recursive calls in the conquer step. It's the combine step, where you have to merge two sorted subarrays, where the real work happens.

In the next challenge, you'll focus on implementing the overall merge sort algorithm, to make sure you understand how to divide and conquer recursively. After you've done that, we'll dive deeper into how to merge two sorted subarrays efficiently and you'll implement that in the later challenge.

// Takes in an array that has two sorted subarrays,
//  from [p..q] and [q+1..r], and merges the array
var merge = function(array, p, q, r) {
    var lowHalf = [], 
        highHalf = [], 
        k = p, d, e;
    // get everything on the left side
    for (d = 0; k <= q; d++, k++) {
        lowHalf[d] = array[k];
    }
    // get everything on the right side
    for (e = 0; k <= r; e++, k++) {
        highHalf[e] = array[k];
    }
    // switch elements if needed
    k = p;
    for (e = d = 0; d < lowHalf.length && e < highHalf.length;) {
        if (lowHalf[d] < highHalf[e]) {
            array[k] = lowHalf[d];
            d++;
        } else {
            array[k] = highHalf[e];
            e++;
        }
        k++;
    }
    // left side
    for (; d < lowHalf.length;) {
        array[k] = lowHalf[d];
        d++;
        k++;
    }
    // right side
    for (; e < highHalf.length;) {
        array[k] = highHalf[e];
        e++;
        k++;
    }
};

// Takes in an array and recursively merge sorts it
var mergeSort = function(array, p, r) {
    if (r > p) {
        var q = floor((r-p)/2) + p;
        mergeSort(array, p, q); 
        mergeSort(array, q+1, r);
        merge(array, p, q, r);
    }
};

var array = [14, 7, 3, 12, 9, 11, 6, 2];
mergeSort(array, 0, array.length-1);
println("Array after sorting: " + array);
// Program.assertEqual(array, [2, 3, 6, 7, 9, 11, 12, 14]);

Linear-time merging

The remaining piece of merge sort is the merge function, which merges two adjacent sorted subarrays, array[p..q] and array[q+1..r] into a single sorted subarray in array[p..r]. We'll see how to construct this function so that it's as efficient as possible. Let's say that the two subarrays have a total of nnn elements. We have to examine each of the elements in order to merge them together, and so the best we can hope for would be a merging time of Θ(n)\Theta(n)Θ(n). Indeed, we'll see how to merge a total of nnn elements in Θ(n)\Theta(n)Θ(n) time.

In order to merge the sorted subarrays array[p..q] and array[q+1..r] and have the result in array[p..r], we first need to make temporary arrays and copy array[p..q] and array[q+1..r] into these temporary arrays. We can't write over the positions in array[p..r] until we have the elements originally in array[p..q] and array[q+1..r] safely copied.

The first order of business in the merge function, therefore, is to allocate two temporary arrays, lowHalf and highHalf, to copy all the elements in array[p..q] into lowHalf, and to copy all the elements in array[q+1..r]into highHalf. How big should lowHalf be? The subarray array[p..q]contains q−p+1q-p+1q−p+1 elements. How about highHalf? The subarray array[q+1..r] contains r−q r-qr−q elements. (In JavaScript, we don't have to give the size of an array when we create it, but since we do have to do that in many other programming languages, we often consider it when describing an algorithm.)

In our example array [14, 7, 3, 12, 9, 11, 6, 2], here's what things look like after we've recursively sorted array[0..3] and array[4..7] (so that p=0p=0p=0, q=3q=3q=3, and r=7r=7r=7) and copied these subarrays into lowHalf and highHalf:

The numbers in array are grayed out to indicate that although these array positions contain values, the "real" values are now in lowHalf and highHalf. We may overwrite the grayed numbers at will.

Next, we merge the two sorted subarrays, now in lowHalf and highHalf, back into array[p..r]. We should put the smallest value in either of the two subarrays into array[p]. Where might this smallest value reside? Because the subarrays are sorted, the smallest value must be in one of just two places: either lowHalf[0] or highHalf[0]. (It's possible that the same value is in both places, and then we can call either one the smallest value.) With just one comparison, we can determine whether to copy lowHalf[0] or highHalf[0]into array[p]. In our example, highHalf[0] was smaller. Let's also establish three variables to index into the arrays:

  • i indexes the next element of lowHalf that we have not copied back into array. Initially, i is 0.

  • j indexes the next element of highHalf that we have not copied back into array. Initially, j is 0.

  • k indexes the next location in array that we copy into. Initially, k equals p.

After we copy from lowHalf or highHalf into array, we must increment (add 1 to) k so that we copy the next smallest element into the next position of array. We also have to increment i if we copied from lowHalf, or increment j if we copied from highHalf. So here are the arrays before and after the first element is copied back into array:

We've grayed out highHalf[0] to indicate that it no longer contains a value that we're going to consider. The un-merged part of the highHalf array starts at index j, which is now 1. The value in array[p] is no longer grayed out, because we copied a "real" value into it.

Where must the next value to copy back into array reside? It's either first untaken element in lowHalf (lowHalf[0]) or the first untaken element in highHalf (highHalf[1]). With one comparison, we determine that lowHalf[0] is smaller, and so we copy it into array[k] and increment k and i:

Next, we compare lowHalf[1] and highHalf[1], determining that we should copy highHalf[1] into array[k]. We then increment k and j:

Keep going, always comparing lowHalf[i] and highHalf[j], copying the smaller of the two into array[k], and incrementing either i or j:

Eventually, either all of lowHalf or all of highHalf is copied back into array. In this example, all of highHalf is copied back before the last few elements of lowHalf. We finish up by just copying the remaining untaken elements in either lowHalf or highHalf:

We claimed that merging nnn elements takes Θ(n)\Theta(n)Θ(n) time, and therefore the running time of merging is linear in the subarray size. Let's see why this is true. We saw three parts to merging:

  1. Copy each element in array[p..r] into either lowHalf or highHalf.

  2. As long as some elements are untaken in both lowHalf and highHalf, compare the first two untaken elements and copy the smaller one back into array.

  3. Once either lowHalf or highHalf has had all its elements copied back into array, copy each remaining untaken element from the other temporary array back into array.

How many lines of code do we need to execute for each of these steps? It's a constant number per element. Each element is copied from array into either lowHalf or highHalf exactly one time in step 1. Each comparison in step 2 takes constant time, since it compares just two elements, and each element "wins" a comparison at most one time. Each element is copied back into arrayexactly one time in steps 2 and 3 combined. Since we execute each line of code a constant number of times per element and we assume that the subarray array[p..q] contains nnn elements, the running time for merging is indeed Θ(n)\Theta(n)Θ(n).

Analysis of merge sort

Given that the merge function runs in Θ(n)\Theta(n)Θ(n) time when merging nnn elements, how do we get to showing that mergeSort runs in Θ(nlog⁡2n)\Theta(n \log_2 n)Θ(nlog2​n) time? We start by thinking about the three parts of divide-and-conquer and how to account for their running times. We assume that we're sorting a total of nnn elements in the entire array.

  1. The divide step takes constant time, regardless of the subarray size. After all, the divide step just computes the midpoint qqq of the indices ppp and rrr. Recall that in big-Θ notation, we indicate constant time by Θ(1)\Theta(1)Θ(1).

  2. The conquer step, where we recursively sort two subarrays of approximately n/2n/2n/2 elements each, takes some amount of time, but we'll account for that time when we consider the subproblems.

  3. The combine step merges a total of nnn elements, taking Θ(n)\Theta(n)Θ(n)time.

If we think about the divide and combine steps together, the Θ(1)\Theta(1)Θ(1) running time for the divide step is a low-order term when compared with the Θ(n)\Theta(n)Θ(n)running time of the combine step. So let's think of the divide and combine steps together as taking Θ(n)\Theta(n)Θ(n) time. To make things more concrete, let's say that the divide and combine steps together take cncncn time for some constant ccc.

To keep things reasonably simple, let's assume that if n>1n>1n>1, then nnn is always even, so that when we need to think about n/2n/2n/2, it's an integer. (Accounting for the case in which nnn is odd doesn't change the result in terms of big-Θ notation.) So now we can think of the running time of mergeSort on an nnn-element subarray as being the sum of twice the running time of mergeSort on an (n/2n/2n/2)-element subarray (for the conquer step) plus cncncn (for the divide and combine steps—really for just the merging).

Now we have to figure out the running time of two recursive calls on n/2n/2n/2 elements. Each of these two recursive calls takes twice of the running time of mergeSort on an (n/4)(n/4)(n/4)-element subarray (because we have to halve n/2n/2n/2) plus cn/2cn/2cn/2 to merge. We have two subproblems of size n/2n/2n/2, and each takes cn/2cn/2cn/2 time to merge, and so the total time we spend merging for subproblems of size n/2n/2n/2 is 2⋅cn/2=cn2\cdot cn/2 = cn2⋅cn/2=cn.

Let's draw out the merging times in a "tree":

Computer scientists draw trees upside-down from how actual trees grow. A tree is a graph with no cycles (paths that start and end at the same place). Convention is to call the vertices in a tree its nodes. The root node is on top; here, the root is labeled with the nnn subarray size for the original nnn-element array. Below the root are two child nodes, each labeled n/2n/2n/2 to represent the subarray sizes for the two subproblems of size n/2n/2n/2.

Each of the subproblems of size n/2n/2n/2 recursively sorts two subarrays of size (n/2)/2(n/2)/2(n/2)/2, or n/4n/4n/4. Because there are two subproblems of size n/2n/2n/2, there are four subproblems of size n/4n/4n/4. Each of these four subproblems merges a total of n/4n/4n/4 elements, and so the merging time for each of the four subproblems is cn/4cn/4cn/4. Summed up over the four subproblems, we see that the total merging time for all subproblems of size n/4n/4n/4 is 4⋅cn/4=cn4 \cdot cn/4 = cn4⋅cn/4=cn:

What do you think would happen for the subproblems of size n/8n/8n/8? There will be eight of them, and the merging time for each will be cn/8cn/8cn/8, for a total merging time of 8⋅cn/8=cn8 \cdot cn/8 = cn8⋅cn/8=cn:

As the subproblems get smaller, the number of subproblems doubles at each "level" of the recursion, but the merging time halves. The doubling and halving cancel each other out, and so the total merging time is cncncn at each level of recursion. Eventually, we get down to subproblems of size 1: the base case. We have to spend Θ(1)\Theta(1)Θ(1) time to sort subarrays of size 1, because we have to test whether p<rp < rp<r, and this test takes time. How many subarrays of size 1 are there? Since we started with nnn elements, there must be nnn of them. Since each base case takes Θ(1)\Theta(1)Θ(1) time, let's say that altogether, the base cases take cncncn -time:

Now we know how long merging takes for each subproblem size. The total time for mergeSort is the sum of the merging times for all the levels. If there are lll levels in the tree, then the total merging time is l⋅cnl \cdot cnl⋅cn. So what is lll? We start with subproblems of size nnn and repeatedly halve until we get down to subproblems of size 1. We saw this characteristic when we analyzed binary search, and the answer is l=log⁡2n+1l = \log_2 n + 1l=log2​n+1. For example, if n=8n=8n=8, then log⁡2n+1=4\log_2 n + 1 = 4log2​n+1=4, and sure enough, the tree has four levels: n=8,4,2,1n = 8, 4, 2, 1n=8,4,2,1. The total time for mergeSort, then, is cn(log⁡2n+1)cn (\log_2 n + 1)cn(log2​n+1). When we use big-Θ notation to describe this running time, we can discard the low-order term (+1)(+1)(+1) and the constant coefficient (c)(c)(c), giving us a running time of Θ(nlog⁡2n)\Theta(n \log_2 n)Θ(nlog2​n), as promised.

One other thing about merge sort is worth noting. During merging, it makes a copy of the entire array being sorted, with one half in lowHalf and the other half in highHalf. Because it copies more than a constant number of elements at some time, we say that merge sort does not work in place. By contrast, both selection sort and insertion sort do work in place, since they never make a copy of more than a constant number of array elements at any one time. Computer scientists like to consider whether an algorithm works in place, because there are some systems where space is at a premium, and thus in-place algorithms are preferred.

🪶
Θ(n2)\Theta(n^2)Θ(n2)
Θ(n2)\Theta(n^2)Θ(n2)
Θ(n2)\Theta(n^2)Θ(n2)
Θ(n2)\Theta(n^2)Θ(n2)
Θ(n)\Theta(n)Θ(n)
Θ(n2)\Theta(n^2)Θ(n2)
Θ(nlg⁡n)\Theta(n \lg n)Θ(nlgn)
Θ(nlg⁡n)\Theta(n \lg n)Θ(nlgn)
Θ(nlg⁡n)\Theta(n \lg n)Θ(nlgn)
Θ(n2)\Theta(n^2)Θ(n2)
Θ(nlg⁡n)\Theta(n \lg n)Θ(nlgn)
Θ(nlg⁡n)\Theta(n \lg n)Θ(nlgn)
selection sort
insertion sort