Merge sort is a divide-and-conquer sorting algorithm that recursively divides an array into half and then merges the subarrays back into a sorted array, hence the name. Merge sort can either be implemented in a top-down, where the array is recursively split unto it is of size 1, then merged or bottom-up, where the array is already assumed to contain 1-sized subarrays (known as runs) and then they get merged.
Process
Top Down
- Take an array and split it (by copying into new arrays) into a left and right subarray, and
- Recursively sort the subarrays, . The base case is when there is a single element left in each subarray, in which case no further sorting is required.
- The key assumption is that we can assume and to be sorted, but now they need to merged together.
- Merge the subarrays according to the process below:
- Initialise 3 pointers, . will be used to iterate through and
- Loop until one of the arrays are empty:
- If , then the element is placed into . Increment
- Else, , so is placed into . Increment
- In either cases, we’ve added an element to so increment
- Once either of the arrays are empty, copy the remaining elements from the other array into , since we know they are sorted.
Bottom Up
Pseudocode
Top Down
Algorithm MergeSort(A: Array of length n):
Function Merge(Al,Ar,A: Arrays of lengths n,m,n+m):
i <- 0; j <- 0; k <- 0;
//Add the smaller of the two elements to A
While i < n and j < m do:
If Al[i] <= Ar[j] then:
A[k] <- Al[i]
i <- i + 1
Else:
A[k] <- Ar[j]
j <- j + 1
End if
k <- k + 1
End do
//If one of the subarrays is empty, copy the remaining from the other into A
If i == n then:
Copy Ar[j...m - 1] to A[k...n + m- 1]
Else:
Copy Al[i...n - 1] to A[k...n + m- 1]
End if
End Function
If n > 1 then:
Al <- Array of length ⌊n/2⌋ - 1
Ar <- Array of length ⌊n/2⌋ - 1
//Creat two subarrays A left and A right, and copy to them
Copy A[0..⌊n/2⌋ - 1] to Al[0..⌊n/2⌋ - 1]
Copy A[⌊n/2⌋ to n - 1] to Ar[0..⌊n/2⌋ - 1]
MergeSort(Al)
MergeSort(Ar)
Merge(Al,Ar,A)
End if
End Algorithm
Properties
- Stable: Duplicates are kept in order, as the left array is given priority in the
Merge
cost. - Not in-place, as subarrays are created in the splitting stage.
Time Complexity
Case Scenario | Input Description | Growth Function | Big-O | Notes |
---|---|---|---|---|
Worst | Largest and second largest elements are in seperate halves of the array | See below | ||
Average | ||||
Best | Array is sorted |
Basic Operation: Key Comparison
Since merge sort splits an array of size into two arrays of size and , and uses both arrays in the merge sort, it can be modelled with the recurrence relation:
Where is the merge step, which we can find by observing the Merge()
function. The best case for the merge function is when contains elements all smaller than that of , or vice versa, since then the while loop exists early, and comparisons are reduced. The worst case is when the largest and second largest elements are in different arrays
%%🖋 Edit in Excalidraw, and the dark exported image%%
In the worst case, there are key comparisons done in an ‘interleaving’ pattern. In the best case, there are comparisons. Since these both have a linear tight bound, we can have the final recurrence relation as:
We have a condition of because a comparison requires at least two elements.
Using the Master Theorem: