Like the Quick Sort, the Merge Sort uses a divide and conquer method to help sort the data. In a given array, you start to subdivide your array into smaller arrays, sorting the data as you go along, only to merge it back together when you are done.
The way you sort the data is a bit interesting. You pick an element within the array, called a pivot. Usually the first, middle, or last. One could pick the mean value, however, that takes extra time to calculate. For simplicity sake, normally the first element is chosen.
From there, you push elements that are either less than, or greater than that pivot point into new arrays. Those arrays are then sorted again. This process repeats itself until there is only one element in the array and it is returned. For added ease, you can add an equal array, and store elements that are equal in that array. I recommend only adding this complexity if there is a decent chance of having duplicate values.
You can then join the elements from smallest to largest into one big array, and since you work from smallest to largest, they are sorted automatically as you merge them back together.
This process does require recursion, which isn’t as fast as a sorting algorithm which can stay in the array, but it is still relatively quick and easy to implement if you understand recursion.
Below is a block of Python code which performs a Merge Sort.
def merge_sort(array):
# Sort the array by using merge sort
if len(array) > 1:
less = []
equal = []
greater = []
pivot = array[0]
for x in array:
if x < pivot:
less.append(x)
elif x == pivot:
equal.append(x)
else:
greater.append(x)
# Don't forget to return something!
return merge_sort(less) + equal + merge_sort(greater)
# Just use the + operator to join lists
# Note that you want equal not pivot
else: # You need to handle the part at the end of the recursion - when you only have one element in your array, just return the array.
return array
If you look deeply at the code, you’ll notice a few things which will greatly affect this algorithm’s performance. For example, if you have a lot of duplicate values, then they can go into the equal array, and don’t have to be further tested.
As an example, I created an array of 500,000 elements, but limited their values to be between 1 and 255. When I sorted them, because there were so many duplicate values, it only took 0.49 seconds. However, when I allowed the values to be between 1 and 32,000 (which means there was still a lot of duplicate values), it jumped to 1.4 seconds. That number jumped to 1.9 seconds when I increased the possible range of values to be between 1 and 2,000,000 because almost all values were unique.
This is why understanding what type of data you are using, in addition to what types of values you will be using is important to picking the right data type as the merge sort can be very efficient if there will be lots of duplicate values, for example, if you wanted to sort by GPA, number of completed credit hours, or test grade scores, and very inefficient if you have lots of unique values, like student numbers that you wanted to sort on.
Merge Sort was originally found on Access 2 Learn