Merge Sort - Access 2 Learn

Like the Quick Sort, the Merge Sort uses a divide and conquer method to help sort the data. In a given array, you start to subdivide your array into smaller arrays, sorting the data as you go along, only to merge it back together when you are done.

The way you sort the data is a bit interesting. You pick an element within the array, called a pivot. Usually the first, middle, or last. One could pick the mean value, however, that takes extra time to calculate. For simplicity sake, normally the first element is chosen.

From there, you push elements that are either less than, or greater than that pivot point into new arrays. Those arrays are then sorted again. This process repeats itself until there is only one element in the array and it is returned. For added ease, you can add an equal array, and store elements that are equal in that array. I recommend only adding this complexity if there is a decent chance of having duplicate values.

You can then join the elements from smallest to largest into one big array, and since you work from smallest to largest, they are sorted automatically as you merge them back together.

This process does require recursion, which isn’t as fast as a sorting algorithm which can stay in the array, but it is still relatively quick and easy to implement if you understand recursion.

Below is a block of Python code which performs a Merge Sort.

def merge_sort(array):
	# Sort the array by using merge sort
	if len(array) > 1:
        less = []
        equal = []
        greater = []

    	pivot = array[0]
    	for x in array:
        	if x < pivot:
            	less.append(x)
        	elif x == pivot:
            	equal.append(x)
        	else:
            	greater.append(x)
    	# Don't forget to return something!
    	return merge_sort(less) + equal + merge_sort(greater)  
        # Just use the + operator to join lists
        # Note that you want equal not pivot
	else:  # You need to handle the part at the end of the recursion - when you only have one element in your array, just return the array.
    	return array

If you look deeply at the code, you’ll notice a few things which will greatly affect this algorithm’s performance. For example, if you have a lot of duplicate values, then they can go into the equal array, and don’t have to be further tested.

As an example, I created an array of 500,000 elements, but limited their values to be between 1 and 255. When I sorted them, because there were so many duplicate values, it only took 0.49 seconds. However, when I allowed the values to be between 1 and 32,000 (which means there was still a lot of duplicate values), it jumped to 1.4 seconds. That number jumped to 1.9 seconds when I increased the possible range of values to be between 1 and 2,000,000 because almost all values were unique.

This is why understanding what type of data you are using, in addition to what types of values you will be using is important to picking the right data type as the merge sort can be very efficient if there will be lots of duplicate values, for example, if you wanted to sort by GPA, number of completed credit hours, or test grade scores, and very inefficient if you have lots of unique values, like student numbers that you wanted to sort on.

Merge Sort was originally found on Access 2 Learn

Walter Wimberly

Assistant Professor

Walter Wimberly is an Assistant Professor at a regional college in Tennessee, teaching Computer Science in the Software Engineering track. He works as a student advisor, oversees curriculum changes, develops new courses, and manages the advisory panel.
Walter taught full time for about 7 years, before going back into “industry” as a full stack Software Developer for a dozen years. There he focused on web based projects coding in JavaScript/jQuery and utilizing the Bootstrap CSS Framework on the front-end, and coding in PHP, ASP/ASP.Net, SQL on the back-end.

Since he loves teaching, he taught as an adjunct web and digital media classes for eight (8) years, while working in industry, and has since returned to teaching full time.

He has been married for over 25 years, and is father to several special needs boys. As such, he is working on some projects to help others who have special needs to be self-sufficient, and support the care givers of those with special needs. Check out his Autism blog for more info.

Algorithms

An Introduction to JSON

ByWalter Wimberly April 13, 2020April 13, 2020

JavaScript Object Notation, or JSON, is a solution to the problem of how do we move data between systems, especially data between a server and a web client. JSON is an open-standards, data interchange format that uses a serializable string of data to store information. It was initiated in the early 2000’s by Douglas Crockford,…

Algorithms

A (Known) Search Path

ByWalter Wimberly March 4, 2020March 4, 2020

In some cases you will know the path, or partial path, to the node you are looking for. This is often found when you are searching a HTML file (the Document Object Model or DOM), XML file, or similar storage structure. In that case, you can use something similar to XPATH for XML documents, or…

Algorithms

Constant Time Notation: O(1) Style Algorithms

ByWalter Wimberly December 30, 2024December 30, 2024

In computer science, Big O notation is used to describe the performance or complexity of an algorithm. O(1), also known as constant time notation, describes an algorithm that takes the same time regardless of the size of the input. For example, it doesn’t matter if you have an array of 10 elements, or 10,000 elements,…

Algorithms

Exponential Algorithms: O(n!)

ByWalter Wimberly December 31, 2024January 1, 2025

Exponential algorithms have a time complexity of O(n!), where n is the input size. These algorithms are generally inefficient and impractical for large inputs because of their rapid growth rate. Of all the different Big O notation problems, these are often considered the least efficient, and should be avoided if possible. Sometimes it cannot be…

Algorithms

Tower of Hanoi

ByWalter Wimberly March 23, 2020March 23, 2020

The Tower of Hanoi, sometimes called the Tower of Brahma puzzle, is one of the classic problems to look at if you want to learn recursion. It is good to understand how recursive solutions are arrived at and how parameters for this recursion are implemented. The basics of the problem is that you have a…

C++

C-Strings in C++

ByWalter Wimberly March 11, 2020March 11, 2020

A C String is actually an array of characters. While C strings are a thing, generally speaking it is best to use the C++ string object. However, if you run into an older application, or someone who wants to do things old style, it is good to know how to use a C-String. Additionally, any…

Algorithms

Similar Posts