Sorting is a very important and fundamental problem in the study of computer algorithms.
When we’re sorting, we start with a list of n items, and when we’re done, we want the list to contain the same items, but rearranged into nondecreasing order. (I’d say “into increasing order,” but two or more items might be equal.)
The list items might be a type that already has a way to order it, such as numbers or strings. But they might be something else, such as references to objects. In general, we can say that each list item has a sort key, which is the part of the item that determines the correct sorted order. In the case of objects, an object might have satellite data that travels with the sort keys. (No, satellite data does not necessarily come from satellites.) For example, when I make the final grades in this course, I will sort. The sort key will be the percentage of available points that you have earned. The satellite data will be your name. (If I sorted only the points and didn’t move the names around, too, then those whose names are at one end of the alphabet all get A’s and those whose names are at the other end don’t do so well.)
Python provides built-in functions to sort items. There’s the sorted
function that creates a sorted copy of a list. There’s also a sort
method, which sorts a list in place:
But how does sorting work? There are dozens of different sorting algorithms. Although the results of each algorithm are the same (a sorted list), some algorithms are more efficient than others, depending on how the data is initially organized in the list.
It may seem silly to implement a sorting method when Python already has one built-in, but every card-carrying computer scientist knows the basic sorting algorithms. They are to the canon of computer science as The Iliad is to the canon of classical literature—and it takes a lot less time to understand how to sort than it takes to read The Iliad.
We’ve already seen one sorting algorithm: selection sort. Here it is again:
Selection sort looks over the remaining list of items, finds the smallest, and moves that item into place.
Objective: Predict the behavior of a sorting algorithm.
Here are some cards, from the Algorithms tutorials on Khan Academy (for which Balkcom and Cormen are co-authors). You can click on one card, and then another, to swap the locations of the cards. Sort the cards using the selection-sort algorithm.
The insertion sort algorithm takes a different approach. Insertion sort inspects only the next item, and then inserts that new item into the already-sorted part of the list.
Here is a visualization, from the Algorithms tutorials on Khan Academy (for which Balkcom and Cormen are co-authors):
In insertion sort, we have a list, say the_list
, and it’s initialized some values:
the_list = [5, 2, 4, 6, 1, 3]
Consider the sublist containing just the item in the_list[0]
. Is this sublist sorted? Of course! Any sublist containing just one item is sorted.
Next, we consider the item in the_list[1]
. We are going to figure out where in the sublist to its left it should go. Let’s give the item in the_list[1]
the name key
, so that key
has the value 2. We compare key
with the item immediately to the left of the_list[1]
, namely the_list[0]
, which has the value 5. We see that key
, since it has the value 2, is less than the_list[0]
, which has the value 5. So we know that key
should go somewhere to the left of the item in the_list[0]
. We shift the item in the_list[0]
over one position to the right, into the_list[1]
. At this point, we have hit the left end of the list, and we stop and drop key
into the_list[0]
. At this point the sublist of the_list
starting at index 0 and going through index 1 is sorted, and the_list
is [2, 5, 4, 6, 1, 3]
.
Next, we consider the item in the_list[2]
, with value 4. Again, we figure out where in the sublist to its left it should go. We set key
to 4 and compare it with the item immediately to its left, in the_list[1]
, with value 5. We see that key
is less than the_list[1]
, and so we know that key
will go somewhere to the left of the_list[1]
. We shift the item in the_list[1]
over one position to the right, into the_list[2]
. Next, we compare key
with the item in the_list[0]
. We see that key
is not less than this item, whose value is 2, and so we know that key
does not go its left. We stop comparing key
with items to its left, and we drop key
into the last position out of which we shifted an item, namely the_list[1]
. At this point, the sublist of the_list
starting at index 0 and going through index 2 is sorted, and the_list
is [2, 4, 5, 6, 1, 3]
.
We repeat this process:
Look at the next item in the list, and assign its value to key
.
Find where in the sublist to its left this item goes, by marching to the left. Whenever we find an item that is greater than key
, we shift this item over one position to the right. Eventually, we either hit the beginning of the list (the left end), or we find an item that is not greater than key
. In the latter case, because the sublist preceding the item that we assigned to key
is already sorted, we know that if we find an item that is not greater than key
, then all items to the left of this item will also not be greater than key
. In either case, drop key
into the last position of the list out of which we shifted an item. The sublist known to be sorted has increased by one item.
Stop once all items in the list have marched to the left and been placed into the appropriate position in the list.
If we look at our example after each item is processed, we see, in order
[5, 2, 4, 6, 1, 3] # initial list
[2, 5, 4, 6, 1, 3] # after processing the_list[1]
[2, 4, 5, 6, 1, 3] # after processing the_list[2]
[2, 4, 5, 6, 1, 3] # after processing the_list[3]: it's already greater than everything to its left
[1, 2, 4, 5, 6, 3] # after processing the_list[4]
[1, 2, 3, 4, 5, 6] # after processing the_list[5]: done
Notice that finding an item equal to key
is just as good as finding an item that is less than key
. In either case, there’s no need to shift this item to the right, and so we stop marching to the left.
Here is some code for insertion sort.
Objective: Factor out a function from a section of code.
The code for insertion sort includes an outer loop and an inner loop. The inner loop takes an item, the key, at some index in the list, and slides items one to the right until the correct space has been made available to place the key. Write a function insert
that takes as parameters a list and the index of the key, and inserts the key into the correct location. Re-write insertion sort to use insert
.
Here is a solution. No peeking until you have solved it yourself!
Next, we’re going to use recursion to sort a list.
We will see later than merge sort is faster than selection sort and insertion sort in what we call the “worst case.” Merge sort is not the sorting algorithm of choice for small problem sizes. Once the problem size is in the range 500–1000, merge sort beats the other two, and its advantage increases as the problem size increases from there.
Another potential disadvantage of merge sort is that it does not work in place. That is, it has to make complete copies of the entire input list. Contrast this feature with selection sort and insertion sort, which at any time keep a copy of only one list entry rather than copies of all the list entries. Thus, if space is at a premium, merge sort might not be the sorting algorithm of choice.
For most occasions, however, merge sort will be just fine.
The key to making merge sort work is a linear-time merging step. (By “linear-time,” we mean that the time it takes is proportional to the number of items being merged. We’ll see exactly what that means later.) The idea is follows. Suppose we have two sorted lists, say a[0:m]
, containing m items in a[0]
through a[m-1]
, and b[0:n]
, containing n items in b[0]
through b[n-1]
, and we wish to produce a sorted list c[0:m+n]
containing all m + n items in either a
or b
, ending up in c[0]
through c[m+n-1]
.
Here is an important observation: there are only two candidates for the value that should be in c[0]
(the smallest item of the output), and these candidates are a[0]
and b[0]
. Why? Because the lists a
and b
are sorted, the smallest item overall must be the smallest item of whichever list, a
or b
, it started in. So we take the smaller of a[0]
and b[0]
and copy it into c[0]
.
Let’s say that it turned out that a[0] < b[0]
, so that we copied a[0]
into c[0]
. The value that should go into c[1]
is the second-smallest item overall, and it is also the smallest item of those remaining in a[1:m]
and b[0:n]
. Using the same reasoning as before, there are only two candidates for this value: a[1]
and b[0]
. We pick the smaller of these values and copy it into c[1]
.
In general, as we are copying values into c
, we have to look at only two candidate values: the smallest remaining value in a
and the smallest remaining value in b
. We can quickly determine which of these values is smaller, to copy it into the correct position of c
, and to update any indices into lists a
, b
, and c
.
It would be a very good exercise to write merge yourself, but here is one version. In this version, both of the sorted lists are initially contained within a single list, at indices p…q (for the first sorted list), and q+1…r. So the function merge
takes the list, p, q, and r as inputs. After merging, the result is put back into the list at index p, overwriting the original lists.
Notice that we can copy part of one list into another using slice notation, that is the colon within the square brackets. So this line:
the_list[k : r+1] = left[i:]
copies the items in left
starting at i
into the_list
, starting at index k
.
The merge function merges the sorted sublists the_list[p : q+1]
(containing the items from index p
through index q
—remember that with slice notation, the index after the colon is 1 greater than the index of the last item in the sublist) and the_list[q+1 : r+1]
(containing the items from index q+1
through index r
) into the sorted sublist the_list[p : r+1]
. Notice that here the lists being merged are actually two consecutive sublists of the_list
, and that they are being merged into the same locations, indices p
through r
, that they started in. The merge
function creates two new temporary lists, left
and right
, copied from the original list. It then merges back into the original list. It repeatedly chooses the smaller of the unmerged items in left
and right
and copies it back into the_list
. The index i
always tells us the where to find the next unmerged item in left
; j
indexes the next unmerged item in right
, and k
indexes where the next merged item will go in the_list
. Eventually, we exhaust either left
or right
. At that time, the other list still has unmerged items, and we can just copy them back into the_list
directly.
Notice that the parameters p
and r
to merge
have default values of 0 and None
, respectively. Recall that in Python, None
is a special value that means “no value at all.” Here we use it so that if we just call merge_sort(some_list)
, the code assumes that you want to sort the entire list, by setting p
to 0 and r
to the last index in the list.
Now that we have the linear-time merging function merge
, we need to use it well. Merge sort works by a common computer-science paradigm known as divide-and-conquer:
For merge sort, the problem to be solved is sorting the sublist the_list[p]
through the_list[r]
. Initially, for a list with n items, we’ll have p
= 0 and r
= n–1, but these values will be different for the recursive calls. Merge sort follows the divide-and-conquer paradigm as follows:
q
of the midpoint of the sublist the_list
in ranges p…r, and considering the sublists p…q and (q+1)…r.Here’s an example of merge sort in action.
Objective: implement a recursive sorting algorithm.
Unfold the merge_sort
function in the code below by clicking on the small arrow next to the appropriate line number. Complete the merge_sort
function by writing the recursive case. You’ll need to compute the midpoint q of the sublist p..r (you have integer variables available), make two recursive calls to merge_sort
, and merge the results using the provided merge
function.
Here is a solution for merge sort. No peeking until you’ve solved it yourself!
The recursion “bottoms out” when the sublist to be sorted has fewer than 2 items, since a sublist with 0 or 1 items is already sorted.
The merge_sort
function shows how easy divide-and-conquer is for merge sort. It almost seems like cheating! But if you understand recursion, you’ll believe that it works.
If you don’t understand how the recursion works—I mean really understand it—I recommend that you
p
and r
to merge_sort
. (If you have trouble drawing out the recursion tree, please come see me during office hours.)The merge sort function we saw is fine for sorting lists of items that can be easily compared, like strings or ints. What if we wanted to sort a list that contained addresses of (references to) objects? How does Python know which object is “less than” another?
If you are using the Python .sort
method, it turns out that you can write a special method __lt__
in the class definition. __lt__
is then used to compare the objects when you use the less-than symbol <
.
But here, we’re not using the Python .sort
method. We’re writing our own sorting code. How can we specify how we want to compare objects?
Imagine that you have a list of planets. You might wish to sort by mass, by distance from the sun, or by mean orbital velocity. These quantities might be stored in instance variables of a Planet
object, or they might be computable using instance variables of a Planet
object.
There is only one place in the merge sort code where we compare items from the list:
if left[i] < right[l]:
We need a function to replace the <
. Let’s say we had two Planet
objects, each with a mass
instance variable. We could write the function like this:
def lessthan_mass(body1, body2):
return body1.mass < body2.mass
I included “mass” in the function name to make it clear that this “less than” function compares masses. Somehow I need to specify to the merge sort code that merge
should use the lessthan_mass
function instead of a simple <
sign in comparisons. Fortunately, we can pass a function as a parameter to another function. We rewrite the header for merge_sort
like this:
def merge_sort(the_list, p, r, compare_fn)
and for merge
:
def merge(the_list, p, q, r, compare_fn):
When merge_sort
is called, we just pass in lessthan_mass
(itself a function) as the second parameter. Within merge_sort
, we change the call to merge
to include compare_fn
as the second actual parameter. Finally, in the body of merge
, we can use compare_fn
to compare items rather than using the <
sign:
if compare_fn(left[i], right[j]):