Analysis of sorting algorithms

Now that we’ve seen how to analyze algorithms using big-O notation, let’s analyze the three sorting algorithms we’ve seen.

Selection sort

Recall the code for selection sort:

As in the code, let’s say that the_list contains n items. Selection sort has nested loops. The outer loop, with loop index i, runs from 0 to n − 2. For a given value of i, the inner loop, with index j runs from i to n − 1. Here’s an easy way to see that the running time of selection sort is O(n²):

The outer loop makes n − 1 iterations. (0 to n − 2 is n − 1 different values of i.)
For each iteration of the outer loop, the inner loop makes at most n iterations. (i is at least 0, so the inner loop runs at most from 0 to n − 1.)
So, over all iterations of the outer loop, the total number of iterations of the inner loop is at most (n − 1) × n, which equals n² − n.
Each iteration of the outer loop takes a constant amount of time, plus the time for the inner loop.
Each iteration of the inner loop takes a constant amount of time.
Therefore, the total amount of time is at most a constant times n² − n. When we use O-notation, we can drop the low-order term, and we get O(n²).

But remember that O-notation gives us only an upper bound. Does selection sort really take time proportional to n²? The answer is yes. Look at it this way:

In the last iteration of the outer loop, i equals n − 2, and the inner loop iterates twice (for j = n − 2 and j = n − 1).
In the next-to-last iteration of the outer loop, i equals n − 3, and the inner loop iterates three times (for j = n − 3, n − 2, and n − 1).
Each time we think of reducing i, the number of inner-loop iterations increases by 1.
By the time we get to the first iteration of the outer loop, for i = 0, the inner loop makes n iterations.
Therefore, the total number of iterations of the inner loop is given by the following sum: 1 + 2 + ⋯ + n.
Fact: For a given integer n, the sum 1 + 2 + ⋯ + n is called an arithmetic series, and it has the value $\frac{n \times (n+1)}{2}$.
If we ignore the constant factor (1/2) and the low-order term, we get O(n²).

Here’s another way to see that the time is proportional to n².

Let i be the index of any item in the leftmost half of the list. How many items do we need to examine to find the smallest item in the sublist starting at index i?
We know that for any position of the leftmost half of the list, we need to examine all the items in the rightmost half to find the smallest item in the sublist.
That is, for each of n/2 items in the left half, we will have to examine at least the n/2 items in the right half.
So we have to examine at least (n/2) × (n/2) items, or n²/4.
Again, ignoring the constant factor, we see that the number of items we have to examine is at least proportional to n².

Big-Θ notation

So how can we denote that the running time of an algorithm is not just bounded from above by some function of its input size n, but that the running time is also bounded from below by the same function of n, ignoring constant factors and low-order terms? We use big-Θ notation. (That’s the Greek letter Theta.)

We can say that the running time of selection sort is not only O(n²), but that it’s Θ(n²) in all cases. In the best case, the worst case, and all cases in between, selection sort’s running time is at least a constant times n² and at most a constant times n² (ignoring low-order terms).

We must be careful when we talk about which running time of an algorithm we mean. For selection sort, all cases take Θ(n²) time. But as we’re about to see, when we talk about the running time of insertion sort, the best case and worst case are quite different.

Insertion sort

Here’s the code for insertion sort:

In the worst case, each item has to move all the way to the left in the inner loop. So the item in position 1 moves 1 place to the left, the item in position 2 moves 2 places to the left, the item in position 3 moves 3 places to the left, and so on, up to the item in position n − 1, which has to move n − 1 positions to the left. So how many times do we have to move items in this case? Just like with selection sort, we get an arithmetic series: 1 + 2 + ⋯ + (n − 1). Now, the arithmetic series goes up to n − 1, not up to n. No matter, because using our formula for arithmetic series, we see that the sum is $\displaystyle\frac{(n-1) \times n}{2}$. Ignoring the constant factors and low-order term, we see that in the worst case, insertion sort’s running time is Θ(n²).

When does the worst case occur? If each item is less than or equal to all the items to its left. In other words: if the list starts out sorted in reverse order.

Unlike selection sort, the best case for insertion sort is quite different from its worst case. Suppose that each time we enter the inner loop of insertion sort, we find that the_list[j] is less than or equal to key. Then the inner loop iterates 0 times, and the total running time is just Θ(n), That’s a lot better than Θ(n²).

When does the best case occur? If each item is greater than or equal to all the items to its left. In other words: if the list starts out sorted in the correct order.

What if the list is “mostly” sorted? What if each item is within a constant number of places from where it belongs, let’s say c places. Then each iteration of the outer loop takes time at most proportional to c, but that’s constant. So insertion sort takes O(n) time in this case. Can you think of a situation in which the list is mostly sorted?

Let’s recap. The best-case running time of insertion sort is Θ(n), and the worst-case running time is Θ(n²). What if you had to make a blanket statement that applied to all cases? The strongest you could say would be that insertion sort’s running time is O(n²). That is, the running time is at most a constant times n² (ignoring low-order terms), but it could be less.

Merge sort

Finally, consider the code for merge sort.

When merging two sorted sublists, say with lengths m and n, each item is copied out of the sublist once, wins a comparison at most once, and is copied back into the sublist once. Each of these three operations takes constant time per item, and so with m + n items, the time to merge the two sublists is Θ(m + n).

Put another way, if we’re merging two sublists whose total size is n, then the merging time is Θ(n).

Now, how about the running time for merge sort? We’ll see that it is Θ(n log n).

Given that the merge function runs in Θ(n) time when merging n elements, how do we get to showing that merge_sort runs in Θ(n log n) time? We start by thinking about the three parts of divide-and-conquer and how to account for their running times. We assume that we’re sorting a total of n elements in the entire array.

The divide step takes constant time, regardless of the subarray size. After all, the divide step just computes the midpoint q of the indices p and r. In big-Θ notation, we indicate constant time by Θ(1).
The conquer step, where we recursively sort two subarrays of approximately n/2 elements each, takes some amount of time, but we’ll account for that time when we consider the subproblems.
The combine step merges a total of n elements, taking Θ(n) time.

If we think about the divide and combine steps together, the Θ(1) running time for the divide step is a low-order term when compared with the Θ(n) running time of the combine step. So let’s think of the divide and combine steps together as taking Θ(n) time. To make things more concrete, let’s say that the divide and combine steps together take cn time for some constant c.

To keep things reasonably simple, let’s assume that if n > 1, then n is always even, so that when we need to think about n/2, it’s an integer. (Accounting for the case in which n is odd doesn’t change the result in terms of big-Θ notation.) So now we can think of the running time of merge_sort on an n-element subarray as being the sum of twice the running time of merge_sort on an (n/2)-element subarray (for the conquer step) plus cn (for the divide and combine steps—really for just the merging).

Now we have to figure out the running time of two recursive calls on n/2 elements. Each of these two recursive calls takes twice of the running time of merge_sort on an (n/4)-element subarray (because we have to halve n/2) plus cn/2 to merge. We have two subproblems of size n/2, and each takes cn/2 time to merge, and so the total time we spend merging for subproblems of size n/2 is 2 × cn/2 = cn.

Let’s draw out the merging times in a tree:

Each of the subproblems of size n/2 recursively sorts two subarrays of size (n/2)/2, or n/4. Because there are two subproblems of size n/2, there are four subproblems of size n/4. Each of these four subproblems merges a total of n/4 elements, and so the merging time for each of the four subproblems is cn/4. Summed up over the four subproblems, we see that the total merging time for all subproblems of size n/4 is 4 × cn/4 = cn:

What do you think would happen for the subproblems of size n/8? There will be eight of them, and the merging time for each will be cn/8, for a total merging time of 8 × cn/8 = cn:

As the subproblems get smaller, the number of subproblems doubles at each “level” of the recursion, but the merging time halves. The doubling and halving cancel each other out, and so the total merging time is cn at each level of recursion. Eventually, we get down to subproblems of size 1: the base case. We have to spend Θ(1) time to sort subarrays of size 1, because we have to test whether p < r, and this test takes time. How many subarrays of size 1 are there? Since we started with n elements, there must be n of them. Since each base case takes Θ(1) time, let’s say that altogether, the base cases take cn time:

Now we know how long merging takes for each subproblem size. The total time for merge_sort is the sum of the merging times for all the levels. If there are l levels in the tree, then the total merging time is l × cn. So what is l? We start with subproblems of size n and repeatedly halve until we get down to subproblems of size 1. We saw this characteristic when we analyzed binary search, and the answer is l = log₂ n + 1. For example, if n = 8, then log₂ n + 1 = 4, and sure enough, the tree has four levels: n = 8, 4, 2, 1. The total time for merge_sort, then, is cn(log₂ n + 1). When we use big-Θ notation to describe this running time, we can discard the low-order term (+1), the constant coefficient (c), and the base of the logarithm, giving us a running time of Θ(n log n), as promised.