Analyzing algorithms

Computer science is more than about just programming. It’s also about understanding characteristics of the problems we study and comparing algorithms for solving problems. Whatever the problem we study, we will be concerned with two aspects when we analyze an algorithm:

Correctness: Showing that our algorithms produce the correct answer.
Efficiency: Understanding how efficient our algorithms are, in terms of resources used. The resource that we most often analyze is time. (After all, time is money, right?)

Linear search vs. binary search

One of the most basic problems in computer science is finding the index of an item in a list. Let’s assume that the list is sorted, so that we can use the technique of binary search on the list if we want to. Maybe it’s a list of country names, sorted alphabetically.

countries = ["Afghanistan", "Albania", "Algeria", "Andorra", "Angola",
  "Anguila", "Argentina", "Armenia", "Aruba", ..., "Zimbabwe"]

(The … means that I got tired of typing country names, but let’s assume that every country name is in the list.)

Let’s say you’d like to find the index of a country in the list. (Perhaps “Yemen” or “Afghanistan”.) I can think of at least three approaches.

Approach #1: Random search. Pick a random index. Check whether that location contains the item you were looking for. If not, repeat.
Approach #2: Linear search. Loop over the items in the list, starting at the item at index 0, then the item at index 1, index 2, and so on until you find the country name you are looking for or you have exhausted the entire list without finding it.
Approach #3: Binary search. This approach requires the list to be sorted. We did this algorithm in the first lecture, when we repeatedly ripped a phone book in half to find my friend’s phone number. Maintain indices into a sublist under consideration. If the sublist is ever empty, then the country name is not found. Otherwise, divide the current sublist in half. Check whether the item at the midpoint is what we’re looking for. If so, return that index. Otherwise, check to see whether the midpoint item is greater than what we’re looking for or less than what we’re looking for. Based on the results of the comparison, discard either the first or second half the sublist, updating the indices demarcating the sublist appropriately, and repeat with the new sublist.

Worst-case and best-case analysis

Which of these methods will find the item we are looking for most quickly? It depends on the input we are given and perhaps on other factors. Let’s consider the best and worst possible outcomes for each of our approaches.

Random search. In the best case, we get lucky, and the item is the first one we look at. The time taken would be however long it takes for Python to examine a single item in the list. In the worst case, however, random numbers are picked forever, and the item is never found. (As time approaches forever, the probability of this unfortunate outcome occuring approaches 0—but we will not do probabalistic analysis of running time in this class.) Moreover, if the item is not in the list at all, we’ll never figure that out by picking random indices.
Linear search. In the best case, the item we are looking for happens to be the first item of the list, just like the best case for random search. In the worst case, the item we want is at the end of the list or not present in the list at all. If the list contains n items, Python will have to examine n items.
Binary search. In the best case, the item we are looking for is at the midpoint of the entire list, and the item is found in the first place we look, just like the best cases of random search and linear search. In the worst case, the algorithm has to divide the list in half over and over again until the remaining sublist is empty. How many times can a list of length n be divided in half before the sublist is empty? The answer turns out to be one more than log₂ n, the “base-2” logarithm of n. This number is much less than n, for large n.

The worst-case running time is often of more interest than best-case running time, since it is nice to be able to guarantee the performance of software. If we can say “the worst case running time for this algorithm is OK, as long as the list is not too long,” that’s good information. It’s often less useful to say “this algorithm might be really fast, if you give it just exactly the right input and you’re really lucky.”

Sometimes, the worst-case and best-case running times are the same. For example, if I ask you to find the smallest item in an unsorted list of length n, the obvious (and best) algorithm of “look at every every item in the list, always keeping track of the smallest seen so far” will require looking at all n items. Even if the smallest item happens to be located in the first spot, you won’t know it’s the smallest until you’ve looked at every item. The best and worst cases behave exactly the same.

The worst-case running time often depends on the size of the input.

How fast an algorithm runs often depends on how much data there is. (This is not always true. For example, what if the problem were to simply return a randomly selected item from a list? Or if the problem were to find the smallest item in a sorted list?)

Review of logarithms

The binary-search method above is an example of a repeated halving approach to solving a problem. We divide the list (or phone book) in half. We consider each half of the original list and solve the problem for that sublist. One of the sublists is trivial to solve the problem for: the item cannot be in that sublist, and so we eliminate that entire sublist from further consideration. The item might be in the other sublist. We divide that sublist in half and solve the two smaller problems. We repeat this process until either we find what we’re looking for or the sublist under consideration is empty.

The worst-case running time depends on how many times we can divide the list in half before we get down to an empty sublist. The answer turns out to be the base-2 logarithm of the size of the original list. Let’s review logarithms in more detail. If you’re loga-phobic I would like you to be comfortable with them.

Suppose we take a number, say n, and we halve it, getting n/2. We halve it again, getting n/4. And again, getting n/8. We continue halving, until we get down to a number that is less than 1. Since we cannot have a sublist with, say, half an item, any sublist size that is less than 1 must be 0.

Now, in order for the sublist size to become 0, it must have been 1 the previous time. Let’s answer the following question:

How many times do we halve n until we get down to 1 (or less)?

The answer turns out to be the base-2 logarithm of n, which we write as log₂ n.

Let’s see what base-2 logarithms are, and why log₂ n turns out to be the answer to our repeated-halving problem.

Let’s try the opposite operation. Let’s start with 1 and keep doubling until we get up to n. Since doubling is the inverse operation of halving, the number of times we have to double, starting at 1, until we get to n, must be the same as the number of times we have to halve, starting at n, until we get to 1.

Let’s make a table:

doublings	result
0	1
1	2
2	4
3	8
4	16
5	32
6	64

It’s pretty easy to see the pattern here. Each time, we’re multiplying by another factor of 2. After doubling x times, we have the number 2^x.

The answer to our question of how many times we have to double, starting at 1, until we get to n is the same as the answer to the following question: for what value of x is 2^x = n?

That’s exactly what log₂ n is! It’s the exponent x, that we raise 2 to, in order to get n. Put precisely:

2^x = n if and only if log₂ n = x.

So, we see that if we start at 1, and we double log₂ n times, we get n. Therefore, if we start at n, and we halve log₂ n times, we get 1. So, the answer to the question, “How many times do we halve n until we get down to 1?” is log₂ n times.

The above discussion assumes that n is a power of 2. That won’t always be the case, and we can handle it with some math that we don’t cover in CS 1. We also haven’t dealt with the case in which we have to get the sublist size down to 0 (recall: that’s the worst case for binary search, because what we’re looking for is not found). But that’s not much of a problem, because once the sublist size gets down to 1, the next sublist size will be 0. So the number of halvings to get down to 0 is one more than the number of halvings to get down to 1.

Fortunately, the notation that we shall soon see for characterizing running times allows us to ignore such inconvenient issues.

When computer scientists use logarithms, they almost always use base-2 logarithms.

Analyzing the running time for a program

So far, we’ve only talked loosely about the running time of algorithms. We can see that binary search, given the worst possible input for a binary search, will require fewer “steps” to complete than a linear search, given the worst possible input for linear search. Let’s try to do a more precise analysis, given some code that implements an algorithm.

Here is an implementation of linear search; comments have been removed to save space.

Let’s say that the_list has the length n. Let’s analyze how much time the function will take to run in terms of n.

First, Python has to create a local variable index. That’s pretty fast, and it’s done only once, but let’s measure that time anyway, and call it c₁, where c₁ is some constant that does not depend in any way on the length n of the_list.

Now let’s look at the while-loop. In a worst-case analysis, we have to assume that the item is not in the list at all. The body of the while-loop will execute n times. It seems reasonable to assume that each execution of the body will take a constant amount of time (or at least an amount that is at most a constant); let’s call it c₂. So the total time spent executing the body of the while-loop, in the worst case, is c₂n.

But we’re not quite done. In the worst case, the last test in the while-loop header comes up False, and it’s when index equals len(the_list). That’s the (n + 1)st time we make that test (because we’ve already made the n tests for index equaling 0, 1, 2, …, n − 1). We also have to return None in the worst case, indicating that what we’re looking for does not appear anywhere in the list. That last test and returning None, together, take some constant amount of time; let’s call it c₃ (again, independent of n).

Now, as they say in a French restaurant, “L’addition s’il vous plait.” (“The check, please.”) The running time of linear search in the worst case is therefore no worse than

c₁ + c₂n + c₃ ,

c₂n + (c₁ + c₃) .

The exact values of c₁, c₂, and c₃ depend on the speed of the computer the program is run on and the efficiency of the Python implementation.

Running time of recursive binary search

How do you analyze the running time of a recursive program? Let’s look at an example, recursive binary search:

If we look at the time for each recursive call on its own, not counting the time for the recursive calls that it makes, we see that each recursive call takes a constant amount of time. We check whether we have an empty sublist, compute the midpoint, check whether the key is in the position given by the midpoint, and recursively call the function on the proper half. Each of these steps, within a given recursive call takes constant time; let’s call it c₄. And, as we’ve seen, in the worst case, the number of recursive calls is at most log₂ n + 1. So the total worst-case time for binary search is

c₄ (log₂ n + 1) ,

c₄ log₂ n + c₄ .

Comparing linear search and binary search

So which is faster, linear search or binary search? It depends: on the suitability of the input we get for the particular algorithm (linear search will find “Afganistan” faster than binary search, for the example list), on the length of the list, and on the size of the constants. In the worst case, we ask the following question: When is

c₄ log₂ n + c₄

smaller than

c₂n + (c₁ + c₃) ?

I claim that whatever the value of the constants, for large enough n, binary search will be faster, if you assume the worst-case input for each algorithm. Let’s say that linear search is run on a HAL 9000 computer that is blazingly fast and ultra-modern, and that binary search is run on a vintage Apple II. Let’s assume that the HAL 9000 executes each iteration of the while-loop in linear search in 0.00001 seconds, so that c₂ = 0.00001, and let’s say that c₁ + c₃ = 0.

Now let’s take the Apple II. It is much slower, and it takes 0.01 seconds to execute each recursive call of binary search, so that c₄ = 0.01.

How large does n have to be before the Apple II wins? I claim that log₂ 16384 = 14; you can check for yourself that 2¹⁴ = 16384. So, when n = 16384, the Apple II takes 0.01 × 14 + 0.01 = 0.14 + 0.01 = 0.15 seconds. For the same input size, the HAL 9000 takes 0.00001 × 16384 = 0.16384 seconds. The Apple II wins!

Perhaps you’re thinking that I rigged the game here by choosing constants c₂ and c₄ so that the Apple II would win. Let’s just choose a longer list. Consider a list of length 1 million. Since log₂ 10⁶ is about 20, the Apple II with binary search will take about 0.2 seconds for the search. The HAL 9000 will take 10 seconds. There’s no way that the difference between c₂ and c₄ will overcome the Apple II’s advantage of having a log₂ n factor where the HAL 9000 has a factor of n. From now on, we will ignore these leading constant coefficients, such as c₂ and c₄, when comparing algorithms, since once n gets large enough, the constants become less important than how the running time varies with n.

Orders of growth and big-O notation

We say that in the worst case, linear search takes linear time in n, the size of the input list because the running time scales linearly with n. If the length of the input list doubles, the running time of linear search doubles, in the worst case. If the size of the input list quadruples, the running time quadruples, in the worst case.

We say that in the worst case, binary search takes logarithmic time in n, the size of the input list because the running time scales like a log₂ function. If the original size of the list doubles from n to 2n, the running time increases from c₄ log₂ n + c₄ to c₄ log₂ (2n) + c₄. How much bigger is this running time? Forget about the additive c₄ term; it’s insignificant. We can analyze c₄ log₂ n vs. c₄ log₂ (2n) either by thinking about the algorithm or with some equations. Thinking about the algorithm, doubling the length of the list will require that binary search recurse just one more time. That’s not very expensive; it just costs c₄ more time, even though we doubled the size of the list. If you prefer to use equations, c₄ log₂ (2n) = c₄ log₂ n + c₄ log₂ 2. But, since 2¹ = 2, we see that log₂ 2 = 1, and so c₄ log₂ n + c₄ log₂ 2 = c₄ log₂ n + c₄. If we double the size of the list, we spend just an additional c₄ time—the time for a single iteration of the loop.

We can see that in general, algorithms that take logarithmic time are always faster than algorithms that take linear time, at least once n is large enough. As n gets even bigger, the algorithm with logarithmic time will win by even more. The function log₂ n grows much more slowly than the function n.

We are usually only concerned with running time of an algorithm once its input size n gets large. For large enough n, a logarithmic running time is better than a linear running time, regardless of the constants. Computer scientists use a standard notation to indicate that the running time grows “like” some function, ignoring the constants. If the running time is linear (grows linearly with n) in the worst case, we say that the running time is O(n), pronounced “big-Oh of n”. If the running time is logarithmic (grows with the logarithm of n), we say that the running time is O(log₂ n).

Big-O notation indicates that for large enough n, ignoring constants, the running time is no worse than the function in the parentheses. I can therefore say that binary search’s running time is O(log₂ n) and that linear search’s running time is O(n).

In fact, we can drop the base of the logarithm when we use big-O notation. Why? Because of this mathematical fact: for any constants a and b,

$\displaystyle \log_a n = \frac{\log_b n}{\log_b a}$ .

So the difference between taking the logarithm of n using base a vs. base b is just the constant factor log_b a, and we’ve already decided that constant factors don’t matter in big-O notation.

I slipped something in before, when I said that “Big-O notation indicates that for large enough n, ignoring constants, the running time is no worse than the function in the parentheses.” Big-O notation gives us an upper bound on the running time’s rate of growth, but it doesn’t say anything about how low the running time’s rate of growth might be. If I tell you that I have an amount of money in my wallet, and I guarantee that this amount is at most a thousand dollars, I might have a thousand dollars, or I might have only ten cents.

So if an algorithm’s running time is O(n), it might actually take cn time for some constant c, but it might be even faster. For example, it might take only c log₂ n time. In other words, any running time that is O(log n) is also O(n): it’s at least as good as linear, if not better. Put another way: it is true that binary search runs in O(n) time, but that is not the most accurate statement you can make; it would be more accurate to say that binary search runs in O(log n) time.

Constant and quadratic orders of growth

Are there algorithms that are O(n²)? Sure. Here’s one:

Let’s assume that the print statement takes some constant amount of time to execute, c seconds. How many times will that statement (the body of the innermost loop) be executed? It looks like n² times: for each of the n iterations of the outer loop, the inner loop iterates n times. So the runtime is roughly cn². Ignoring the constants, we say that the function has a running time of O(n²).

What if an algorithm takes constant time, regardless of the input?

We say in this case that the function is O(1), since 1 is what we get for the running time if we drop the constants.

Ranking algorithms by running time

It’s true that not all O(n²) algorithms take the same time. The constants might be different, and the best-case performance might differ for different algorithms. However, it is true that in the worst case, for large enough n, any O(n²)-time algorithm runs faster than any algorithm for which the runtime is any positive constant times n³. We therefore can think of all O(n²) algorithms as being roughly in the same family. We can rank running times:

$O(1) < O(\log_2 n) < O(\sqrt n) < O(n) < O(n \log_2 n) < O(n^2) < O(n^3) < O(2^n) < O(n!)$

An algorithm whose rate of growth depends on the factorial of the input n is going to run for a very long time indeed, for large values of n.

Sometimes, you will see a running time function that looks like this:

c₁n² + c₂n + c₃ .

In this case, we can drop all of the lower-order terms and say that the function is O(n²). Why? For large enough n, n² is much greater than n. Once n gets large enough, no matter what c₁, c₂, and c₃ are, c₁n² will be so much greater than c₂n + c₃ that c₂n + c₃ will be just a blip compared with c₁n².

The take home message: compute the running time for a function, using constants if you need to. Then drop the constants and just take the term in the expression that grows the fastest. The running time is big-O of that term.

What is this “n” anyway?

The value n represents some characteristic of the input that determines the running time. Maybe n is the length of a list that’s input to the function, or maybe n is the value of some parameter that is input to the function. We use n instead of saying specifically what the term is so that we can discuss the rate of growth of the running time of an algorithm in more general terms. However, we should always know specifically to what n refers before discussing the running time.

What you should know

From now on, given an algorithm or piece of code, you should be able to

Identify what quantity n the running time depends on (for example, n might be the length of a list input to the function)
Compute the running time in terms of constants and n.
Describe the running time using big-O notation.

We’ll see how to analyze the running times of selection sort, insertion sort, and merge sort in this style.

Exercise: running times

Objective: compute the running time for a section of code using constants, and asymptotically bound the running time using big-O notation.

Compute the total running time for each of the functions in the following code. First, compute the running time in terms of some constants a, b, c, d, e, etc. Show your work and give the answer in the text box. Then give the tightest big-O bound you can for each. (All of the functions are O(n!), but you can give a more informative bound for each.)

Type your answer here:

Here is a solution.

Type your answer here:

Here is a solution.

Type your answer here:

Here is a solution.

Type your answer here:

Here is a solution.

Type your answer here:

Here is a solution.

Type your answer here:

Here is a solution.