Analysis Example, CPSC 331, Winter 2017

Analyzing the Running Time of a Simple Program

About This Example

As part of the self-study exercise on debugging software, you were encouraged to use a debugger to study the following sorting algorithm (included in the program MySort.java);

public static void sort ( int[] A ) { for (int i=1; i < A.length; i++) { /* * Loop Invariant: * a) The entries of A have been rearranged but are * otherwise unchanged * b) i is an integer such that 1 <= i <= A.length * c) The entries of A[0], A[1], ..., A[i−1] are * sorted in nondecreasing order — that is, * A[h] <= A[h+1] for every integer h such that * 0 <= h <= i-2 * Loop Variant: A.length − i */ int j = i−1; while ((j >= 0) && (A[j] > A[j+1])) { /* * This inner loop moves the entry that is initially * in position A[i] toward the front until the entries of * A[0], A[1], ..., A[i] are in sorted order * * Parts (d), (e), (f) and (g) of the following * loop invariant say, essentially, that this * part of the array is almost sorted: only * A[j+1] is smaller than the entry in front of it, * and moving this entry forward by one or more * positions will sort this part of the array. * * Loop Invariant: * a) The entries of A have been rearranged but are * otherwise unchanged * b) i is an integer such that 1 <= i <= A.length * c) j is an integer such that 0 <= j <= i−1 * d) A[h] <= A[h+1] for every integer h such * that 0 <= h <= j−1 * e) A[h] <= A[h+1] for every integer h such that * j+1 <= h <= i−1 * f) if j < i−1 then A[j] <= A[j+2] * g) A[j] > A[j+1] * Loop Variant: j+1 */ int temp = A[j]; A[j] = A[j+1]; A[j+1] = temp; j−−; }; }; }

In this example the methods for the analysis of the running times of algorithms introduced in lectures will be used to analyze the running time of this algorithm.

Worst-Case Running Time

An Upper Bound

Our first goal is to find an upper bound on the “worst-case running time” of this algorithm. To keep things simple we will assume that the execution of each statement shown in the code (and the test for each loop) has “unit cost,” that is, we will charge 1 for the execution of each of these statements. We are now looking for an upper bound on the number of statements that are executed. As explained in the lecture notes on this topic, we will try to give this as a function of the input size, and, for this program, we consider the “input size” to be the length of the array that is given as input.

There is a problem right away: this program does not seem to correspond to any of the cases described in the lecture notes! The reason is that this uses a control structure that was not included in the “tiny programming language” described in the lecture notes, namely, a for loop.

In order to continue we need to convert this into a program that uses the control structures included in our programming language: The for loop should be replaced by a single initialization statement (setting the initial value of i to be 1), followed by a while loop whose test is the same as that of the for loop (that is, “i < A.length”), and whose loop body consists of the body of the for loop, followed by a single statement in which i is incremented.

This version of the algorithm can be analyzed using the techniques presented in the lecture notes, and will be considered in the rest of this example.

This version of the program is a sequence of two subprograms. The first subprogram includes a single statement (in which i is intialized) so it is clear that the running time of this is one, regardless of the input. The worst-case running time of the entire program is therefore one more than the worst-case running time of the second subprogram, a while loop.

It is therefore sufficient (in order to complete the analysis) to analyse the worst-case running time of this while loop. Since this does exactly the same thing as the for loop that was used to produce it, the loop invariant and loop variant that were given for the for loop can be used for this while loop too.

In particular, as suggested by the inline documentation, “A.length−i” is a loop variant for this loop: The loop test, “i < A.length,” clearly fails whenever the value of this expression is nonnegative and, since A.length is never changed while the value of i is incremented during each value of the execution of the loop body, the value of this expression is decreased by one every time the loop body is executed.

Since the initial value of this expression is A.length−1, we can conclude that the body of the outer loop is executed at most A.length−1 times. Furthermore it is clear, by inspection of the code, that it is executed once for each value of i between 1 and A.length−1, inclusive.

To continue the analysis we should next bound the number of steps executed by the body of this loop, for any such value of i. This subprogram is a sequence of three subprograms, such that the first and last subprograms are each a single statement and the middle is another (“inner”) while loop. The worst case running time of this loop body will clearly be two more than the worst case running time of the inneer while loop.

As suggested by the inline documentation in the code, “j+1” is a loop variant for this loop (please verify this, if it is not obvious). Since the initial value of this expression is i, we know that the body of the inner loop is executed at most i times.

Since the inner loop body is just a sequence of four statements it is obvious that 4 statements are executed every time this inner loop body is. Using the formula given in the lecture notes for this topic, we may now connclude that

4i + (i+1) = 5i + 1

is an upper bound on the number of statements that are executed by this inner loop (for a given value of i) in the worst case.

Since the body of the inner loop was a sequence including exactly two more statements (as noted above), we can conclude that 5i+3 is an upper bound on the cost of an execution of the body of the inner loop.

Let n be the length of the input array (that is, n = A.length). Then, as noted above, the body of the outer loop is executed at most n−1 times, so that there are at most n executions of the outer loop’s test. Using the above analysis we can conclude that the total number of steps used by all executions of the body of the outer loop is at most the sum, as the value of i ranges from 1 to n−1 (inclusive), of 5i+3. Using a well-known formula for the value of the sum of the first n−1 integers (you should have seen this in MATH 271) we can see that the value of the above sum is

5n²/2 + n/2 − 3.

Adding in the cost of at most n executions of the outer loop test, we see that the total cost of the outer loop is at most

5n²/2 + 3n/2 − 3

in the worst case.

Finally, adding in the cost of the first step that intializes the value of i, we may conclude that the worst case running time of this algorithm is less than or equal to

5n²/2 + 3n/2 − 2

where n is the input size (that is, the length of the input array), assuming that the array is nonempty (so that n ≥ 1).

A Lower Bound

As mentioned in class the above does not imply that the worst cse running time is also at least the above value. In some cases there is no input that causes all loops to be executed the maximum number of times and, occasionally, the bound on running time that can be produced using the techniques that we’ve applied is not very good at all!

In order to produce a lower bound on worst case running time we should describe a family of inputs for our problems

I₁, I₂, I₃, …,

where I_n is an input of size n for every positive integer (or, for some problems, every possible input size) n. The time used by our algorithm on the input, I_n that we have described, can be used as a lower bound on the worst case running time of the algorithm for input size n.

For this particular algorithm we can get a very good result by choosing I_n to be an array with distinct entries that is sorted in decreasing order instead of increasing order. In particular, we can define I_n by setting A[i] to be n−i for each integer i between 0 and n−1. It will turn out (and, you should be able to check that) the body of each loop does get executed its “maximal” number of times, so that the number of steps used by this algorithm on the input I_n is

5n²/2 + 3n/2 − 2,

the same as the upper bound on the worst case running time that we obtained.

This does not happen very often! When it does we can conclude that our “upper bound” on the worst case running time is “tight,” so that, in fact, the worst case running time of our algorithm is equal to

5n²/2 + 3n/2 − 2,

for input size n.

Best-Case Running Time

A Lower Bound

The process of finding a lower bound on “best case running time” has not been described in any detail in class, but it resembles the process of finding an upper bound on “worst case running time,” which we did describe.

A significant difference (probably the most important one) is that, when we consider loops, we need to find a lower bound on the minimum number of times that the loop body can be executed, instead of finding an upper bound on the maximum number of executions. Everything else is very similar.

With that noted, consider our program. The outer loop is (produced from) a for loop, so that it is always executed exactly n−1 times, for values of i ranging from 1 to n−1, inclusive. Consequently the minimal number of executions of the body of this loop is the same as the maximal number of executions, and a part of our analysis (from above) can be used here too.

On the other hand, the situation is very different for the inner loop: It is possible that the loop test might fail right away, so the body of the inner loop might not ever get executed at all! Seeing that, we must use 0 as our lower bound for the number of executions of the body of the inner loop, producing a lower bound of 1 as the best case running time of the inner loop (counting only the first, and only, execution of the loop test).

Since the outer loop body include the inner loop, as well as two more statements, we obtain a lower bound of 3 for the best case running time of the body of the outer loop.

Using this (and, remembering again that the outer loop is executed exactly n−1 times), the lower bound we get for the best case running time of the outer loop is

3(n−1) + n = 4n−3.

Since the entire program includes only one more statement we obtain a lower bound of 4n−2 for the best case running time of this program, for input size n.

An Upper Bound

The process of finding an upper bound on “best case running time” is almost exactly the same as the process of finding a lower bound on “worst case running time:” we should describe a family of inputs for our problems

I₁, I₂, I₃, …,

where I_n is an input of size n for every positive integer (or, for some problems, every possible input size) n. The time used by our algorithm on the input, I_n that we have described, can be used as an upper bound on the best case running time of the algorithm for input size n.

For this particular algorithm we can get a very good result by choosing I_n to be an array with distinct entries that is sorted in increasing order. In particular, we can define I_n by setting A[i] to be i for each integer i between 0 and n−1. It will turn out (and, you should be able to check that) the body of each loop does get executed its “minimal” number of times, so that the number of steps used by this algorithm on the input I_n is

4n − 3,

the same as the upper bound on the worst case running time that we obtained.

This does not happen very often! When it does we can conclude that our “lower bound” on the best case running time is “tight,” so that, in fact, the best case running time of our algorithm is equal to

4n − 3,

for input size n.

Average Case (or “Expected”) Running Time

Since the best case running time and worst case running time of this algorithm are so far apart we cannnot use them to say very much about the average case running time: All that we know is that it is greater than or equal to the best case running time, and less than or equal to the worst case running time.

Furthermore the “average case” (or “expected”) running time is not even well defined without more informaton: We need to know something about the likehihood that each input of size n might appear!

This is, pretty much, all that can be said at this point. We will return to this subject later on in the course.

Concluding Remarks

At least two things should be noted at this point:

The mathematics is getting pretty messy already — and the program we studied is very short and simple! Unless we can do something more, in order to simplify the analysis we are carrying out, the mathematics might very quickly become unmanageable.
This all depended on an assumption that is likely to be unrealistic in practice, namely, that all statements had the same running time. It would be useful to be above to say something that is interesting, and relevant, about the running time of an algorithm, without needing an assumption like this.

Both of the above points illustrate the need for (and motivate the use of) the “asymptotic notation” that will be discussed next in class.

Last updated:
http://www.cpsc.ucalgary.ca/~jacobs/Courses/cpsc331/W17/handouts/RunTimeExample.html