Tutorial 11, CPSC 331, Fall 2010

Average-Case Analysis of Binary Search Trees

About This Exercise

The objectives of this exercise are to help students check that they understand the following:

concepts from basic probability theory that are will be useful in this course;
the application of these concepts to perform an average case analysis of a (deterministic) algorithm or data structure;
what such an analysis implies — including how and why the results of such an analysis are always of limited (or conditional) use
the results about the “expected” depth of a binary search tree of size n, and the expected performance of algorithms on binary search trees, that have been presented in class.

Students should read through and try to solve the problems on this exercise before attending the tutorial on Wednesday, October 27.

Expected Background

It is expected that students have attended the lecture in which the average-case analysis of binary search trees was discussed and carefully reviewed the online lecture notes that are available.

The reference Introduction to Algorithms, which you can also read online as an ebook, is a very good reference for this material: Material about basic probability (that is applied here) can be found in Appendix A.

Warmup Exercises — To Check That You Understand the Basics

It is to be expected that all of the questions in this first section can be answered by any student who has successfully completed the prerequisites for this course, attended the lecture in which the average-case analysis of binary search trees was discussed, and read through the material about this that is listed above.

Teaching assistants will not be spending very much time in tutorials discussing these questions! Students should look at them anyway to make sure that they know how to answer them.

Students who do have difficulty with these questions, after completing the suggested reading, should contact the course instructor to get extra help.

State a definition for each of the following terms.
1. a sample space
2. a probability distribution (sometimes also called a probability function)
3. a random variable
4. the expected value of a random variable
Say, as precisely as you can, what assumptions were made during the lecture on the “average case analysis of binary search trees” about the construction of binary search trees of size n.

Then make this more formal (if you have not already done so) by specifying both the sample space and the probability distribution that was considered during the analysis that was described during the lecture.

Finally, say (again, as precisely as you can) what you can conclude from this about binary search trees of size n, if the assumptions described in class are not valid.
Consider randomly constructed binary search trees with size four.
1. List the shapes of all binary search trees of size four, and the probability that each of these trees is generated, using the probability distributon that has been described. You should find that there are trees with 14 different shapes.
2. Compute the average depth of a binary search tree of size four.
3. Now consider a diffeent probabiltiy distribution in which each shape of tree appears with probability 1/14. What is the expected value of the depth when this probability distribution is used?
4. Finally, compute the expected value of the exponential-depth when both probability distributions are used as well. How are they related?
Here are two different ways to construct a binary search tree of size n that stores the integers 1, 2, ,…, n:
1. Choose one of the orderings of these integers (choosing each one of the n! orderings with the same probability) and insert these integers into an empty tree, using the ordering that has been selected as the ordering for insertions.
2. First, choose the root, choosing each integer i between 1 and n with the same probability. Choose a random subrtree with size i−1 (and storing the first i−1 positive integers) as described in (i) and use this as the left subtree as the root. Choose a random subtree of size n−i (and storing the first n−i positive integers) as described above, add i to the value at each node, and use this as the right subtree of the root.
It should be reasonably clear that you end up constructing trees from the same set (of binary search trees) in each case.
1. Prove something more, namely, that each tree is constructed with the same probability when either one of the above processes is used, as well.
2. This fact was used in the “average-case analysis of binary search trees — although this was stated extremely quickly in class. After a review of the online lecture notes, explain how this fact was used.

To Be Discussed in the Tutorial

Consider the following method, which returns the index of the first zero found in the array that has been supplied as input, throwing an exception of no zeroes are found. Suppose the entries of the input array A are either 0 or 1.

int firstZero (int[] A) { i = 0; while (i < A.length) { if (A[i] == 0) { return i; }; i++; }; throw noZeroException; }
1. Show that there exist positive constants c₁, c₂, c₃ and c₄ such that
  - the method uses c₁ steps if the first 0 is found at location 0,
  - for 1 ≤ i < A.length , the method uses c₂ i + c₃ steps if the first 0 is found at location i, and
  - the method uses c₁ n + c₄ steps, for n = A.length, if there is no 0 in the array at all.
2. Suppose now that we wish to compute the expected number of steps used by this algorithm when its input is an array A with length n.
  
  List the elements of a sample space that can be used for this analysis.
3. Suppose that each one of the 2ⁿ possible input arrays is chosen with the same probability.
  
  Prove that, for 0 ≤ i < n, the first 0 appears at location i of the input array with probabilty 2⁻⁽ⁱ⁺¹⁾.
  
  Prove, as well, that there are no 0’s in the input array at all with probability 2⁻ⁿ.
4. Using the above information, show that the expected number of steps used by this algorithm is in O(1), even though the number of steps used by this algorithm in the worst case is in Ω(n).
Finally, consider the average number of steps used for a successful search in a randomly constructed binary of size n. In order to analyze this it is necessary to to consider both the way that the tree is constructed and the node whose contents are searched for.
1. Describe the elements of the sample space you should use for this analysis. It should be n times as large as the sample space considered when analyzing the depth of a random binary search tree with size n.
2. Suppose that we search for each element of the set stored in the tree with the same probability, and that trees are generated using the probability distribution that has been described in the lecture notes.
  
  Write down the probability distribution (for this new problem) correponding to this information as precisely as you can.
3. Consider the random variable comparisons whose value is the number of comparisons made between the value searched for and the values at nodes in the tree.
  
  Explain why the expected value of comparisons is less than or equal to one plus the expected value of the depth of a binary search tree with size n (that is defined using the sample space and probability distribution described in the course notes).
  
  It follows that the expected value of comparisons is in O(log n).
4. Finally, here is a more challenging problem: First, show that the number of nodes in any search tree with depth less than or equal to (log₂ n)/2 is at at most 2√n.
  
  Then use this information to show that the expected value of comparisons is in Ω(log n) as well.

Last updated:
http://www.cpsc.ucalgary.ca/~jacobs/cpsc331/F10/tutorials/tutorial11.html