BST Operation Analysis, CPSC 331, Winter 2012

home page -  news -  syllabus -  schedule -  assignments -  tutorials -  tests -  java -  references -  Mike Jacobson


 Analysis of Operations on Binary Search Trees

Overview

During the lectures on (ordinary) binary search trees, recursive algorithms for searches, insertions, and deletions of binary search trees were described. In the case of search, sketches of proofs of their corrrectness and of analyses of their worst-case running times were provided.

This lecture supplement provides more complete proofs of correctness and running time analyses, as well proofs for a few other algorithms. These can serve as examples of proofs and analyses of simple recursive algorithms that access or modify data structures.

The Binary Search Tree Property

Recall that every binary search tree T satisfies the following Binary Search Tree Property:

If T is nonempty, then

Proofs of correctness of many algorithms that access or modify binary search trees, including the ones considered below, make heavy use of this property.

Searching in a Binary Search Tree

Problem To Be Solved

The first problem to be considered is that of searching for a given value x in a given binary search tree T. If E stands for the key type and V stands for the value type that can be stored, then the signature for the desired method is as follows

V search (BST<E,V> T, E x)

and the problem can be specified using the following precondition-postcondition pairs.

Precondition 1:

  1. T is a binary search tree storing values of some type V with keys of type E
  2. x is an element of type E which is the key of an element stored in T

Postcondition 1:

  1. Value returned is (a reference to) the value in T with key x
  2. T and x have not changed

Precondition 2:

  1. T is a binary search tree storing values of some type V with keys of type E
  2. x is a key of type E not stored in T

Postcondition 2:

  1. A notFoundException is thrown
  2. T and x have not changed

Algorithm

A recursive algorithm for the above problem is as follows.

V search (BST<E,V> T, E x) if (T == null) throw notFoundException } else if (x < T.key) { return(search(T.left, x)) } else if (x > T.key) { return (search(T.right, x)) } else return(T.value) } }

Proof of Correctness

The correctness of this algorithm can be proved using mathematical induction, specifically, using induction on the depth of the given binary search tree T. That said, this proof will differ from a completely “standard” (or, most basic) proof by mathematical induction in two respects:

A proof of the correctness of this algorithm is as follows.

Claim: Let T be a binary search tree storing values of type V with keys of type E and let x be any value of type E. If the above algorithm is executed with inputs T and x, then the algorithm eventually terminates, and the following properties are satisfied on termination.

In other words, the algorihm is a correct solution for the “search” problem that is specified above.

Proof: This will be proved using the strong form of mathematical induction. Since an empty tree can have height −1, while the height of any nonemmpty tree is greater than or equal to zero, −1 will be the smallest value that is considered.

Suppose, first, that the height of T is −1. Then T is an empty tree, so that x is not stored in T. It is, therefore, necessary and sufficient to show that the algorithm halts with a notFoundException being thrown and with T and x not being changed. This is clear by inspection of the code: Since T is an empty tree the initial test, “T == null,” is passed, and a notFoundException is, indeed, thrown immediately after that.

Now let h be an integer that is greater than or equal to −1, and suppose that the algorithm works correctly whenever its input consists of a binary search tree whose height is less than or equal to h as well as an element of type E.

Let T be some binary search tree with height h+1 (storing values of type V with keys of type E) and let x be some element of type E. In order to complete the proof it is necessary and sufficient to show that the algorithm works correctly on inputs T and x (without assuming anything more about these).

Since T has height h+1 ≥ 0, T is not empty and there is some element with key of E stored at the root of T (and for which the key is accessible as T.key). Either x is less than T.key, equal to T.key, or greater than T.key. These three cases are considerably separately, below.

Thus the algorithm works correctly on inputs T and x in all cases, as is required to complete the proof.

Analysis of Worst-Case Running Time

Let Steps(T) be the number of steps used by the above algorithm, when it is given the binary search tree T and any key x ∈ E, in the worst case.

As mentioned in the lecture notes on this topic, one can see by inspection of the code that there exist positive constants c1, c2, and c3, such that

Exercise: Using the above, prove (by mathematical induction, using the strong form of induction on the height of T, as above), that

Steps(T) ≤ c3 × height(T) + max(c1, c2)

for every binary search tree T.

It follows that the number of steps needed to search in a binary search tree T by this approach is in O(height(T)) in the worst case.

We would like to have a lower bound on the worst-case running time of this algorithm, as a function of T, as well.

The following claim can also be proved using mathematical induction   this time, by induction on the nonnegative integer i that is mentioned in it.

Claim: Let T be a binary search tree and let x be a key stored at a node whose depth in T is a nonnegative integer i. Then the above algorithm uses at least i+1 steps when it is executed on inputs T and x.

Exercise: Prove the above claim (by induction on i).

Hint: Notice that if i ≥ 1 then x is not equal to the value stored at the root of T, and either x < T.key and x is stored at a node with depth i−1 in T.left, or x > T.key and x is stored at a node with depth i−1 in T.right.

Since any binary search tree T with height h ≥ 0 has a node at depth h, with some key x, it follows by the above the claim that the number of steps used (by the above algorithm) to search in T is at least height(T)+1, in the worst case.

Thus Steps(T) ∈ Ω(height(T)).

It now follows that Steps(T) ∈ Θ(depth(T)), as well.

Listing the Elements in a Binary Search Tree in Order

Problem To Be Solved

The second problem to be considered is that of listing the elements that are stored in a binary search tree T in increasing order. We will consider a version of the problem in which the output in presented as an ArrayList.

A Related Problem

It will be useful to start by considering a slightly different problem, namely, one in which our goal is to add the keys stored in a tree onto the end of an ArrayList that already exists. A method solving this problem would have the signature

void appendTree (BST<E,V> T, ArrayList<E> A)

and the problem can be specified using the following precondition-postcondition pair.

Precondition:

  1. T is a binary search tree storing keys of some type E
  2. A is an ArrayList storing values of the same type E

Postcondition:

  1. The set of keys stored in T have been added to the end of A in increasing order; A is otherwise unchanged.
  2. T has not been changed.

Algorithm

A recursive algorithm that solves the above problem is given below.

void appendTree (BST<E,V> T, ArrayList<E> A) { if (T != null) { appendTree(T.left, A); A.add(T.key); appendTree(T.right, A); } }

Proof of Correctness

Once again, correctness of the algorithm can be established using mathematical induction. This time we will used the strong form of mathematical induction on the size of T.

Claim: Let T be a binary search tree strong keys of some type E, and let A be an ArrayList storing values of the same type. If the above algorithm is executed with inputs T and A, then the algorithm eventually terminates. On termination, the values stored in T have been added to the end of A in increasing order, and A has otherwise been unchanged; T has not been changed.

Proof: This result will be proved using the strong form of mathematical induction on the size of T.

Suppose first that the size of T is 0. Then T is an empty tree, so that the first test, “T != null,” made by this algorithm, fails, and the algorithm ends immediately after that — with both T and A unchanged, as required.

Now let n be a nonnegative integer and suppose the algorithm works correctly when it is given any binary search tree with size between 0 and n, and any ArrayList storing values of the same type, as inputs.

Let T be a binary search tree with size n+1. In order to complete the proof it is necessary and sufficient to show that the algorithm also works correctly when given the inputs T and an ArrayList A (without assuming anything more about these inputs).

Since T has size n+1 > 0, T is not an empty tree, and the subtrees T.left and T.right are each smaller than T, so that each of these subtrees has size between 0 and n. It therefore follows, by the inductive hypothesis, and the Binary Search Tree Property, that

It should now be clear, by inspection of the code, that appendTree(T, A) appends the keys stored in T onto the end of A in increasing order, without changing T or making other changes to A as required.

Analysis of Worst-Case Running Time

Once again, we will try to find upper and lower bounds on the worst-case cost to execute this algorithm on a binary search tree and ArrayList.

In particular, we will try to find bounds for these running times, assuming that A is empty before the first time this algorithm is called.

It will be useful, for this algorithm, to measure the running time as a function of the size of the given binary search tree.

Let Steps(T, A) be the number of steps used by the algorithm on inputs T and A. It will be useful to consider the following functions as well.

Notice that, by the above definitions,

Steps(T, A) = StepsDS(T, A) + StepsR(T, A).

We will find upper bounds for both StepsDS(T, A) and StepsR(T, A), and then use these to find an upper bound for Steps(T, A).

We will first consider StepsDS(T, A). Notice that the only instruction in the code that accesses or modifies this structure is the use of “A.add” to add another element onto the end. This increases the size of the ArrayList by exactly one so, since the size has been increased by exactly n (if this is the size of T), we may conclude that the execution of the algorithm includes a sequence of n = size(T) add operations on the ArrayList A.

Now, assuming that A is empty before the algorithm is applied, it follows by our previous amortized analysis of operations on ArrayList’s that the total number of steps used by this sequence of operations is in O(n), so that

StepsDS(T, A) ∈ O(size(T)).

Now consider StepsR(T, A). One can see by inspection of the code that there exist positive constants c0 and c1 such that

Exercise: Use the above, information, along with the fact that

size(T.left) + size(T.right) = size(T) − 1

whenever T is not empty, to prove that

StepsR(T, A) ≤ (c0 + c1) × size(T) + c)

for every binary search tree T and ArrayList A.

Thus StepsR(T, A) ∈ O(size(T). Therefore — assuming, again, that A is empty when the algorithm is called — the number of steps used by the appendTree method on inputs T and A is in O(size(T)) as well.

It is easy to see, for thia algorithm, that the best-case running time is in Ω(size(T)), in particular, that

Steps(T, A)StepsDS(T, A) ≥ size(T).

The first inequality is obvious. The second follows from the fact that StepsDS(T, A) is the cost of exactly size(T) applications of the method “A.add,” and (since each of these changes the ArrayList) each of these applications requires at least one step.

Therefore the worst-case running time and the best-case running time of this method are both in Θ(size(T)) when T is the input tree.

Solution for the Original Problem

It should now be clear that the inorder traversal described above produces an ArrayList listing the values stored in a given binary search T, sorted in increasing order, using time that is linear in the size of T in both the best and the worst case.

Exercises

  1. The proofs that were included in the analysis of the first problem used mathematical induction on the height of a binary search tree, while the proofs included in the analysis of the second problem used induction on the size of a tree, instead.

    Verify that this was not really necessary: In particular, verify that the first set of proofs could have used induction on the size, and that the second set of proofs could have used induction on the height, as well.

  2. Confirm that the assumption “A is initially empty” really is necessary in order to claim that the number of steps used by appendTree(T, A) is linear in the size of T.

    Try to find the best bound that you can for the number of steps used by this method without assuming anything about A.


Last updated:
http://www.cpsc.ucalgary.ca/~jacobs/Courses/cpsc331/W12/handouts/lecture14-example.html