BST Operation Analysis, CPSC 331, Winter 2012

Analysis of Operations on Binary Search Trees

Overview

During the lectures on (ordinary) binary search trees, recursive algorithms for searches, insertions, and deletions of binary search trees were described. In the case of search, sketches of proofs of their corrrectness and of analyses of their worst-case running times were provided.

This lecture supplement provides more complete proofs of correctness and running time analyses, as well proofs for a few other algorithms. These can serve as examples of proofs and analyses of simple recursive algorithms that access or modify data structures.

The Binary Search Tree Property

Recall that every binary search tree T satisfies the following Binary Search Tree Property:

If T is nonempty, then

The left subtree T_L is a binary search tree including all elements of the represented set that are less than the element at the root, and

The right subtree T_R is a binary search tree including all elements of the represented set that are greater than the element at the root.

Proofs of correctness of many algorithms that access or modify binary search trees, including the ones considered below, make heavy use of this property.

Searching in a Binary Search Tree

Problem To Be Solved

The first problem to be considered is that of searching for a given value x in a given binary search tree T. If E stands for the key type and V stands for the value type that can be stored, then the signature for the desired method is as follows

V search (BST<E,V> T, E x)

and the problem can be specified using the following precondition-postcondition pairs.

Precondition 1:

T is a binary search tree storing values of some type V with keys of type E

x is an element of type E which is the key of an element stored in T

Postcondition 1:

Value returned is (a reference to) the value in T with key x

T and x have not changed

Precondition 2:

T is a binary search tree storing values of some type V with keys of type E

x is a key of type E not stored in T

Postcondition 2:

A notFoundException is thrown

T and x have not changed

Algorithm

A recursive algorithm for the above problem is as follows.

V search (BST<E,V> T, E x) if (T == null) throw notFoundException } else if (x < T.key) { return(search(T.left, x)) } else if (x > T.key) { return (search(T.right, x)) } else return(T.value) } }

Proof of Correctness

The correctness of this algorithm can be proved using mathematical induction, specifically, using induction on the depth of the given binary search tree T. That said, this proof will differ from a completely “standard” (or, most basic) proof by mathematical induction in two respects:

Since the height of an empty tree is actually −1, we will consider −1 (rather than 0) to be the smallest value for which a result must be proved.
We will need to use the strong form of mathematical induction (also sometimes called “complete induction”) in this case, because the height of the left (or right) subtree of a given binary search tree T could have any value from −1 up to one less than the height of T.

A proof of the correctness of this algorithm is as follows.

Claim: Let T be a binary search tree storing values of type V with keys of type E and let x be any value of type E. If the above algorithm is executed with inputs T and x, then the algorithm eventually terminates, and the following properties are satisfied on termination.

If x is a key in in T, then a reference to the value in T with key x is returned.

If x is not stored in T, then a notFoundException is thrown.

Neither T nor x have been changed.

In other words, the algorihm is a correct solution for the “search” problem that is specified above.

Proof: This will be proved using the strong form of mathematical induction. Since an empty tree can have height −1, while the height of any nonemmpty tree is greater than or equal to zero, −1 will be the smallest value that is considered.

Suppose, first, that the height of T is −1. Then T is an empty tree, so that x is not stored in T. It is, therefore, necessary and sufficient to show that the algorithm halts with a notFoundException being thrown and with T and x not being changed. This is clear by inspection of the code: Since T is an empty tree the initial test, “T == null,” is passed, and a notFoundException is, indeed, thrown immediately after that.

Now let h be an integer that is greater than or equal to −1, and suppose that the algorithm works correctly whenever its input consists of a binary search tree whose height is less than or equal to h as well as an element of type E.

Let T be some binary search tree with height h+1 (storing values of type V with keys of type E) and let x be some element of type E. In order to complete the proof it is necessary and sufficient to show that the algorithm works correctly on inputs T and x (without assuming anything more about these).

Since T has height h+1 ≥ 0, T is not empty and there is some element with key of E stored at the root of T (and for which the key is accessible as T.key). Either x is less than T.key, equal to T.key, or greater than T.key. These three cases are considerably separately, below.

If x is less than T.key, then the first test made by the algorithm, “T == null,” fails, but the second test, “x < T.key,” passes. At this point the algorithm is recursively applied to T.left and x (with the output of this computation being returned).

Now, either x is the key of a value stored in T or it is not. Each of these subcases is considered, below.

If x is the key of a value stored in T then it follows by the Binary Search Tree Property, given above, that x is one of the keys stored in T.left.

Since T has height h+1, the height of T.left is an integer between −1 and h. It now follows, by the inductive hypothesis, that the execution of the algorithm on inputs T.left and x terminates, and that a reference to the data in T.left with key x is returned by the algorithm as ouput, without either T.left or x being changed.

This output that has been produced is immediately returned as the output of this algorithm on inputs T and x, so (since the node storing key x in T.left is also a node storing x in T) it is clear that the algorithm terminates and that it returns the expected output when it does so, without having changed either T or x.

Thus the algorithm terminates, returning the required output, in this case.

On the other hand, if x is not stored in T then it cannot be stored in T.left either.

Since the height of T.left is an integer between −1 and h, it follows by the inductive hypothesis that the execution of the algorithm on T.left and x terminates with a notFoundException being thrown, and without having changed T.left or x.

Since a notFoundException is not caught be this algorithm, it is clear that the execution of the algorithm on the inputs T and x ends in the same way, as required.

Thus the algorithm terminates, with the required output (or exception) being returned, whenever x < T.key.

If x is equal to T.data, then all three of the initial tests “T == null,” “x < T.key,” and “x > T.key” fail. A reference to the root of T is then returned, without either T or x being changed, as required.

Finally, if x is greater than T.key, then the first two tests, “T == null” and “x < T.key,” each fail, and the third test, “x > T.key,” is passed. The algorithm is now recursively applied to T.right and x, with the output of this computation immediately returned after that.

Either x is stored in T or it is not. Each subcase is considered separately, below.

If x is a key in T then it follows by the Binary Search Tree Property, given above, that x is one of the keys stored in T.right.

Since T has height h+1, the height of T.right is an integer between −1 and h. It now follows, by the inductive hypthesis, that the execution of the algorithm on inputs T.right and x terminates, and that a reference to the value in T.right with key x is returned by the algorithm as output, without either T.right or x having been changed.

This output that has been produced is immediately returned as the output of this algorithm on inputs T and x, so (since the node with key x in T.right is also a node with key x in T) it is clear that the algorithm terminates and that it returns the expected output when it does so, without having changed either T or x.

Thus the algorithm terminates, returning the required output, in this case.

On the other hand, if x is not a key in T then it cannot be in T.right either.

Since the height of T.right is an integer between −1 and h, it follows by the inductive hypothesis that the execution of the algorithm on T.right and x terminates with a notFoundException being thrown, and without having changed T.right or x.

Since a notFoundException is not caught be this algorithm, it is clear that the execution of the algorithm on the inputs T and x ends in the same way, as required.

Thus the algorithm terminates, with the required output (or exception) being returned, whenever x > T.data, as well.

Thus the algorithm works correctly on inputs T and x in all cases, as is required to complete the proof.

Analysis of Worst-Case Running Time

Let Steps(T) be the number of steps used by the above algorithm, when it is given the binary search tree T and any key x ∈ E, in the worst case.

As mentioned in the lecture notes on this topic, one can see by inspection of the code that there exist positive constants c₁, c₂, and c₃, such that

Steps(T) ≤ c₁ whenever T is an empty tree (that is, whenever height(T) = −1);
Steps(T) ≤ c₂ whenever the height of T is 0; and
Steps(T) ≤ c₃ + max(Steps(T.left), Steps(T.right)), otherwise.

Exercise: Using the above, prove (by mathematical induction, using the strong form of induction on the height of T, as above), that

Steps(T) ≤ c₃ × height(T) + max(c₁, c₂)

for every binary search tree T.

It follows that the number of steps needed to search in a binary search tree T by this approach is in O(height(T)) in the worst case.

We would like to have a lower bound on the worst-case running time of this algorithm, as a function of T, as well.

The following claim can also be proved using mathematical induction this time, by induction on the nonnegative integer i that is mentioned in it.

Claim: Let T be a binary search tree and let x be a key stored at a node whose depth in T is a nonnegative integer i. Then the above algorithm uses at least i+1 steps when it is executed on inputs T and x.

Exercise: Prove the above claim (by induction on i).

Hint: Notice that if i ≥ 1 then x is not equal to the value stored at the root of T, and either x < T.key and x is stored at a node with depth i−1 in T.left, or x > T.key and x is stored at a node with depth i−1 in T.right.

Since any binary search tree T with height h ≥ 0 has a node at depth h, with some key x, it follows by the above the claim that the number of steps used (by the above algorithm) to search in T is at least height(T)+1, in the worst case.

Thus Steps(T) ∈ Ω(height(T)).

It now follows that Steps(T) ∈ Θ(depth(T)), as well.

Listing the Elements in a Binary Search Tree in Order

Problem To Be Solved

The second problem to be considered is that of listing the elements that are stored in a binary search tree T in increasing order. We will consider a version of the problem in which the output in presented as an ArrayList.

A Related Problem

It will be useful to start by considering a slightly different problem, namely, one in which our goal is to add the keys stored in a tree onto the end of an ArrayList that already exists. A method solving this problem would have the signature

void appendTree (BST<E,V> T, ArrayList<E> A)

and the problem can be specified using the following precondition-postcondition pair.

Precondition:

T is a binary search tree storing keys of some type E

A is an ArrayList storing values of the same type E

Postcondition:

The set of keys stored in T have been added to the end of A in increasing order; A is otherwise unchanged.

T has not been changed.

Algorithm

A recursive algorithm that solves the above problem is given below.

void appendTree (BST<E,V> T, ArrayList<E> A) { if (T != null) { appendTree(T.left, A); A.add(T.key); appendTree(T.right, A); } }

Proof of Correctness

Once again, correctness of the algorithm can be established using mathematical induction. This time we will used the strong form of mathematical induction on the size of T.

Claim: Let T be a binary search tree strong keys of some type E, and let A be an ArrayList storing values of the same type. If the above algorithm is executed with inputs T and A, then the algorithm eventually terminates. On termination, the values stored in T have been added to the end of A in increasing order, and A has otherwise been unchanged; T has not been changed.

Proof: This result will be proved using the strong form of mathematical induction on the size of T.

Suppose first that the size of T is 0. Then T is an empty tree, so that the first test, “T != null,” made by this algorithm, fails, and the algorithm ends immediately after that — with both T and A unchanged, as required.

Now let n be a nonnegative integer and suppose the algorithm works correctly when it is given any binary search tree with size between 0 and n, and any ArrayList storing values of the same type, as inputs.

Let T be a binary search tree with size n+1. In order to complete the proof it is necessary and sufficient to show that the algorithm also works correctly when given the inputs T and an ArrayList A (without assuming anything more about these inputs).

Since T has size n+1 > 0, T is not an empty tree, and the subtrees T.left and T.right are each smaller than T, so that each of these subtrees has size between 0 and n. It therefore follows, by the inductive hypothesis, and the Binary Search Tree Property, that

appendTree(T.left, A) adds the the keys stored in T that are less than T.key onto the end of the ArrayList A in increasing order by value, without changing T or making any other changes to A,

A.add(T.key) adds the value T.key onto the end of A without making any other changes, and

appendTree(T.right, A) adds the keys stored in T that are greater than T.key onto the end of the ArrayList A in increasing order by value, without changing T or making any other changes to A.

It should now be clear, by inspection of the code, that appendTree(T, A) appends the keys stored in T onto the end of A in increasing order, without changing T or making other changes to A as required.

Analysis of Worst-Case Running Time

Once again, we will try to find upper and lower bounds on the worst-case cost to execute this algorithm on a binary search tree and ArrayList.

In particular, we will try to find bounds for these running times, assuming that A is empty before the first time this algorithm is called.

It will be useful, for this algorithm, to measure the running time as a function of the size of the given binary search tree.

Let Steps(T, A) be the number of steps used by the algorithm on inputs T and A. It will be useful to consider the following functions as well.

Steps_DS(T, A) is the number of steps that are used to perform operations on the Arraylist A.
Steps_R(T, A) is the number of all other steps (that is, steps that do not access or modify the ArrayList) used by this algorithm on inputs T and A.

Notice that, by the above definitions,

Steps(T, A) = Steps_DS(T, A) + Steps_R(T, A).

We will find upper bounds for both Steps_DS(T, A) and Steps_R(T, A), and then use these to find an upper bound for Steps(T, A).

We will first consider Steps_DS(T, A). Notice that the only instruction in the code that accesses or modifies this structure is the use of “A.add” to add another element onto the end. This increases the size of the ArrayList by exactly one so, since the size has been increased by exactly n (if this is the size of T), we may conclude that the execution of the algorithm includes a sequence of n = size(T) add operations on the ArrayList A.

Now, assuming that A is empty before the algorithm is applied, it follows by our previous amortized analysis of operations on ArrayList’s that the total number of steps used by this sequence of operations is in O(n), so that

Steps_DS(T, A) ∈ O(size(T)).

Now consider Steps_R(T, A). One can see by inspection of the code that there exist positive constants c₀ and c₁ such that

Steps_R(T, A) ≤ c₀ if T is an empty tree (that is, if size(T) = 0), and
Steps_R(T, A) ≤ c₁ + Steps_R(T.left, A) + Steps_R(T.right, A’) (where A’ is obtained by adding some elements onto the end of A) if size(T) > 0.

Exercise: Use the above, information, along with the fact that

size(T.left) + size(T.right) = size(T) − 1

whenever T is not empty, to prove that

Steps_R(T, A) ≤ (c₀ + c₁) × size(T) + c₎

for every binary search tree T and ArrayList A.

Thus Steps_R(T, A) ∈ O(size(T). Therefore — assuming, again, that A is empty when the algorithm is called — the number of steps used by the appendTree method on inputs T and A is in O(size(T)) as well.

It is easy to see, for thia algorithm, that the best-case running time is in Ω(size(T)), in particular, that

Steps(T, A) ≥ Steps_DS(T, A) ≥ size(T).

The first inequality is obvious. The second follows from the fact that Steps_DS(T, A) is the cost of exactly size(T) applications of the method “A.add,” and (since each of these changes the ArrayList) each of these applications requires at least one step.

Therefore the worst-case running time and the best-case running time of this method are both in Θ(size(T)) when T is the input tree.

Solution for the Original Problem

It should now be clear that the inorder traversal described above produces an ArrayList listing the values stored in a given binary search T, sorted in increasing order, using time that is linear in the size of T in both the best and the worst case.

Exercises

The proofs that were included in the analysis of the first problem used mathematical induction on the height of a binary search tree, while the proofs included in the analysis of the second problem used induction on the size of a tree, instead.

Verify that this was not really necessary: In particular, verify that the first set of proofs could have used induction on the size, and that the second set of proofs could have used induction on the height, as well.
Confirm that the assumption “A is initially empty” really is necessary in order to claim that the number of steps used by appendTree(T, A) is linear in the size of T.

Try to find the best bound that you can for the number of steps used by this method without assuming anything about A.

Last updated:
http://www.cpsc.ucalgary.ca/~jacobs/Courses/cpsc331/W12/handouts/lecture14-example.html