Additional Kinds of Search Trees, CPSC 331, Fall 2010

Additional Kinds of Search Trees

This course has introduceed two kinds of search trees, namely, “ordinary” (or “regular”) binary search trees, and red-black trees. Several other kinds of search trees have been studies and have interesting properties; a few notable ones are introduced below.

AVL Trees

AVL trees are named after their inventors, G. M. Adel’son-Vel’skiî and E. M. Landis. Like regular binary search trees, these are binary trees such that every node in the tree stores data and such that the binary search tree property is satisfied. Like red-black trees, a small amount of additional information is stored at each node, and the depth of an AVL tree is never more than logarithmic in the size — this is another kind of “height-balanced” tree.

AVL trees also satisfy an additional property, just as red-black trees do. In particular, the difference between the heights of the left and right subtrees of an AVL tree is never more than one, and the left and right subtrees must be AVL trees (so that they, and all of their subtrees, satisfy this property too). Just as with red-black trees, the algorithm to search in an AVL tree is essentially the same as the algorithm to search in a regular binary search tree. Algorithms for insertion and deletion begin with the algorithms for insertion and deletion, respectively, in regular trees; just as with red-black trees, the regular operation is followed by an update (including one more rotations) in order to make sure that the tree’s special property is maintained.

AVL Trees are introduced in Section 10.2 of the textbook; the Wikipedia article on AVL trees also provides a readable description of these trees and their algorithms.

2-3 Trees, and 2-3-4 Trees

2-3 trees and 2-3-4 trees are examples of search trees that are not binary.

Each node in a 2-3 tree stores either one or two values. If an internal node stores one value then it has two subtrees, a left and right subtree, and the usual binary search property is satisfied: The left subtree stores values that are less than the value at the root node while the right stores values that are greater. On the other hand, if an internal node stores two values then it has three subtrees, instead: The left subtree stores all values (in the set being represented) that are less than both of the values at the root, the right subtree stores all the values that are greater than both of these values, and the middle subtree stores the values that are greater than the smaller of the values at the root and that are also less than the larger of these two values.

Every leaf in a 2-3 tree is at exactly the same level in the tree. Once again, one can use this property to prove that the depth of the tree is never more than linear in the logarithm of the size. While algorithms are (once again) rather complicated, it is possible to search, insert and delete velues from these trees using time that is linear in the depth in the worst case.

2-3-4 trees are similer, but they can include nodes that store three values and have four subtrees as well. The logic of some operations is a little bit simpler as a result. As mentioned in the textbook, there is an interesting connection between 2-3-4 trees and red-black trees.

2-3-4 trees are discussed in Section 10.4 of the textbook, respectively. Additional information can also be found in the Wikipedia articles on 2-3 trees and 2-3-4 trees, as well as the references listed on these pages.

B-Trees

A B-tree is yet another search tree that is not binary; the number of values that can be stored at a node (and the number of subtrees of this node) can be much, much higher than is the case for any of the trees mentioned above. Indeed, this data structure is suitable for systems that read and write large blocks of data; as mentioned in the Wikipedia article on B-trees, these are commonly used in databases and file systems.

B-trees are discussed in Section 11.5 of the textbook.

Self-Adjusting Binary Search Trees

Slay trees (also called “self-adjusting binary search trees”) were invented and analyzed by Daniel Sleator and Robert Tarjan. On the one hand, they are not height-balanced trees and operations may use time that is linear in the size of the tree in the worst case. On the other hand, these are binary trees that do not require additional storage — so that they are (structurally) just regular binary search trees with somewhat different operations. Indeed, rotations are used after insertions, deletions, and searches to move recently accessed values closer to the root of the tree.

As mentioned above the worst-case cost of an operation is linear in the size of the tree. However, operations are simpler than insertion or deletion operations for the height-balanced trees mentioned above, and the amortized cost of any sequence of n operations that begins with an empty tree is in O(log n) (and, hence, logarithmic in the maximal size of the tree).

Furthermore, if the probabilities that various elements are accessed (perhaps, by a long sequence of searches after the tree has been created) is fairly static then the tree will gradually adjust itself so that the most frequently accessed elements are closer to the root — making future accesses of these elements less expensive..

As a result, if the probability that various elements are accessed is highly nonuniform (with some elements being accessed considerably more often than others) than these trees might be more efficient than any of the height-balanced trees that have been mentioned before this.

The Wikipedia article on splay trees includes more information about these search trees along with a description of their operations.

Treaps

Up until this point in the course we have considered deterministic algorithms, which are guaranteed to execute exactly the same sequence of operations whenever if they are run on the same input. However, randomized algorithms — which also make use of (pseudo)random number generators — so that their behaviours might differ when repeatedly run with the same inputs — are also commonly used.

A treap is a kind of binary search tree whose operations are implemented using randomized algorithms — so that the shape and depth of a tree, constructed by applying a fixed sequence of operations to an empty tree, are not actually fixed — instead, they are random variables (corresponding to a sample space defined by the random choices made when algorithms are executed). The “worst case expected cost” of an operation on a treap with size n is logarithmic in n, just like the worst-case cost of an operation on a red-black tree with the same size.

Treaps will be further described later on in the course.

Last updated:
http://www.cpsc.ucalgary.ca/~jacobs/cpsc331/F10/handouts/lecture18-supplement.html