Tutorial 16, CPSC 331, Winter 2017

Hash Tables

About This Exercise

The objectives of this exercise are

to ensure that you understand the definition and basic properties of (various kinds of) hash tables
to ensure that you understand the assumptions that one must make when analyzing hash tables, as well as the way these assumptions are used
to ensure that you understand how fundamental ideas from probability theory are used to analyze hash tables and their algorithms
to ensure that you are aware of various issues that arise when hash tables are implemented
to ensure that you are aware of (some of) the support for hash tables provided by the Java software environment.

Students should read through and try to solve the problems on this exercise before attending the tutorial.

Expected Background

This exercise is based on material presented during the lectures on hash tables. It is expected that students have attended these lectures or carefully reviewed the notes for these classes that are now available online.

The exercise also makes use of material in the lecture supplement on hash functions and additional hashing strategies Students should review this supplemental material before spending time on this exercise as well.

Warmup Exercises — To Check That You Understand the Basics

It is to be expected that all of the questions in this first section can be answered by any student who has successfully completed the prerequisites for this course, attended the lectures in which hash tables ere discussed (reviewing the lecture notes, as needed, too), and completed the assigned reading mentioned above.

Students who do have difficulty with these questions, after completing the above, should contact the course instructor to get extra help.

Give a brief description of each of the following.
1. a hash table with chaining
2. a hash table with open addressing
3. a collision between two keys
4. the load factor in a hash table with chaining
5. the simple uniform hashing assumption
6. the uniform hashing assumption
7. a probe sequence
8. linear probing
9. quadratic probing
10. double hashing
11. primary clustering
12. secondary clustering
13. the division method
14. the multiplication method
15. universal hashing

To Be Discussed in the Tutorial

Suppose that you inserted the following values into a hash table of size m=59 (using “the division method”):

473, 418, 356, 312, 119, 237, 605, 307

You should assume that the “universe” U is the set of integers between 0 and 999 and that the hash table is empty before 473 is inserted into it.

Show the hash tables that would result from this sequence of insertions when each of the following kinds of hash tables are used.
1. a hash table with chaining
2. a hash table with open addressing; linear probing is used for collision resolution;
3. a hash table with open addressing; quadratic probing is used for collision resolution;
4. a hash table with open addressing; double hashing is used for collision resolution.
When using double hashing, please use

h₀(x) = x mod 59

as suggested above, and use

h₁(x) = (x mod 58) + 1.
Consider the following hash table (with size m=8).
1. Confirm that this table could be produced from an empty table by inserting the keys
  
  25, 2, 12, 14, 22
  
  using the hash function
  
  h(k, i) = k+i mod 8.
2. Draw the tables produced and mention the results you would obtain if the following sequence of operations was performed, starting with the table given above and using the above hash function.
  1. delete 14;
  2. search for 22;
  3. insert 20;
  4. insert 28.
Describe how the algorithm to insert a key into a hash table with open addressing could be modified in order to allow deleted to be overwritten with the key you wish to insert.

What could go wrong if you simply replaced the first copy of deleted with the key to be inserted, as soon as one such copy was found?
The worst-case performance of an operation on any of the (reasonably simple) kinds of hash tables, that have been described in lectures, is significantly worse than the worst-case performance of the corresponding operation on a balanced binary search tree. Yet, there are situations in which hash tables are considered to be preferable. Why?
Describe three (necessary or) desirable properties that hash functions should have.
Describe the hashCode method that is provided by every Object in Java.

Notice that this should not be used, all by itself, as the “hash function” that maps values to locations in a hash table. Why is that? Why is this function useful (when implementing hash tables) anyway?

Last updated:
http://www.cpsc.ucalgary.ca/~jacobs/Courses/cpsc331/W17/tutorials/tutorial16.html