Tutorial 9, CPSC 331, Fall 2008

Hash Tables

About This Exercise

This exercise is based on material presented during the lectures on October 22, October 24, and October 27.

The objectives of this exercise are

to ensure that you understand the definition and basic properties of (various kinds of) hash tables,
to ensure that you understand the assumptions that one must make when analyzing hash tables as well as the way these assumptions are used,
to ensure that you understand properties of hash functions and know how to use these,
to ensure that you appreciate and know how to properly handle various implementation issues that arise when hash functions are used,
to ensure that you can use a hash table to develop a class that implements a “Set” (or “Mapping”) interface in Java.

Please read through and try to solve the problems on this exercise before attending the tutorials on October 27-29.

Problems

Suppose that you inserted the following values into a hash table of size m=59 (using “the division method”):

473, 418, 356, 312, 119, 237, 605, 307

You should assume that the “universe” U is the set of integers between 0 and 999 and that the hash table is empty before 473 is inserted into it.

Show the hash tables that would result from this sequence of insertions when each of the following kinds of hash tables are used.
1. a hash table with chaining
2. a hash table with open addressing; linear probing is used for collision resolution;
3. a hash table with open addressing; quadratic probing is used for collision resolution;
4. a hash table with open addressing; double hashing is used for collision resolution.
When using double hashing, please use

h₀(x) = x mod 59

as suggested above, and use

h₁(x) = (x mod 58) + 1.
Consider the following hash table (with size m=8).
1. Confirm that this table could be produced from an empty table by inserting the keys
  
  25, 2, 12, 14, 22
  
  using the hash function
  
  h(k, i) = k+i mod 8.
2. Draw the tables produced and mention the results you would obtain if the following sequence of operations was performed, starting with the table given above and using the above hash function.
  1. delete 14;
  2. search for 22;
  3. insert 20;
  4. insert 28.
Repeat the first part of the first question, assuming that the hash function is produced using the multiplication method:

h(k) = ⌊mh⌋ for h = kA − ⌊kA⌋, m = 59,

and where

A = 2654435769 / 2³²,

the rational number of the form s / 2³² (for an integer s) that is closest in value to

(√5 − 1)/2

You might find it easiest to write a program in Java that computes this function. What type (or types) would you need to use, to represent integers, in order to make sure that this function is computed accurately?
For each of the following character strings s, compute the location

h(s) = n_s mod 123

for this string in a hash table of size 123, where n_s is the (generally, extremely large) integer that is obtained by mapping each symbol in s to a number between 0 and 127 and using these as a “digits” of a base-128 integer.

Note that this is essentially the method that was used during the lecture on March 7 to map the string “rabbit” to a location in a hash table.

The mapping from characters in the ASCII set to integers shown at the web sit

http://www.lookuptables.com

will be useful when you answer this question.
1. octopus
2. mouse
3. "a long string forcing a large integer to be involved"
4. "silly question"
In the last two parts of the question the string you should use is the string (including spaces) that is shown between quotation marks.

Recall that both a “good” way and a ”bad” way to perform this computation were discussed during the lecture!
Another way to map a character string to a location in a hash table is to map the string to a (possibly large) integer by computing the sum the values corresponding to individual symbols, and then apply a hash function expecting integer inputs to the sum.

For example, since the letters in the string “rabbit” correspond to the integers 114, 97, 98, 98, 105, and 116, one could map “rabbit” to a location in a hash table of size 123 by computing the sum

114 + 97 + 98 + 98 + 105 + 116 = 628

and then mapping “rabbit” to

628 mod 123 = 13.
1. Compute hash table locations (in a hash table of size 123 as above) for the strings given in the previous question, if this new method is used to map them to integers.
2. Recall that we had to be concerned about overflow when the original method was used to evaluate hash functions with character strings. Is overflow a problem when this new method is used, instead? Why (or why not)?
Note that many symbols in the ASCII character set are unlikely to be used in strings you might wish to store in a hash table. For example, you are unlikely to include the NULL character (which is mapped to zero).

Suppose we discarded the above symbol from our ASCII character set — mapping each remaining symbol to a value that is smaller, by one, than the value we used before, and working now with an alphabet with size 127 instead of 128.

How would this effect the usefulness of hash functions that are obtained using the “multiplication method” as described above (specifically, in Question 3, above)?

In order to start to think about this, it might be helpful to compare the hash table locations used for each of the following three strings, using the full ASCII alphabet (with size 2⁷) and using the reduced alphabet (with size 2⁷−1).
1. “a silly example”
2. “for example”
3. “by example”
What happens (in either case) if you begin by mapping strings to integers, using the method from Question 5 instead?

On the other hand, what happens — in all cases — when you compute hash table locations for the following strings, instead?
1. ababababab
2. aaaaabbbbb
3. bbbbbaaaaa
Suppose we are using integer-valued keys, and that we are using the division method to perform hashing into a hash table with open addressing — with quadratic probing used for collision resolution:

h(k, i) = k + i² mod m.

Consider the probe sequence, which includes

h(k, 0), h(k, 1), h(k, 2), …, h(k, m-1).
1. Does this include all m of the values between 0 and m−1 when m is prime? If it does not, then how many of these values can you expect to find in this probe sequence?
2. What is the answer for these questions when m is a power of two?
Design and implement (in Java) a hash table that could be used to store a set of at most one-hundred nonnegative integers that are each at most 31 bits long (so that they could be represented using the “int” type in Java).

If you have time, try to do this in several ways:
1. using a hash table with chaining,
2. using open addressing, with linear probing
3. using open addressing, with quadratic probing
In each case, try to design your hash table so that it does not use more space than necessary — but also so that the expected number of probes needed to find an element is at most five, under some plausible (or, at least, commonly used) assumption that the way your hash function will behave.

Please state and (if you can) describe the use of any assumption that is made in order to estimate the number of probes that will be used for a search in your hash table.
All hash tables using open addressing provide poor performance (both in the average- and worst-case) when they fill up — that is, when their load factors approach one.

Design an algorithm for “hash table expansion,” that is, an algorithm that can be used to take a given hash table with size m as input and to produce another hash table with size M as output, such that the input and output tables store the same set of values.

How many steps does your algorithm require?

How should the new (larger) table size, M, be related to the original size, m? Why?

This page last modified:
http://www.cpsc.ucalgary.ca/~jacobs/cpsc331/F08/tutorials/tutorial9.html