Maintaining External Memory Efficient Hash Tables

Philipp Woelfel

Abstract. In typical applications of hashing algorithms the amount of data to be stored is often too large to fit into internal memory. In this case it is desirable to find the data with as few as possible non-consecutive or at least non-oblivious probes into external memory. Extending a static scheme of Pagh (1999) we obtain new randomized algorithms for maintaining hash tables, where a hash function can be evaluated in constant time and by probing only one external memory cell or O(1) consecutive external memory cells. We describe a dynamic version of Pagh’s hashing scheme achieving 100% table utilization but requiring (2+ε)·n·log n space for the hash function encoding as well as (3+ε)·n·log n space for the auxiliary data structure. Update operations are possible in expected constant amortized time. Then we show how to reduce the space for the hash function encoding and the auxiliary data structure to O(n·loglog n). We achieve 100% utilization in the static version (and thus a minimal perfect hash function) and 1-ε utilization in the dynamic case.

Notes