A hash table utilises a hash function to assign an index to a value, allowing for constant search time.
Hash Function
A hash function is the function used to determine the bucket in which a value is stored.
However, there is an infinitely large amount of possible keys
Two distinct keys
k_1,k_2
collide ifh(k_1) = h(k_2)
.
Double Hashing
Double hashing is the practice of using two hash functions to minimise collisions.
Collisions
Generally, the cost of a hash function is considered as
O(1)
.
Chaining | Open Addressing |
---|---|
Utilises the Simple Uniform Hashing Assumption. | Utilises the Uniform Hashing Assumption. |
Can store infinite amount of elements. | Can only store up to the size of the table. |
Chaining
This approach stores a list of elements in one bucket, instead of one.
insert(key, value)
- The hash of the key is computed.
- The node is added to the linked list.
search(key)
- The hash of the key is computed
- Search through the linked list to find the key.
], the expected maximum chain length is
O(logn)
orO(loglogn)
.
Open-Addressing
For this approach, instead of storing a list of elements in one bucket, the hash function is redefined such that the key is mapped to a permutation, and goes through the permutation until it is able to find an empty bucket.
Linear probing uses a singular hash function to find the first value of the hash, then increments by 1 until it finds a bucket that is empty.
However, for this approach, if the table is full, there is no way to store more elements, other than to resize it, unlike chaining. Thus, the table has to be resized from time to time.
The resizing needs to be factored in when considering time complexity of the insertion of elements. The amortized cost needs to be considered.
An operation has amortized cost
T(n)
if for every integerk
, the cost ofk
operations is\leq kT(n)
.Amortized is not average. Another way to look at it is: At
k
operations,T(n)
is the value where the total running time should be lesser thankT(n)
for allk
.
Hashing Assumptions
Simple Uniform Hashing Assumption
- every key is equally likely to be mapped to every bucket
- keys are mapped independently
Uniform Hashing Assumption
- every key is equally likely to be mapped to every permutation independent of every other key.