Hash Tables

Hash tables are used for efficient searches of arrays when a linear search may take too long, or the maintaining of the sorting required for a binary search is too much processing (too often).

A hash table is an array in which elements are placed a certain place in the array based on a particular “hash” calculation. The array will start out a size that will satisfy the need, and consist of empty elements. It is then populated by having the hash calculation applied to the new elements, and putting them one-by-one right at (or close to) the index that equals their hash calculation result. In some cases each element in the array will have its subscript match exactly its hash algorithm result. Though most often, many elements are not actually stored right at, but close to the subscript that matches the results of their “hash algorithm” calculation.

So, for example, if the hash algorithm applied to the data element “Hello” evaluates to 7, then “Hello” will preferrably be held at the 7th subscript position [7] of the hash table array.

If a data item is being added to the array and the resulting integer result of the hash alogrithm on it equals an index already occupied, then one of several “tie-breaking” approaches will be utilized. The approach suggested by the IBO is simply searching for the next free space and putting the data item there. This isn’t the best solution, but it works.

Using this approach, the absolute worst case scenerio would be if only one element was left empty, and it was just ahead of the hash result element, so there would have to be a sequential search all the way through to the end of the array, and then from the beginning, up to that free space. But the chances of a long squential search like this ocurring would be extremely small, and so the benefits of not having to continually sort, and not having to seqential search much, outweigh the inefficiency of an occasional longish sequential search. Never-the-less in the case that a new element’s hash result is the same as an index already being occupied, it is actually preferrable to have a secondary hash algorithm that yields more possible numbers.

So once the hash table (array) is established, in order to perform a search for a specific data item in it, you first re-perform the exact same hash algorithm that was used in the hash table's creation Then you look at that index to see if the item you are searching for is there. If it is, voila! If it's not (in which case there must have been a tie), you would revert at that stage to the slower linear search beginning to the right of that index until you find it – it should be close.

A hashing technique that will commonly work for Strings is modulo division performed on the ASCII sum of the String's chars by a prime number close to the length of the array, to yield remainders which are in the range of 1 to approximately the length of the array.

See page 269 for a good hash table creation algorithm. You are only expected to be able to use it, not created it. But you'll need to add some lines to this one to place ties in the next available space.

Adding in these lines after the ht = ht + codes[i].charAt(j); line should work:

                                                int x = ht % 51;
                                                while !(hash[x].equals(“”);
                                                            x++;
                                                hash[x] = codes[i];