Algorithms

Spring 2007
course
navigation

march 8

sorting by counting

Describe the basic idea: counting how many you have of each thing.
Works particularly well if you have a small-ish number of different things, and you know this in advance. Takes a huge amount of space if there are lots of things, most of which won't be in the list under consideration.

hashes

The idea is to reduce searching for something to a O(1) operation by just jumping directly to it. To do this you need to find a "hash function" that tells you where to jump to. This function is typically a map f(key)=integer which sends the keys you're searching over to fairly random numbers in a range that's larger than your number of items, but small enough to index your storage array.

example

Suppose you want to store information on n=500 people, using their names as keys. You'd like to be able to find a specific person quickly using a hash table.
Choose as a hash function
f(string) = product of ascii values of characters mod 7919
where I picked a prime number (the 1000'th prime, actually) much bigger than n (about 10 times bigger) but not so big that I can't easily allocate storage space for my hash table, H[0..7918].
Then given anyone's name, say "Jim Mahoney", I apply the hash function to calculate the corresponding integer
Using Mathematica's programming language (just for kicks)
f[x_] := Mod[ Apply[Times, ToCharacterCode[x]], 7919]; f["Jim Mahoney"] 6251
The drawback to this method is that two different keys may give the same index, which is called a "collision". The algorithm you use to search, store, and delete items from the hash table needs to be able to deal with these collisions - this is the price you pay for the speed of the O(1) lookup in most cases.
The two most common ways of dealing with collisions are
  1. external: if there's a collision store all collided entries in a linked list, and search that list sequentially. This requires extra storage outside the hash table.
  2. internal: put that data somewhere else in the hash table, either in the next empty slot or in a quasi-random-but-deterministic spot. Either way, the insertion/delection/search methods need to "do the right thing".
http://cs.marlboro.edu/ courses/ spring2007/algorithms/ notes/ march_8
last modified Thursday March 8 2007 12:59 pm EST