Algorithms

Spring 2011
course
navigation

Apr 5

homework

Discuss state of open-ended LZW or Huffman coding assignment. Easy? Hard? Finished? Need more time?
As we move forward with gathering tools, one of the keys is building up a tested, stable, usable set of utilities. For some examples: the graph algorithms uses a priority queue, LZW uses bit-access buffer and some sort of lookup table, and so on.
For this homework, one of the ways to make it manageable is to decide which parts you're going to code, and which parts you're just going to find (and cite) from an online source.
Midterm project comments and grades are up: nice work.

sample implementation

Discuss Michael Dipperstein's libraries at http://michael.dipperstein.com/ . Note the style of the work: discussion, code, example, versions, ...
Included utilties:
See attached for an example of both in action.

hash tables

As listed at http://www.inf.ethz.ch/personal/wirth/ :
From discussion in section 5.1, pg 177 of Wirth :
Let keys = 16char names ; total number = 26<sup>16</sup>. Let there be a thousand names ; total number = 10<sup>3</sup>.
Then "hashing" or "a hash table" or a "python dictionary" or an "associative array" are all names for finding a way to store information in an array for each person, but still jump to the right person from their name.
The idea has two parts.
1. Find a function Hash(key) = index that turns a char16 into a number 0..999 (or perhaps a larger array, such as double size or more) spread evenly and apparently randomly over that range. Anything that "mixes up" the keys can be used (hence the name "hash" in the first place); typically something like "ord(key) mod some_prime" works pretty well. Often sort of XOR folding is used; see the discussion at http://en.wikipedia.org/wiki/Hash_function and http://en.wikipedia.org/wiki/List_of_hash_functions Desired properties: a) fast b) uniform across array indices 2. Since many strings map to the same number (Quick quiz: how many?) we need to deal with the situation when we get to the right location but find the wrong string there. (Which also means we need to store the string as well as any other data at that array index, typically there's a pointer there to a data structure.) Two collision mechanics are common : i) put all items with same index into a linked list (or other searchable thing) outside the hash's array. ii) or use some other spot(s) in the array, looking (in some deterministic way) for one that isn't used yet. Typically we add an offset, mod the size of the table. a) variation 1: linear probing ... but can cluster entries b) variation 2: quadratic (computed with recursion) These work best if the size of the table is prime, since otherwise the offsets may well not be uniform.
Here's some sample C code :
And here's a video lecture : MIT Intro to Algorithms course; see video lectures 7 and 8
http://cs.marlboro.edu/ courses/ spring2011/algorithms/ notes/ Apr_5
last modified Tuesday April 5 2011 3:30 am EDT

attachments [paper clip]

     name last modified size
[TXT]dipperstein_notes.txt Apr 4 2011 9:41 pm 1.65kB