Apr 5

homework

Discuss state of open-ended LZW or Huffman coding assignment. Easy? Hard? Finished? Need more time?

As we move forward with gathering tools, one of the keys is building up a tested, stable, usable set of utilities. For some examples: the graph algorithms uses a priority queue, LZW uses bit-access buffer and some sort of lookup table, and so on.

For this homework, one of the ways to make it manageable is to decide which parts you're going to code, and which parts you're just going to find (and cite) from an online source.

Midterm project comments and grades are up: nice work.

sample implementation

Discuss Michael Dipperstein's libraries at http://michael.dipperstein.com/ . Note the style of the work: discussion, code, example, versions, ...

Included utilties:

bit processing
command line processing
binary tree (for string lookup table)
other?

See attached for an example of both in action.

hash tables

As listed at http://www.inf.ethz.ch/personal/wirth/ :

Wirth's Algorithms and Data Structures

From discussion in section 5.1, pg 177 of Wirth :

 Let keys = 16char names ; total number = 26<sup>16</sup>.
 Let there be a thousand names ; total number = 10<sup>3</sup>.

Then "hashing" or "a hash table" or a "python dictionary" or an "associative array" are all names for finding a way to store information in an array for each person, but still jump to the right person from their name.

The idea has two parts.

 1. Find a function 
       Hash(key) = index 
    that turns a char16 into a number 0..999
    (or perhaps a larger array, such as double size or more)
    spread evenly and apparently randomly over that range.
 
    Anything that "mixes up" the keys can be used
    (hence the name "hash" in the first place); typically
    something like "ord(key) mod some_prime" works pretty well.
    Often sort of XOR folding is used; see the discussion
    at http://en.wikipedia.org/wiki/Hash_function
    and http://en.wikipedia.org/wiki/List_of_hash_functions
 
    Desired properties:
      a) fast
      b) uniform across array indices

 2. Since many strings map to the same number (Quick quiz: how many?)
    we need to deal with the situation when we get to the right
    location but find the wrong string there. (Which also means
    we need to store the string as well as any other data at
    that array index, typically there's a pointer there 
    to a data structure.)
 
    Two collision mechanics are common :
 
     i) put all items with same index into a linked list
        (or other searchable thing) outside the hash's array.

     ii) or use some other spot(s) in the array, looking
        (in some deterministic way) for one that isn't used yet.
        Typically we add an offset, mod the size of the table.
        a) variation 1: linear probing ... but can cluster entries
        b) variation 2: quadratic (computed with recursion)
        These work best if the size of the table is prime, 
        since otherwise the offsets may well not be uniform.

Here's some sample C code :

http://task3.cc/308/hash-maps-with-linear-probing-and-separate-chaining/

And here's a video lecture : MIT Intro to Algorithms course; see video lectures 7 and 8

http://cs.marlboro.edu/ courses/ spring2011/algorithms/ notes/ Apr_5
last modified Tuesday April 5 2011 3:30 am EDT

attachments

name last modified size

Algorithms

course

navigation

Apr 5

homework

sample implementation

hash tables

attachments