Algorithms

Spring 2011
course
navigation

Apr 7

Continue our discussion of hash tables

sources

assignment posted

Discuss the assignment for next week: not much writing of C code; instead, exploring a hash implementation with my code to pull the words out of moby dick.

python dicts

Discuss dictionaries in python.
First, make note of python's hash() function, which is defined for all built-in static types.
Second, notice the importance of these dictionaries in python.
$ python > class Foo: ... pass > f = Foo() > f.name = "Fred" > f.age = 23 > f.__dict__
Finally, look at how these dictionaries (hash tables) are implemented in the standard C implementation of python; see the links at the top of the page.
From "Beautiful Code", chap 18 , as sample PyDictObject mapping "aa", "bb", ... to 1, 2, 3, ... :
"The implementation of dictionaries in Python teaches several lessons about performance-critical code. First, one has to trade off the advantages of an optimization against the overhead it adds in space or calculation time. There were places where the Python developers found that a relatively naïve implementation was better in the long run than an extra optimization that seemed more appealing at first. In short, it often pays to keep things simple." - Andrew Kuchling in "Beautiful Code"
int ma_fill 13 int ma_mask 31 PyDictEntry ma_table[]: [0]: aa, 1 hash(aa) == -1549758592, -1549758592 & 31 = 0 [1]: ii, 9 hash(ii) == -1500461680, -1500461680 & 31 = 16 [2]: null, null [3]: null, null [4]: null, null [5]: jj, 10 hash(jj) == 653184214, 653184214 & 31 = 22 [6]: bb, 2 hash(bb) == 603887302, 603887302 & 31 = 6 [7]: null, null [8]: cc, 3 hash(cc) == -1537434360, -1537434360 & 31 = 8 [9]: null, null [10]: dd, 4 hash(dd) == 616211530, 616211530 & 31 = 10 [11]: null, null [12]: null, null ... [28]: null, null [29]: ll, 12 hash(ll) == 665508394, 665508394 & 31 = 10 [30]: mm, 13 hash(mm) == -1475813016, -1475813016 & 31 = 8 [31]: null, null
ma_used = number of slots occupied by keys; ma_mask has 1s from bit 0 to (table size - 1).
Collision algorithm (as described in Beautiful Code)
/* Starting slot */ slot = hash; /* Initial perturbation value */ perturb = hash; while ( slot_is_full && item_in_slot_is_wrong) { slot = (5*slot) + 1 + perturb; perturb >>= 5; }
See the attached ; play around with 'em in class
Update after class: jimhash.c works.
http://cs.marlboro.edu/ courses/ spring2011/algorithms/ notes/ Apr_7
last modified Thursday April 7 2011 3:11 pm EDT

attachments [paper clip]

     name last modified size
[COD]jimhash.c Apr 7 2011 3:10 pm 3.95kB    python_hash_play.py Apr 7 2011 2:51 am 1.24kB