Apr 7
Continue our discussion of hash tables
sources
assignment posted
Discuss the assignment for next week: not much writing of C code;
instead, exploring a hash implementation with my
code to pull the words out of moby dick.
python dicts
Discuss dictionaries in python.
First, make note of python's hash() function,
which is defined for all built-in static types.
- Discuss why lists in python aren't allowed to be dictionary keys, but tuples are.
- Discuss syntax for creating new hashable objects with your own __hash__ .
Second, notice the importance of these dictionaries in python.
$ python
> class Foo:
... pass
> f = Foo()
> f.name = "Fred"
> f.age = 23
> f.__dict__
Finally, look at how these dictionaries (hash tables)
are implemented in the standard C implementation of python;
see the links at the top of the page.
From "Beautiful Code", chap 18 , as sample PyDictObject
mapping "aa", "bb", ... to 1, 2, 3, ... :
"The implementation of dictionaries in Python teaches several lessons about performance-critical code. First, one has to trade off the advantages of an optimization against the overhead it adds in space or calculation time. There were places where the Python developers found that a relatively naïve implementation was better in the long run than an extra optimization that seemed more appealing at first. In short, it often pays to keep things simple." - Andrew Kuchling in "Beautiful Code"
int ma_fill 13
int ma_mask 31
PyDictEntry ma_table[]:
[0]: aa, 1 hash(aa) == -1549758592, -1549758592 & 31 = 0
[1]: ii, 9 hash(ii) == -1500461680, -1500461680 & 31 = 16
[2]: null, null
[3]: null, null
[4]: null, null
[5]: jj, 10 hash(jj) == 653184214, 653184214 & 31 = 22
[6]: bb, 2 hash(bb) == 603887302, 603887302 & 31 = 6
[7]: null, null
[8]: cc, 3 hash(cc) == -1537434360, -1537434360 & 31 = 8
[9]: null, null
[10]: dd, 4 hash(dd) == 616211530, 616211530 & 31 = 10
[11]: null, null
[12]: null, null
...
[28]: null, null
[29]: ll, 12 hash(ll) == 665508394, 665508394 & 31 = 10
[30]: mm, 13 hash(mm) == -1475813016, -1475813016 & 31 = 8
[31]: null, null
ma_used = number of slots occupied by keys; ma_mask has 1s from bit 0 to (table size - 1).
Collision algorithm (as described in Beautiful Code)
/* Starting slot */
slot = hash;
/* Initial perturbation value */
perturb = hash;
while ( slot_is_full && item_in_slot_is_wrong) {
slot = (5*slot) + 1 + perturb;
perturb >>= 5;
}
See the attached ; play around with 'em in class
- try to implement these in python
- and/or check out my simplistic C implementation.
Update after class: jimhash.c works.