march 12
Distrubution Count sort
7.1 #3 : sort {b,c,d,c,b,a,a,b} given that only {a,b,c,d} are the possible values.
First pass through list generates frequencies and "distribution index", e.g. where the next one of that type will should go.
Note problem 7.1 #9 : store Tic-Tac-Toe "answers" ahead of time.
string matching : Horspool and Boyer-Moore
We're trying to match a short string (for example "BARBER") within a longer string
(for example "RED RUBBER PLANTS ARE BOUNCY"). Like what we did before, we move the short string rightwards within the long one. For example, the first attempt at a fit is
RED RUBBER PLANTS ARE BOUNCY
BARBER
However, this time we don't in this case compare the left-most characters ("R" above and "B" below); instead we compare the rightmost ("U" above and "R" below). Why? Because if we pre-analyze BARBER, we'd know that there aren't any U's in it at all; therefore, we can shift BARBER far enough to put its first "B" past this U. Therefore, the next place we should check should be
RED RUBBER PLANTS ARE BOUNCY
BARBER
The idea is to construct a table that says how far we can shift the bottom string, based only on analyzing its letters and positions.
The details can get tricky.
Do 7.2 # 2 in class to make this clear.
Hashing
I started discussing this last time in class.
Do 7.3 #2 in class.
B-trees
Like 2-3 trees, but lots more branching per node allowed. All data goes at the bottom, in leaves. Nodes above that are "indexes", which divide up the indeces below them.
These are designed to minimize number of times we access an index or node (stored on a disk); not key comparisons (in memory; fast).