March 28
news
midterm projects
Finish this week; next assignment due a week from today (will be posted by Thu).
review
Open discussion of what we've done so far.
- learned basic C syntax and ideas : pointers, memory allocation
- looked at datastructures : linked lists, stack, queues, trees, graphs (various implementations), heap (as priority queue)
- discussed analysis of efficiency of algorithms : O() notation
- big idea : O(1) << O(log) << O(poly) << O(exp)
- saw several examples of ideas behind algorithms : brute force, divide and conquer, greed
- coded and worked through several classics : sorting (mergersort, quicksort, radix sort), FFT, several graph algorithms (graph analysis, tree searching, minimum distances, minimum spanning tree)
possible next topics
taste of compression algorithms : Huffman and LZW encoding
a common datastructure : hash / dictionary / (key,value) storage with O(1) lookup. How does that work?
Runge-Kutta type approach to differential equations
"hard" NP problems : approximate answers
Huffman coding
Compression algorithms are all about taking something
(a file, a multimedia stream, ...) and making it smaller.
There are two big categories: lossless (i.e. don't lose any data)
and lossy (i.e. throw away 'unimportant' stuff). Exercise 1:
give some examples of each that you use regularly. Exercise 2:
how big a factor is the compression? An ideal algorithms
has (a) a large compression factor, and (b) runs fast.
Typically there is some tradeoff between these two things.
And many real applications/protocols use several types
of compression sequentially.
Huffman basic idea:
- Start with frequency table of symbols in your alphabet
- Invent a "code", giving each letter a bitstring, which makes the encoded message as short as possible.
- Symbols which are used more often (like "e") get a smaller bitstring encoding.
- Symbols are used rarely (like "q") get a longer one.
Morse code uses a similar notion: "e" is dot, "z" is "dash dash dot dot".
Huffman (1952) is provably the best way to do this sort of trick;
it gives a "prefix code" (or prefix-free code) which doesn't
need any special markers between characters, even though
they're made of bitstrings of different length(!)
Walk through an example of the code generation algorithm.
This is called "fixed to variable" encoding. Explain.
Resources: