March 28

news

http://research.microsoft.com/en-us/projects/trinity/

midterm projects

Finish this week; next assignment due a week from today (will be posted by Thu).

review

Open discussion of what we've done so far.

learned basic C syntax and ideas : pointers, memory allocation
looked at datastructures : linked lists, stack, queues, trees, graphs (various implementations), heap (as priority queue)
discussed analysis of efficiency of algorithms : O() notation
- big idea : O(1) << O(log) << O(poly) << O(exp)
saw several examples of ideas behind algorithms : brute force, divide and conquer, greed
coded and worked through several classics : sorting (mergersort, quicksort, radix sort), FFT, several graph algorithms (graph analysis, tree searching, minimum distances, minimum spanning tree)

possible next topics

taste of compression algorithms : Huffman and LZW encoding

a common datastructure : hash / dictionary / (key,value) storage with O(1) lookup. How does that work?

Runge-Kutta type approach to differential equations

"hard" NP problems : approximate answers

Huffman coding

Compression algorithms are all about taking something (a file, a multimedia stream, ...) and making it smaller. There are two big categories: lossless (i.e. don't lose any data) and lossy (i.e. throw away 'unimportant' stuff). Exercise 1: give some examples of each that you use regularly. Exercise 2: how big a factor is the compression? An ideal algorithms has (a) a large compression factor, and (b) runs fast. Typically there is some tradeoff between these two things. And many real applications/protocols use several types of compression sequentially.

Huffman basic idea:

Start with frequency table of symbols in your alphabet
Invent a "code", giving each letter a bitstring, which makes the encoded message as short as possible.
Symbols which are used more often (like "e") get a smaller bitstring encoding.
Symbols are used rarely (like "q") get a longer one.

Morse code uses a similar notion: "e" is dot, "z" is "dash dash dot dot".

Huffman (1952) is provably the best way to do this sort of trick; it gives a "prefix code" (or prefix-free code) which doesn't need any special markers between characters, even though they're made of bitstrings of different length(!)

Walk through an example of the code generation algorithm.

This is called "fixed to variable" encoding. Explain.

Resources:

my 2001 links
textbook pg 153 (part of "greedy" chapter)
wikipedia: huffman coding
many others found via a google search

http://cs.marlboro.edu/ courses/ spring2011/algorithms/ notes/ March_28
last modified Tuesday March 29 2011 12:28 am EDT

Algorithms