Jan 31
Continue discussion of Huffman coding :
How would you implement the Huffman algorithm?
What is the right data structure(s)?
If you wanted to do this with the bits,
what coding techniques would you need?
How much compression do you get with a Huffman code?
Go over "average bits per symbol" and its implications.
How many (fixed length) bits do you need to encode a set of symbols without using Huffman coding?
Invent an example and work through it.
Can you use Huffman coding for a bit stream that has conditional probability, i.e. P(a|b) different from P(a)?
What are the issues to consider?
answers:
- Huffman is a clean, short idea, usually used for P(a) models. So you could just do that ...
- One way to get more compression with Huffman is to change what a "symbol" is. Longer symbols lets you get better statistics, and more options for fitting the power-of-two Huffman binary tree. (Discussed in chapter 4 in the text.)
- I haven't seen this discussed (so maybe it's a bad idea), but you could construct a different Huffman encoding for each x in P(a|x). Then you could change your code for each symbol. Of course, this would mean a *big* symbol table to send ... which is perhaps why it's a bad idea.
Huffman is not the only game in town - I think that if you're going to do a consider a complicated variation, it's likely that other compression algorithms would do better.
Coming next: chapter 4 material and related topics :
- LZW compression
- arithmetic code