By Bluefish@11a.nu.
The Huffman algorithms are based around the idea that a datastream which is to be compressed contains a high frequence of certain character, while others are not so common. A character which is common is assigned a short path in a tree, while an uncommon character is assigned a longer path.
An interresting fact is that the wildly known and used Morse-codes are based upon very similar ideas as the Huffman algorithms. (this paragraph given to aid understanding Huffman if you allready are familiar with Morse, if you aren't simply skip to the next paragraph)
Consider the message "ABBACCAAAAA". We find that we have seven "A", two "B" and two "C". As this message is so simple, we can design a suitable tree easily:
+ / \ A + / \ B C |
Thus we have that the path to the leaf "A" is "0", "B" is "10" and "C" is "11" (I'm using "0" for left, "1" for right). Thus "ABBACCAAAAA", eleven characters can be represented using the tree by "010100111100000" which is only 15 bits. If each character previously was stored in 8 bits, this example has given us an compression ratio of 83% ! |
As seen the example, how to use the trees to compress a known data is quite simple. And, when creating small trees, you can find a good tree by simply guessing. However, what happens if you are i.e working on compressing a chineese book, and there are thousands of different characters used? Obviously, an algorithmic solution to find a good tree is needed. David A Huffman developed a algorithm which is rather simple for this purpose.
In the algorithm, Huffman defines the concept of weight. A leaf has the same weight as the number of such characters to be compressed (thus, if there are 7 "A", "A" has the weight of 7. For a tree, the 'weight' is sum of all the leaves' weight. The Huffman algorithm can then be described in the following way:
while not all trees are connected { take the tree with the least weight, and combine it the tree with second least weight. }
3A 4B 11C 23D 37E |
We begin with A (weight 3), B (weight 4), and so on... |
7 11C 23D 37E / \ A B |
We then connect the two trees with least weights (3 and 4). |
18 23D 37E / \ 7 C / \ A B |
We then connect the two trees with least weights (7 and 11). |
41 37E / \ 18 D / \ 7 C / \ A B |
We then connect the two trees with least weights (18 and 23). |
78 / \ 41 E / \ 18 D / \ 7 C / \ A B |
We then connect the two trees with least weights (37 and 41). Tada! The tree is now finnished! |
3A 4B 11C 16D 17E |
We begin with A (weight 3), B (weight 4), and so on... |
7 11C 16D 17E / \ A B |
We then connect the two trees with least weights (3 and 4). |
18 16D 17E / \ 7 C / \ A B |
We then connect the two trees with least weights (7 and 11). |
18 33 / \ / \ 7 C D E / \ A B |
We then connect the two trees with least weights (16 and 17). |
51 / \ 18 33 / \ / \ 7 C D E / \ A B |
We then connect the two trees with least weights (18 and 33). Tada! The tree is now finnished! |