== Looking at huffman and lzw utilties from michael.dipperstein.com == $ ls -l moby_dick.txt -rw-r--r-- 1 mahoney mahoney 1198687 Nov 1 2006 moby_dick.txt So the raw moby file has 1,198,687 bytes. == LZW == $ cd lzw.06 $ make $ ./sample -c -i ../moby_dick.txt -o ../moby_dick.lzw # encode $ ./sample -d -i ../moby_dick.lzw -o ../moby_dick_lzw.txt # decode $ diff moby_dick.txt moby_dick_lzw.txt $ So the M.D.'s lzw code does encode/decode successfully. $ ls -s moby_dick.lzw -rw-r--r-- 1 mahoney mahoney 485830 Apr 4 19:01 moby_dick.lzw The lzw compress file is 485830 bytes, for a compression ratio of 0.405. == Huffman == $ cd huffman-0.81 $ make $ ./sample -C -c -i ../moby_dick.txt -o ../moby_dick.huff $ ./sample -C -d -i ../moby_dick.huff -o ../moby_dick_huff.txt Look at the code table: $ ./sample -t -i ../moby_dick.txt -o ../moby_dick_hufftable.txt $ cd .. $ ls -l moby_dick.huff $ diff moby_dick.txt moby_dick_huff.txt $ Again, no differences: huff code does encode/decode successfully. $ ls -l moby_dick.huff -rw-r--r-- 1 mahoney mahoney 676332 Apr 4 21:04 moby_dick.huff This time the compression ratio is 0.54422 ; not quite as good but not at all shabby. $ more moby_dick_hufftable.txt Char Count Encoding ----- ---------- ---------------- 0x65 114117 000 0x68 60487 0010 0x69 61039 0011 0x73 61139 0100 0x6E 63573 0101 0x6B 7798 0110000 0x4E 981 0110001000 0x50 991 0110001001 0x78 991 0110001010 0x3F 999 0110001011 0x3B 4144 01100011 0x70 15970 011001 0x79 16321 011010 0x76 8285 0110110 ...