April 6
for the good of the order
I propose that for an end-of-semester project that each of you
do a quasi-real implementation of a (compressed, encoded, transmitted,
garbled, corrected, uncompressed) data flow.
The idea would be to simulate the following data flow
I) start an ascii text file original.txt
II) compress it to compressed.xxx |
| prepare
III) error-correct encode to encoded.yyy |
IV) simulate transmission errors transmitted.yyy
by flipping random bits with
probability e. (For a really
good time, you could put in
burst errors or malicious errors.)
V) correct errors decoded.xxx |
| restore
VI) uncompress final.txt |
I would prefer a two-error-correcting scheme, but a 1-error-correction
(as we have already seen) would be OK.
All the details would be up to you :
- choice of which compression algorithm
- choice of which error-correction algorithm
- how realistic transmitted file is (i.e. ascii 1's and 0's or bits)
And I'd also like a discussion of the information entropy of these files
and how it changes through the process.
We have a month left. Given the discussion on Tuesday in which many
said that you'd seen some crypto before, perhaps we should do only a bit
of that and also talk about TCP/IP (that thing was wrong - send it again)
and take a look at what this stuff looks like in practice (ethernet, ECC memory, etc.)
Thoughts?
new business
The two other error correction codes I'd like to discuss are
- Hamming codes
- CRC (cyclic redundancy checks) codes
The material in chapter 9 of the text is very mathematical,
and while I do want to cover some of those topics, it may
well be helpful to look at other sources too :
Also related are
There are actually a bunch of these things, based on various fancy
math tricks to try to get enough space between the words. The basic
goal is the same in each case: k data bits, n coded bits, and
codewords some minimum hamming distance apart to fix errors.
Here are some others :
I am not propsing we do all of these ...
Hamming
Readings :
Specifics :
- invented by Hamming in 1950
- binary linear code (like we've been discussing)
- (n, k, δ) = (bits in error-correcting code, bits of data, min hamming distance between no-error codewords)
- E is (n x k) encoding matrix (Biggs' notation)
- x = (k-bit info word, column vector k x 1)
- y = E x = (n-bit redundant error correcting word, column vector n x 1)
- H is (n-k x n) check matrix
- H y = 0 if y is a legit (no errors) codeword
- Hamming code : H is (m x (2**m - 1)), i.e. all possible columns
- So (n = 2**m - 1)
- And (n - k = m) means (k = m + n = 2**m - 1 - m)
- All (pure) Hamming codes have δ = 3, i.e. 1-error correcting.
The cool part about the Hamming codes is that they are "perfect" -
there are no gaps in the coverage of words, and every error correcting
word is 1 away from a legit codeword.
That also means that they have the highest possible rate
of information for 1-error-correcting codes.
# m n k
>>> for m in range(10):
... print m, 2**m - 1, 2**m - m - 1
...
0 0 0
1 1 0
2 3 1
3 7 4 # (7, 4) code
4 15 11
5 31 26
6 63 57
7 127 120
8 255 247
9 511 502
(from wikipedia "Hamming Code" article)
This extended Hamming code is popular in computer memory systems,
where it is known as SECDED ("single error correction, double error detection").
Particularly popular is the (72,64) code, a truncated (127,120) Hamming code
plus an additional parity bit, which has the same space overhead as a (9,8) parity code.
We should work through one of these, either by
setting up a python notebook, or on the board,
or by staring at the wikipedia article.
The homework for Tues asks you to do an encode/decode example.
[7,4]
Discuss the [7,4] code and its [8,4] extension,
as in the wikipedia article. (The "G" matrix
in wikipedia is the "E" matrix in the textbook.
Sometimes its transposed.)
So what are the colored circles on that wiki page all about ?