Information
Theory

Spring 2017
course
navigation

April 6

for the good of the order

I propose that for an end-of-semester project that each of you do a quasi-real implementation of a (compressed, encoded, transmitted, garbled, corrected, uncompressed) data flow.
The idea would be to simulate the following data flow
I) start an ascii text file original.txt II) compress it to compressed.xxx | | prepare III) error-correct encode to encoded.yyy | IV) simulate transmission errors transmitted.yyy by flipping random bits with probability e. (For a really good time, you could put in burst errors or malicious errors.) V) correct errors decoded.xxx | | restore VI) uncompress final.txt |
I would prefer a two-error-correcting scheme, but a 1-error-correction (as we have already seen) would be OK.
All the details would be up to you :
And I'd also like a discussion of the information entropy of these files and how it changes through the process.
We have a month left. Given the discussion on Tuesday in which many said that you'd seen some crypto before, perhaps we should do only a bit of that and also talk about TCP/IP (that thing was wrong - send it again) and take a look at what this stuff looks like in practice (ethernet, ECC memory, etc.)
Thoughts?

new business

The two other error correction codes I'd like to discuss are
The material in chapter 9 of the text is very mathematical, and while I do want to cover some of those topics, it may well be helpful to look at other sources too :
Also related are
There are actually a bunch of these things, based on various fancy math tricks to try to get enough space between the words. The basic goal is the same in each case: k data bits, n coded bits, and codewords some minimum hamming distance apart to fix errors.
Here are some others :
I am not propsing we do all of these ...

Hamming

Readings :
Specifics :
The cool part about the Hamming codes is that they are "perfect" - there are no gaps in the coverage of words, and every error correcting word is 1 away from a legit codeword.
That also means that they have the highest possible rate of information for 1-error-correcting codes.
# m n k >>> for m in range(10): ... print m, 2**m - 1, 2**m - m - 1 ... 0 0 0 1 1 0 2 3 1 3 7 4 # (7, 4) code 4 15 11 5 31 26 6 63 57 7 127 120 8 255 247 9 511 502 (from wikipedia "Hamming Code" article) This extended Hamming code is popular in computer memory systems, where it is known as SECDED ("single error correction, double error detection"). Particularly popular is the (72,64) code, a truncated (127,120) Hamming code plus an additional parity bit, which has the same space overhead as a (9,8) parity code.
We should work through one of these, either by setting up a python notebook, or on the board, or by staring at the wikipedia article.
The homework for Tues asks you to do an encode/decode example.

[7,4]

Discuss the [7,4] code and its [8,4] extension, as in the wikipedia article. (The "G" matrix in wikipedia is the "E" matrix in the textbook. Sometimes its transposed.)
So what are the colored circles on that wiki page all about ?
http://cs.marlboro.edu/ courses/ spring2017/info/ notes/ April_6
last modified Thursday April 6 2017 9:00 am EDT