April 6

for the good of the order

I propose that for an end-of-semester project that each of you do a quasi-real implementation of a (compressed, encoded, transmitted, garbled, corrected, uncompressed) data flow.

The idea would be to simulate the following data flow

 I)   start an ascii text file       original.txt      
 
 II)  compress it to                 compressed.xxx  |
                                                     | prepare 
 III) error-correct encode to        encoded.yyy     |
 
 IV)  simulate transmission errors   transmitted.yyy
      by flipping random bits with
      probability e. (For a really
      good time, you could put in
      burst errors or malicious errors.)
 
 V)   correct errors                 decoded.xxx     |
                                                     | restore
 VI)  uncompress                     final.txt       |

I would prefer a two-error-correcting scheme, but a 1-error-correction (as we have already seen) would be OK.

All the details would be up to you :

choice of which compression algorithm
choice of which error-correction algorithm
how realistic transmitted file is (i.e. ascii 1's and 0's or bits)

And I'd also like a discussion of the information entropy of these files and how it changes through the process.

We have a month left. Given the discussion on Tuesday in which many said that you'd seen some crypto before, perhaps we should do only a bit of that and also talk about TCP/IP (that thing was wrong - send it again) and take a look at what this stuff looks like in practice (ethernet, ECC memory, etc.)

Thoughts?

new business

The two other error correction codes I'd like to discuss are

Hamming codes
CRC (cyclic redundancy checks) codes

The material in chapter 9 of the text is very mathematical, and while I do want to cover some of those topics, it may well be helpful to look at other sources too :

http://en.wikipedia.org/wiki/Error_detection_and_correction (overview)
MacKay's book (see resources page) chapters 11 (theory) and 12 (practice)
http://en.wikipedia.org/wiki/Hamming_code
http://en.wikipedia.org/wiki/Cyclic_redundancy_check

Also related are

There are actually a bunch of these things, based on various fancy math tricks to try to get enough space between the words. The basic goal is the same in each case: k data bits, n coded bits, and codewords some minimum hamming distance apart to fix errors.

Here are some others :

wikipedia: low-density parity-check code
wikipedia: Reed-Solomon error correction
wikipedia: Binary Golay code (NASA Voyager missions!)
wikipedia: Reed-Muller codes
wikipedia: Turbo Code
wikipedia: BCH code
wikipedia: Polynomial code
wikipedia: Cyclic code (e.g. 011 110 101 000)

I am not propsing we do all of these ...

Hamming

Readings :

http://en.wikipedia.org/wiki/Hamming_code
beginning of chapter 9 in Biggs textbook

Specifics :

invented by Hamming in 1950
binary linear code (like we've been discussing)
- (n, k, δ) = (bits in error-correcting code, bits of data, min hamming distance between no-error codewords)
- E is (n x k) encoding matrix (Biggs' notation)
- x = (k-bit info word, column vector k x 1)
- y = E x = (n-bit redundant error correcting word, column vector n x 1)
- H is (n-k x n) check matrix
- H y = 0 if y is a legit (no errors) codeword
Hamming code : H is (m x (2**m - 1)), i.e. all possible columns
- So (n = 2**m - 1)
- And (n - k = m) means (k = m + n = 2**m - 1 - m)
- All (pure) Hamming codes have δ = 3, i.e. 1-error correcting.

The cool part about the Hamming codes is that they are "perfect" - there are no gaps in the coverage of words, and every error correcting word is 1 away from a legit codeword.

That also means that they have the highest possible rate of information for 1-error-correcting codes.

 # m n k
 >>> for m in range(10):
 ...   print m, 2**m - 1, 2**m - m - 1
 ... 
 0 0 0
 1 1 0
 2 3 1
 3 7 4        # (7, 4) code
 4 15 11
 5 31 26
 6 63 57
 7 127 120
 8 255 247
 9 511 502

  (from wikipedia "Hamming Code" article)
  This extended Hamming code is popular in computer memory systems, 
  where it is known as SECDED ("single error correction, double error detection"). 
  Particularly popular is the (72,64) code, a truncated (127,120) Hamming code 
  plus an additional parity bit, which has the same space overhead as a (9,8) parity code.

We should work through one of these, either by setting up a python notebook, or on the board, or by staring at the wikipedia article.

The homework for Tues asks you to do an encode/decode example.

[7,4]

http://en.wikipedia.org/wiki/Hamming(7,4)

Discuss the [7,4] code and its [8,4] extension, as in the wikipedia article. (The "G" matrix in wikipedia is the "E" matrix in the textbook. Sometimes its transposed.)

So what are the colored circles on that wiki page all about ?

http://cs.marlboro.edu/ courses/ spring2017/info/ notes/ April_6
last modified Thursday April 6 2017 9:00 am EDT

Information
Theory

course

navigation

April 6

for the good of the order

new business

Hamming

[7,4]

InformationTheory

course

navigation

April 6

for the good of the order

new business

Hamming

[7,4]

Information
Theory