Jan 24 : probability & start entropy

books

Discuss the course texts, all listed on the resources page:

Biggs "Codes: ..."
Shannon's "Mathematical Theory of Communication"
MacKay's "Information Theory ..."
wikipedia: information entropy ... and other topics & references there

First is very math-ish; 2nd is great but of limited scope; 3rd is very good at times but wordy and mostly aimed at another topic.

This subject can get very technical (e.g. Cover & Thomas).

overview

compression
error correction
crypto

We'll start with entropy (defining it, calculating it) and then move to compression (huffman, LZW, etc).

python comments

I'll be using juypter python notebooks for some of the numerical work and plotting.

I would encourage you to explore this platform if you're not familiar with it - quite nice for this sort of stuff.

See http://cs.marlboro.edu/courses/spring2017/algorithms/code/jupyter for some of my notes on it.

And for some fancier data libraries, check out

this week - entropy & preliminary concepts

(The text doesn't really finish entropy until after it discusses huffman coding ... I'll be doing things the other way around.)

Discuss these terms and ideas :

 
 alphabet, string, message, word
 code
 uniquely decodable (UD)
 prefix-free (PF)
 optimal code & "average word length"
 information entropy = sum( - p[i] log(p[i]) ), where p[i] = probability of symbol i
 huffman code
 source, probability, conditional probability

In particular, go over conditional probability.

Here's a tiny example :

 Say you have 3 marbles: 
   (big red)
   (big blue)
   (small red)

Then what is

 P(red)  = ?
 P(blue) = ?
 P(red & big) = ?
 P(red | big) = ?
 P(big | red) = ?

How is P(big & red) related to P(big | red)

You'll need to understand this to do the homework due Thursday ... so let's talk about what I'm expecting you to do.

intuition

In basic physics, entropy of a system is ln[number of states].

why logarithm? Answer: we want it to add, for 2 systems, but number of states multiplies.
what is it, really? Measures how likely a set of states with same macroscopic properties is.
temperature as measure of entropy with energy

How does this connect with Shannon's entropy?

First idea : if something can happen N ways, then

 ln(N) = - ln(1/N) = - ln(p)

Second idea: not *total* entropy, but entropy *per_symbol* . So we need to average.
Definition of average: (make sure this is clear)

 mean(x) = sum p(x) * x

Then mean entropy is

 H = - sum p_i * ln(p_i)

An example

Suppose you have this string of bits :

 0 1 0 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 0 1

And suppose that you would like to calculate its entropy.

First issue: one specific string doesn't really have any entropy, any more than a given number can be random or not.

 Aside: 
   What does it mean for a number to be random?
 Answer: 
   The question misses the point. A process which chooses numbers can be random.
   And we often then say that the numbers it produces are random. But numbers
   themselves don't have the luxury of being random or not. So when someone 
   says "Pick a random number between 1 and 10", what they should be saying
   is "Randomly pick a number between 1 and 10".

So now imagine that that string of bits is one example of a much longer stream of bits, with one generated after another, left to right.

We can then use that example to try to find a model of a process that could generate those bits. And it's that model that we use to generate a number for the entropy.

By "process" here I mean a probabilistic model of what bits (or words, if the bits are chunked together) are generated with what probabilities. We'll use the knowledge of the bits seen previously to hopefully get the best model that we can.

In the formal languages course last term, we say another definition of "information" of a string or set of strings, connected to to the minimal size of a Turing machine needed to generate it. That definition was fundamentally satisfying but impossible to calculate.

Shannon's entropy is based on a probabilistic model of predicting the next bit in the stream. If the bits were a digits of pi, that would be a really bad model, since they look pretty random but in fact aren't. The notion of entropy we're developing here would not be able to tell that digits of pi can be calculated and are therefore predictable and therefore carry little information.

With all that in mind, let's do the best we can with this example to get a model of the probabilities that it implies

 ... coming in class ...

http://cs.marlboro.edu/ courses/ spring2017/info/ notes/ Jan_24
last modified Monday January 23 2017 11:57 pm EST

InformationTheory

course

navigation