thirty-two:Desktop$ python dow.py 

trans[row][col] is prob(row state at time k+1 | col state at time k) :
[[ 0.2  0.6  0.6]
 [ 0.3  0.2  0.3]
 [ 0.5  0.2  0.1]]

obs[row][col] is prob(row observe at time k | col state at time k) :
[[ 0.1  0.6  0.2]
 [ 0.6  0.3  0.1]
 [ 0.3  0.1  0.7]]

state0 is state at time t=0.  (Often called "pi" in the HMM literature.) :
[ 0.5  0.3  0.2]

State and observables evolution without evidence :
   time        down        up      same       coffee        oj       tea 
   ----        ----        --      ----       ------        --       --- 
      0   [ 0.50000   0.30000   0.20000 ]
      1   [ 0.40000   0.27000   0.33000 ]  [ 0.26800   0.35400   0.37800 ]
      2   [ 0.44000   0.27300   0.28700 ]  [ 0.26520   0.37460   0.36020 ]
      3   [ 0.42400   0.27270   0.30330 ]  [ 0.26668   0.36654   0.36678 ]
      4   [ 0.43040   0.27273   0.29687 ]  [ 0.26605   0.36975   0.36420 ]
      5   [ 0.42784   0.27273   0.29943 ]  [ 0.26631   0.36847   0.36523 ]
      6   [ 0.42886   0.27273   0.29841 ]  [ 0.26620   0.36898   0.36482 ]
      7   [ 0.42845   0.27273   0.29882 ]  [ 0.26625   0.36877   0.36498 ]
      8   [ 0.42862   0.27273   0.29865 ]  [ 0.26623   0.36885   0.36492 ]
      9   [ 0.42855   0.27273   0.29872 ]  [ 0.26624   0.36882   0.36494 ]
     10   [ 0.42858   0.27273   0.29869 ]  [ 0.26623   0.36883   0.36493 ]

Now suppose we have actual observations of the morning drink,
where (coffee=0, oj=1, tea=2), and the list is time=1..10 ,
seen is : 
[0 0 0 2 2 1 0 2 0 0]

How do we figure out what the probabilities are for the dow sequences?


---------------------
Method 0: BRUTE FORCE
  (This is completely impractical for non-trivial cases,
   but still interesting conceptually.)
  There are 3**10 = 59049 different state sequences;
  the probability of each sequences is the product of
  10 terms from the trans matrix and state0.

  For example, for the
  three state sequence (up,up,up) the probability is
  P(up t=3|up t=2)*P(up t=2|up t=1)*P(up t=1) = 0.2 * 0.2 * 0.27

  Each of those 59049 sequences has a finite probability
  of producing any of the 3**10=5904 observation sequences.

  For example, the probability of seeing (coffee, oj, tea)
  given that the states are (up,up,up) is P(coffee|up)*P(oj|up)*P(tea|up).

  Therefore the probability of getting (up,up,up) and (coffee,oj,tea) is
   P(up t=3|up t=2)*P(up t=2|up t=1)*P(up t=1)*P(coffee|up)*P(oj|up)*P(tea|up)
   = 0.2 * 0.2 * 0.27 * 0.6 * 0.3 * 0.1

  If we do a similar calculation for all 59049 possible state sequences
  and add up all the probabilities consistent with (coffee,oj,tea),
  we can then normalize them all and get the probabilities of each
  state sequence.

Initializing seqs...
done.  Number of entries is 59049
Updating seqs...
done.
Sorting seqs...
done.  Largest 10 are :
 3.70271e-08   [1, 1, 1, 0, 2, 0, 1, 0, 1, 1] 
 3.08559e-08   [1, 1, 1, 0, 2, 0, 1, 0, 2, 1] 
 2.87988e-08   [1, 1, 1, 0, 2, 0, 1, 2, 1, 1] 
 2.77703e-08   [1, 0, 1, 0, 2, 0, 1, 0, 1, 1] 
 2.31419e-08   [1, 0, 1, 0, 2, 0, 1, 0, 2, 1] 
 2.26277e-08   [2, 1, 1, 0, 2, 0, 1, 0, 1, 1] 
 2.15991e-08   [1, 0, 1, 0, 2, 0, 1, 2, 1, 1] 
 2.05706e-08   [1, 1, 1, 0, 2, 0, 2, 0, 1, 1] 
 1.88564e-08   [2, 1, 1, 0, 2, 0, 1, 0, 2, 1] 
 1.85135e-08   [1, 2, 1, 0, 2, 0, 1, 0, 1, 1] 
Total probability is 4.73132432584e-06

If we now wanted to know the probability of a particular
state at a particular time, we could sum all the
applicable state sequences.

------------------------
HOWEVER, this approach blows up exponentially in time and states.

There are several recursive algorithms which can do these
sorts of things in linear time and space :

Forward-Backward
 * is the backbone of all of these notions
 * uses two recursive formula to calculate prob(state | evidence),
   one using current & past evidence to get prob of current state;
   another using future evidence to get prob of current state.
 * Technique is to start at t=start, propogate evidence "messages"
   forward to t=end, then back to t=end, and repeat until some
   sort of desired convergence is found.
 * It can calculate
   * the liklihood of a given observation set for a given model
   * the probability of a particular state at a particular time
 * all "local" results, averaged over many paths

Baum-Welch
 * can be done along with forward-backward
 * an EM "expectation maximization" algorithm
 * learns improvements to model
 * transition matrix based on P(state_i|state_j) counts in forward-back states

Viterbi
 * finds most probably sequence of states
 * another recursive linear algorithm
 * best sequence to state_i is best to state_(i-1) + one more step
 * ... so only best so far needs to be remembered
 * ... and then next evidence is added in with forward-like recurence relation