This explanation is almost entirely 
 from http://en.wikipedia.org/wiki/Bayes%27_theorem

 ----------------------------------------------------------

 (1) Typical definition of probability is 
     P(x) = (number of times X happens)/(total number of all events)

 (2) Bayesian probability takes a different interpretation,
     in terms of a *degree of belief* in proposition X.

     This is particularly useful in AI learning systems,
     where one can use it to build a knowledge base 
     from observed evidence.

     Let P(A|B) be the conditional probability of A given B,
     and let P(A,B) be the joint probability of A and B.

     Then 

        P(A|B) * P(B) = P(A,B) = P(B|A) * P(A) 

     which implies that

        P(A|B) = P(B|A) * P(A) / P(B)

     which is called Bayes' Theorem.
     

     In this last formula, 

        P(A) is called the "prior probality" of A,
                      or the "marginal probality",
                      before any knowledge of B.

        P(A|B) is called the "posterier probability" of A,
                      derived from or entailed by a specific B.
                   
        
     A further embellishment is

        P(B) = P(A,B) + P(!A, B) = P(B|A)*P(A) + P(B|!A)*P(!A)

     where !A is the complementary event from A.  Then

        P(A|B) = P(B|A) * P(A) / ( P(B|A)*P(A) + P(B|!A)*P(!A) )
     
     More generally, if the set {A_i} covers all possible events, then

        P(A_i | B) = P(B|A_i)*P(A_i) / sum_j ( P(B|A_j)*P(A_j) )


     -----------------------------
      

     Here's an example (still from the wikipedia page)

     There are two bags full of marbles.
     Bag one has 10 black and 30 white ones.
     Bag two has 20 of each.

     Suppose you pick a bag at random, 
     and from it pick a marble at random,
     which turns out to be white.

     What is the probability that you picked bag one?

     In terms of an AI learning procedure, we want 
     to use the observation (the chosen marlbe is white) 
     to update our beliefs about the world (what is the 
     probability that we choose bag one).

     
     Let A1 = bag one was chosen
         A2 = bag two was chosen
          B = observation that the marble is white.

     Then

        P(A1|B) = probability that bag one was chosen
                  given that a white marble was picked

                = P(A1)*P(B|A1) / ( P(A1)*P(B|A1) + P(A2)*P(B|A2) )
                = 0.5 * 0.75 / ( 0.5*0.75 + 0.5*0.5 )
                = 0.6

     Essentially we have in hand all the probabilities if we work
     forward from which bag is chosen - but what we're trying
     to calculate is the opposite question, of what the initial
     bag probabilities should be given some final observation
     of the marlboro.

     This is often a way that "learning" can be represented.
     With each new piece of evidence (B), we update all the 
     probabilities in our knowledge base accordingly to this rule.

     Before observing the cookie, our knowledge of which bag
     is chosen is P(A)=0.5.   After observing the cookie, 
     we update the probability (e.g. our belief) to P(A|B)=0.6.

     The wikipedia page has another nice illustration
     using medical false positives.


    -----------------------------------------------------------------

    The more general AI situation is a Bayesian Network
    (see http://en.wikipedia.org/wiki/Bayesian_network)
    in which each node is a variable whose probability
    depends on some number of "parent" nodes.