This explanation is almost entirely from http://en.wikipedia.org/wiki/Bayes%27_theorem ---------------------------------------------------------- (1) Typical definition of probability is P(x) = (number of times X happens)/(total number of all events) (2) Bayesian probability takes a different interpretation, in terms of a *degree of belief* in proposition X. This is particularly useful in AI learning systems, where one can use it to build a knowledge base from observed evidence. Let P(A|B) be the conditional probability of A given B, and let P(A,B) be the joint probability of A and B. Then P(A|B) * P(B) = P(A,B) = P(B|A) * P(A) which implies that P(A|B) = P(B|A) * P(A) / P(B) which is called Bayes' Theorem. In this last formula, P(A) is called the "prior probality" of A, or the "marginal probality", before any knowledge of B. P(A|B) is called the "posterier probability" of A, derived from or entailed by a specific B. A further embellishment is P(B) = P(A,B) + P(!A, B) = P(B|A)*P(A) + P(B|!A)*P(!A) where !A is the complementary event from A. Then P(A|B) = P(B|A) * P(A) / ( P(B|A)*P(A) + P(B|!A)*P(!A) ) More generally, if the set {A_i} covers all possible events, then P(A_i | B) = P(B|A_i)*P(A_i) / sum_j ( P(B|A_j)*P(A_j) ) ----------------------------- Here's an example (still from the wikipedia page) There are two bags full of marbles. Bag one has 10 black and 30 white ones. Bag two has 20 of each. Suppose you pick a bag at random, and from it pick a marble at random, which turns out to be white. What is the probability that you picked bag one? In terms of an AI learning procedure, we want to use the observation (the chosen marlbe is white) to update our beliefs about the world (what is the probability that we choose bag one). Let A1 = bag one was chosen A2 = bag two was chosen B = observation that the marble is white. Then P(A1|B) = probability that bag one was chosen given that a white marble was picked = P(A1)*P(B|A1) / ( P(A1)*P(B|A1) + P(A2)*P(B|A2) ) = 0.5 * 0.75 / ( 0.5*0.75 + 0.5*0.5 ) = 0.6 Essentially we have in hand all the probabilities if we work forward from which bag is chosen - but what we're trying to calculate is the opposite question, of what the initial bag probabilities should be given some final observation of the marlboro. This is often a way that "learning" can be represented. With each new piece of evidence (B), we update all the probabilities in our knowledge base accordingly to this rule. Before observing the cookie, our knowledge of which bag is chosen is P(A)=0.5. After observing the cookie, we update the probability (e.g. our belief) to P(A|B)=0.6. The wikipedia page has another nice illustration using medical false positives. ----------------------------------------------------------------- The more general AI situation is a Bayesian Network (see http://en.wikipedia.org/wiki/Bayesian_network) in which each node is a variable whose probability depends on some number of "parent" nodes.