Artificial
Intelligence

Fall 2011
course
navigation

Nov 17

aside

More Stanford online classes:
nlp-class.com (natural language processing) pgm-class.com (probabilistic graph models) & others

decision trees

I assigned 18.6 as a way to get you to look at the basic idea, without getting too far into the math details.
The basic idea is that one mechanism to draw a conclusion from a data, is to make a series of sequential choices. This is particularly good when all the variables are discrete, i.e.
in1 in2 in3 output 1 0 1 1 2 0 3 1 ...
In a machine learning context, each row is one training example, and the decision tree is the machine we're going to build from the examples.
First point: any ordering of the inputs gives you a possible tree which can give those outputs.
Second point: depending on how well the outputs match the inputs, some trees will be simpler (i.e. better) than others, giving some confidence that it is a good "model" for that data. Complicated trees are over trained, fitting that specific set of data but not representing general trends.
Third point: since we want a simple tree, we want to find an order for the choices that splits things as much as possible.
With that in mind, discuss the assigned problem AMIA 18.6 :
A1 A2 A3 Output ------------------ 1 0 0 0 1 0 1 0 0 1 0 0 1 1 1 1 1 1 0 1
Look at what happens intuitively for various choices of using A1, A2, A3 first to divide things up, and what the good choices are after that.
The math details of the best way to do this heads into information theory, which I was just glossing over.
But Sam asked about how the "importance" function works, which is at the heart of it, so, here's how it works:
Discuss (briefly) the idea of information entropy, bits per symbol. If p1, p2, p3, ... are the probabilities of each symbol, then H(p1, p2, p3) = -sum( p[i] * log2(p[i]) For a boolean with only two probabilities (q, 1-q) and following the books notation, this is B(q) = - q log2(q) - (1-q) log2(1-q) which is how many bits of info there is. (Discuss briefly; draw the upside down parabola sketch.) Still following the book notation, in one "clump" of things p = number of positive n = number of negative B(p/(n+p)) = bits of info When we use one variable to split the data into a partition of clumps, the best split causes the biggest information gain (bits per symbol). So the technique is to use look at B before and after the split : importance = B(p/(n+p)) - weighted_sum B(pk/(nk+pk)) before split after split where pk = number of positive in k'th partition nk = number of negative in k'th partition weighting is over number in that paritition compared to total
Apply these numbers to 18.6, and compare with intuition. I put the solution in this folder.

more computer vision

openCV and processing.org
Examples I tried crashed on my Mac.
I did get some opencv + python working:
$ sudo port install opencv +python26 $ sudo port select --set python python26 # as opposed to python26-apple $ python >> import cv >> # works!
Then on to
This worked :
import cv img = cv.LoadImageM("dime_building.jpeg", cv.CV_LOAD_IMAGE_GRAYSCALE) eig_image = cv.CreateMat(img.rows, img.cols, cv.CV_32FC1) temp_image = cv.CreateMat(img.rows, img.cols, cv.CV_32FC1) for (x,y) in cv.GoodFeaturesToTrack(img, eig_image, temp_image, 10, 0.04, 1.0, useHarris = True): print "good feature at", x,y
And there are more python examples in the OpenCV-2.2 source
including facedetect.py : see the attached screenshot for an example.
The heart of the python code is a call to HaarDetectObjects(), which uses an xml description of a "frontal face detection" trained specificiation; a "cascade" of "haar-like features".
http://cs.marlboro.edu/ courses/ fall2011/ai/ notes/ Nov_17
last modified Thursday November 17 2011 8:13 am EST

attachments [paper clip]

     name last modified size
[IMG]face_detect_screenshot.jpg Nov 17 2011 4:00 am 106kB    facedetect.py Nov 17 2011 4:08 am 3.48kB