April 2

greedy algorithms

Review where we've been:

effiency and O() notation, both theory and experiment
most important distinction is polynomial vs exponential
brute force (check every possibility exhaustively)
divide problem into smaller chunks; often gives O( something log n)
pre-calculate part of it can make a big difference
various approaches to sorting
various tricky storage structures: heap, hash, tree, ...

We're going to skip dynamic programming in the interestes of time.

Midterms: heap implementations. Discuss.

New chapter: "greedy" algorithms. We'll see where we are at this point - I may take more than a week with this chapter.

Basic idea: at each choice point, take single best choice that goes furthest towards goal. If you're lucky, that approach will also give the best solution.

Example: making change with (0.50, 0.25, 0.10, 0.05) using fewest coins. (Yes, it's simple, but do one of these just to be clear.)

Counter-example: making change with (7, 5, 1). If x=5, greedy algorithm gives (7,1,1,1) with 4 coins, but (5,5) is only 2 coins.

Several of the examples are graph algorithms, which nearly always explode if you use brute force. Coding them is a bit tricky, since as we've seen even defining the problem takes connection matrices and what-not. As a result, the pseudo-code tends to be a bit vague and jargony, IMHO.

Summary:

Prim's algorithm

Given a set of points with possible connections between 'em with weights (for example distances) for each connection, find the minimum set of set of those connections that gives a path between any two points. This is called the "minimum spanning tree"

The method is to start with an empty tree, pick any point to start, and continue to add one point at a time by choosing the one with the smallest weight. The tricky part here is that at each step, the algorithm needs to find the closest point of the remaining unconnected ones to all those already connected.

Kruskal's algorithm

Same problem with a different approach: sort all the edges by weight. From lowest to highest, add the edge to the list if it doesn't create a cycle. The trick part here is that at each step the algorithm needs to determine if a cyle (i.e. closed loop) has been formed.

You can prove that both of these give the same optimal solution.

Dijkstra's algorithm

Given the same set of points with possible weighted connections, this time find the shortest paths to all the other points from one given point.

The method here is to add one point at a time, the next closest, only by considering the ends of the paths already found.

Huffman trees

(See wikipedia: huffman coding)

This section introduces a new class of problems, namely that of "coding": how to assign a code for each character (or word) given some properties of the characters (like how probably they are.)

Huffman coding is a particularly well known version of this, which uses what's called "variable length coding", in which the code words have different lengths, like in morse code.

The problem is this: given a set of letters and their probabilities, find a binary code that represents those letters using the least bits.

Huffman coding uses a "prefix code", in which each prefix is unique. This avoids needing any boundry markers between codes ... but the idea is a bit tricky, and will need some explaining.

Given all that (whew!), Huffman coding generates a tree of prefix codes from the bottom up, by taking the least important once that are left and assigning one "0" and one "1" in the binary tree. The letters with the higest probablity end up near the top of the tree, with the shortest code strings.

We'll have to do an example to see this; it's very cool.

http://cs.marlboro.edu/ courses/ spring2007/algorithms/ notes/ april_2
last modified Monday April 2 2007 10:20 am EDT

Algorithms