assignments
due Tue Jan 27
getting started
- Make sure you can log into this site. If you aren't on my roster yet, email me at mahoney@marlboro.edu .
- Read the "Introduction" chapter in Problem Solving with Algorithms and Data Structures, which reviews python.
- Use the Fraction class as described in the text to find exactly \( \sum_{n=1}^{100}\frac{1}{n^2} \) . (Anything interesting about that number? Hint: multiply by six and take the square root.)
- As one more python coding practice, find the ratio of consecutive Fibonnaci numbers for large n.
- bonus: Make a plot of n vs Fib[n] , fit to a curve of the form \( y = b x^a \) , and discuss.
due Tue Feb 3
analysis and "big-O" notation
- Read the Analysis chapter of our textbook.
- To give you some practice making a plot (I suggest IPython as discussed, but you can use any tool you like, as long as you submit a plot as part of your homework) and to think about the O() notion :
- Suppose the number of steps for two algorithms are \( f(n) = 10 n + 0.1 n \log(n) \) and \( g(n) = 0.01 n^2 \).
- Plot these functions.
- What are their O() behaviors?
- Which algorithm is faster?
- A sorting algorithm takes 1 second to sort 1000 items. About how long do you think it will take to do 10,000 items if
- the algorithm is O(n**2)
- the algorithm is O(n log(n))
- Consider the problem of implementing is_substring(a,b) which is True if and only if a which has m characters is a substring of b which has n characters . For example, if a="is" and b="My name is Jim.", then the answer to is_substring(a,b) is True. On the other hand, is_substring("one", "alpha beta gamma"), then it returns False.
- If the only operation you can preform is comparing characters(i.e. a[i]==b[j]), invent an algorithm that implements this algorithm. Any algorithm will do - it does not have to be the most efficient.
- Describe what you think the O() behavior of your algorithm is, in terms of m and n.
- Implement and test your algorithm.
- Measure the time it takes your code to run on random strings of varying length, and create a plot that explicitly shows the runtime for some choices of n and m.
- Repeat the previous numerical experiment using Python's built-in operation "a in b" which also returns True or False. (This will run very fast for most strings - you can time it by putting in a large loop to do it many times, then dividing the time for the whole loop by the number of times through the loop. Or check out python's "timeit" library.)
- What do you think is the O() behavior of Python's "a in b" operation?
- Discuss what you think are sorts of strings a and b give the best, worst, and average behavior of your is_substring algorithm. What do you think the O() behaviors are for these cases?
due Tue Feb 10
linked lists and C
- Read the Data Structures chapter in the text, particularly the Stack, Queue, and LinkedList parts.
- Implement and test a doubly-linked list in python which can serve as both a stack and a queue. Your API should include at least methods for __str__, push, pop (i.e. the stack API), enqueue and dequeue (i.e. the queue API), reverse, nth (i.e. return the n'th element of the list), and a way to insert a new element into the list. Describe the expected O() behavior of each method.
- Since there was strong support for playing around with the C programming language, browse through the C references, docs, and tutorials that I've posted on the resources page. Then:
- Write a FizzBuzz (google it) program in C. Can you translate directly to and from python? How are they similar and different?
- Write a C program that asks for five words, one at a time, and then prints them back. Optionally, try different ways of storing those words :
- as characters in a single block of memory
- as an array of pointers to strings
- as data in a linked list
due Tue Feb 17
recursion
- Read the chapter on recursion.
- Use the turtle graphics module to recursively draw a Koch curve. (Coding exercise 8 in the text.)
- Pascal's Triangle is a famous array of binomial coefficients. (Google it if you are unfamiliar). Each term can be determined recursively from two smaller ones. Write a program to print out n lines of these coefficients. (Coding exercise 13 in the text.)
- Read the descriptions of coding exercises 14 (knapsack problem) and 15 (string edit distance). For one of these two, implement an algorithm using dynamic programming. (Both of these are well studied, so if you don't see how to proceed without help, google to find a discussion. If so, quote your sorces.) Discuss briefly the O() behavior of your approach.
due Tue Feb 24
searching
- For your last dynamic programming / recursion task, explore (i.e. understand, implement, test) the string edit distance as described at string edit distance. Make sure you can manually do the algorithm on short strings first.
- Read the sequential, binary, and hashing parts of the Searching chapter in the text.
- Test functions that do sequential and binary search on an ordered list of random integers, and show that binary is much faster for big lists, at least waving your hands at seeing the O(n) vs O(log n) difference. (That means I want you to demonstrate what's going on but am not requiring lots of data with graphs and curve fitting and all that.) You can use the textbook code (if so quote your sources), but make sure you understand what's going on.
- Write and test a program that stores strings in a hash table. All the details are up to you, including the choice of a hash function and how to handle collisions. Do explain what's going on, and what the pros and cons are of your choices. Again, you can use code from the textbook - or parts of their code - but if so quote your sources and make sure you understand it.
due Tue Mar 3
sorting
- Read the text sections on sorting. Also check out the wikipedia article on the same topic.
- Explain how shell sort and quicksort work. Demonstrate manually how these work on the list [5, 3, 2, 8, 10, 12, 8, 9]. (You can write code or pseudo-code if you like, but the point here is to show your understanding, not your coding.)
- Start working on your midterm project (see below). Decide which algorithms you'll analyze, and turn in some code that shows you've put in some time.
due Tue Mar 10
trees and heaps
- Read the first part of the trees chapter of the text, through the sections on heaps.
- Also check out several wikipedia articles on these topics, which are more complete and in several places I think clearer.
- Also check out python's built-in heap functions.
- Using any binary tree representation (in python) that you like, build a tree that represents the following algebraic expression : 2 + 3 * 7 + 3 * 5 * (3 + 6 * (4 + 5)). Implement a post-traversal (if you like buzz-words) recursive function that returns the value of such a tree, or of any sub-tree. (You do not need to write a general parser for such expressions, which would create such a tree from a string. What I'm asking for is that you build that a tree for this expression, using something like the insert() operations in the text, and then write something which calculates the value with a recursive function.)
- Explain what a "min heap" is, and why the various (peek, push, pop) operations have (O(1), O(log n), O(log n)) costs. As explained in the wikipedia article, there are two common 'heapify' algorithms which convert an entire collection to a heap, one of which is O(n log(n)), and one of which is O(n), as described in the wikipedia article. Optional: explain what's going on there.
- Using the built-in python methods from the heapq module, implement and test your own heapsort sorting routine. (Note: the one listed in the python docs is *not* the most efficient. For all the brownie points, explain why.)
due Fri Mar 13
midterm sorting project
- Choose two sorting algorithms to implement, test, analyze, and compare. Any of the ones we've discussed are fair game. If you want to use something we haven't talked about, check with me first.
- For each one, your mission is to
- Explain how it works - enough to convince me you understand it. Also explain what its O() behavior should be on random lists, and why. Discuss when (if ever) this algorithm might be an appropriate choice.
- Implement it in python or C
- Test it, i.e. show it does sort lists successfully.
- Measure with numerical experiments the run time or number of steps it takes when sorting lists of random numbers of (at least) length 1e1, 1e3, 1e5, 1e7. Show explicitly with a graph that it has the O() behavior you expect.
- Rules of the road:
- I suggest an ipython notebook as the format ... but that's your choice, as long as all the pieces are turned in.
- Don't work with other students or a tutor on this - I want to see your work. You may use online sources, but if so, quote exactly what you took from where. Do not just copy and paste code that implements one of these algorithms in one fell swoop - that is not a good way to demonstrate your understanding.
- This will be graded, and will be 1/3 of your term grade. Your mission is to convince me you understand what's going on. Code and write-up clarity (API, docstrings, following conventions for the language) matters. Clear and thorough for a simple algorithm is better than a partly-finished and partially-understood complicated one.
- If you need an extension, talk to me. Late work will be given a lower grade.
- If you have questions about any of this, ask me.
due Tue Apr 7
graph basics
- Start reading the graphs chapter in the text, at least through the concepts of depth-first and breadth-first searches.
- Here is a tree
and a cyclic graph
- Write down two computer representations for each of these, using (a) an adjacency matrix and (b) an adjacency list as defined in class and our text. (That's four representations altogether, using some computerish pseudo-code or Python or whatever.)
- Write down the order that the vertices will be visited for both of these, for both (a) a depth-first and (b) a breadth-first search (i.e. traversal), starting at "A" and "a". (You can do this by hand if you like or with code - just make sure you understand why the order is what you say it is.) Now do the same starting at "D" and "d".
- Define the "diameter" of a graph as a the longest of the shortest paths between vertices. Here's one algorithm that tries to find the diameter:
Pick any node, call it X. From X, do a depth-first search, and from
that find the maximum depth and a vertex furtherst down, call it A.
Now starting at A, do another depth first search, and find the
deepest node from A, call it B. Return the diameter (we hope) as
the depth from A to B.
Consider running this on a tree (an acyclic undirected connected graph)
and a cyclic graph, like the two illustrated above. Does the algorithm
work in either of these cases? Explain why you think so. What is it's
O() behavior? Extra credit: prove your claim in both cases.
- (Optional) Implement any of the tasks above.
due Tue Apr 14
graph algorithms 1
- Choose any one of the graph algorithms in the April 7 notes to explore in more detail. Implement it, test it on some appropriate sample graphs, and come to class Tuesday ready to discuss your code and understanding of that algorithm.
- You may use or adapt code from other sources (i.e. the textbook, mine, or other you find online); if so make sure you (a) quote your sources, (b) understand how it works, (c) test its API on a simple case. Or you can roll your own. In either case, try to organize your implementation in reusable pieces for (i) the graph itself, (ii) any required data structures (such as a binary heap or a queue), and (iii) the core of the algorithm itself.
due Tue Apr 21
graph algorithms 2
- A second assignment like last week: pick another of the algorithms we've been discussing to implement and test. The new ones are A* and game search - you can do one of those or another from last week's.
due Tue Apr 28
compression 1
- Explain what a "prefix code" is.
- Explain how a "Huffman code" is derived.
- Pick an ascii text source of at least a page. (And include it with the assignment.) Derive a Huffman code for that text, either with a program or by hand or some combination.
- Discuss how much compression could be achieved using your code rather than the original text.
due Tue May 5
compression 2
- Read about the LZW algorithm using the links on the Apr 28 notes.
- Describe in your own words how the compression and decompression work for the LZW algorithm. Are there any tricky cases to worry about.
- (You should be working on your final project ... so I didn't put in any coding for this week.)
due Fri May 8
final project
- Choose one graph algorithm to implement, test, and analyze with a numerical experiment, and compare. Any of the ones we've discussed are fair game - I suggest Dijkstra's, Floyd's, Prim's, or Kruskal's. If you want to use something we haven't talked about, check with me first.
- Your mission is
- Explain how it works - enough to convince me you understand it.
- Explain what its expected O() behavior should be on a random graph, and why. (We didn't talk much about this in class - you'll need to do a bit of reading and thinking.)
- Implement it in python or C
- Test it and demonstrated that it works.
- Measure its behavior with numerical experiments on graphs of different dizes. You can use either the run time or number of steps it takes. (Don't include the time it takes to create the random graphs, only the time for the algorithm itself.) This is the main point of the project - don't skip it.
- Rules of the road - same as for the midterm. Please re-read.
- Extensions to the following Monday are available on request. (But my grades are due soon after that. The following Wed noon is the school's hard deadline beyond which no work for this term can be accepted.)
term grade
- A place for Jim to submit final comments and a semester grade.