April 21

projects

First order of business: show us what you've done on your project.

decision trees

I'll walk through what a "decision tree" is, namely another machine learning model, using the material from the textbook in chapter 17.

Decision trees are like the game of twenty questions ... the trick is figuring out which questions to ask, and in which order.

One mathy way to choose which questions is to minimize the "average partition entropy", taking a "greedy" algorithmic approach to doing the best we can with each question that partitions the data.

So the model is a tree of questions, splitting the data on a category label or range of numeric data.

Here's a (very) short illustration.

pick a number from 1 to 10

The "from scratch" text uses that idea to generate this model for this data

a decision tree : data & model

These models tend to overfit. One way to avoid this is to use many models and average them ... the "random forest" approach.

Something like this would be entirely possible to put in place for your projects, and might be a good alternative to nearest neighbors for datasets that are of manageable size.

next ?

Discuss what to do for Thursday ... perhaps look at this example of kaggle titanic data using decision trees ?

aside

Building an end-to-end Speech Recognition model in PyTorch | hackernews discussion
xkcd - garbage in, garbage out ... or maybe not.
avatarify lets users run deepfakes on live video calls | hackernews discussion

https://cs.marlboro.college /cours /spring2020 /data /notes /apr21
last modified Fri January 24 2025 7:40 pm

Data
Science

course

site

April 21

projects

decision trees

next ?

aside

DataScience

course

site

April 21

projects

decision trees

next ?

aside

Data
Science