Data
Science

Spring 2020
course
site

Feb 27 - k nearest neighbors (KNN)

This week we're doing our first ML (machine learning) example, looking at probably the simplest approach : find the data points closest to the unknown, and guess from those.

Your mission is to :

Read about the k nearest neighbors algorithm in places like

By a week from today, Fri March 6, make a jupyter notebook that uses this algorithm on one of these datasets and come to class ready to describe what you did and how it worked.

I'll discuss the basic ideas today, and show an example and/or answer questions on Tuesday.

concepts

the problem

Suppose our data looks like this :

 x1    x2   x3    y        z
 --------------   --       --
 1.2   2.3   10   red      50 
 2.1   3.2   11   red      80
 20    20   500   green    90
 30    30   600   green    95

The problem is to guess either y or z given some (x1,x2,x3) e.g. (5,4,20).

Guessing y is a "categorization" problem, e.g. "Is this book a romance, western, or science_fiction?"

Guessing z is a "regression" problem, e.g. "What will the temperature be on July 11?"

algorithm

TLDR

Which movie will you like? Well, you liked Terminator. So you'll probably like Terminator II.

related

discussion

https://cs.marlboro.college /cours /spring2020 /data /notes /feb27
last modified Tue October 15 2024 6:09 am