Jim's
Tutorials

Fall 2018
course
site

I wrote a spam filter for text messages from the kaggle website.

https://www.kaggle.com/uciml/sms-spam-collection-dataset

I used methods from the chapter, but also from this notebook:

https://www.kaggle.com/adrianciconte55/99-prediction-accuracy/notebook

I used methods from this article to clean the text by removing punctuation and capitalization:

https://medium.com/@chaimgluck1/have-messy-text-data-clean-it-with-simple-lambda-functions-645918fcc2fc

I just tested a linnear regression model with cross validation and obtained these results: Precision: 95.26% Recall: 100.00%

Should probably set up a confusion matrix because the data was heavily weighted towards ham and test a few more models.

attachments [paper clip]

  last modified size
TXT spam.ipynb Wed Apr 24 2024 01:02 pm 11K