I wrote a spam filter for text messages from the kaggle website.

I used methods from the chapter, but also from this notebook:

I used methods from this article to clean the text by removing punctuation and capitalization:

I just tested a linnear regression model with cross validation and obtained these results: Precision: 95.26% Recall: 100.00%

Should probably set up a confusion matrix because the data was heavily weighted towards ham and test a few more models.

attachments

		last modified	size
	spam.ipynb	Wed Jul 03 2024 05:21 am	11K

Jim's
Tutorials