I wrote a spam filter for text messages from the kaggle website.
https://www.kaggle.com/uciml/sms-spam-collection-dataset
I used methods from the chapter, but also from this notebook:
https://www.kaggle.com/adrianciconte55/99-prediction-accuracy/notebook
I used methods from this article to clean the text by removing punctuation and capitalization:
I just tested a linnear regression model with cross validation and obtained these results: Precision: 95.26% Recall: 100.00%
Should probably set up a confusion matrix because the data was heavily weighted towards ham and test a few more models.
last modified | size | ||
spam.ipynb | Wed Nov 27 2024 04:21 am | 11K |