binary text classification ....
a blind email this week from Adam Carrigan (carrigan@berkeley.edu)
Hi Jim,
My team and I are currently working on an open-source research project
out of UC Berkeley that makes it very simple for developers to use
the power of artificial neural networks in their projects.
I found your profile on GitHub and thought you would be the ideal
person to try MindsDB out and provide feedback so we can make it
even better.
If you have any feedback feel free to email me back or to
file a ticket on GitHub.
You can find our Repo here: https://github.com/mindsdb/main#mindsdb
and our website here http://mindsdb.com
Kind regards,
Adam & Jorge
I tried to do titanic dataset competition. However, I know completely nothing about how to 'massage' data using panda. So, I followed this youtube video and then I tried to fine tune the model using for loops and find the most optimum parameter value. Files are in my folder
wrote a spam filter for text messages from the kaggle website.
https://www.kaggle.com/uciml/sms-spam-collection-dataset
I used methods from the chapter, but also from this notebook:
https://www.kaggle.com/adrianciconte55/99-prediction-accuracy/notebook
I used methods from this article to clean the text by removing punctuation and capitalization:
https://medium.com/@chaimgluck1/have-messy-text-data-clean-it-with-simple-lambda-functions-645918fcc2fc
I just tested a linnear regression model with cross validation and obtained these results: Precision: 95.26% Recall: 100.00%
Also tried the stochastic gradient filter with slightly better results and tried graphing a precision/recall matrix, that failed.
Should probably set up a confusion matrix because the data was heavily weighted towards ham and test a few more models.
last modified | size | ||
spam.csv | Fri Dec 13 2024 08:30 am | 504K | |
spam.ipynb | Fri Dec 13 2024 08:30 am | 30K |