Jim's
Tutorials

Fall 2018
course
site

oct 11

binary text classification ....

aside

a blind email this week from Adam Carrigan (carrigan@berkeley.edu)

Hi Jim,

My team and I are currently working on an open-source research project 
out of UC Berkeley that makes it very simple for developers to use 
the power of artificial neural networks in their projects.

I found your profile on GitHub and thought you would be the ideal 
person to try MindsDB out and provide feedback so we can make it 
even better.  

If you have any feedback feel free to email me back or to 
file a ticket on GitHub.

You can find our Repo here: https://github.com/mindsdb/main#mindsdb

and our website here http://mindsdb.com 

Kind regards,

Adam & Jorge 

Jimmy

I tried to do titanic dataset competition. However, I know completely nothing about how to 'massage' data using panda. So, I followed this youtube video and then I tried to fine tune the model using for loops and find the most optimum parameter value. Files are in my folder

Nate

wrote a spam filter for text messages from the kaggle website.

https://www.kaggle.com/uciml/sms-spam-collection-dataset

I used methods from the chapter, but also from this notebook:

https://www.kaggle.com/adrianciconte55/99-prediction-accuracy/notebook

I used methods from this article to clean the text by removing punctuation and capitalization:

https://medium.com/@chaimgluck1/have-messy-text-data-clean-it-with-simple-lambda-functions-645918fcc2fc

I just tested a linnear regression model with cross validation and obtained these results: Precision: 95.26% Recall: 100.00%

Also tried the stochastic gradient filter with slightly better results and tried graphing a precision/recall matrix, that failed.

Should probably set up a confusion matrix because the data was heavily weighted towards ham and test a few more models.

https://cs.marlboro.college /cours /fall2018 /jims_tutorials /machine_learning /oct11
last modified Fri December 13 2024 8:30 am

attachments [paper clip]

  last modified size
TXT spam.csv Fri Dec 13 2024 08:30 am 504K
TXT spam.ipynb Fri Dec 13 2024 08:30 am 30K