Fall 2018

Final Project: Titanic

My goal is to try different approach in terms of how I massage features from the dataset and which algorithm fit best with such dataset.


I attached my Jupyter Notebook that I edited in Kaggle using its kernel utility. I mapped Gender feature from female and male becomes 1 and 0 respectively. This is because I saw female would likely survive than male. Then, I filled missing data from Age feature with 30 because that is the median age in the whole data set. Then, I binned Age feature with 1, 2, 3, 4. I saw the dataset and conclude that the older you get the less chance you will survive. That's why people who are younger than 26 are represented as 4 and people who are older than 62 are represented as 1. Lastly, I binned Fare feature with 1 as the cheapest and free and 4 as the most expensive tickets.

With only 3 feature, I train 4 SVM models with different parameter inside it. The result says polynomial model will give me 80% true prediction which I am proud of because previously I could hit 88% mark with the youtube tutorial guide.


I love talking about machine learning and how it could improve the usage of big data. Recently, I was looking for internship during summer and machine learning people have been a huge demand on the market. So, I hope there is more tutorial or even better a class that talks specifically about this field. However, I aware that the tutorial did not run the way we expected in the beginning due to a lot of issues going on through out the semester. Also, this is my first ever tutorial and I have no idea what a tutorial is until now. Overall, I feel learning and knowing Jupyter Notebook and machine learning models as a tool to make something is amazing.

Jim says

I've converted it to a .html (jupyter nbconvert kernel.ipynb) and uploaded here to make it easier to view.

attachments [paper clip]

  last modified size
TXT kernel.html Sat Feb 04 2023 06:06 pm 272K
TXT kernel.ipynb Sat Feb 04 2023 06:06 pm 8.7K