My goal is to try different approach in terms of how I massage features from the dataset and which algorithm fit best with such dataset.
I attached my Jupyter Notebook that I edited in Kaggle using its kernel utility. I mapped Gender feature from female and male becomes 1 and 0 respectively. This is because I saw female would likely survive than male. Then, I filled missing data from Age feature with 30 because that is the median age in the whole data set. Then, I binned Age feature with 1, 2, 3, 4. I saw the dataset and conclude that the older you get the less chance you will survive. That's why people who are younger than 26 are represented as 4 and people who are older than 62 are represented as 1. Lastly, I binned Fare feature with 1 as the cheapest and free and 4 as the most expensive tickets.
With only 3 feature, I train 4 SVM models with different parameter inside it. The result says polynomial model will give me 80% true prediction which I am proud of because previously I could hit 88% mark with the youtube tutorial guide.
I love talking about machine learning and how it could improve the usage of big data. Recently, I was looking for internship during summer and machine learning people have been a huge demand on the market. So, I hope there is more tutorial or even better a class that talks specifically about this field. However, I aware that the tutorial did not run the way we expected in the beginning due to a lot of issues going on through out the semester. Also, this is my first ever tutorial and I have no idea what a tutorial is until now. Overall, I feel learning and knowing Jupyter Notebook and machine learning models as a tool to make something is amazing.
I've converted it to a .html (jupyter nbconvert kernel.ipynb) and uploaded here to make it easier to view.
last modified | size | ||
kernel.html | Thu Dec 26 2024 05:52 pm | 272K | |
kernel.ipynb | Thu Dec 26 2024 05:52 pm | 8.7K |