Exploring the World's Most Renowned Shipwreck 🚢

Exploring the World's Most Renowned Shipwreck 🚢

In 1912, the Titanic set off on its first voyage across the Atlantic Ocean, carrying passengers ranging from the wealthy elite to emigrants seeking a new life. Tragically, the ship collided with an iceberg and sank, resulting in the loss of over 1,500 lives. This disaster not only shook the world but also sparked discussions about maritime safety and the social dynamics of the time.

This repository explores the factors affecting passenger survival on the Titanic and aims to build a predictive model to estimate survival probabilities based on available passenger characteristics. The available dataset contains a detailed records of the passengers aboard, including information such as age, gender, passenger class, fare paid, and survival outcome. However, some key data points are missing, particularly in features like age and cabin, which poses challenges for building accurate predictive models.

In this project, two different approaches are explored and compared based on model performance:

1. Removing Missing Data: This method involves deleting rows with missing values to clean the dataset. While it ensures that the remaining data is complete, it reduces the number of observations available for analysis.
2. Filling Missing Data: This approach fills in missing values in an effort to retain more data and potentially enhance the model's performance.

Overall, more robust models (Random Forest, XGBoost) were achieved using the second approach, which involved filling in missing values. A version of the developed model was also submitted to Kaggle’s Titanic-Machine Learning from Disaster competition, where it ranked in the top 9.38% (1316 out of 14036).

Given that the true survival status of Titanic passengers is publicly available, some higher-ranked entries likely used manually crafted labels to achieve near-perfect accuracies. Therefore, the actual position of the provided model could be higher if all competitors strictly followed the competition rules. You can also find the Kaggle's notebook here.

It's important to mention that the score shown in the above image (0.78947) was achieved through a slightly modified ensemble model and different parameter tuning compared to the provided notebook (0.78468). These exact details are not shared here to encourage independent experimentation and to prevent you from overfitting. 😜

Dataset Description

The Titanic dataset used in this project is divided into two main files: train.csv and test.csv. Below is a brief description of each file:

train.csv: This is the primary training dataset containing labeled data used to train the model. It includes 891 records and 12 columns, with the Survived column indicating whether a passenger survived (1) or not (0). This dataset is used to build and validate the machine learning model.
test.csv: This is the test dataset that contains 418 records and 11 columns. It does not have the Survived column. The goal is to predict Survived using a model trained on the provided training data.

On the competition's data, you will also find the gender_submission.csv file, which is an example submission file (not the true labels) provided by Kaggle. This file shows the expected format of the predictions, containing only the PassengerId and Survived columns.

The following table provides a detailed description of the columns found in train.csv and test.csv:

Column Name	Data Type	Description
`PassengerId`	Integer	Unique identifier for each passenger
`Survived`	Integer	Survival status (0 = No, 1 = Yes)
`Pclass`	Integer	Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd)
`Name`	String	Name of the passenger
`Sex`	String	Gender of the passenger (`male`, `female`).
`Age`	Float	Age of the passenger
`SibSp`	Integer	Number of siblings/spouses aboard the Titanic
`Parch`	Integer	Number of parents/children aboard the Titanic
`Ticket`	String	Ticket number
`Fare`	Float	Passenger fare
`Cabin`	String	Cabin number
`Embarked`	String	Port of embarkation (`C` = Cherbourg; `Q` = Queenstown; `S` = Southampton)

Setup Instructions

Google Colab Setup

Download the required dataset from:
- Kaggle - Titanic: Machine Learning from Disaster
Upload the train.csv and test.csv files to your own Google Drive in your preferred folder structure.
Update the file paths in the notebook to reflect your own Google Drive paths.
Run the notebook cells as instructed to reproduce the results.

Local Environment Setup

Download the required dataset from:
- Kaggle - Titanic: Machine Learning from Disaster

Clone the repository:

git clone https://github.com/Dalageo/ML-TitanicShipwreck.git

Navigate to the cloned directory:
```
cd ML-TitanicShipwreck
```
Open the Exploring the World's Most Renowned Shipwreck.ipynb using your preferred Jupyter-compatible environment (e.g., Jupyter Notebook, VS Code, or PyCharm)
Update file paths for train.csv and test.csv as needed.
Run the cells sequentially to reproduce the results.

Acknowledgments

The dataset used in this project is provided by Kaggle as part of the Titanic-Machine Learning from Disaster competition. Special thanks to Kaggle's data science community, and Will Cukierski for making this dataset available for educational and research purposes.

License

This work is licensed under the Apache License 2.0. It was chosen to comply with the competition rules, which require the use of an Open Source Initiative (OSI) approved license that permits commercial use while promoting open collaboration.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Exploring the World's Most Renowned Shipwreck 🚢.ipynb		Exploring the World's Most Renowned Shipwreck 🚢.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the World's Most Renowned Shipwreck 🚢

Dataset Description

Setup Instructions

Google Colab Setup

Local Environment Setup

Acknowledgments

License

About

Releases

Packages

Contributors 2

Languages

License

Dalageo/ML-TitanicShipwreck

Folders and files

Latest commit

History

Repository files navigation

Exploring the World's Most Renowned Shipwreck 🚢

Dataset Description

Setup Instructions

Google Colab Setup

Local Environment Setup

Acknowledgments

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages