Disaster Response Pipeline Project

Summary: I have analysed the disaster data from Figure Eight and built a Random Forest model for an API that classifies disaster messages across 36 categories. The model performs relatively well- avg weighted f1-score:0.94.
Purpose: gain experience in writing Data Engineering Pipelines, Machine Learning Pipelines and web development with Flask.
Task: Create a multiclass model predicting the emergency categories that a message may belong to.
Demo: See gifs with web app demo: classifier, graphs.

Project Components:

ETL Pipeline- process_data.py

Loads the messages and categories datasets
Merges the two datasets
Cleans the data
Stores it in a SQLite database

ML Pipeline- train_classifier.py

Loads data from the SQLite database
Splits the dataset into training and test sets
Builds a text processing and machine learning pipeline
Trains and tunes a model using GridSearchCV
Outputs results on the test set
Exports the final model as a pickle file

Flask Web App- run.py

Classifies inputed message using the pickle model
Includes 2 interactive visualisations
Query specific category for top words

Project structure:

- app
| - template
| |- master.html  # main page of web app
| |- go.html  # classification result page of web app
| - static # Folder with static data visualisations
|- run.py  # Flask file that runs app

- data
|- disaster_categories.csv  # data to process 
|- disaster_messages.csv  # data to process
|- process_data.py
|- DisasterResponse.db   # database to save clean data to

- models
|- train_classifier.py
|- classifier.pkl  # saved model 

- README.md

Requirements:

python==3.7.6
Flask==1.1.2
matplotlib==3.1.3
numpy==1.18.1
pandas==1.0.2
pickleshare==0.7.5
plotly==4.6.0
scikit-learn==0.22.1
SQLAlchemy==1.3.16
wordcloud==1.6.0
sys
re
json

Instructions:

Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
  
  python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
Run the following command in the app's directory to run your web app. python run.py
Go to http://0.0.0.0:3001/

Further recommendations:

To further improve the model, I recommend more data cleaning as well as adding word to vec feature embeddings. I would also try to reduce the class imbalance and see if it can improve the model performance.

Credits:

Thanks to Udacity and Figure Eight for providing the project idea and data to work with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster Response Pipeline Project

Project Components:

Project structure:

Requirements:

Instructions:

Further recommendations:

Credits:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
app		app
data		data
demo		demo
models		models
.gitignore		.gitignore
ML Pipeline Preparation-final.ipynb		ML Pipeline Preparation-final.ipynb
README.md		README.md

gajdulj/disaster_response

Folders and files

Latest commit

History

Repository files navigation

Disaster Response Pipeline Project

Project Components:

Project structure:

Requirements:

Instructions:

Further recommendations:

Credits:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages