Diabetes Health Indicators Prediction Model

Owner: Jacob McEwen Contact: [email protected]

This is a machine learning model that classifies patients by predicting whether a patient is non-diabetic or is prediabetic/has diabetes.

The dataset used to train this model can be found here:

https://www.kaggle.com/datasets/julnazz/diabetes-health-indicators-dataset/data

The dataset includes 21 features and ~236,000 entries. This project leverages many different standard techniques, including EDA (exploratory data analysis) before the models are constructed. The two models that will be used are Random Forest and Logistic Regression, both classification algorithms.

The tools used for EDA (exploratory data analysis) are as follows:

Python
NumPy
Pandas
Matplotlib
Scikit-learn

Other tools for the environment used are as follows:

Anaconda
Jupyter Notebook

How to Run

Clone and unzip the repo
Launch Anaconda Prompt
cd into the directory of the .ipynb file
Activate the conda environment
Launch jupyter notebook

How to improve and further tweak the accuracy of the models

The accuracy of both models are ~86%. This can be further improved with hyperparameter tuning. This project invites usage of hyperparamter tuning in all aspects. Please feel free to experiment with the model and see what improvements or modifications you can make!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
diabetes-health-indicators-binary-classification.ipynb		diabetes-health-indicators-binary-classification.ipynb
diabetes_binary_health_indicators_BRFSS2021.csv		diabetes_binary_health_indicators_BRFSS2021.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes Health Indicators Prediction Model

This is a machine learning model that classifies patients by predicting whether a patient is non-diabetic or is prediabetic/has diabetes.

The dataset used to train this model can be found here:

The tools used for EDA (exploratory data analysis) are as follows:

How to Run

How to improve and further tweak the accuracy of the models

About

Releases

Packages

Languages

jacobmcazure/Diabetes-Health-Indicators

Folders and files

Latest commit

History

Repository files navigation

Diabetes Health Indicators Prediction Model

This is a machine learning model that classifies patients by predicting whether a patient is non-diabetic or is prediabetic/has diabetes.

The dataset used to train this model can be found here:

The tools used for EDA (exploratory data analysis) are as follows:

How to Run

How to improve and further tweak the accuracy of the models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages