Skip to content

Machine learning project that detects diabetes using a number of different health indicators by binary classification.

Notifications You must be signed in to change notification settings

jacobmcazure/Diabetes-Health-Indicators

Repository files navigation

Diabetes Health Indicators Prediction Model

Owner: Jacob McEwen Contact: [email protected]

This is a machine learning model that classifies patients by predicting whether a patient is non-diabetic or is prediabetic/has diabetes.

The dataset used to train this model can be found here:

https://www.kaggle.com/datasets/julnazz/diabetes-health-indicators-dataset/data

The dataset includes 21 features and ~236,000 entries. This project leverages many different standard techniques, including EDA (exploratory data analysis) before the models are constructed. The two models that will be used are Random Forest and Logistic Regression, both classification algorithms.

The tools used for EDA (exploratory data analysis) are as follows:

  • Python
  • NumPy
  • Pandas
  • Matplotlib
  • Scikit-learn

Other tools for the environment used are as follows:

  • Anaconda
  • Jupyter Notebook

How to Run

  • Clone and unzip the repo
  • Launch Anaconda Prompt
  • cd into the directory of the .ipynb file
  • Activate the conda environment
  • Launch jupyter notebook

How to improve and further tweak the accuracy of the models

The accuracy of both models are ~86%. This can be further improved with hyperparameter tuning. This project invites usage of hyperparamter tuning in all aspects. Please feel free to experiment with the model and see what improvements or modifications you can make!

About

Machine learning project that detects diabetes using a number of different health indicators by binary classification.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published