The prevalence of diabetes cases in the United States has emerged as a pressing contemporary issue. There is a need for comprehensive strategies for early detection and effective disease management. In this context, using machine learning techniques has proven instrumental in finding profound correlations within vast and complex datasets, particularly within the medical domain.
The dataset used for this project is the Pima Indians Diabetes Database, representing a collected repository of data sourced from female patients aged at least 21 years, with a shared heritage in Pima Indian ethnicity. This dataset serves as a cornerstone for our research and analysis.
An important point of this project was the exploration of the dataset. This consisted of data cleaning and preprocessing steps aimed at ensuring the quality and reliability of the input data.
The Algorithm trained on the preprocessed data was a Artificial Neural Network consisting of 2 hidden layers with a total of 4,801 parameters.
The outcomes of the neural network model showcase good levels of accuracy during both the training and testing phases. Notably, the testing phase resulted in an accuracy score of 83.12%. The model was also evaluated using a confusion matrix and classification report
This project demonstrates the importance of researching machine learning techniques that can result in important medical solutions.