This repository consists of the various Jupyter Notebooks that were written to perform analysis on the different Open-Source Datasets available on Health Parameters and different disease, namely: Breast Cancer, Diabetes Analysis, Heart Disease, Kidney Disease and Liver Disease.
Logistic Regression | Naive Bayes | Support Vector Machines | Random Forest | |
---|---|---|---|---|
Breast Cancer | 96.4912% | 92.3977% | 95.91% | NaN |
Liver Disease | 70.1149% | 53.4483% | 70.1149% | NaN |
Diabetes | 78.355% | 76.1905% | 78.355% | NaN |
Kidney Disease | 97.9167% | 100% | 100% | NaN |
Heart Disease | 80.2198% | NaN | 81.32% | 91.21% |
The Breast Cancer Wisconsin (Diagnostic) Database available with sklearn was utilizes to create the dataset which has about 569 rows (cases) with 30 numeric features. The outcomes are either 1 - malignant, or 0 - benign.
The Liver Disease Database was utilized from an open-source Kaggle Database. The outcomes are two: Does the patient has liver disease or he does not have.
An open source Kaggle was utilized for our Data Analysis and Machine Learning Processing.
An open source UCI Machine Learning Repository was utilized for our Data Analysis and Machine Learning Processing.
An open source Kaggle Dataset was utilized for our Data Analysis and Machine Learning Processing.