This project aims to develop a prediction system for osteoporosis using Logistic Regression and Random Forest models. The system predicts osteoporosis risk based on a variety of clinical and lifestyle factors. Below is a detailed explanation of the key components and scripts in this project.
Treball_Final_Bioinfo.R This script contains the entire process of training and evaluating a Logistic Regression model for osteoporosis prediction. The key steps include:
-
Package Installation
-
Dataset Preparation:
- Loads the
osteoporosis.csv
dataset, removes irrelevant columns, handles missing data, and converts categorical variables into factors.
- Loads the
-
Model Training:
- Trains a Logistic Regression model using 70% of the dataset (training set) and evaluates it using confusion matrix, ROC curve, and Precision-Recall curve.
-
Feature Importance Evaluation:
- A Random Forest model is trained to identify the most influential features for predicting osteoporosis.
-
Model Saving:
- Saves the trained Logistic Regression and Random Forest models for future use.
API.R This script sets up a REST API using the plumber package to provide osteoporosis predictions based on the trained Logistic Regression model. The key steps are:
-
Library Loading
-
API Endpoint Definition:
- Defines the predict_logistic endpoint that accepts user inputs such as gender, age, family history, physical activity, etc., and returns the probability and prediction (Yes/No) of osteoporosis.
-
Input Format:
- Accepts both categorical (e.g., gender, family history) and numeric (e.g., age) input variables for making predictions.
-
Prediction:
- Uses the trained Logistic Regression model to make predictions based on the provided inputs.
Run_API.R This script runs the API defined in 'API.R' on a local server.
Shiny App (app.R)
- The Shiny app provides an interactive user interface for predicting osteoporosis risk in real time. It allows users to input clinical and lifestyle variables and see immediate results.
Key Features:
-
Live Predictions:
- Results update as you change input values.
-
Color-Coded Output:
- Red: "Yes" (High risk of osteoporosis)
- Green: "No" (Low risk of osteoporosis)
-
Age Selection:
- Uses a slider for easy age selection.
-
Batch Predictions (CSV Upload):
- Users can upload a CSV file containing data for multiple individuals. The app processes the data, predicts osteoporosis risk for each record, and allows users to download the results as a new CSV file.
How to Use the Batch Prediction Feature:
- Navigate to the Batch Prediction tab in the Shiny app.
- Upload a properly formatted CSV file (columns must match the API input requirements, e.g., Gender, Age, etc.).
- Click the "Process CSV" button.
- View the predictions in the table displayed on the app.
- Download the results using the "Download Results as CSV" button.
How to Run:
-
Ensure the API is running on port 8000 by executing Run_API.R.
-
Run the Shiny app script (app.R) in RStudio:
-
Open the app in your browser, input values, and observe live predictions.
Run docker-compose.yml using: docker-compose up. This will run the plumber API from port http://0.0.0.0:8000 and APP from http://0.0.0.0:8180.
Make sure you have docker, docker-compose plug-in and are running on linux! (We used colima for this)
The following R packages are required to run the project:
plumber
: For creating REST APIs in R.shiny
: For building interactive web applications.httr
: For sending HTTP requests to the API.jsonlite
: For parsing JSON data.caret
: For building predictive models.dplyr
: For data manipulation.PRROC
: For Precision-Recall curve creation.pROC
: For ROC curve creation and evaluation.smotefamily
: For handling imbalanced data through SMOTE.randomForest
: For building Random Forest models.shinycssloaders
: Adds a spinner when the app is loading or processing data..shinyjs
: Provides JavaScript capabilities to disable/enable buttons and improve interactivity.DT
: For rendering interactive tables in the Shiny app.