-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
66 lines (43 loc) · 2.78 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# Osteoporosis Prediction Project
This project aims to develop a prediction system for osteoporosis using Logistic Regression and Random Forest models. The system predicts osteoporosis risk based on a variety of clinical and lifestyle factors. Below is a detailed explanation of the key components and scripts in this project.
## Documents Overview
**Treball_Final_Bioinfo.R**
This script contains the entire process of training and evaluating a Logistic Regression model for osteoporosis prediction. The key steps include:
1. Package Installation
2. Dataset Preparation:
- Loads the `osteoporosis.csv` dataset, removes irrelevant columns, handles missing data, and converts categorical variables into factors.
3. Model Training:
- Trains a Logistic Regression model using 70% of the dataset (training set) and evaluates it using confusion matrix, ROC curve, and Precision-Recall curve.
4. Feature Importance Evaluation:
- A Random Forest model is trained to identify the most influential features for predicting osteoporosis.
5. Model Saving:
- Saves the trained Logistic Regression and Random Forest models for future use.
**API.R**
This script sets up a **REST API** using the plumber package to provide osteoporosis predictions based on the trained Logistic Regression model. The key steps are:
1. Library Loading
2. API Endpoint Definition:
- Defines the predict_logistic endpoint that accepts user inputs such as gender, age, family history, physical activity, etc., and returns the probability and prediction (Yes/No) of osteoporosis.
3. Input Format:
- Accepts both categorical (e.g., gender, family history) and numeric (e.g., age) input variables for making predictions.
4. Prediction:
- Uses the trained Logistic Regression model to make predictions based on the provided inputs.
**Run_API.R**
This script runs the API defined in 'API.R' on a local server.
---
## How to Run the Project
1. **Train the model**:
- Run 'Treball_Final_Bioinfo.R' to train the Logistic Regression. Make sure the dataset (osteoporosis.csv) is placed correctly at the specified path.
2. **Run the API**:
- Run 'API.R' to load the trained Logistic Regression model and set up the API with the predict_logistic` endpoint.
3. **Start the server**:
- Run 'Run_API.R' to start the server and make the API available on port 8000.
---
## Required Packages
The following R packages are required to run the project:
- `plumber`: For creating REST APIs in R.
- `caret`: For building predictive models.
- `dplyr`: For data manipulation.
- `PRROC`: For Precision-Recall curve creation.
- `pROC`: For ROC curve creation and evaluation.
- `smotefamily`: For handling imbalanced data through SMOTE.
- `randomForest`: For building Random Forest models.