DM847

Our task was to implement a Random Forest algorithm on the samples measured at Odense University Hospital. We've chosen R language for this project.

Install all necessary libraries
Load all packages
Load raw measured data
Run PEAX on data to generate peaks of use our premeasured data (DEBUG has to be TRUE)
Load the peaks and calculate the best cluster number
Apply k-means on the dataset
Build training matrix
Split data in ratio 75%-25% for train and test
Apply randomForest
Run 5-fold cross validation
Report mean accuracy, sensitiviry and specificity
Apply gini index and pick top 5 peaks
Extract decision tree with the best peaks
Load and build training matrix for unlabeled data
Apply the decision tree on the unlabeled data set

With the current solution by average we have 0 or 1 mismatches.

Quick link to exam.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
corrected_data		corrected_data
data		data
peax		peax
unlabelled_candy_raw		unlabelled_candy_raw
.RData		.RData
.Rhistory		.Rhistory
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
exam.R		exam.R
exam_workspace.R		exam_workspace.R
helper_functions.R		helper_functions.R
result.csv		result.csv
tree_img.png		tree_img.png

Provide feedback