Our task was to implement a Random Forest algorithm on the samples measured at Odense University Hospital. We've chosen R language for this project.
- Install all necessary libraries
- Load all packages
- Load raw measured data
- Run PEAX on data to generate peaks of use our premeasured data (DEBUG has to be TRUE)
- Load the peaks and calculate the best cluster number
- Apply k-means on the dataset
- Build training matrix
- Split data in ratio 75%-25% for train and test
- Apply randomForest
- Run 5-fold cross validation
- Report mean accuracy, sensitiviry and specificity
- Apply gini index and pick top 5 peaks
- Extract decision tree with the best peaks
- Load and build training matrix for unlabeled data
- Apply the decision tree on the unlabeled data set
- Open the exam.R
- Load exam_workspace.R (if you want to see the best best result)
- Run the script (everything is automated)
With the current solution by average we have 0 or 1 mismatches.
Quick link to exam.