Skip to content

Commit

Permalink
QOL Fixes.
Browse files Browse the repository at this point in the history
  • Loading branch information
ChrisRackauckas committed Aug 2, 2013
1 parent efc7030 commit 4c10862
Show file tree
Hide file tree
Showing 7 changed files with 49 additions and 36 deletions.
13 changes: 3 additions & 10 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,8 @@
## Binaries
##############

Models/libFM/libFM
Models/libFM/convert
Models/libFM/transpose

###############
## Config Files
###############

*config.py
test.R
Models/libFM
Models/SVDFeature

###############
## Data
Expand Down Expand Up @@ -42,6 +34,7 @@ Data/
## Idea
#################
*.idea
*.iml

#################
## Eclipse
Expand Down
12 changes: 0 additions & 12 deletions HybridMovieRecommendationSystem.iml

This file was deleted.

17 changes: 3 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Hybrid Movie Recommendation System
TBEEF - Triple Bagged Ensemble Ensemble Framework.
==============================================

This program is a hybrid recommendation system, an implementation of the DubbKavee algorithm which utilizes various different statistical models and ensemble tehcniques, patching them together in an intelligent way to improve prediction accuracy. It was specifically developed to implement the methods from the various top competitors in the Baidu, Inc. movie recommendation algorithm contest into a single unified approach and is written to work with datasets which follow the format of the contest.
This program is a hybrid recommendation system, an implementation of the TBEEF algorithm which utilizes various different statistical models and ensemble tehcniques, patching them together in an intelligent way to improve prediction accuracy. It was specifically developed to implement the methods from the various top competitors in the Baidu, Inc. movie recommendation algorithm contest into a single unified approach and is written to work with datasets which follow the format of the contest.

Program Structure
----------------------------------------------
Expand All @@ -16,18 +16,7 @@ The structure of the program is as follows. There are three main phases of the p
6. Synthesize - Aggregating of the ensemble models through gradient boosted regression
6. Post-Processing - Fixing predictions and finding which of the random trials had the lowest cross-validation RMSE.

This program requires two datasets: a training dataset and a test/prediction dataset. The training dataset is used to train the models and the prediction dataset is simply a file with predictors for which the program will generate predictions.

ToDo
----------------------------------------------

1. Integrate ensemble methods:
- Multinomial logistic regression
- Item-Based Collaborative Filering.
2. Add CV set to synthesis.
3. Fix up CV prediction readouts.
4. Implement more features for SVD and FM.
5. Document and clean parts of the code.
This program requires two datasets: a training dataset and a test/prediction dataset. The training dataset is used to train the models and the prediction dataset is simply a file with predictors for which the program will generate predictions.

Contributors
----------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions clean.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
os.system("find Data/ModelSetup ! -name README -type f -delete")
os.system("rm *.Rout")
os.system("rm *.Rhistory")
os.system("rm TBEEF.*")
os.system("find Data/HybridSetup ! -name README -type f -delete")
os.system("find Data/HybridPredictions ! -name README -type f -delete")
os.system("find Data/Output ! -name README -type f -delete")
Expand Down
2 changes: 2 additions & 0 deletions scripts/README
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,6 @@ findInitStd.py - Runs the program on small iterations on a list of initial stand

installPackages.R - Installs the packages needed for the default ensemble models

jobScript.sh - An example job script for use on a cluster.

test.R - Can be used to easily test and develop ensemble models. Should be used interactively with an R interpreter. Already has paths setup to easily import data matrices.
9 changes: 9 additions & 0 deletions scripts/jobScript.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

#$ -N TBEEF
#$ -q mathbio5
#$ -m bea

module load python/3.2.2
module load R/3.0.1
python3 driver.py
31 changes: 31 additions & 0 deletions scripts/test.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
### This file can be used to easily test the ensemble models
### It is recommended to be done interactively through the interpreter
### via something like Rstudio. This way you can run your R commands
### on the files to see what exactly happens in real time!

trainPath = "..Data/HybridSetup/boot_train_t0"
CVPath = "..Data/HybridSetup/boot_CV_t0"
testPath = "..Data/HybridSetup/orig_test_t0"
predCV = "..Data/HybridPredictions/OLSR_CV_t0_tmp"
predTest = "..Data/HybridPredictions/OLSR_test_t0_tmp"
dataTrain = read.csv(trainPath, sep="\t")
dataCV = read.csv(CVPath, sep="\t")
dataTest = read.csv(testPath, sep="\t")
library(ipred)





fit = bagging(y~0+.,data=dataTrain)

fit = lm(y~0 + (.)^2,data=dataTrain)
summary(fit)

predict(fit,dataTest)
CVPredictions = predict(fit,dataCV)
TestPredictions= predict(fit,dataTest)

write(CVPredictions, file = predCV, ncolumns=1)
write(TestPredictions, file = predTest, ncolumns=1)

0 comments on commit 4c10862

Please sign in to comment.