Survival Models for the Prostate Cancer DREAM Challenge

This repository contains an implementation of survival models that have been used by the CAMP team for the Prostate Cancer DREAM Challenge.

‼️ This repository is not actively maintained, please use sebp/scikit-survival instead ‼️

Requirements

All code has only been tested on a Linux-based operating system, therefore we cannot guarantee that the code is going to run on other platforms. The following instructions apply to Linux-based operating systems only.

Minimum Requirements

Python 3.3 or later
IPython and IPython notebook 3.1 or later
numexpr
numpy 1.9 or later
pandas 0.15.2 (patched, see below)
scikit-learn 0.16.1
scipy 0.15 or later
six
C/C++ compiler
rpy2 2.6.0
R 3.2 with the following packages installed:
- randomForestSRC
- mboost
- timeROC

Patching Pandas 0.15.2

Recent versions of the pandas Python package changed the way how categorical variables are handled, therefore this code is known to work only with a patched version of pandas 0.15.2. The following two patches need to be applied to pandas 0.15.2:

Extended Requirements

Some non-essential parts of the code depend on additional libraries:

MongoDB 2.4
pymongo
matplotlib
seaborn 0.5.1
VIM package in R

Getting Started

The easiest way to setup a R and Python environment is to use Anaconda to install all dependencies. The script below sets up a new environment from scratch under Linux.

# Install Miniconda3
wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
chmod +x miniconda.sh
./miniconda.sh -b
export PATH=~/miniconda3/bin:$PATH
conda update --yes conda
conda create --yes --name dream-env python=3.4
conda install --yes --name dream-env -c r --file requirements-conda.txt
source activate dream-env

# Install patched version of pandas
wget https://github.com/pydata/pandas/archive/v0.15.2.tar.gz -O pandas-0.15.2.tar.gz
tar xzvf pandas-0.15.2.tar.gz
cd pandas-0.15.2
wget https://github.com/pydata/pandas/commit/c98dcdf8479b879d2d77d7366109334ba125404b.patch -O bug1.patch
wget https://github.com/pydata/pandas/commit/c97238c2e3b9475b0e30ab7b68ebcf1239ddcc10.patch -O bug2.patch
patch -p1 -f -i bug1.patch
patch -p1 -f -i bug2.patch
python setup.py install
cd ..

# Install additional R packages
R -e 'install.packages(c("mboost", "timeROC", "randomForestSRC", "VIM"), repos="http://cran.r-project.org", dependencies=TRUE)'

# Install rpy2
pip install rpy2

# Install seaborn
# (do not install it from anaconda, since it would pull in a different pandas version)
pip install seaborn==0.5.1

Once you setup your build environment, you have to compile the C/C++ extensions and install the package by running:

python setup.py install

Documentation

API documentation can be generated from the source code using sphinx 1.2.3. Note that version 1.3 or later is known not to work.

cd doc
PYTHONPATH="..:sphinxext" sphinx-autogen api.rst
make html
xdg-open _build/html/index.html

Scripts

The scripts folder contains Python scripts that provide entry points for our analyses. All scripts will print a list of arguments if called with --help from the command line.

Cross-Validation

We provide scripts to perform cross-validation for various models and evaluate them using the challenge's preferred evaluation criterion. Validation is performed in parallel using IPython.parallel. Therefore, access to an IPython cluster, which can run locally, is necessary. The easiest way to start a cluster is to run ipcluster start on the local machine. The following scripts are available:

validate-survival.py: Evaluates models for survival analysis (subchallenge 1a).
validate-regression.py: Evaluates models for predicting time of death (subchallenge 1b).
validate-classifier.py: Evaluates models for classification (subchallenge 2).

For instance, to do cross-validation for data from the ASCENT2 study using random survival forest, the call would look like the following:

python validate-survival.py -m rsf --event DEATH --time LKADT_P --outcome "1" \
--metric timeroc -i data/q1/train_q1_ASCENT2-imputed.arff \
-p param_grid/q1a/rsf_param_grid.json

Subchallenge 1a

Our submission for subchallenge 1b was generated by the script model_1a_survival.py.

# Start MongoDB to cache results
mongod --bind_ip "127.0.0.1" --journal --nohttpinterface --dbpath ${DBPATH} --quiet &

# Start IPython cluster
ipcluster start --daemonize

# Train ensemble of models, write them to disk and perform prediction on test data
python scripts/model_1a_survival.py --event DEATH --time LKADT_P --models-dir ensemble_1a \
-i data/q1/train_q1_ASCENT2_CELGENE_EFC6546-imputed.arff \
-t data/test/test_ASCENT2_CELGENE_EFC6546-imputed.arff

Subchallenge 1b

Our submission for subchallenge 1b was generated by the script model_1b_regression.py.

# Start MongoDB to cache results
mongod --bind_ip "127.0.0.1" --journal --nohttpinterface --dbpath ${DBPATH} --quiet &

# Start IPython cluster
ipcluster start --daemonize

# Train ensemble of models, write them to disk and perform prediction on test data
python scripts/model_1b_regression.py --event DEATH --time LKADT_P --models-dir ensemble_1b \
-i data/q1/train_q1_ASCENT2_CELGENE_EFC6546-imputed.arff \
-t data/test/test_ASCENT2_CELGENE_EFC6546-imputed.arff

Subchallenge 2

Your submission for subchallenge 2 was generated by the script model_2_classification.py.

# Start MongoDB to cache results
mongod --bind_ip "127.0.0.1" --journal --nohttpinterface --dbpath ${DBPATH} --quiet &

# Start IPython cluster
ipcluster start --daemonize

# Train ensemble of models, write them to disk and perform prediction on test data
python scripts/model_2_classification.py --event DISCONT --models-dir ensemble_2 \
-i data/q2/train_q2_ASCENT2_CELGENE_EFC6546-imputed.arff \
-t data/test/test_and_leaderboard_ASCENT2_CELGENE_EFC6546-imputed.arff

Datesets

Datasets generated from raw CSV files are available from the data directory, where each ARFF file contains one partition of the data with its respective set of features:

Study	Patients	Features (Testing)	Features (Imputation)	Complete Cases
ASCENT2	476	223	242	78.8%
CELGENE	526	383	421	57.0%
EFC6546	598	350	388	64.0%
ASCENT2 + CELGENE	1,002	221	237	92.7%
ASCENT2 + EFC6546	1,074	220	236	92.1%
CELGENE + EFC6546	1,124	345	366	77.0%
All	1,600	217	233	93.9%

Scripts contained in the notebooks folder can be used to generate datasets from raw data (run ipython notebook). First, one has to execute the notebook DREAM_Prostate_Cancer.ipynb by following the instructions within the notebook. Imputation is performed by the notebook DREAM_Prostate_Cancer_Imputation.ipynb. The result are 7 ARFF files of training data for subchallenge 1a/b and 2, respectively, and 7 ARFF files of the challenges test data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Survival Models for the Prostate Cancer DREAM Challenge

‼️ This repository is not actively maintained, please use sebp/scikit-survival instead ‼️

Requirements

Minimum Requirements

Patching Pandas 0.15.2

Extended Requirements

Getting Started

Documentation

Scripts

Cross-Validation

Subchallenge 1a

Subchallenge 1b

Subchallenge 2

Datesets

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
doc		doc
notebooks		notebooks
param_grid		param_grid
scripts		scripts
survival		survival
.gitignore		.gitignore
COPYING		COPYING
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements-conda.txt		requirements-conda.txt
setup.py		setup.py

License

tum-camp/dream-prostate-cancer-challenge

Folders and files

Latest commit

History

Repository files navigation

Survival Models for the Prostate Cancer DREAM Challenge

‼️ This repository is not actively maintained, please use sebp/scikit-survival instead ‼️

Requirements

Minimum Requirements

Patching Pandas 0.15.2

Extended Requirements

Getting Started

Documentation

Scripts

Cross-Validation

Subchallenge 1a

Subchallenge 1b

Subchallenge 2

Datesets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages