The datahub is a repository for data storage only. It contains staging files which are validated and can be loaded directly into the cBioPortal.
Behind the scenes git-lfs is used to manage the large files. https://github.com/github/git-lfs
Validation status of all studies on Datahub master branch. This runs weekly using the validation code from the cBioPortal master branch. It also validates if the studies on cbioportal.org and on Datahub are in sync.
At cbioportal.org datasets page a zipped file with staging files from each study can be downloaded. These zip files are compressed versions of the study folders in the master branch of this repository.
It is also possible to download uncompressed staging files from this repository with git-lfs.
After you have installed git-lfs, configure it not to download all data files right away:
git lfs install --skip-repo --skip-smudge
Clone the git repository and install lfs hooks into it:
git clone https://github.com/cBioPortal/datahub.git
cd datahub
git lfs install --local --skip-smudge
Download the data files for a study folder, for example brca_tcga:
git lfs pull -I public/brca_tcga
git checkout master
git pull origin master
git checkout -b [name_of_your_new_branch]
For general background on creating and managing branches within GitHub, see: Git Branching and Merging.
[back to the root directory]
git add .
git commit -m '[notes_for_your_change]'
git push origin [name_of_your_new_branch]
For instructions on submitting a pull-request, please see: Using Pull Requests and Sending Pull Requests.
http://download.cbioportal.org/mysql-snapshots/mysql-snapshots-toc.html
The data are available under the ODC Open Database License (ODbL).You are free to share and modify the data as long as you attribute any public use of the database, or works produced from the database; keep the resulting data-sets open; and offer your shared or adapted version of the data-set under the same ODbL license.
TCGA data are availabe under Broad Institute GDAC TCGA Analysis Pipeline License. The Cancer Genome Atlas Consortium is pleased to provide the research community with preliminary data prior to publication. Users are requested to carefully consider that these data are preliminary and have yet to be validated. Researchers are warned that the preliminary data have a significant uncertainty, are likely to change, and should be used with caution.
For questions, please post on our user discussion group at: https://groups.google.com/g/cbioportal