Orphan non-coding RNAs (oncRNAs) are a class of cancer-specific, non-coding small RNAs. In this project directory, we include the scripts, code, and analysis notebooks used to systematically annotate a pancancer set of oncRNAs within The Cancer Genome Atlas (TCGA). Details and results can be found in our released preprint.
Running our oncRNA discovery pipeline will require installing the packages listed below (here we also include the versions used in our preprint. We recommend installing the following packages and dependencies using the conda/mamba ecosystem.
- mongo
- pymongo
- scipy
- pandas
- numpy
- statsmodels
- bedtools
- pysam
- joblib
Notebooks used for the systematic annotation of oncRNAs in TCGA datasets. Note: Access to the controlled TCGA sequencing files used in this project will require obtainig appropriate authorization.
01_preprocess_data.ipynb
– notebook used to preprocess the TCGA sequencing files (bam).02_TCGA_oncRNA_analysis.ipynb
– notebook containing our analytical framework for calling oncRNAs.
MIT license