Howto and script to convert Cosmic_MutantCensus and Cosmic_NonCodingVariants .TSV files into bigbed tracks that can be used in IGV.
Created files correspond to the UCSC COSMIC tracks described here
- Create a conda environment with Python 3 ->
conda create -n py3 python=3.11
- Activate the environment ->
conda activate py3
- Clone this repository ->
git clone https://github.com/northwestwitch/cosmic2bed.git
- Enter cloned folder ->
cd cosmic2bed
- Install poetry ->
pip install poetry
- Install this software ->
poetry install
- Make sure the script works ->
poetry run cosmic2bed --help
Cosmic data should be downloaded from COSMIC. Note that you need to register as a non-commercial user or have a commercial license in order to download COSMIC data.
Demo data present in this repository consists of 2 files: Cosmic_MutantCensus_v100_GRCh38.tsv
and Cosmic_NonCodingVariants_v100_GRCh38.tsv
, both present in the .tar sample download in build 38 obtained from COSMIC.
These files can be found in the cosmic2bed/demo/infiles
folder.
Demo outfiles were created in this way:
poetry run cosmic2bed -i cosmic2bed/demo/infiles/Cosmic_MutantCensus_v100_GRCh38.tsv -o cosmic2bed/demo/outfiles/Cosmic_MutantCensus_v100_GRCh38.bed --build 38
This command will convert the .tsv file to a 6+3 BED file.
The sorted BED file created in the step above can be converted to bigbed using the bedToBigBed utility from UCSC. The utility can also be installed using conda.
In this example I've used the script present in the cosmic2bed/scripts
folder (don't use it and download the script specific for your architecture from UCSC instead) and runned the following command:
./cosmic2bed/scripts/bedToBigBed -type=bed6+3 -as=<path-to-bedplus-definitions> <path-to-sorted-bed-infile> <path-to-chrom-sizes> <path-to-sorted-bigbed-outfile> -tab
path-to-bedplus-definitions
: use the path to the bedPlus definitions ->cosmic2bed/resources/bedPlus_definitions.as
path-to-sorted-bed-infile
: it's the sorted BED file obtained in the step abovepath-to-chrom-sizes
: A file with chromosome sizes is present in this repository under cosmic2bed/resources. Choose the right genome build.path-to-sorted-bigbed-outfile
: It's the outfile, for instanceCosmic_MutantCensus_v100_GRCh38.bb