This is a wrapper in Docker of the python package pysradb :
Choudhary, Saket. “pysradb: A Python Package to Query next-Generation Sequencing Metadata and Data from NCBI Sequence Read Archive.” F1000Research, vol. 8, F1000 (Faculty of 1000 Ltd), Apr. 2019, p. 532 (https://f1000research.com/articles/8-532/v1)
- Docker
git clone https://github.com/ablancomu/pysradb_docker.git
cd pysradb_docker
docker build -t pysradb .
Let's take the SRP256479 study as example:
docker run --rm -it -v /path/to/repo/pysradb_docker:/home/docker/out -e SRP='SRP256479' -e THREADS='8' pysradb
The THREADS is an env variable to select the number of cores to download the study samples. If is not provided, the default is set to 4 cores.
docker run --rm -it -v /path/to/repo/pysradb_docker:/home/docker/out -e SRP='SRP256479' -e INPUT='subset.txt' -e THREADS='8' pysradb
The subset.txt file should be plain text with one SRS/SRR code per line, whithout header. The github repository comes with an example file sra_sampleID.txt
The wrapper creates in the /path/to/repo/pysradb_docker
a folder called pysradb_downloads
with all the SRA files of the samples. It also saves a table with the metadata of the downloaded samples.
Currently the wrapper can be used just for:
- Download all files from a study (SRP accession id)
- Download a subset of samples from one study
In both cases, a table with the respective metadata is generated. To access all the pysradb functions, the user should execute the pysradb docker image changing the --entrypoint to bash, so the program can be used interactively inside the container.