wf-module-ensembl

Snakemake workflow module to download reference data from ENSEMBL.

This repository provides a reproducible Snakemake pipeline that is primarily intended to be consumed by other workflows as a module (although it can also run independently):

Outputs:

GTF file (genes.gtf )
Genome FASTA file (genome.fa)
Transcriptome FASTA file (transcriptome.fa)
Table with transcript and gene ids (transcript-gene-ids.tab)
Organizes outputs under {assembly}/.

Requirements

Snakemake (>=6 recommended)

How to use in other workflows

In the consuming workflow add a section in the config that includes all required parameters included in this workflow config file.

In the consuming config.yml:

# Sample data
ENSEMBL: {
    OUT_DIR: 'ensembl',
    SITE_URL: 'http://may2025.archive.ensembl.org',
    VERSION: '114',
    FTP_URL: 'ftp://ftp.ensembl.org/pub',
    GTF_URL: '{FTP_URL}/release-{VERSION}/gtf/{SPECIES_NAME_LC}/{SPECIES_NAME}.{ASSEMBLY}.{VERSION}.gtf.gz',
    GENOME_URL: '{FTP_URL}/release-{VERSION}/fasta/{SPECIES_NAME_LC}/dna/{SPECIES_NAME}.{ASSEMBLY}.dna_sm.primary_assembly.fa.gz',
    ASSEMBLY_META: {
        "GRCh38": {
            "name": "Homo sapiens",
            "id": "hsapiens",
        },
    },
}

Then in a consuming snakefile:

module ensembl:
    snakefile:
        github("maragkakislab/wf-module-ensembl", path="workflow/Snakefile")
    config:
        config["ENSEMBL"]

use rule * from ensembl as ensembl_*

rule run_all:
    input:
        # GTF
        OUT_DIR + "/GRCh38/genes.gtf",
        # Genome
        OUT_DIR + "/GRCh38/genome.fa",
        # Transcriptome
        OUT_DIR + "/GRCh38/transcriptome.fa",
        # Table with transcript and gene ids
        OUT_DIR + "/GRCh38/transcript-gene-ids.tab",

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
workflow		workflow
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

wf-module-ensembl

Requirements

How to use in other workflows

About

Uh oh!

Releases 1

Packages

Languages

License

maragkakislab/wf-module-ensembl

Folders and files

Latest commit

History

Repository files navigation

wf-module-ensembl

Requirements

How to use in other workflows

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages