Skip to content

VEuPathDB/iedb-epitope-mapping-nextflow

Repository files navigation

Immune Epitope analyses

This repository contains scripts and processes for the identification of genes whose protein products contains epitope sequences identified by Immune Epitope Database and Analysis Resource (IEDB).

The analysis begin with processing of the epitopes taking a tab file containing the epitopes and reference proteome. We divide the peptides into 3 categories: very small peptides which require separate preprocessing of the reference genome, peptides annotated as matching the input taxon which we will allow up to one mismatch in the alignment, and the rest which we will do exact matching. We also retrieve the original protein sequences from genbank for those peptides which match the input taxon. Full peptide sequence are also matched against the reference proteome.

iedb PEPMatch (https://github.com/IEDB/PEPMatch) is used for peptide matching.

Get Started

To run the work the following dependencies need to be downloaded and installed.

  • Docker

https://docs.docker.com/engine/install/

  • Nextflow

curl https://get.nextflow.io | bash

  • Pull the GitHub with the command below

git pull https://github.com/VEuPathDB/antismash-nextflow.git
Then do > nextflow run main.nf

  • Alternatively the workflow can be run directly from the online which pull the repo and run it.

nextflow run VEuPathDB/antismash-nextflow -with-trace -c <config_file> -r main The -c <config_file> is the nextflow.config file, an example can found in the repo directory

Input Data

Example of input data can be found in the data directory with following required files

  • An epitope tab file containing the short epitope sequence from IEDB. See example in data/epitopes.tab
  • A file containing sequences for peptide source proteins. See example in data/peroepitopesGenpept.fasta

Output Data

The output of the workflow is an indexfile containing the location of the match peptide, indicating if they are exact matches, match the taxon, and the reference source protein. An example can be found in the output folder.

Nextflow workflow diagram

flowchart TB
    subgraph " "
    v0["taxonID"]
    v2["peptideTabfile"]
    v8["refFasta"]
    v9["peptideTabfile"]
    v11["fasta"]
    v16["fasta"]
    v21["fasta"]
    end
    subgraph epitopeMapping
    v1([fetchTaxon])
    v3([peptideProteinAccessionsFilteredByTaxa])
    v5([fetchProtein])
    v10([iedbExactMatches])
    subgraph smallExactPepMatch
    v12([preprocess])
    v13([match])
    v14([filterResults])
    end
    subgraph exactPepMatch
    v17([preprocess])
    v18([match])
    v19([filterResults])
    end
    subgraph inexactForTaxaPeptidesPepMatch
    v22([preprocess])
    v23([match])
    v24([filterResults])
    end
    v28([mergeResultsFiles])
    v30([indexResults])
    v4(( ))
    v6(( ))
    v15(( ))
    v20(( ))
    v25(( ))
    
    end
    v0 --> v1
    v1 --> v3
    v1 --> v10
    v2 --> v3
    v3 --> v13
    v3 --> v4
    v3 --> v15
    v3 --> v20
    v4 --> v5
    v5 --> v6
    v8 --> v10
    v9 --> v10
    v6 --> v10
    v10 --> v28
    v11 --> v12
    v12 --> v13
    v13 --> v14
    v14 --> v25
    v16 --> v17
    v17 --> v18
    v15 --> v18
    v18 --> v19
    v19 --> v25
    v21 --> v22
    v22 --> v23
    v20 --> v23
    v23 --> v24
    v24 --> v25
    v25 --> v28
    v28 --> v30
Loading

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •