Parse a pangenome graph and overlap given annotations of genes and TEs for every sample in the graph. For all genes, define orthogroups based on their synteny on the pangenome graph. Plot the variation for the whole pangenome and for repeats at the nucleotide level, and for all genes and a given subset of genes at the orthogroup level.
git clone https://github.com/Umbel89/pangenome_analysis.git
The whole pipeline is executed by running the main.py script:
python main.py --help
usage: main.py [-h] -i FILE -g STR -r STR -e STR -o STR [-c STR] [-t FLOAT]
Parse and annotate a pangenome graph, and plot its variation.
-h, --help show this help message and exit
required arguments:
-i FILE, --input_gfa FILE
File location of pangenome graph gfa.gz file.
-g STR, --gene_gff_dir STR
Directory with the gene gff files for all samples in the graph. dir/[sample]*.gff3
-r STR, --repeat_gff_dir STR
Directory with the repeat gff files for all samples in the graph. dir/[sample]*.gff3
-e STR, --effector_dir STR
Directory with txt files of a subgroup of genes, one gene_id per line, for all samples in the
graph. dir/[sample]*.txt
-o STR, --output_dir STR
Directory where output will be written.
optional arguments:
-c STR, --input_chrom STR
Specify a chromosome to be parsed. [default=all]
-t FLOAT, --cluster_threshold FLOAT
Define orthogroups of genes that their distance is bellow this threshold. [default=0.6]