Assess Popgen48/scalepopgen #80

muffato · 2024-06-10T23:07:46Z

We need to review how much of our population genomics ideas Popgen48/scalepopgen can do to determine:

if it could be used as is
if it could be used with modifications
if we'd rather extract and replicate some functionality here

Links: poster

Summary

All the different tools and analyses can be independently enabled.
There were a couple of things to do to the input VCF files, but then the pipeline runs fine.
We'd want to clarify whether it's going to be part of nf-core or not.
We need to decide in which pipeline (scalepopgen or a new pipeline) the ROH and population size analyses should go.

Next developments

Based on the tests above, to use scalepopgen, we would want to:

Add automatic splitting of VCF files by chromosome
Handle non-numeric chromosome names when plotting TajimaD and SweepFinder results

hangxue-wustl · 2024-07-02T15:44:16Z

Requirement for input files.

All VCF files need to be splitted by the chromosomes and indexed with tabix.
Sample map has two tab-delimited columns without header line. In the first column are individual IDs and in the second are population IDs

vcf_input.csv:
chrom,vcf,vcf_idx
chr1,chrom1.vcf.gz,chrom1.vcf.gz.tbi
chr2,chrom2.vcf.gz,chrom2.vcf.gz.tbi

sample.map:
ind1 pop1
ind2 pop1
ind3 pop2
ind4 pop2

Splitting the VCF file by chromosomes
bcftools index -s mLutLut_renamed_autosomes_bisnps.vcf.gz | cut -f 1 | while read C; do bcftools view -O z -o split.${C}.vcf.gz mLutLut_renamed_autosomes_bisnps.vcf.gz.vcf.gz "${C}" ; done

hangxue-wustl · 2024-07-05T09:11:29Z

Downloaded supplementary data from https://doi.org/10.1093/molbev/msad207 and followed EurasianOtter_PopGen.html to obtain vcf.gz files and rename samples, and select only autosomes and bialleleic SNPs for analyses. Split the vcf file by chromosomes using bcftools. Ran "nextflow run scalepopgen -profile singularity -params-file /global/scratch/users/hangxue/otter/vcf_publication/jul4_parameters.yml -qs 10". See output graphs at https://docs.google.com/presentation/d/1O8vFmYImrJd6p4pvSLyzwiMsf9fTAZSTaG_FJGLz8t8/edit#slide=id.p

hangxue-wustl · 2024-07-18T06:36:51Z

Tested PCA, Admixture, Pairwise Fst and Treemix in scalepopgen. These can run successfully with little modifications. Scalepopgen can also do Tajimas_D and search for selective sweeps selection (Sweepfinder2), but plotting the these two results requires the type of the chromosome name being integer. Out of these, Sweepfinder2 takes the longest, ~7hr for the otter data, followed by admixture ~1hr.
Additional potential analysis:

ROH identification (eg. RzooROH)
Estimate population-size inference (eg. GONe)

muffato · 2024-08-19T21:55:15Z

Regarding the otter data. Here is more information about the sample confusion that occurred during that project.

The label swaps were very visible on the admixture plots, see left (labels corrected) vs right (wrong labels)

In your pipeline run it's only k=2 that is a bit messy. All the other k are clean. I think you may have the correct labels and the differences are due to different methods / parameters ?

hangxue-wustl · 2024-08-20T17:11:24Z

I have doubled checked the label. I think the ones I am working with is labeled correctly. Yeah, I think the difference might be due to different softwares / parameters

muffato added this to Genome After Party Jun 2, 2024

muffato converted this from a draft issue Jun 10, 2024

muffato removed this from Genome After Party Jun 10, 2024

muffato added this to variantcalling Jun 10, 2024

github-project-automation bot moved this to To do in variantcalling Jun 10, 2024

muffato added the question Further information is requested label Jun 10, 2024

github-project-automation bot added this to Genome After Party Jun 10, 2024

github-project-automation bot moved this to Todo in Genome After Party Jun 10, 2024

muffato removed this from Genome After Party Jun 10, 2024

muffato added this to Genome After Party Jun 10, 2024

github-project-automation bot moved this to Todo in Genome After Party Jun 10, 2024

muffato assigned hangxue-wustl Jun 12, 2024

muffato removed this from Genome After Party Jun 13, 2024

muffato moved this from To do to In progress in variantcalling Jun 13, 2024

muffato mentioned this issue Jun 13, 2024

Population genomics #65

Open

3 tasks

muffato added this to Genome After Party Jun 17, 2024

github-project-automation bot moved this to Todo in Genome After Party Jun 17, 2024

muffato moved this from Todo to In Progress in Genome After Party Jun 17, 2024

muffato mentioned this issue Jun 18, 2024

Assess nf-core/phylonetwork #83

Closed

muffato closed this as completed Sep 18, 2024

github-project-automation bot moved this from In Progress to Done in Genome After Party Sep 18, 2024

github-project-automation bot moved this from In progress to Done in variantcalling Sep 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assess Popgen48/scalepopgen #80

Assess Popgen48/scalepopgen #80

muffato commented Jun 10, 2024 •

edited

Loading

hangxue-wustl commented Jul 2, 2024 •

edited

Loading

hangxue-wustl commented Jul 5, 2024 •

edited

Loading

hangxue-wustl commented Jul 18, 2024 •

edited

Loading

muffato commented Aug 19, 2024

hangxue-wustl commented Aug 20, 2024

Assess Popgen48/scalepopgen #80

Assess Popgen48/scalepopgen #80

Comments

muffato commented Jun 10, 2024 • edited Loading

Summary

Next developments

hangxue-wustl commented Jul 2, 2024 • edited Loading

hangxue-wustl commented Jul 5, 2024 • edited Loading

hangxue-wustl commented Jul 18, 2024 • edited Loading

muffato commented Aug 19, 2024

hangxue-wustl commented Aug 20, 2024

muffato commented Jun 10, 2024 •

edited

Loading

hangxue-wustl commented Jul 2, 2024 •

edited

Loading

hangxue-wustl commented Jul 5, 2024 •

edited

Loading

hangxue-wustl commented Jul 18, 2024 •

edited

Loading