Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COLO829 recall regression in v1.0 #15

Open
proteinosome opened this issue Apr 5, 2024 · 2 comments
Open

COLO829 recall regression in v1.0 #15

proteinosome opened this issue Apr 5, 2024 · 2 comments

Comments

@proteinosome
Copy link

Hi @fenderglass and @aysegokce , this is Khi Pin. As you know I've implemented Severus 0.1.1 in the somatic variants WDL pipeline. I've been looking into Severus 1.0 and while the majority of the truth variants remain, there are two BND events that's filtered out in v1.0. This is truthset_1 and truthset_19. Looking into breakpoints_double.csv, truthset_1 is filtered out due to missing mapping quality threshold (MAPQ of 26 in v0.1.1 output, but I can tell from the source code there's a minimum of 30 now). For truthset_19, the reasons given was FAIL_CONNS_CONNS.

Have you observed similar regression comparing v01.1. and 1.0? Additionally, v1.0 calls 145 SVs compared to 122 in v0.1.1, which seems counterintuitive if the filtering threshold is not stricter.

Command used with v1.0:

python ~/softwares/Severus/severus.py \
  --target-bam COLO829.tumor.aligned.hiphase.bam \
  --control-bam COLO829.normal.aligned.hiphase.bam \
  --phasing-vcf COLO829.tumor.clair3.small_variantscorrect_ref.hiphase.vcf.gz \
  --out-dir severus_results_withsupp_minsup3 \
  -t 32 --vntr-bed ~/references/hifisomatic_resources/human_GRCh38_no_alt_analysis_set.trf.bed \
  --use-supplementary-tag --min-support 3

With 0.1.1:

severus \
  --target-bam COLO829.tumor.aligned.hiphase.bam \
  --control-bam COLO829.normal.aligned.hiphase.bam \
  --phasing-vcf COLO829.tumor.clair3.small_variantscorrect_ref.hiphase.vcf.gz \
  --out-dir severus_results_0.1.1_minsupp3 \
  -t 32 --vntr-bed ~/references/hifisomatic_resources/human_GRCh38_no_alt_analysis_set.trf.bed \
  --min-support 3

The BAMs are identical and were aligned with pbmm2 version 1.10 to GCA_000001405.15_GRCh38_no_alt_analysis_set_maskedGRC_exclusions_v2.fasta using the full COLO829 BAM (No down-sampling).

@aysegokce
Copy link
Contributor

Hello Khi-Pin,
Thank you for the feedback. We didn't observe any change in the number of TPs between v0.1 and v1.0, but we have a slightly different pipeline. In our tests, both SVs were FNs in both versions. We will work on it.

Thank you
Ayse

@proteinosome
Copy link
Author

proteinosome commented Aug 6, 2024

@aysegokce I've just gotten a chance to look at version 1.1, and it looks like truthset_1 is now recovered. truthset_19 is still failing with FAIL_CONNS_CONNS but given that this had just three reads supporting it, I think it's reasonable to miss. Nonetheless, do you have any explanation to "FAIL_CONNS_CONNS"?

If all goes well I'll update our somatic WDL workflow with 1.1 soon. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants