Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algorithm Deep Dive: XD #16

Open
ktmeaton opened this issue Dec 8, 2023 · 2 comments
Open

Algorithm Deep Dive: XD #16

ktmeaton opened this issue Dec 8, 2023 · 2 comments

Comments

@ktmeaton
Copy link
Collaborator

ktmeaton commented Dec 8, 2023

I want to write documentation about how the algorithm works (ex. run.md) with a case study. SARS-CoV-2 recombinant XD often confuses me, so I'll work through some of the results here.

image

  • XD is designated as B.1.617.2* and BA.1*.
  • The "majority" parent is B.1.617.2*, as only about a ~3-5 kb section comes from a secondary parent.
  • Prior to designation, XD samples were classified as Delta 21J. However, the UShER phylogeny has them placed as BA.1.15 descendants. Probably because the ~3-5 kb is in the Spike, which is so mutation-rich.
Public UShER GISAID UShER
image image
  • rebar thinks B.1.617.2 and XS have more support.
  • Who is wrong, rebar or our prior knowledge? (Let's assume rebar for now, to critique the method)
  • Hypotheses:
    • DesignatedRecombinant (BA.1, B.1.617.2): score=20, conflict=18
    • NonRecursiveRecombinant: (BA.1, B.1.617.2* consensus of various AY.*, BA.1): score=41, conflict=8
    • RecursiveRecombinant: (XD, ???): No evidence
    • KnockoutRecombinant: (XS, B.1.617.2* consensus of various AY.*) score=35, conflict=7

These results tell me that:

  • The primary parent is not B.1.617.2 strict, a consensus of various AY.* has way higher scores/less conflict.
  • Non-Recursive Recombinant (BA.1, B.1.617.2) seems like it should be "best", with the highest score (41) and almost the lowest conflict (8).
  • However, there is a large conflict range (18 - 7 = 11) between hypotheses. In cases such as this, rebar prefers the hypothesis that minimizes conflict, rather than maximum support. This is why KnockoutRecombinant with XS was being picked as best. This decision needs to be re-assessed, as I never liked it in the first place.
  • This min_conflict strategy was originally developed to deal with XBB* recursive recombinants. Because often the original recombination (XBB=BJ.1 and CJ.1) would have the highest support but a LOT of conflict.
@ktmeaton
Copy link
Collaborator Author

ktmeaton commented Dec 8, 2023

Dataset

rebar dataset download --name sars-cov-2 --tag 2023-12-06 --output-dir dataset/sars-cov-2/2023-12-06

@ktmeaton
Copy link
Collaborator Author

ktmeaton commented Dec 8, 2023

Designated Recombinant

What is the evidence for BA.1.15 as a secondary parent?

  • The BA.1.15 region is 22578-25469 (2.9 kb), with 28 mutations in support and no conflict ALT or REF bases.
  • The B.1.617.2 regions are 210-21762 and 25584-29402, with 10 mutations in support and 18 conflict ALT bases.

Evidence

rebar run \
  --dataset-dir dataset/sars-cov-2/2023-12-06 \
  --output-dir output/sars-cov-2/2023-12-06/XD \
  --verbosity debug \
  --populations "XD" \
  --parents "B.1.617.2,BA.1.15"
  • Hypothesis: NonRecursiveRecombinant: score=20, conflict=18
  • Regions: 210-21762 : B.1.617.2, 22578-25469: BA.1.15, 25584-29402: B.1.617.2
score:
  - B.1.617.2: -8
  - BA.1.15: 28

support:
  - B.1.617.2 (10): G210T, G15451A, C16466T, C21618G, T26767C, T27638C, C27752T, A28461G, G28881T, G29402T
  - BA.1.15 (28): G22578A, T22673C, C22674T, T22679C, C22686T, G22813T, T22882G, G22898A, G22992A, C22995A, A23013C, A23040G, G23048A, A23055G, A23063T, T23075C, C23202A, A23403G, C23525T, T23599G, C23604A, C23854A, G23948T, C24130A, A24424T, T24469A, C24503T, C25000T

conflict_ref:
  - B.1.617.2 (0):
  - BA.1.15 (0):

conflict_alt:
  - B.1.617.2 (18): A1321C, G4181T, C6402T, C7124T, C7851T, A8723G, C8986T, G9053T, A11201G, A11332G, C14407T, T15264C, C19220T, G21641T, C25667T, G25855T, C27874T, G28916T
  - BA.1.15 (0):

private:
  - B.1.617.2 (18): A1321C, G4181T, C6402T, C7124T, C7851T, A8723G, C8986T, G9053T, A11201G, A11332G, C14407T, T15264C, C19220T, G21641T, C25667T, G25855T, C27874T, G28916T
  - BA.1.15 (0):

Visualization

rebar plot --annotations dataset/sars-cov-2/2023-11-30/annotations.tsv --run-dir output/sars-cov-2/2023-12-06/XD --all-coords

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant