Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to withdraw AY.89 because it nests within AY.4 and is maybe not even monophyletic #398

Closed
corneliusroemer opened this issue Jan 10, 2022 · 9 comments
Labels
correction Highlight an error in the description or definition

Comments

@corneliusroemer
Copy link
Contributor

AY.89 is defined on the Usher tree by C6402T > C21304A > G21305A > C21302T > C7851T

The last mutation C7851T is what defines AY.4. AY.89 is a UK lineage, just like AY.4

AY.89 contains lineage with extra mutations that also appear within AY.4, like T17040C and 4237C. This homoplasy is suspicious.

The three mutations C21304A > G21305A > C21302T are on the list of problematic sites.

Considering all of the above, I feel that the way AY.89 clusters on the Usher tree is an artefact. This lineage is probably not a monophyletic one and should thus be withdrawn. Potentially relatedly, pangoLEARN struggles a lot with AY.89 classifying almost none of the lineage defining sequences correctly (according to Usher).

Usher tree with all the AY.89 defining mutations:
https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_255b2_c2e4f0.json

@FedeGueli
Copy link
Contributor

FedeGueli commented Jan 10, 2022

Thx @corneliusroemer it was briefly discussed as offtopic in this issue #300
with @AngieHinrichs

@corneliusroemer
Copy link
Contributor Author

Oh I see, thanks @FedeGueli

Here's the comment #300 (comment)

@AngieHinrichs
Copy link
Member

Ah, I forgot it had come up before, and as @FedeGueli anticipated in the next comment, @corneliusroemer indeed found some AY.4 diversity within AY.89 sequences.

So I agree, probably an error. If @chrisruis agrees then I can mask those sites within Delta, do a little reoptimization, and then UShER should place those sequences within AY.4.

@chrisruis
Copy link
Collaborator

Thanks @AngieHinrichs The masking and reoptimization sounds great

@AngieHinrichs
Copy link
Member

OK, today's build will mask 21302, 21304 and 21305 in Delta -- it should complete tomorrow morning & then I'll check on it. It might not be available on the main site (genome.ucsc.edu) until the following day. A preliminary mask & optimize test with yesterday's tree went pretty well -- I spot-checked some sequences that were on the nodes for C21304A, G21305A, and C21302T, and they were moved next to similar sequences all over Delta (and of course the C7851T AY.89 sequences moved to AY.4).

@corneliusroemer
Copy link
Contributor Author

corneliusroemer commented Jan 12, 2022 via email

@chrisruis
Copy link
Collaborator

I agree that it looks like the AY.89 sequences should be in AY.4. It sounds like it won't take Angie very long to check with a reoptimized tree so I'm happy to wait for that

@AngieHinrichs
Copy link
Member

Yes, in the 2022-01-11 tree with 21302. 21304 and 21305 masked in the Delta branch (not yet available on the public site), all samples in the 2021-01-10 branch for AY.89 were moved into the AY.4 branch as expected.

AY.32 had C21304T in the set of defining mutations that I use to annotate nodes as lineage root; since 21304 is now masked in the whole Delta branch, that may cause a few more sequences to be moved into the AY.32 branch but I haven't looked into that yet.

Anyway, all go for withdrawing AY.89, thanks for pointing it out @corneliusroemer!

@chrisruis
Copy link
Collaborator

Thanks Angie. We've merged AY.89 into AY.4 in v1.2.123. AY.89 is therefore withdrawn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correction Highlight an error in the description or definition
Projects
None yet
Development

No branches or pull requests

4 participants