Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8 new BA.2 sequences with private mutations from south africa (September-October 2023) #2484

Closed
dikeledik opened this issue Feb 1, 2024 · 37 comments
Labels
BA.2 recommended Recommended for designation by pango team member Saltation Appears on long branch length with no intermediates

Comments

@dikeledik
Copy link

dikeledik commented Feb 1, 2024

Good day everyone.

9 new sequences of a BA.2 strain were detected. Samples were collected between September and November 2023 from Gauteng, Mpumalanga and Limpopo, South Africa. NextClade assigns to clade 21L. Pangolin assigns to lineage BA.2. Sequences have two long deletions on the spike (del_15-27 and del_136-146) as well as mutations in the RBD: S:K417T, S:K444N, S:V445G and S:L452M. Relative to BA.2, this cluster has >30 non-synonymous substitutions (concentrated in spike) and 7 deletions (3 in spike). The new BA.2 is basal and branches off directly from the ancestral BA.2.

Accession numbers:

hCoV-19/SouthAfrica/NICD-R13200/2023 EPI_ISL_18849984
hCoV-19/SouthAfrica/NICD-N56614/2023 EPI_ISL_18849985
hCoV-19/SouthAfrica/NICD-N56836/2023 EPI_ISL_18849986
hCoV-19/SouthAfrica/NICD-N57176/2023 EPI_ISL_18849987
hCoV-19/SouthAfrica/NICD-N57208/2023 EPI_ISL_18849988
hCoV-19/SouthAfrica/NICD-N57216/2023 EPI_ISL_18849989
hCoV-19/SouthAfrica/NICD-N57440/2023 EPI_ISL_18849990
hCoV-19/SouthAfrica/NICD-N57469/2023 EPI_ISL_18849991
hCoV-19/South Africa/NICD-R13515/2023 EPI_ISL_18845398

Mutations observed on branch:

Nt
Unique (43):C541T, G542A, C2536T, C4012T, G4354T, T4379G, C5184T, C7296T, T7480C, C7764T, G9190T, T9866C, A10037G, C10702T, C12008T, A12791C, C12896T, G15451A, C16575T, C17644A, C18744T, T19364C, C19698A, G21786A, C21855T, T21939C, G22017T, G22132T, T22552C, T23005G, T23487G, A23598G, T23633C, G23761A, C23934T, T23948C, A24369G, C25487A, T26311G, G26529A, C26833T, C28045T, A28715G

Homoplasies (3):A11782G, A22206G, C23423T

Reversions to root (1):T9866C

ORF1ab
Unique (14):E93K, E1363D, S1372A, P1640L, A2344V, S2500F, F3201L, T3258A, L3915F, K4176Q, P4211S, G5063S, Q5794K, L6367S

Reversions to root (1):F3201L

S
Unique (12):G75D, S98F, V126A, W152L, R190S, N481K, V642G, K679R, S691P, T791I, Y796H, D936G

Homoplasies (2):D215G, P621S

ORF3a
Unique (1):T32N

E
Unique (1):F23V

M
Unique (2):D3N, A104V

ORF8
Unique (1):A51V

N
Unique (1):T148A

Usher places it on a branch next to BA.2.83: (Mutations observed on branch: Homoplasies (1):A22995C, Reversions to root (1):A22995C, Homoplasies (1):S:K478T, Reversions to root (1):S:K478T)

Usher tree below: https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/subtreeAuspice1_genome_24faa_9de150.json

image

Downsampled global tree:

https://nextstrain.org/fetch/genome.ucsc.edu/trash/ct/singleSubtreeAuspice_genome_24faa_9de150.json

Nextclade

image

Deletions in the spike

image

image

@FedeGueli
Copy link
Contributor

Thx , First spotted by @Outpfmance yesterday.

@FedeGueli
Copy link
Contributor

@dikeledik in the rbd it has also S:K417T , S:K444N, S:V445G and S:L452M not mentioned above please add them,.

@FedeGueli
Copy link
Contributor

FedeGueli commented Feb 1, 2024

9 now,

@FedeGueli
Copy link
Contributor

@silcn @shay671 @thomasppeacock @corneliusroemer @AngieHinrichs

@FedeGueli
Copy link
Contributor

Interesting it has Orf1b:G662S as the other saltation ( defining mutation of Delta BA.2.75 XBB)

@BorisUitham
Copy link

BorisUitham commented Feb 1, 2024

Note, there is no reversion T9866C as this one very likely stems from a ba.2 without C9866T, this can be seen in a lot of the early south african ba.2 samples. Same goes for ba.2.86 and the ba.2.15 saltation singlet.
Two nucleotide queries you can use (lets see how they hold up when more sequences are available):
C541T, T19364C, C28045T
T4379G, A10037G

@FedeGueli
Copy link
Contributor

@corneliusroemer the most recent sample is misplaced due to dropouts , so they all have Orf1b:G662S, S:S691P and M:A104V and none of them as T28306C so the whole branch is mistakenly put there in this flip flop branch (@AngieHinrichs )
https://nextstrain.org/fetch/genome-test.gi.ucsc.edu/trash/ct/subtreeAuspice1_genome_test_3735f_b5b700.json?f_userOrOld=uploaded%20sample&label=id:node_11239036
Screenshot 2024-02-01 alle 10 04 59

@FedeGueli
Copy link
Contributor

T4379G, A10037G

Thx! yes exactly.

@thomasppeacock thomasppeacock added recommended Recommended for designation by pango team member BA.2 Saltation Appears on long branch length with no intermediates labels Feb 1, 2024
@thomasppeacock
Copy link

Recommending for designation - even if it doesnt grow any further its an interetsing lineage from an evolutionary perspective and hopefully designating will make it easier to pick up in ongoing surveillence data (particularly outside of South Africa)

@corneliusroemer
Copy link
Contributor

Thanks everyone, I'll designate this

@shay671
Copy link

shay671 commented Feb 1, 2024

From first look :
c9866,g21987a,c22995a that are part of BA.2 been reversed in this

C21618T and A23040Gthat are part of BA.2 been reversed but for the greater branch before it

G23948T that is in BA.2 was here changed to C instead

So - this might be BA.6 ?

@shay671
Copy link

shay671 commented Feb 1, 2024

From first look : c9866,g21987a,c22995a that are part of BA.2 been reversed in this

C21618T and A23040Gthat are part of BA.2 been reversed but for the greater branch before it

G23948T that is in BA.2 was here changed to C instead

So - this might be BA.6 ?

Good remark from @thomasppeacock - All are Non Synonymous. most in S1. so less likely BA.6

@c19850727
Copy link

https://twitter.com/Tuliodna/status/1752996494905414026 Dr. de Oliveira mentioned on twitter that NICD has been aware of 8 sequences with the lated one sampled on Nov 29th 2023, and all those 8 sequences are labeled 'baseline surveillance' on GISAID.

Also they are the only sequences with such a label from South Africa since October 2023.

Meanwhile, EPI_ISL_18845398, the latest sequence, seems a bit different. It's not among the 8 sequences Dr. de Oliveira mentioned; it's labeled as pneumonia surveillance instead of baseline surveillance; and it's uploaded in a different batch.

There have been only 5 pneumonia surveillance sequences samped since December 2023.

So I'd guess it could still be in circulation?

@ryhisner
Copy link

ryhisner commented Feb 1, 2024

There's an extremely interesting TRS pattern in this lineage. It has C66T in the 5' UTR, which is the second nucleotide upstream of the core TRS (which I consider to be AAACGAAC). So the two nucleotides upstream of the TRS-L are now 'TT' instead of 'CT.'

There are five TRS-Bs that match the first two nucleotides upstream of the TRS-L, which creates ea better match with the TRS-L (i.e. extended homology) and should increase their expression: Spike, M, ORF7a (in lineages with ORF6:D61L), ORF8, and N/ORF9b. Remarkably, the ORF7a and N/ORF9b TRS-Bs have mutated to match the C66T mutation. The mutations are C27384T (a reversion of the last, synonymous nucleotide in ORF6:D61L) and C28256T.
image

Since spike, M, and ORF8 no longer match the TRS-L at that position, their expression would be expected to decrease somewhat, though I'm not sure how significant this effect would be. If spike expression is decreased, this could be an immune-evasion tactic, as suggested to me by Ben Murrell.
image

@ryhisner
Copy link

ryhisner commented Feb 1, 2024

Continuing on the theme above, C66T mutations are extremely rare. I looked at lineages with ORF6:D61L (to make the C27384T comparison relevant) and the statistics suggest that C66T and C28256T occur together with remarkable regularity.

image

Similarly, the C27384T reversion commonly occurs in sequences with C66T.
image

C26469T, the corresponding mutation that causes the M TRS to match C66T, also sometimes occurs, though more rarely. The most striking example is in an HK.13 branch circulating in South Korea, which has C66T, C26469T, C27384T, and C28256T.

image

I looked at about 15 different Usher trees containing sequences with ORF6:D61L, C66T, C27384T, and C28256T, and in every case, the order of mutations was the same as above: C66T first, then C28256T (sometimes simultaneously with C66T), and then C27384T. (Two trees made it appear as if C28256T occurred first, but further investigation revealed the very small number of sequences that caused this not to have any coverage at C66.)

Also like the HK.13 branch pictured, these mutations often occur independently on different branches once C66T has been acquired. The largest such tree I could find is pictured below.

image

@shay671
Copy link

shay671 commented Feb 1, 2024

ba.2.x.xlsx

full charecterization of this clade + comparison to BA.2. based on our Mutation prevalence and % tools and the tree in USHER.

@corneliusroemer
Copy link
Contributor

Of the BA.2 without 9866C, around 4% have C18744T - so quite likely that this is the common ancestor lineage. I'll hence make BA.2.87=BA.2 + C18744T and this lineage here BA.2.87.1

See: https://cov-spectrum.org/explore/World/AllSamples/from%3D2021-11-01%26to%3D2024-01-25/variants?variantQuery=nextcladePangoLineage%3ABA.2+%26+9866C&nextcladePangoLineage1=BA.2&

These are the nucs that occur in 1% or more of those basal BA.2

image

@UnusualTimes
Copy link

Totally unique E mutation (Again !!) E:F23V never before seen anywhere as far as I know

@dikeledik
Copy link
Author

https://twitter.com/Tuliodna/status/1752996494905414026 Dr. de Oliveira mentioned on twitter that NICD has been aware of 8 sequences with the lated one sampled on Nov 29th 2023, and all those 8 sequences are labeled 'baseline surveillance' on GISAID.

Also they are the only sequences with such a label from South Africa since October 2023.

Meanwhile, EPI_ISL_18845398, the latest sequence, seems a bit different. It's not among the 8 sequences Dr. de Oliveira mentioned; it's labeled as pneumonia surveillance instead of baseline surveillance; and it's uploaded in a different batch.

There have been only 5 pneumonia surveillance sequences samped since December 2023.

So I'd guess it could still be in circulation?

Yes, two of the 9 were sampled through pneumonia surveillance, the latest one sampled in December.

@oobb45729
Copy link

This is not the first time we've seen S:V642G, S:S691P, S:T791I together:
https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?variantQuery=S%3AV642G%26S%3AT791I%26S%3AS691P&
Also, we've seen S:V642G and S:T791I/P multiple times together in highly mutated sequences:
https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?variantQuery=S%3AV642G%26%28S%3AT791I%7CS%3AT791P%29&
@ryhisner
Do you think those mutations are related?

@Sinickle
Copy link

Sinickle commented Feb 1, 2024

The only two previous variants I'm aware of that were infectious and also deleted (not just shifted) the disulfide bond between S:C15 and S:C136 are C.1.2 and B.1.640.
Both of these variants ALSO have the same glycan addition with S:R190S.

S:R190S isn't rare in general, but it's rare enough that I'm surprised to see it happened 3/3 times here.

@shay671
Copy link

shay671 commented Feb 3, 2024

BA.2.87.1 convergence analysis.xlsx

Here's the convergence analysis for BA.2.87.1
This analysis gives the convergence on 4 levels : AA sub, nuc sub, position, codon (non synonymous changes in the same codon in the ORF in different variants. e.g Spike 417 will include the variant with AA sub in that codon no matter the AA )
The analysis is for emergence. meaning that the variant for convergence is not including its progeny just the one in which the mutation occured infependently.
There is also eclusion of mediatoor branched. meaning that i include mutatios for a variant only based on the last step this variant was made. if branches to this variant from his parent was not designated, they do get excluded as we designate them in out team).

This data could not be calculated without the data of Usher from @AngieHinrichs. Thanks Angie, from me and evefryone in my team.

@drmutaba
Copy link

drmutaba commented Feb 4, 2024

Totally unique E mutation (Again !!) E:F23V never before seen anywhere as far as I know

I'd say it is very uncommon, but not completely new: there are around 60 sequences in GISAID showing the E:F23V mutation over the past years.

@FedeGueli
Copy link
Contributor

C16575T

@corneliusroemer the most recent sample is misplaced due to dropouts , so they all have Orf1b:G662S, S:S691P and M:A104V and none of them as T28306C so the whole branch is mistakenly put there in this flip flop branch (@AngieHinrichs )

thx to heads up from @Over-There-Is now the most recent sample is separated from the other not just by the three dropouts but also by several nucleotides cc @corneliusroemer :
Screenshot 2024-02-04 alle 23 18 08
https://nextstrain.org/fetch/genome-test.gi.ucsc.edu/trash/ct/subtreeAuspice1_genome_test_42666_aef0.json?c=gt-nuc_15451,66,6100,25791,27383&f_userOrOld=uploaded%20sample&gmax=28383&label=id:node_3047511

@corneliusroemer
Copy link
Contributor

corneliusroemer commented Feb 6, 2024

I think the US traveller surveillance picked up BA.2.87.1 already in October 2023:

hCoV-19/USA/NJ-GBW-H20-347-8898/2023|EPI_ISL_18421941|2023-10-12

This sample is spotty, it has a lot of missing segments (~70% coverage). But it's remarkable that despite the low coverage it shares 9 SNPs (vs. BA.2) with BA.2.87.1:

4012T
10702T
12791C 
16575T
19698A
25416C
25487A
27128T
28715T

This is very unlikely due to chance. It looks like the US sample was at least a coinfection of EG.5 and BA.2.87.1

One additional SNP is shared only with one of the 9 SA samples (hCoV-19/South_Africa/NICD-N57469/2023|EPI_ISL_18849991|2023-11-21): 18828T

On top of the 9/10 SNPs shared with BA.2.87.1, there are 5 reversion compared to EG.5 making it even less unlikely to be due to chance:

9866C -> wt/reversion to BA.2, almost all BA.2 and descendants have this mutation
12789C -> wt/reversion to pre-XBB.1.9
16342T -> wt/reversion to pre-XBB (also seen in XBB.2.3)
29625C -> wt/reversion to pre-EG.5

The annotation for the sample is: Traveler from United Arab Emirates

This shows how valuable traveler surveillance is. Maybe Gingko Bioworks can share the raw reads so we can see whether there's further evidence in the N stretches.

The spike in particular doesn't look very much like BA.2.87 and much more like EG.5, so the sample wasn't notable until BA.2.87.1 popped up as a distinctive cluster in high quality South African samples. But in retrospect, it seems pretty clear that this US travel surveillance sample is the earliest known BA.2.87.1 (thus far).

I found this sequence by doing an advanced covSpectrum search using the "5-of" operator on the list of nuc mutations unique to BA.2.87.1 among all BA.2 descendants. Previous searches I had done didn't specifically use only the "unique" mutations meaning the searches were less discriminative. Here's a query similar to the one I used: https://cov-spectrum.org/explore/World/AllSamples/AllTimes/variants?variantQuery=%5B5-of%3AC66T%2C+A309G%2C+G542A%2C+C4012T%2C+G4354T%2C+T4379G%2C+C7296T%2C+T7480C%2C+G9190T%2C+A10037G%2C+C10702T%2C+C12008T%2C+A12791C%2C+C12896T%2C+C15925T%2C+C17644A%2C+C18744T%2C+T19364C%2C+C19698A%2C+G21604A%2C+G21786A%2C+T21939C%2C+T22552C%2C+T22896G%2C+T23005G%2C+T23487G%2C+T23633C%2C+G23761A%2C+C23934T%2C+C25487A%2C+T26311G%2C+C28256T%2C+T28474C%2C+A28715G%5D&

Nextclade picture showing spottiness of the sample and nuc mutations:
image

@ryhisner
Copy link

ryhisner commented Feb 6, 2024

Oh, wow, I'd noticed that sequence had N:T148A, ORF1a:K4176Q, and ORF3a:T32N when no other sequence before BA.2.87.1 had ever had any two of those three. ORF1a:K4176Q had only been in about 90 sequences ever, while N:T148A and ORF3a:T32N have only ever been in ~1000 sequences each.

@Ex3B
Copy link

Ex3B commented Feb 8, 2024

"before the first known SA samples:"

EPI_ISL_18849985 has a collection date of 20 September.
So this October sequence was uploaded before the first SA sequence uploads, but the sample comes from after the first known SA samples.
Correct?

@corneliusroemer
Copy link
Contributor

"before the first known SA samples:"

EPI_ISL_18849985 has a collection date of 20 September.
So this October sequence was uploaded before the first SA sequence uploads, but the sample comes from after the first known SA samples.
Correct?

You are totally right! I had missed the September sample!

@ryhisner
Copy link

ryhisner commented Feb 8, 2024

This shows how valuable traveler surveillance is. Maybe Gingko Bioworks can share the raw reads so we can see whether there's further evidence in the N stretches.

Given that GBW does a lot of pooled sequencing—which they then upload to GISAID against database guidelines—this sample is almost certainly a badly contaminated one. I think you've proven beyond doubt that BA.2.87.1 was present in this sequence. But I doubt if a company that clearly has no interest in competently doing the job they're being paid tens of millions of dollars to do will be willing or capable of helping us learn more about one of their embarrassingly bad sequences.

@corneliusroemer
Copy link
Contributor

Update from Spheres call: The raw reads of the US traveller surveillance sequence are now on SRA
image

@AngieHinrichs
Copy link
Member

Accessions and direct link from the SPHERES call chat (IIRC from Cindy Friedman, CDC):

Additional accession information:
CDC's Traveler-Based Genomic Surveillance Program - PRJNA989177
SRA run - SRR27955152
BioSample - SAMN39931094

https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR27955152&display=metadata

@corneliusroemer
Copy link
Contributor

This is what the raw reads look like in IGV after some minimal processing (trimming, mapping). Note the coverage is on a log scale going up to 300k.

image

Some observations:

  • Wide range of coverage: most amplicons seem to have failed to amplify, about 10 have coverage >10k, then there are around 8 with coverage 300-10k. Around 10 amplicons have 10-300, and some segments have 0 coverage.
  • When restricting the analysis to the 10 amplicons with the highest coverage, I find that they are entirely compatible with BA.2.87.1, containing 7 signature mutations of BA.2.87.1 acquired on top of BA.2: C10702T, A12791C, C16575T, C19698A, C25487A, C27128T, A28715G, no mutation that should be in a high coverage region is missing, so this is a full hit
  • None of the 10 high coverage amplicons have signature XBB mutations, e.g. it's missing T16342C, C25416T, these should be present if it was an XBB descendant.
  • In areas of low to medium coverage, a few sites look like BA.2.87.1, but most look like XBB

These leads me to the conclusion that what we're seeing is a sample that's clearly BA.2.87.1, but where amplification failed for almost all amplicons. The host was either coinfected with XBB or there was some low-coverage contamination in the regions of the failed amplicons.

I'd be curious what other people conclude from the raw reads.

It might be helpful to have a list of all mutations in BA.2 (the most recent common ancestor of both XBB and BA.2.87.1) and the mutations that distinguish BA.2.87.1 from BA.2, and those that distinguish XBB from BA.2

All BA.2 mutations (SNPs):

C241T, T670G, C2790T, C3037T, G4184A, C4321T, C9344T, A9424G, C9534T, C10029T, C10198T, G10447A, C10449A, C12880T, C14408T, C15714T, C17410T, A18163G, C19955T, A20055G, C21618T, G21987A, T22200G, G22578A, C22674T, T22679C, C22686T, A22688G, G22775A, A22786C, G22813T, T22882G, G22992A, C22995A, A23013C, A23040G, A23055G, A23063T, T23075C, A23403G, C23525T, T23599G, C23604A, C23854A, G23948T, A24424T, T24469A, C25000T, C25584T, C26060T, C26270T, C26577G, G26709A, C26858T, A27259C, G27382C, A27383T, T27384C, C27807T, A28271T, C28311T, G28881A, G28882A, G28883C, A29510C`

BA.2.87.1 mutations on top of BA.2:

C66T, A309G, C541T, G542A, C2536T, C4012T, G4354T, T4379G, C5184T, C7296T, T7480C, C7764T, G9190T, A10037G, C10702T, A11782G, C12008T, A12791C, C12896T, G15451A, C15925T, C16575T, C17644A, C18744T, T19364C, C19698A, G21604A, G21786A, C21855T, T21939C, G22017T, G22132T, A22206G, T22552C, A22812C, G22894T, T22896G, C22916A, T22942G, A22995C, T23005G, G23040A, C23423T, T23487G, A23598G, T23633C, G23761A, C23934T, T23948C, A24369G, C25487A, T26311G, G26529A, C26833T, C27128T, C28045T, C28256T, T28474C, A28715G

XBB mutations on top of BA.2:

A405G, C9866T, G15451A, C15738T, T15939C, T16342C, T17859C, A19326G, T21810C, C22000A, C22109G, G22200A, G22577C, G22599C, C22664A, G22895C, T22896C, G22898A, T22942G, T23019C, T23031C, G23040A, C25416T, A26275G, T29759G

I got these from the prototypical pango sequence tree I maintain here: https://nextstrain.org/staging/nextclade/sars-cov-2

@FedeGueli
Copy link
Contributor

It might be helpful to have a list of all mutations in BA.2 (the most recent common ancestor of both XBB and BA.2.87.1) and the mutations that distinguish BA.2.87.1 from BA.2, and those that distinguish XBB from BA.2


I got these from the prototypical pango sequence tree I maintain here: https://nextstrain.org/staging/nextclade/sars-cov-2

i think @shay671 has it

@HynnSpylor
Copy link
Contributor

HynnSpylor commented Mar 11, 2024

Well... 3 wastewater samples are uploaded today from Bangkok, Thailand, collected from Dec 2023-Jan 2024.
EPI_ISL_18968820-18968822

@ryhisner
Copy link

EPI_ISL_18421941

@corneliusroemer, I was looking back on this sequence today, and the GISAID file now says it was from a female traveler from India. Do we know when or why this was changed? Has the metadata from the SRA sequence also been changed?
image

@FedeGueli
Copy link
Contributor

originally:

The annotation for the sample is: Traveler from United Arab Emirates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BA.2 recommended Recommended for designation by pango team member Saltation Appears on long branch length with no intermediates
Projects
None yet
Development

No branches or pull requests