fix: Update to the latest version of nextclade. #1701

jgadling · 2024-08-16T22:12:15Z

Summary:

What: Upgrade to the latest version of nextclade, so our strain results are more accurate
Ticket:
Env:

Demos:

Notes:

Checklist:

I merged latest <base branch>
I manually verified the change
I added labels to my PR
I tested in multiple browsers
I added relevant unit tests
I have notified others of changes they need to make locally (migrations, jobs, package updates, etc)

danrlu

Thanks very much!

vincent-czi · 2024-08-16T22:26:27Z

src/backend/aspen/workflows/nextclade/prep_samples.py

@@ -127,6 +128,13 @@ def cli(
        # generalized case and we'll need to figure out how to handle that,
        # but right now the workflow is hardcoded to always expecting dataset.
        nextclade_dataset_name = target_pathogen.nextclade_dataset_name
+        # Nextclade 3.2.8 has new names for datasets vs the 2.1 names in the db.
+        new_nextclade_dataset_names = {


I'm not sure if normalizing the new names back to whatever the old one was is the right choice. I don't remember how the dataset name gets used, but if there's no logic based on the value elsewhere and its just being held so we know what reference was used, I think we shouldn't standardize to the old name.

Oh, whoops, I see I misread the direction of the lookup var.

yeah, I think this is just the minimal change possible -- we only use this value to dowload the right dataset via the nextclade cli, and nothing else changes anywhere in our system

I honestly don't know: is there a reason we can't modify the row for the pathogens table so the nextclade_dataset_name for MPX is the new value instead? If that's doable, it seems preferable to go that way, but I also don't know if the refresh logic would freak out if old MPX and new MPX referenced different things.

I did a skim through the code and it looks like this is the one-and-only place where the nextclade dataset name value is used, so it should be safe to update the db values instead - I'll update the PR

danrlu · 2024-08-19T14:19:10Z

The update is in:
Prod (old version)

Staging (new version and includes updated lineages)

Also tested SC2 samples and nothing is broken. Thank you both!

jgadling added 2 commits August 16, 2024 17:55

Update to the latest version of nextclade.

11d2b5b

Fix tests

224d3c0

jgadling requested review from danrlu and vincent-czi August 16, 2024 22:12

danrlu approved these changes Aug 16, 2024

View reviewed changes

vincent-czi reviewed Aug 16, 2024

View reviewed changes

jgadling added 3 commits August 16, 2024 18:28

Fix git error.

d6912a5

update the db instead of the code.

66a354c

Fix lint.

db1a67c

jgadling merged commit 8b44c4b into trunk Aug 16, 2024
13 checks passed

jgadling deleted the jgadling/upgrade-nextclade branch August 16, 2024 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Update to the latest version of nextclade. #1701

fix: Update to the latest version of nextclade. #1701

jgadling commented Aug 16, 2024

danrlu left a comment

vincent-czi Aug 16, 2024

vincent-czi Aug 16, 2024

jgadling Aug 16, 2024

vincent-czi Aug 16, 2024

jgadling Aug 16, 2024

danrlu commented Aug 19, 2024

fix: Update to the latest version of nextclade. #1701

fix: Update to the latest version of nextclade. #1701

Conversation

jgadling commented Aug 16, 2024

Summary:

Demos:

Notes:

Checklist:

danrlu left a comment

Choose a reason for hiding this comment

vincent-czi Aug 16, 2024

Choose a reason for hiding this comment

vincent-czi Aug 16, 2024

Choose a reason for hiding this comment

jgadling Aug 16, 2024

Choose a reason for hiding this comment

vincent-czi Aug 16, 2024

Choose a reason for hiding this comment

jgadling Aug 16, 2024

Choose a reason for hiding this comment

danrlu commented Aug 19, 2024