Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify data loading between narrative and non-narrative modes #1305

Merged
merged 4 commits into from
Sep 7, 2021

Conversation

jameshadfield
Copy link
Member

This PR represents in-progress work to address issue #1283.

Dataset loading, as of v2.24.0, differs dramatically between narrative & non-narrative ("normal") mode. Specifically, the narrative mode failed to fetch sidecar files (tip-frequencies and root-sequence) which resulted in #1283. To some extent the differences in logic between the modes is unavoidable, however there are steps in common which this PR aims to isolate into functions, thus reducing duplicated logic which may diverge.

I may not have a chance to revisit this for a week or so, however there are a few places where 👀 would be welcome:

  • Are there additions to narratives/test_multiple-datasets.md to capture examples not added via this PR?
  • @eharkins this PR should correctly populate the cache with sidecar files (for datasets which differ from that of the starting page), however when changing pages those sidecar files aren't requested from the cache. Do you have ideas here?
  • general comments on the abstractions started by 3c9d28c would be welcome

@jameshadfield jameshadfield temporarily deployed to auspice-bug-narratives--14d4pl March 16, 2021 23:13 Inactive
@jameshadfield jameshadfield changed the title Bug narratives frequencies Unify data loading between narrative and non-narrative modes Mar 16, 2021
@eharkins
Copy link
Contributor

eharkins commented Aug 6, 2021

Rebased on master and merged #1312 to address

sidecar files aren't requested from the cache

Will continue reviewing and testing here.

Copy link
Contributor

@eharkins eharkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jameshadfield! Jumping back into the code, I left a few comments about specific lines. Here's my interpretation of what remains to be done:

Please add to the list here if I missed some TODO items you had in mind :)

* [1] {string | undefined} secondTreeUrl
* [2] {string | undefined} string of old syntax
*/
const collectDatasetFetchUrlsDeprecatedSyntax = (url) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the existing collectDatasetFetchUrls handle the deprecated syntax or does this mean we are dropping the ability to parse the deprecated syntax here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameshadfield it seems like in its current state this PR drops the ability to parse the deprecated second tree syntax, e.g. /flu/seasonal/h3n2/ha:na/2y, which works on the master branch. Is that intentional?

src/actions/loadData.js Outdated Show resolved Hide resolved
src/actions/loadData.js Outdated Show resolved Hide resolved
src/actions/navigation.js Outdated Show resolved Hide resolved
This replaces the functionality of narratives/test_multiple-datasets.md
which is removed.
These changes were motivated by
[#1283](#1283) which arose
as we used different code paths for loading a dataset viz and a
narrative.

Here we represent each dataset by a `Dataset` object. This is used for
stand alone datasets, each dataset in a tangletree, and each dataset in
a narrative. Each dataset instance describes the various API endpoints
of datafiles for each dataset, manages fetching of these datafiles and,
where appropriate, can dispatch data to update redux state.

This has been tested on various single datasets, tangle-trees, and
narratives in this repo. Notably, this commit breaks narratives with
multiple datasets; this will be fixed in a subsequent commit to
reflect Eli's work in PR #1312.
eharkins and others added 2 commits August 11, 2021 17:51
This complements the previous commit to allow narrative slide-changes to
change datasets by querying the cache set up at narrative load.
Appropriate sidecar files are also loaded, and we ensure that
sidecar data from the previous dataset is not displayed.

This work is based upon PR #1312 by eharkins.

Co-authored-by: eharkins <[email protected]>
This fixes a longstanding (perhaps undescovered) bug where narratives
could not both define different datasets per slide and have one of those
datasets be a tanglegram.
@jameshadfield
Copy link
Member Author

Thanks for the detailed review @eharkins - this prompted me to rewrite my original commit to further simply things. The concept of a "dataset" and its associated api calls, fetch results etc is now an object, so dataset views and narratives can both simply pass around instances of Dataset. I think this simplifies things quite a bit and prevents somewhat duplicated code.

I leant heavily on http://localhost:4000/narratives/test/sidecar-files for testing, which identified numerous bugs (including stale sidecar data, broken tangletrees etc). I've also tested a large number of individual datasets, tangletrees and other narratives. The nextstrain.org review app should allow further testing still.

@eharkins would you mind re-reviewing?

pushState: true,
query
});
mainDataset.loadSidecars(dispatch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense in terms of frequencies to only fetch for the main dataset and not the second one since it seems like only the main dataset's frequencies can be displayed in a tanglegram view. What about for root sequences? Sorry, I'm not as familiar with how that file works / is displayed.

Copy link
Contributor

@eharkins eharkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I still would like to test it more and get others to test it out as well before we merge but the code is much easier to understand in terms of logic and abstraction. A couple of abstraction notes:

  1. I know we talked about whether it would make sense to refactor changePage at this point but maybe that's best done in a separate PR. To maintain the best of both worlds, maybe we could make changePage only do case 3 where we actually change page. Then we could have a separate function called changeDatasetState or something like that which has all three options with appropriate option names like queryUpdate, narrativeUpdate, changePage to allow us some consistency if we do think these 3 cases belong under the same heading nominally.

  2. Since you might need to be able to dispatch a clean state with two datasets, i think the abstraction level of the dataset object makes sense as is, since dispatching inside a prototype attached to the dataset object would become complicated in that case.

@jameshadfield jameshadfield marked this pull request as ready for review September 7, 2021 08:28
@jameshadfield
Copy link
Member Author

Thanks Eli - I agree that changes to changePage would make for a nicer (internal) API. I've re-tested and think this is good to go, so I'm going to merge and release to dev.nextstrain.org for a final testing pass.

@jameshadfield jameshadfield merged commit 4aac28d into master Sep 7, 2021
@jameshadfield jameshadfield deleted the bug-narratives-frequencies branch September 7, 2021 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants