Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up nml import for nmls with many trees #4742

Merged
merged 4 commits into from
Jul 30, 2020
Merged

Conversation

daniel-wer
Copy link
Member

@daniel-wer daniel-wer commented Jul 29, 2020

I noticed that the NML import for NMLs with many trees (20,000+) was very slow and had a look. Turns out I introduced this regression in #4541 by repeatedly calling the getMaximumTreeId function when importing trees. The getMaximumTreeId function was also rather slow as it created a potentially large array everytime it was called. I replaced that with a simple reduce (10x speedup for tracings with >20,000 trees) and also made sure to only call the getMaximumTreeId function when it is necessary (~100x speedup during nml import).

URL of deployed dev instance (used for testing):

  • https://___.webknossos.xyz

Steps to test:

  • Unzip and import synapses_0.80.zip in an existing tracing. The import should be fast and not take multiple minutes.

@daniel-wer daniel-wer self-assigned this Jul 29, 2020
Comment on lines 75 to 78
return Object.values(trees).reduce(
(maxId, tree) => (tree.treeId > maxId ? tree.treeId : maxId),
Constants.MIN_TREE_ID - 1,
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you also compare this to a simple for-loop? I think, I remember a case where this was another large performance boost. But maybe only because the aggregator was an object in that particular case. For a simple number it might be without benefit. Maybe worth to test, though.

Also: Did you compare line 76 to Math.max(tree.treeId, maxId)? If your current way is faster, that's fine, but maybe there is no difference. In that case, I'd prefer the math.max version :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good remarks! I did some further testing and also noticed that I worked with a trees array (instead of an object) in my benchmarks 😕 . When using an object there doesn't seem to be a significant speedup over the existing lodash method using reduce or a foor lop (because Object.keys/values needs to be called). I did not spent much more time investigating because not calling getMaximumTreeId 20,000+ times provided the much greater speedup and optimizing the getMaximumTreeId function seems more like premature optimization at this point :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using an object there doesn't seem to be a significant speedup over the existing lodash method using reduce or a foor lop (because Object.keys/values needs to be called).

Did you notice the "not significant speed ups" while micro benchmarking getMaximumTreeId or while e2e-benchmarking the entire nml import? Since getMaximumTreeId is also used in the rest of WK core, I'd find it worth it to swap the implementation to reduce/for-loop in case there's at least some speed-up factor :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I micro-benchmarked getMaximumTreeId and smaller speedups are noticeable only when calling getMaximumTreeId >10,000 times for a tracing with at least 20,000 trees. As we don't call it nearly that often (after my fix in this PR), I'm not convinced we need to optimize the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, totally agree then 👍

Copy link
Member

@philippotto philippotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool that you spent the time to find this culprit :)

@daniel-wer daniel-wer merged commit f6a0d84 into master Jul 30, 2020
@daniel-wer daniel-wer deleted the speed-up-tree-import branch July 30, 2020 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants