Speed up nml import for nmls with many trees #4742

daniel-wer · 2020-07-29T15:38:17Z

I noticed that the NML import for NMLs with many trees (20,000+) was very slow and had a look. Turns out I introduced this regression in #4541 by repeatedly calling the getMaximumTreeId function when importing trees. ~~The getMaximumTreeId function was also rather slow as it created a potentially large array everytime it was called.~~ I ~~replaced that with a simple reduce (10x speedup for tracings with >20,000 trees) and also~~ made sure to only call the getMaximumTreeId function when it is necessary (~100x speedup during nml import).

URL of deployed dev instance (used for testing):

https://___.webknossos.xyz

Steps to test:

Unzip and import synapses_0.80.zip in an existing tracing. The import should be fast and not take multiple minutes.

Updated (unreleased) changelog
Updated (unreleased) migration guide if applicable
Updated documentation if applicable
Adapted wk-connect if datastore API changes
Needs datastore update after deployment
Ready for review

philippotto · 2020-07-29T15:50:30Z

frontend/javascripts/oxalis/model/reducers/skeletontracing_reducer_helpers.js

+  return Object.values(trees).reduce(
+    (maxId, tree) => (tree.treeId > maxId ? tree.treeId : maxId),
+    Constants.MIN_TREE_ID - 1,
+  );


Did you also compare this to a simple for-loop? I think, I remember a case where this was another large performance boost. But maybe only because the aggregator was an object in that particular case. For a simple number it might be without benefit. Maybe worth to test, though.

Also: Did you compare line 76 to Math.max(tree.treeId, maxId)? If your current way is faster, that's fine, but maybe there is no difference. In that case, I'd prefer the math.max version :)

Very good remarks! I did some further testing and also noticed that I worked with a trees array (instead of an object) in my benchmarks 😕 . When using an object there doesn't seem to be a significant speedup over the existing lodash method using reduce or a foor lop (because Object.keys/values needs to be called). I did not spent much more time investigating because not calling getMaximumTreeId 20,000+ times provided the much greater speedup and optimizing the getMaximumTreeId function seems more like premature optimization at this point :)

When using an object there doesn't seem to be a significant speedup over the existing lodash method using reduce or a foor lop (because Object.keys/values needs to be called).

Did you notice the "not significant speed ups" while micro benchmarking getMaximumTreeId or while e2e-benchmarking the entire nml import? Since getMaximumTreeId is also used in the rest of WK core, I'd find it worth it to swap the implementation to reduce/for-loop in case there's at least some speed-up factor :)

I micro-benchmarked getMaximumTreeId and smaller speedups are noticeable only when calling getMaximumTreeId >10,000 times for a tracing with at least 20,000 trees. As we don't call it nearly that often (after my fix in this PR), I'm not convinced we need to optimize the function.

Cool, totally agree then 👍

philippotto

Very cool that you spent the time to find this culprit :)

speed up nml import for nmls with many trees

629f390

daniel-wer requested a review from philippotto July 29, 2020 15:38

daniel-wer self-assigned this Jul 29, 2020

update changelog

2e8cfde

philippotto reviewed Jul 29, 2020

View reviewed changes

daniel-wer added 2 commits July 29, 2020 19:00

revert getMaximumTreeId implementation

321e6e9

Merge branch 'master' into speed-up-tree-import

0f19004

philippotto approved these changes Jul 30, 2020

View reviewed changes

daniel-wer merged commit f6a0d84 into master Jul 30, 2020

daniel-wer deleted the speed-up-tree-import branch July 30, 2020 09:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up nml import for nmls with many trees #4742

Speed up nml import for nmls with many trees #4742

daniel-wer commented Jul 29, 2020 •

edited

Loading

philippotto Jul 29, 2020

daniel-wer Jul 29, 2020

philippotto Jul 30, 2020

daniel-wer Jul 30, 2020

philippotto Jul 30, 2020

philippotto left a comment

Speed up nml import for nmls with many trees #4742

Speed up nml import for nmls with many trees #4742

Conversation

daniel-wer commented Jul 29, 2020 • edited Loading

URL of deployed dev instance (used for testing):

Steps to test:

philippotto Jul 29, 2020

Choose a reason for hiding this comment

daniel-wer Jul 29, 2020

Choose a reason for hiding this comment

philippotto Jul 30, 2020

Choose a reason for hiding this comment

daniel-wer Jul 30, 2020

Choose a reason for hiding this comment

philippotto Jul 30, 2020

Choose a reason for hiding this comment

philippotto left a comment

Choose a reason for hiding this comment

daniel-wer commented Jul 29, 2020 •

edited

Loading