Optimized `diff()` function which now uses `fastDiff()` function internally for large data sets #274

f1ames · 2019-02-11T14:26:14Z

Suggested merge commit message (convention)

Other: Optimized diff() function which now uses fastDiff() function internally for large data sets. Closes ckeditor/ckeditor5#5013.

Additional information

See ckeditor/ckeditor5#5013.

f1ames · 2019-02-11T14:41:27Z

I was thinking about adding some performance unit tests which checks how long setting data (with editor.setData()) takes. There is no sense in measuring how long diff() function takes, because it doesn't change very often and as you can see from https://github.com/ckeditor/ckeditor5-utils/issues/269#issuecomment-461116792, this issue was caused by some external code changes (not the diff() changes itself).

The problem with such tests is that exec time may vary (same test from 100 to 400 ms between runs in different browsers, or even 400 to 1100 ms) so I'm not sure if it makes sense as such tests are not very stable and the exec time depends on the environment they are run in. WDYT @Reinmar?

Anyway, these tests can be added separately (especially that they should be added to core IMHO) so this PR is ready for review.

Reinmar · 2019-02-11T15:02:16Z

I was thinking about adding some performance unit tests which checks how long setting data (with editor.setData()) takes.

In general, 👍 .

Anyway, these tests can be added separately (especially that they should be added to core IMHO) so this PR is ready for review.

Definitely to the core.

The problem with such tests is that exec time may vary (same test from 100 to 400 ms between runs in different browsers, or even 400 to 1100 ms) so I'm not sure if it makes sense as such tests are not very stable and the exec time depends on the environment they are run in. WDYT @Reinmar?

I assume we changed results from a scenario with e.g. 1000 nodes from 10s to 100ms, am I right? So it's easy to write a threshold value – anything below e.g. 1s will be fine. Such a test should be really stable.

However, tests should be fast. So ideally I'd love a test which always passes in <100ms. Is there such a node count which would generate such results now and before would fail?

Reinmar · 2019-02-11T15:03:59Z

Definitely to the core.

Actually... to the engine. The bug was in the engine.

src/diff.js

f1ames · 2019-02-11T15:47:46Z

I assume we changed results from a scenario with e.g. 1000 nodes from 10s to 100ms, am I right? So it's easy to write a threshold value – anything below e.g. 1s will be fine. Such a test should be really stable.

Yes and no :D Previously render() took around 23 seconds for the sample data, now it takes around 200 - 300ms. However, the other parts of the pipeline still takes visible amount of time

which means end to end tests for larger data sets still takes up to few seconds to be completed. Maybe we could test only some part of the pipeline (e.g. render()?), but the whole picture might get lost then.

Mgsy · 2019-02-12T08:23:18Z

Steps to reproduce

Copy content from the following Gist.
Call editor.setData() with the content from step 1.
Repeat step 2.

Current result

The editor crashes.

Error

fastdiff.js:176 Uncaught TypeError: arr.slice is not a function
    at cutAndReverse (fastdiff.js:176)
    at findChangeBoundaryIndexes (fastdiff.js:131)
    at Function.fastDiff (fastdiff.js:99)
    at diff (diff.js:37)
    at Renderer._diffNodeLists (renderer.js:584)
    at Renderer._updateChildrenMappings (renderer.js:264)
    at Renderer.render (renderer.js:176)
    at View._render (view.js:667)
    at View.on (view.js:182)
    at View.fire (emittermixin.js:204)

GIF

Reinmar · 2019-02-12T08:52:11Z

Yes and no :D Previously render() took around 23 seconds for the sample data, now it takes around 200 - 300ms. However, the other parts of the pipeline still takes visible amount of time

It's understandable and acceptable (at this stage) when loading a bigger blob of data takes that much time. We should be focused on the part which took the majority of that time and that's diff(). Going down from 23s to 200ms is the biggest win we could have – going down another 50ms will be really hard now.

which means end to end tests for larger data sets still takes up to few seconds to be completed.

I don't understand. Can't we use a data set of a size which will give now results <100ms and would take 10s before?

The point of an e2e test is to make sure the problem's never back. That it doesn't appear in some other place. The problem was that for a data set of a reasonable size the setData() call took 20s. It's down to 200ms. Let's make sure it stays there.

Mgsy · 2019-02-12T08:59:44Z

#274 (comment)

Now I can't reproduce it. Interesting, it would be nice if you could take a look at it 🤔

f1ames · 2019-02-12T10:22:21Z

#274 (comment)

@Mgsy I will take a look. Is it possible that you had some cached / not up-to-date version? The 2263658 commit changed how non-array / array like elements are detected and converted into arrays so it could affected the issue you have mentioned.

Mgsy · 2019-02-12T10:25:20Z

@Mgsy I will take a look. Is it possible that you had some cached / not up-to-date version?

I suppose it could be this. Now it works fine for me.

coveralls · 2019-02-12T13:07:12Z

Coverage remained the same at 100.0% when pulling a99c734 on t/269 into 54b8108 on master.

f1ames · 2019-02-12T14:01:47Z

I don't understand. Can't we use a data set of a size which will give now results <100ms and would take 10s before?

The point of an e2e test is to make sure the problem's never back. That it doesn't appear in some other place. The problem was that for a data set of a reasonable size the setData() call took 20s. It's down to 200ms. Let's make sure it stays there.

What I meant earlier is that for data which took around 10s before, render() took ~8s and rest of the pipeline took ~2s. Now for such cases render() time is probably ~100ms but the rest of the pipeline still takes ~2s. So it is not possible to create e2e test which now takes 100ms and before the changes took 10s (because of entire pipeline additional processing time). And for smaller data sets the difference is not that visible (in the overall time), something like 500ms (with fastDiff) vs 700ms (without fastDiff) on Chrome (and basically doubles on Firefox 😞).

Mgsy

Haven't found any issue.

f1ames · 2019-02-13T14:16:33Z

I have added rendering performance tests in ckeditor/ckeditor5-engine#1676.

f1ames added 4 commits February 6, 2019 13:33

The fastdiff() function now also accepts arrays as input.

fa6088f

Now diff() uses fastDiff() internally for longer inputs.

bebc61f

fastDiff() function improvements.

2263658

Docs: Rewording. [skip ci]

b857007

f1ames requested a review from Reinmar February 11, 2019 14:42

Reinmar requested a review from Mgsy February 11, 2019 14:57

Reinmar reviewed Feb 11, 2019

View reviewed changes

src/diff.js Outdated Show resolved Hide resolved

Mgsy added the status:review- label Feb 12, 2019

f1ames added 2 commits February 12, 2019 13:51

Smaller threshold for fastDiff().

81bfe2d

Naming and docs improvements in fastDiff().

484c07b

Mgsy approved these changes Feb 12, 2019

View reviewed changes

f1ames requested a review from Reinmar February 12, 2019 17:00

f1ames removed the status:review- label Feb 12, 2019

f1ames mentioned this pull request Feb 13, 2019

Tests: Rendering performance tests introduced ckeditor/ckeditor5-engine#1676

Merged

Docs.

a99c734

Reinmar force-pushed the t/269 branch from b24e1e0 to a99c734 Compare February 13, 2019 15:21

Reinmar merged commit ee9bed0 into master Feb 13, 2019

Reinmar deleted the t/269 branch February 13, 2019 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized `diff()` function which now uses `fastDiff()` function internally for large data sets #274

Optimized `diff()` function which now uses `fastDiff()` function internally for large data sets #274

f1ames commented Feb 11, 2019 •

edited by pomek

Loading

f1ames commented Feb 11, 2019 •

edited

Loading

Reinmar commented Feb 11, 2019

Reinmar commented Feb 11, 2019

f1ames commented Feb 11, 2019

Mgsy commented Feb 12, 2019

Reinmar commented Feb 12, 2019

Mgsy commented Feb 12, 2019

f1ames commented Feb 12, 2019

Mgsy commented Feb 12, 2019

coveralls commented Feb 12, 2019 •

edited

Loading

f1ames commented Feb 12, 2019 •

edited

Loading

Mgsy left a comment

f1ames commented Feb 13, 2019

Optimized diff() function which now uses fastDiff() function internally for large data sets #274

Optimized diff() function which now uses fastDiff() function internally for large data sets #274

Conversation

f1ames commented Feb 11, 2019 • edited by pomek Loading

Suggested merge commit message (convention)

Additional information

f1ames commented Feb 11, 2019 • edited Loading

Reinmar commented Feb 11, 2019

Reinmar commented Feb 11, 2019

f1ames commented Feb 11, 2019

Mgsy commented Feb 12, 2019

Steps to reproduce

Current result

Error

GIF

Reinmar commented Feb 12, 2019

Mgsy commented Feb 12, 2019

f1ames commented Feb 12, 2019

Mgsy commented Feb 12, 2019

coveralls commented Feb 12, 2019 • edited Loading

f1ames commented Feb 12, 2019 • edited Loading

Mgsy left a comment

Choose a reason for hiding this comment

f1ames commented Feb 13, 2019

Optimized `diff()` function which now uses `fastDiff()` function internally for large data sets #274

Optimized `diff()` function which now uses `fastDiff()` function internally for large data sets #274

f1ames commented Feb 11, 2019 •

edited by pomek

Loading

f1ames commented Feb 11, 2019 •

edited

Loading

coveralls commented Feb 12, 2019 •

edited

Loading

f1ames commented Feb 12, 2019 •

edited

Loading