Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Optimized diff() function which now uses fastDiff() function internally for large data sets #274

Merged
merged 7 commits into from
Feb 13, 2019

Conversation

f1ames
Copy link
Contributor

@f1ames f1ames commented Feb 11, 2019

Suggested merge commit message (convention)

Other: Optimized diff() function which now uses fastDiff() function internally for large data sets. Closes ckeditor/ckeditor5#5013.


Additional information

See ckeditor/ckeditor5#5013.

@f1ames
Copy link
Contributor Author

f1ames commented Feb 11, 2019

I was thinking about adding some performance unit tests which checks how long setting data (with editor.setData()) takes. There is no sense in measuring how long diff() function takes, because it doesn't change very often and as you can see from https://github.com/ckeditor/ckeditor5-utils/issues/269#issuecomment-461116792, this issue was caused by some external code changes (not the diff() changes itself).

The problem with such tests is that exec time may vary (same test from 100 to 400 ms between runs in different browsers, or even 400 to 1100 ms) so I'm not sure if it makes sense as such tests are not very stable and the exec time depends on the environment they are run in. WDYT @Reinmar?

Anyway, these tests can be added separately (especially that they should be added to core IMHO) so this PR is ready for review.

@f1ames f1ames requested a review from Reinmar February 11, 2019 14:42
@Reinmar Reinmar requested a review from Mgsy February 11, 2019 14:57
@Reinmar
Copy link
Member

Reinmar commented Feb 11, 2019

I was thinking about adding some performance unit tests which checks how long setting data (with editor.setData()) takes.

In general, 👍 .

Anyway, these tests can be added separately (especially that they should be added to core IMHO) so this PR is ready for review.

Definitely to the core.

The problem with such tests is that exec time may vary (same test from 100 to 400 ms between runs in different browsers, or even 400 to 1100 ms) so I'm not sure if it makes sense as such tests are not very stable and the exec time depends on the environment they are run in. WDYT @Reinmar?

I assume we changed results from a scenario with e.g. 1000 nodes from 10s to 100ms, am I right? So it's easy to write a threshold value – anything below e.g. 1s will be fine. Such a test should be really stable.

However, tests should be fast. So ideally I'd love a test which always passes in <100ms. Is there such a node count which would generate such results now and before would fail?

@Reinmar
Copy link
Member

Reinmar commented Feb 11, 2019

Definitely to the core.

Actually... to the engine. The bug was in the engine.

src/diff.js Outdated Show resolved Hide resolved
@f1ames
Copy link
Contributor Author

f1ames commented Feb 11, 2019

I assume we changed results from a scenario with e.g. 1000 nodes from 10s to 100ms, am I right? So it's easy to write a threshold value – anything below e.g. 1s will be fine. Such a test should be really stable.

Yes and no :D Previously render() took around 23 seconds for the sample data, now it takes around 200 - 300ms. However, the other parts of the pipeline still takes visible amount of time

image

which means end to end tests for larger data sets still takes up to few seconds to be completed. Maybe we could test only some part of the pipeline (e.g. render()?), but the whole picture might get lost then.

@Mgsy
Copy link
Member

Mgsy commented Feb 12, 2019

Steps to reproduce

  1. Copy content from the following Gist.
  2. Call editor.setData() with the content from step 1.
  3. Repeat step 2.

Current result

The editor crashes.

Error

fastdiff.js:176 Uncaught TypeError: arr.slice is not a function
    at cutAndReverse (fastdiff.js:176)
    at findChangeBoundaryIndexes (fastdiff.js:131)
    at Function.fastDiff (fastdiff.js:99)
    at diff (diff.js:37)
    at Renderer._diffNodeLists (renderer.js:584)
    at Renderer._updateChildrenMappings (renderer.js:264)
    at Renderer.render (renderer.js:176)
    at View._render (view.js:667)
    at View.on (view.js:182)
    at View.fire (emittermixin.js:204)

GIF

bug_cke5

@Reinmar
Copy link
Member

Reinmar commented Feb 12, 2019

Yes and no :D Previously render() took around 23 seconds for the sample data, now it takes around 200 - 300ms. However, the other parts of the pipeline still takes visible amount of time

It's understandable and acceptable (at this stage) when loading a bigger blob of data takes that much time. We should be focused on the part which took the majority of that time and that's diff(). Going down from 23s to 200ms is the biggest win we could have – going down another 50ms will be really hard now.

which means end to end tests for larger data sets still takes up to few seconds to be completed.

I don't understand. Can't we use a data set of a size which will give now results <100ms and would take 10s before?

The point of an e2e test is to make sure the problem's never back. That it doesn't appear in some other place. The problem was that for a data set of a reasonable size the setData() call took 20s. It's down to 200ms. Let's make sure it stays there.

@Mgsy
Copy link
Member

Mgsy commented Feb 12, 2019

#274 (comment)

Now I can't reproduce it. Interesting, it would be nice if you could take a look at it 🤔

@f1ames
Copy link
Contributor Author

f1ames commented Feb 12, 2019

#274 (comment)

@Mgsy I will take a look. Is it possible that you had some cached / not up-to-date version? The 2263658 commit changed how non-array / array like elements are detected and converted into arrays so it could affected the issue you have mentioned.

@Mgsy
Copy link
Member

Mgsy commented Feb 12, 2019

@Mgsy I will take a look. Is it possible that you had some cached / not up-to-date version?

I suppose it could be this. Now it works fine for me.

@coveralls
Copy link

coveralls commented Feb 12, 2019

Coverage Status

Coverage remained the same at 100.0% when pulling a99c734 on t/269 into 54b8108 on master.

@f1ames
Copy link
Contributor Author

f1ames commented Feb 12, 2019

I don't understand. Can't we use a data set of a size which will give now results <100ms and would take 10s before?

The point of an e2e test is to make sure the problem's never back. That it doesn't appear in some other place. The problem was that for a data set of a reasonable size the setData() call took 20s. It's down to 200ms. Let's make sure it stays there.

What I meant earlier is that for data which took around 10s before, render() took ~8s and rest of the pipeline took ~2s. Now for such cases render() time is probably ~100ms but the rest of the pipeline still takes ~2s. So it is not possible to create e2e test which now takes 100ms and before the changes took 10s (because of entire pipeline additional processing time). And for smaller data sets the difference is not that visible (in the overall time), something like 500ms (with fastDiff) vs 700ms (without fastDiff) on Chrome (and basically doubles on Firefox 😞).

Copy link
Member

@Mgsy Mgsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't found any issue.

@f1ames
Copy link
Contributor Author

f1ames commented Feb 13, 2019

I have added rendering performance tests in ckeditor/ckeditor5-engine#1676.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize diff() for bigger data sets
4 participants