Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce fastDiff() #5000

Closed
f1ames opened this issue Apr 12, 2018 · 6 comments · Fixed by ckeditor/ckeditor5-utils#238
Closed

Introduce fastDiff() #5000

f1ames opened this issue Apr 12, 2018 · 6 comments · Fixed by ckeditor/ckeditor5-utils#238
Assignees
Labels
package:utils type:task This issue reports a chore (non-production change) and other types of "todos".
Milestone

Comments

@f1ames
Copy link
Contributor

f1ames commented Apr 12, 2018

Extracted from https://github.com/ckeditor/ckeditor5-engine/issues/403 (see https://github.com/ckeditor/ckeditor5-engine/issues/403#issuecomment-380467350).

The whole idea behind fastDiff is to introduce a faster diffing function (as our diff has some performance issues in specific cases) which will be able to diff two strings determining at what index was the first and last change and what text was changed.

The main purpose is to use it in the renderer while implementing smart text nodes updates (https://github.com/ckeditor/ckeditor5-engine/issues/403). So based on a fastDiff result we would like to use CharacterData.insertData and CharacterData.deleteData to smartly (partially) replace exisitng text node content when rendering new text.

@Reinmar
Copy link
Member

Reinmar commented Apr 12, 2018

LOL... when I first saw "fastdiff" I thought these are just random letters :D

BTW, fastDiff() would be more correct.

@f1ames f1ames changed the title Introduce fastdiff Introduce fastDiff Apr 12, 2018
@f1ames f1ames changed the title Introduce fastDiff Introduce fastDiff() Apr 12, 2018
@scofalik
Copy link
Contributor

LOL... when I first saw "fastdiff" I thought these are just random letters :D

Same here, I thought WTF.

@f1ames
Copy link
Contributor Author

f1ames commented Apr 12, 2018

Regarding the fastDiff output format, when thinking about first and last change we may go with something like:

// fastDiff( oldText, newText );
fastDiff( '123', 'abc123' ); // { firstChange: 0, lastChange: 3, text: 'abc' }

which means that in a newText, abc is the changed sequence between index 0 - 3 and rest stays untouched (basically lastChange and text are redundant and one can be calculated from the other, but let's leave it for now).

Similar could be for reverse situation:

fastDiff( '123', '123abc' ); // { firstChange: 3, lastChange: 6, text: 'abc' }

for insertion in the middle:

fastDiff( '123', '12abc3' ); // { firstChange: 2, lastChange: 5, text: 'abc' }

Splitting the old text with the new one, will require two sets of changes:

fastDiff( '123', 'ab123c' ); // [ { firstChange: 0, lastChange: 2, text: 'ab' }, { firstChange: 5, lastChange: 6, text: 'c' } ]

of course in this case we could go with:

fastDiff( '123', 'ab123c' ); // { firstChange: 0, lastChange: 6, text: 'ab123c' }

but I think case when entire old text is a substring of a new text should also be handled.


Such format makes some sense for insertions (and replacements in some cases), but it's rather useless for deletions. Maybe something like:

fastDiff( '12345', '123' ); // { firstChange: 3, lastChange: 5, text: '' }

when firstChange and lastChange refers to positions in oldText while in all previous examples it referred for positions/indexes in newText which made it inconsistent.

So to everywhere refer to oldText indexes, insertions should result in something like:

fastDiff( '123', 'abc123' ); // { firstChange: 0, lastChange: 0, text: 'abc' }
fastDiff( '123', '123abc' ); // { firstChange: 3, lastChange: 3, text: 'abc' }
fastDiff( '123', '12abc3' ); // { firstChange: 2, lastChange: 2, text: 'abc' }
fastDiff( '123', 'ab123c' ); // [ { firstChange: 0, lastChange: 0, text: 'ab' }, { firstChange: 3, lastChange: 3, text: 'c' } ]

What about replacements:

fastDiff( '12345', '123ab' ); // { firstChange: 3, lastChange: 5, text: 'ab' }
fastDiff( '12345', 'abc' ); // { firstChange: 0, lastChange: 5, text: 'abc' }

Edge cases:

fastDiff( '123', '' ); // { firstChange: 0, lastChange: 3, text: '' }
fastDiff( '', 'abc' ); // { firstChange: 0, lastChange: 0, text: 'abc' }

It makes some sense, but I'm not sure about such approach. Basically, number of insertions (text.length) and deletions ( lastChange - firstChange ) could be calculated so it makes some sense. Also using it with CharacterData.insertData and CharacterData.deleteData should be quite straightforward.

But maybe it would be better to go with something similar to what diffToChanges produces?
image

Do you have any thoughts on that @Reinmar?

@Reinmar
Copy link
Member

Reinmar commented Apr 12, 2018

But maybe it would be better to go with something similar to what diffToChanges produces?

I think it'd be better to go this way. We'll have two compatible algorithms which you can interchange if needed.

Also, even if fastDiff() will now always return just one entry I'd say that we can return an array. This will be future proof and compatible with diffToChanges(). The only smelly part will be the values property which should also be an array...

Finally, the format returned by diffToChanges() was useful when you have to transform the old text to new text by using remove() and insert() functions. I think that its format works better in this case than firstChange + lastChange.

fastDiff( '123', 'ab123c' ); // { firstChange: 0, lastChange: 6, text: 'ab123c' }

I don't think that this is a big deal. If this is the entire text node change, it means you used a spellchecker's suggestion to make such a replacement. You can see this as one word being replaced with another word. Rather this than two insertions.

So, we can optimise for this scenario, but we don't have to.

@f1ames
Copy link
Contributor Author

f1ames commented Apr 12, 2018

I think it'd be better to go this way. We'll have two compatible algorithms which you can interchange if needed.

Makes perfect sense.

Also, even if fastDiff() will now always return just one entry I'd say that we can return an array.

Yes, I was thinking the same (but forgot about it when writing above comment:D).

The only smelly part will be the values property which should also be an array...

I don't see a problem making it an array so it will be fully compatible with diffToChanges.


The only question is if we would like to have exactly same results in some specific cases. Consider for example:

const old = '123';
conts new = '12ab123c3';

diffToChanges( diff( old, new ), new ); 
// [ {
//  	"index": 2,
//  	"type": "insert",
//  	"values": "ab12"
// }, {
//  	"index": 7,
//  	"type": "insert",
//  	"values": "c3"
// } ]

so here the old text needs two insertions like 12[ab12]3[c3]. And TBH I was thinking more about something like [12ab]123[c3] because then you have the whole old string untouched (ofc you never know which part was changed) and it may be more efficient when working with composition (but we may check this in more details later on). The output would be:

fastDiff( old, new ); 
// [ {
//  	"index": 0,
//  	"type": "insert",
//  	"values": "12ab"
// }, {
//  	"index": 7,
//  	"type": "insert",
//  	"values": "c3"
// } ]

@Reinmar
Copy link
Member

Reinmar commented Apr 12, 2018

We can always tune up this algorithm later, to match eventual problems that we'll have. The simpler it will be initially, the sooner we'll see if we have any problems :) So for me, the following may happen:

const old = '123';
conts new = '12ab123c3';

fastDiff( old, new ); 
// [ {
//  	"index": 2,
//  	"type": "insert",
//  	"values": "ab123c"
// } ]

f1ames referenced this issue in ckeditor/ckeditor5-utils May 9, 2018
Other: Introduced `fastDiff` diffing function. Closes #235.
@mlewand mlewand transferred this issue from ckeditor/ckeditor5-utils Oct 9, 2019
@mlewand mlewand added this to the iteration 18 milestone Oct 9, 2019
@mlewand mlewand added status:confirmed type:task This issue reports a chore (non-production change) and other types of "todos". package:utils labels Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:utils type:task This issue reports a chore (non-production change) and other types of "todos".
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants