Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modeling RichText with Automerge #193

Open
Gozala opened this issue Jul 31, 2019 · 39 comments
Open

Modeling RichText with Automerge #193

Gozala opened this issue Jul 31, 2019 · 39 comments

Comments

@Gozala
Copy link
Contributor

Gozala commented Jul 31, 2019

I would to figure out / invite to collaborate on a best practice for modeling a state for the rich text editor. Most popular rich text editors on the web (ProseMirror, Quill, Slate) use some for of tree representation. Quill is an exception as it represents document as array of formatted segments, but is still a subject to the same problem - One of the actors needs to split formatted segment of text (either insert something with a different format, or reformat) which ends up translating to deleting slice & inserting it afterwords, problem being that identity of the deleted & reinserted slice has different identity & there for any edits by other actors to the original slice no longer correspond to a new one.

I have discussed this to a greater length on slack but the best solution I could came up to was to represent rich text as sequence of tokens where tokens are either characters or formatting markers.

For example following state corresponds to:

{format:{italic:true}}Hello {format:{bold:true, italic:true}}beautiful{format:null} world\n

Following markdown:

**hello _beautiful_** wold

That is:

hello beautiful wold

This way splitting a formatted segment just means inserting a marker and causality is preserved, however it comes with a pain of mapping from / to text editor data model & making sure that overlapping markers are merged etc.. Furthermore Automerge.Text no longer seems to fit the bill.

I also have considered storing markers (offset, length as counters) separate from text but that seems even more troublesome so I have opted out.

I would very much love something like Automerge.RichText that bakes some some of these opinions internally and exposes higher level API like Automerge.Text but with methods like format(offset, length, format).

P.S.: I'm also working on above mentioned RichText` API, but also feel like discussing it here is probably a good idea.

@pvh
Copy link
Member

pvh commented Jul 31, 2019

Are you writing a new rich-text editor or trying to integrate with an existing one? I think the best approach here is going to be determined by that question.

I'd suggest you consider a couple of other features like cursors (which need to be tracked in the document but won't be permanently persisted), and comments (which might be special markers?)

The automerge-slate repo has an example of the pain in translating another text editor's OT-style into Automerge, and it models the document as a tree.

Having done a lot of markup manipulation in past lives I'm somewhat nervous about inventing new representations without doing a bunch of extra research but maybe I can spare some time to dig in on what you're working on.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 31, 2019

Are you writing a new rich-text editor or trying to integrate with an existing one? I think the best approach here is going to be determined by that question.

Right now I'm integrating with quill but I think same general approach would work across the board & likely would apply even if I rolled out my own editor (which I don't want to do).

I'd suggest you consider a couple of other features like cursors (which need to be tracked in the document but won't be permanently persisted), and comments (which might be special markers?)

I have non rich text with cursors already which is actually fairly easy, they are stored separately because they don't get concurrent updates only actor owning that cursor does.

The automerge-slate repo has an example of the pain in translating another text editor's OT-style into Automerge, and it models the document as a tree.

I've seen it, but I think it suffers from the outlined problem.

Having done a lot of markup manipulation in past lives I'm somewhat nervous about inventing new representations without doing a bunch of extra research but maybe I can spare some time to dig in on what you're working on.

I'm nervous myself, also why I started this issue. It does seem that y-js author came to the same conclusion as myself in regards to marker tokens

That alone obviously does not mean much, but I also have not found anything else other than RGASplit (see #195) which may offer a way to deal with outlined problem.

Mostly I opened to invite others (more qualified than myself) to offer some thoughts, feedback, ideas.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 31, 2019

I'd suggest you consider a couple of other features like cursors (which need to be tracked in the document but won't be permanently persisted), and comments (which might be special markers?)

Actually you're right, I was thinking of cursor as single position (which is easy to update base where inserted occurred before / after) but if you extend that to selections that becomes a lot more tricky.

@pvh
Copy link
Member

pvh commented Jul 31, 2019

Definitely worth considering how moving text will work too.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 31, 2019

Definitely worth considering how moving text will work too.

You mean cut & paste ?

@Gozala
Copy link
Contributor Author

Gozala commented Jul 31, 2019

Side remark: this echos quite a bit constraints I had in https://github.com/gozala/dominion where node moves were preferred over remove / insert. I end up wish something like register where I could stash nodes and then insert them from. Obviously doing this in collaborative setting seems far more challenging.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 31, 2019

I was hoping to get a basic demo working of what I was trying to build before sharing, but I'm also happy to do it much sooner as potentially it may save me from working towards the dead end

@ept
Copy link
Member

ept commented Jul 31, 2019

I have spent a while thinking about this too, and I also think that a single document sequence with marker characters is the way to go. If you represent a document as a tree, there are a lot of operations that require deleting and re-inserting nodes (e.g. hitting enter in the middle of a paragraph, causing it to split into two paragraphs), which don't merge well in a concurrent setting. If a document is just a flat sequence, hitting enter in the middle of a paragraph just means inserting a '\n' element into that sequence.

My intention was that Automerge.Text should be usable for this purpose. That is, I want Automerge.Text to be able to contain marker objects as well as characters from the text, along the lines of:

new Automerge.Text(['h', 'e', 'l', 'l', 'o', ' ', {bold:true}, 'w', 'o', 'r', 'l', 'd', {bold:false}])

At the moment I think that's not working, but that's a bug that should be fixed. And I think it's a good idea to create an Automerge.RichText, which I think could be done as a layer on top of Automerge.Text. Perhaps RichText could create a projection of the sequence of characters and marker characters, and map it into the tree structure used by rich text editors.

Regarding cursor positions, the safest would be to use the internal element ID to refer to a position within the text, since it remains stable, even as people insert and delete text before and after the position. The Text class has a .getElemId(index) method that looks up the element ID for a particular index; the translation in the opposite direction doesn't seem to be exposed in the API at the moment, but that would be easy to add.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 31, 2019

Regarding cursor positions, the safest would be to use the internal element ID to refer to a position within the text, since it remains stable, even as people insert and delete text before and after the position. The Text class has a .getElemId(index) method that looks up the element ID for a particular index; the translation in the opposite direction doesn't seem to be exposed in the API at the moment, but that would be easy to add.

What happens if the element cursor was at got removed ? Would index somehow be restored via tombstone ?

Would you do the same for selections, meaning element IDs for begin / end indexes ?

This makes me also wonder what if actor receives cursor / selection for anchored to elements that are not (yet) present in the document.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 31, 2019

My intention was that Automerge.Text should be usable for this purpose. That is, I want Automerge.Text to be able to contain marker objects as well as characters from the text, along the lines of:

new Automerge.Text(['h', 'e', 'l', 'l', 'o', ' ', {bold:true}, 'w', 'o', 'r', 'l', 'd', {bold:false}])

For what it's worth I chose to avoid open / close style markers that introduce complication due to nesting

{bold:true}hello {italic:true}wold{italic:false}{bold:false}

with a markers that imply flat structure:

{bold:true}hello {bold:true, italic:true}wold{}

I'd be interested to hear arguments in favor of the nested approach assuming there are some.

@ept
Copy link
Member

ept commented Jul 31, 2019

What happens if the element cursor was at got removed ? Would index somehow be restored via tombstone ?

Yes, currently tombstones are retained indefinitely anyway, so the element ID resolution will work even after the element has been deleted. And yes, representing ranges as a start-ID and end-ID should work fine.

@ept
Copy link
Member

ept commented Jul 31, 2019

For what it's worth I chose to avoid open / close style markers that introduce complication due to nesting

I agree with that: in a concurrent editing setting it will probably not be possible to ensure proper nesting, so whatever thing maps the markers into a DOM-like tree structure will have to handle arbitrarily ordered markers. It seems better to think of the markers as state transitions ("stuff from here onwards is bold! stuff from here onwards is non-bold!") rather than opening and closing tags.

@steida
Copy link

steida commented Jul 31, 2019

Maybe I am overlooking something, but "If a document is just a flat sequence, hitting enter in the middle of a paragraph just means inserting a '\n' element into that sequence." means we can not express tree structures. I can not imagine WYSIWYG without at least an anchor which has to be split as well.

@Gozala
Copy link
Contributor Author

Gozala commented Jul 31, 2019

Maybe I am overlooking something, but "If a document is just a flat sequence, hitting enter in the middle of a paragraph just means inserting a '\n' element into that sequence." means we can not express tree structures. I can not imagine WYSIWYG without at least an anchor which has to be split as well.

Depends on the editor, for example Quill treats every line of text as paragraph so there model would really be just inserting \n.

However it is true that in some cases you may need something more more complex that is insert \n and copy of of the formatting attributes. However most rich text editors would handle block splitting logic for you, so you'll just need to map it back to RichText model as discussed above.

@Gozala
Copy link
Contributor Author

Gozala commented Aug 1, 2019

I've published my current draft of the discussed RichText exploration here
https://github.com/Gozala/conspire/

At the moment I'm using array / list instead of Text because of the issues discussed in #194, but hopefully I'll be able to switch back.

Concurrent selections are broken right now, but shouldn't be too hard to fix once #198 is fixed.

Interesting / relevant bits are RichText class that just wraps Automerge.Text (well array currently). And a QuillRichText subclass that adds patch(delta) method which takes Quill Delta and applies changes to the RichText instance and toContent() method that encodes RichText state into Quill.Delta format (which is also how quill represents editor state) so it can be updated accordingly.

@j-f1
Copy link

j-f1 commented Aug 2, 2019

Here’s one way to do a link

before link { link: { to: "https://github.com", target: "_blank", etc } }link here{ link: "end" } after link

@Gozala
Copy link
Contributor Author

Gozala commented Aug 2, 2019

Here’s one way to do a link

before link { link: { to: "https://github.com", target: "_blank", etc } }link here{ link: "end" } after link

That is more or less what my current implementation encodes to, following to be precise:

before link {attributes:{link:"https://github.com"}, length:0}link here{attributes:{}} after link

P.S.: {attributes, length:0} is used for markers, but there are also notion of embeds e.g. {embed:{img:"https://..."}, attributes: {...}, length: 1 }

@pvh
Copy link
Member

pvh commented Oct 21, 2019

Okay, I've been working on this problem a bit and I think the current design is not going to work, but that we might have a way forward.

The fundamental problem is that an inline-control-characters design needs to apply formatting to spans either in some order (left-to-right?) or tries to consolidate all local formatting operations based on local knowledge... but can't predict what other users are doing.

This means that we either lose causal ordering in the first case, or fidelity in the second. Fortunately, we have a system already present in automerge that can manage keeping things ordered during merges: automerge.array!

My new design theory is that we should keep an ordered array of formatting spans. Each span will refer to a start and end character and will be applied in-order on the user's machine when the document is materialized. This is potentially slow, but should at least provide correct results and we can worry about optimization next.

@pvh
Copy link
Member

pvh commented Oct 21, 2019

Important: an Ordered Array of Formatting Spans will henceforth be known as the OAFS model.

@Gozala
Copy link
Contributor Author

Gozala commented Oct 21, 2019

My new design theory is that we should keep an ordered array of formatting spans. Each span will refer to a start and end character and will be applied in-order on the user's machine when the document is materialized

Is that separate array from the text characters? And I presume it refers to start & end chars by an id isn’t it ?

What happens when span range needs to change either grow or shrink ? I don’t know what you have in mind but I’m suspecting that If removing a span is ever necessary that is going to lead to some errors.

@pvh
Copy link
Member

pvh commented Oct 21, 2019

The structure will look like this, if you imagine a bold operation on "cada", and unbold of "raca" below:

['abracadabra']: Automerge.Text
[ 
  [<character_id for 'c' character>, <character_id for 'b' character>, { bold: true },
  [<character_id for 'r' character>, <character_id for 'd' character>, { bold: false },
]

@Gozala
Copy link
Contributor Author

Gozala commented Oct 21, 2019

@pvh I think primary problem with just computed formatting dictionaries (in front position only & no span length) is that could lead to multiple such dictionaries in the same position or is there something else ?

If that is only problem is there no way to handle that differently like collapse such dictionaries ? At the camp I remember we were talking about an idea of having a special operation like insertion of empty dict that would only insert one if it’s not already there.

Doing it in user space I’d imagine that by having separate dict where id of element prior to control char is used as a key and dict and as value so that all formatting attrs are changing same dict and (presumably) avoid multiple formatters next to each other.

Alternative user space solution I’ve considered was to use leftmost formatting dicts as the canonical and let actor that created second formatting dict do the merge + removal which I believe would converge

@pvh
Copy link
Member

pvh commented Oct 21, 2019

Yes, you need to collapse them. You can see a non-causal implementation here: https://github.com/inkandswitch/pushpin/blob/quill-rich-text/src/renderer/TextDelta.ts

The issue is that you can't (in an open system) know who or what is doing what formatting out there. I think this approach is likely a significant improvement in fidelity of intent preservation.

@Gozala
Copy link
Contributor Author

Gozala commented Nov 4, 2019

@pvh pointed major flaw with my approach on the call other day, that worth posting for anyone reading this, including future myself who already forgot this. So issue with collapsing all formatting into dictionaries is following:

Alice

v-a1

hello world!

Alice makes change to v-a2

hello {bold:true}world{}!

hello world!

Bob

Bob starts from v-a1

hello world!

And makes change v-b2

{italic:true}hello world!{}

hello world!

Merge

Once changes a2 & b2 reach both peers they converge to

{italic:true}hello {bold:true}world{}!{}

hello world!

Instead of intended

{italic:true}hello {bold:true, italic:true}world{italic:true}!{}

hello world!

That is because when each participant injected new formatting rules neither was aware of formatting that had to be copied over. So all the formatting needs to be preserved in data structure and collapsed on the fly, although I suspect we'd need to have collapsed cache for responsive local edits and more expensive re-computations could be performed when receiving changes from others, which might be deferred to be performed when local edits idle.

@Gozala
Copy link
Contributor Author

Gozala commented Nov 4, 2019

I would also like to encourage someone better qualified than myself to take a look at xi-editor text representation which seems to employ interesting approach that is promising fast operation and efficient memory representation while keeping append-only history.

@Gozala
Copy link
Contributor Author

Gozala commented Nov 14, 2019

@pvh I was reading on https://braid.news/ other day which took me to sync9 performance overview that GC semantics via acknowledgement messages. I have not looked into details, but I'm guessing it's similar to idea I have being entertaining in my head for quite some time - let an actor propose compaction block with pointers to the heads of all participants. If all the other actors agree / acknowledge history prior to compaction could be deleted.

However I believe there is contingency with approach you were proposing - storing formatting information as a separate structure that refers to ranges based on character id's with in the text.
Would not that imply that history can never be removed ? That being I'm not sure if current design of automerge already precludes that constraint & maybe that is irrelevant.

All this got me wondering if instead of anchoring fragments to text identifiers they were anchored to the offsets instead which could be adjusting on inbound operations. This starts to sound similar to what Xi Text engine does from my recollection, but I'd need to read that over to confirm.

Just wanted to throw this over here.

@pvh
Copy link
Member

pvh commented Nov 14, 2019

Acknowledgement messages are interesting, but acknowledgement only works if you know who is out there. In other words, you have to know the set of other users in the universe to safely compact which is tricky. If you're willing to simply move forward without knowing who else is out there then you don't need acknowledgements. You just post a compaction message and anyone else who wants to keep collaborating with you will have to "rebase" work they want to keep on top of that or be dropped.

I'm not too sure about the cost of tombstones in the rich text formatting. Presumably if you were compacting you could rectify those markers as well, but once you start doing offset math you're kind of back in OT land which is... fine? but reintroduces a lot of the complexity we've been trying to avoid I think.

Maybe @ept will have some input here.

@ept
Copy link
Member

ept commented Mar 23, 2020

Update: @pvh's initial implementation of the OAFS model is now in test/text_test.js as of #238. We can discuss later whether to make it available as part of Automerge, or whether it should be a separate layer on top of it.

@jclem
Copy link

jclem commented Jun 29, 2020

@Gozala I am curious if you think cases like automerge/automerge#193 (comment) can be avoided by using markers to represent picking up and dropping attributes, rather than representing the state of all attributes:

Alice

v-a1

hello world!

Alice makes change to v-a2

hello {pickup:['bold']}world{drop:['bold']}!

hello world!

Bob

Bob starts from v-a1

hello world!

And makes change v-b2

{pickup:['italic']}hello world!{drop:['italic']}

hello world!

Merge

Once changes a2 & b2 reach both peers they converge to

{pickup:['italic']}hello {pickup:['bold']}world{drop:['bold']}!{drop:['italic']}

hello world!

Potential improvements would be to allow picking up the adoption or removal of attributes, and then to drop a reference to the pickup, e.g. {pickup: {bold: false}, id: 1} followed by {drop: 1}. Essentially you're building a little stack/LWW register per-attribute based on the orderable IDs of pick-ups.

For example, I think @pvh's "abracadabra" edits could be represented by (assuming the formatting operations can have unique, orderable IDs (using integers here just to express the idea), so a site with both format operations knows not to apply 1 since 2 is a younger operation for the bold format—it would, however, remain in a stack so that it's picked up once 2 is dropped):

ab{pickup: {bold: false}, id: 2}ra{pickup: {bold: true}, id: 1}ca{drop: 2}da{drop: 1}bra

These obviously don't directly map to anything like a valid DOM tree, but presumably the view layer could handle the translation to the final state:

<p>abraca<b>da</b>bra</p>

@j-berman
Copy link

j-berman commented Feb 6, 2021

In case this helps anyone: I had a ton of trouble using markers, so I just decided to stick the attribute object on each letter (definitely space inefficient, but it works!*). The sample code provided in test/text_test.js wasn't working properly after certain combinations of formatting (like you bold 2 letters, then unbold 1, then bold the same 2 again.. odd combinations like that would mess up the text). I re-worked that code a bit and got it working sticking the attribute object on each letter here. I'm sure there's a clean, more efficient, and safe algorithm that uses markers, but for anyone seeking a quick and dirty solution, here it is :)

Edit*: I had a small one line bug in my original implementation too that would sometimes mess up copy pasting large sections. Should be fixed in the code linked, but please be warned it's not rigorously tested code!

@pvh
Copy link
Member

pvh commented Feb 6, 2021

Awesome, @j-berman! That code was never merged because I never solved those problems and then got pulled onto other tasks. This is a great contribution, even if it is inefficient. We can always make correct things faster but getting the wrong answer quickly is not much use at all.

@j-berman
Copy link

j-berman commented Feb 6, 2021

Glad I could give back something of value. Automerge is so awesome, I'm very grateful!

@strogonoff
Copy link

strogonoff commented Feb 8, 2021

identity of the deleted & reinserted slice has different identity & there for any edits by other actors to the original slice no longer correspond to a new one

I have a vague feeling that with ProseMirror it might be possible to overcome the limitations mentioned in the description by hooking into its low-level document transformation APIs.

Not involved in ProseMirror's development myself, but have been using it and had to poke around its innards for integrations I'm working on. It seems to be among the most flexible and mature editor frameworks right now and teaching it Automerge could be a major boost to both ecosystems.

@inventivejon
Copy link

Hi,
I am currently integrating Quill.js inside a Blazor Application with planned collaboration features based on WebRTC. After reading this thread I am wondering if the quick implementation from @j-berman is still the best way to go. But it is close to 4 months old now... Is there new development going on? Can I give a hand somewhere?
BR

@ept
Copy link
Member

ept commented May 24, 2021

Hi @inventivejon, yes, there is new development going on: next month, @geoffreylitt and @sliminality will be kicking off a research project to figure out the best way of handling rich text in CRDTs such as Automerge, with the support of the @inkandswitch research lab. I will let them reach out if they want contributions to this project.

@inventivejon
Copy link

inventivejon commented May 24, 2021

Hi @ept,
that is great news. Thank you for the update. Just answer here or ping me whenever we can help in some way.
BR

@burdiyan
Copy link

burdiyan commented Jul 5, 2021

Hi @inventivejon, yes, there is new development going on: next month, @geoffreylitt and @sliminality will be kicking off a research project to figure out the best way of handling rich text in CRDTs such as Automerge, with the support of the @inkandswitch research lab. I will let them reach out if they want contributions to this project.

Hi @ept! Do you have news about this research project? Did it end up starting?

@ept
Copy link
Member

ept commented Jul 5, 2021

@burdiyan Yes, the project is currently running. We will update this issue when there are results to share.

@ept
Copy link
Member

ept commented Nov 24, 2021

Update: we have now published our research on rich text CRDTs: https://www.inkandswitch.com/peritext/

The Peritext implementation is currently independent from Automerge (to allow simpler code and faster iteration), but we have designed it with a view towards integrating it into Automerge in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants