Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editor crashes after inserting a specific HTML embed #8323

Closed
Mgsy opened this issue Oct 22, 2020 · 11 comments · Fixed by #8529
Closed

Editor crashes after inserting a specific HTML embed #8323

Mgsy opened this issue Oct 22, 2020 · 11 comments · Fixed by #8529
Assignees
Labels
package:html-embed squad:core Issue to be handled by the Core team. type:bug This issue reports a buggy (incorrect) behavior.

Comments

@Mgsy
Copy link
Member

Mgsy commented Oct 22, 2020

📝 Provide detailed reproduction steps (if any)

  1. Go to http://localhost:8080/build/docs/ckeditor5/latest/features/html-embed.html.
  2. Call editor.getData() and copy the string.
  3. Paste it to a new HTML snippet.
  4. Switch htmlEmbed.showPreviews to false.

❌ Actual result

The editor crashes permanently.

Error from Chrome:

snippet.js:4 Uncaught (in promise) Error: Failed to execute 'setAttribute' on 'Element': 'picture-in-picture\"' is not a valid attribute name.
    at b.viewToDom (snippet.js:4)
    at b.viewChildrenToDom (snippet.js:4)
    at viewChildrenToDom.next (<anonymous>)
    at b.viewToDom (snippet.js:4)
    at b.viewChildrenToDom (snippet.js:4)
    at viewChildrenToDom.next (<anonymous>)
    at b.viewToDom (snippet.js:4)
    at o.toData (snippet.js:4)
    at model (snippet.js:4)
    at snippet.js:4

Error from Safari:

Unhandled Promise Rejection: CKEditorError: The string contains invalid characters.
Read more: https://ckeditor.com/docs/ckeditor5/latest/framework/guides/support/error-codes.html#error-The string contains invalid characters.

📃 Other details

I was able to reproduce it only in docs, the manual test works fine for me.


If you'd like to see this fixed sooner, add a 👍 reaction to this post.

@Mgsy Mgsy added type:bug This issue reports a buggy (incorrect) behavior. package:html-embed labels Oct 22, 2020
@Mgsy
Copy link
Member Author

Mgsy commented Oct 22, 2020

Example HTML:

<h2>CKEditor 5 on socials</h2><div class=\"raw-html-embed\"><blockquote class=\"twitter-tweet\"><p dir=\"ltr\" lang=\"en\">Keeping CKEditor 5 integrations tidy and free of bugs 🐛 The new release for the official <a href=\"https://twitter.com/hashtag/Angular?src=hash&amp;ref_src=twsrc%5Etfw\">#Angular</a> WYSIWYG editor component works with Angular 9.1 and above. <a href=\"https://t.co/Umag7F9dq0\">https://t.co/Umag7F9dq0</a> <a href=\"https://t.co/cIPy5fFSDn\">pic.twitter.com/cIPy5fFSDn</a></p>— CKEditor Ecosystem (@ckeditor) <a href=\"https://twitter.com/ckeditor/status/1318575258153160707?ref_src=twsrc%5Etfw\">October 20, 2020</a></blockquote><script charset=\"utf-8\" src=\"https://platform.twitter.com/widgets.js\" async=\"\"></script></div><div class=\"raw-html-embed\"><iframe allowfullscreen=\"\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" frameborder=\"0\" src=\"https://www.youtube-nocookie.com/embed/DgM5DfAPQTI\" height=\"315\" width=\"560\"></iframe></div><p>Psst, we're tracking ya!</p><div class=\"raw-html-embed\"><script>(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-XXXXX-Y', 'auto'); ga('send', 'pageview'); </script></div>

@pomek
Copy link
Member

pomek commented Oct 23, 2020

It fails when we try to get raw HTML from the view element.

// Note: The below line has a side-effect – the children are *moved* to the DF so
// viewElement becomes empty. It's fine here.
const fragment = upcastWriter.createDocumentFragment( viewElement.getChildren() );
const innerHtml = htmlProcessor.toData( fragment );

The HTML used to test is invalid. Our tools don't parse it correctly:

image

Everything works fine with clean structure:

<h2>CKEditor 5 on socials</h2>
<div class="raw-html-embed">
	<blockquote class="twitter-tweet"><p dir="ltr" lang="en">Keeping CKEditor 5 integrations tidy and free of bugs 🐛 The new release
		for the official <a href="https://twitter.com/hashtag/Angular?src=hash&amp;ref_src=twsrc%5Etfw">#Angular</a> WYSIWYG editor
		component works with Angular 9.1 and above. <a href="https://t.co/Umag7F9dq0">https://t.co/Umag7F9dq0</a> <a
				href="https://t.co/cIPy5fFSDn">pic.twitter.com/cIPy5fFSDn</a></p>— CKEditor Ecosystem (@ckeditor) <a
			href="https://twitter.com/ckeditor/status/1318575258153160707?ref_src=twsrc%5Etfw">October 20, 2020</a></blockquote>
	<script charset="utf-8" src="https://platform.twitter.com/widgets.js" async=""></script>
</div>
<div class="raw-html-embed">
	<iframe allowfullscreen="" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" frameborder="0" src="https://www.youtube-nocookie.com/embed/DgM5DfAPQTI" height="315" width="560"></iframe></div><p>Psst,
	we're tracking ya!</p>
<div class="raw-html-embed">
	<script>( function ( i, s, o, g, r, a, m ) {
		i[ 'GoogleAnalyticsObject' ] = r;
		i[ r ] = i[ r ] || function () {
			( i[ r ].q = i[ r ].q || [] ).push( arguments )
		}, i[ r ].l = 1 * new Date();
		a = s.createElement( o ), m = s.getElementsByTagName( o )[ 0 ];
		a.async = 1;
		a.src = g;
		m.parentNode.insertBefore( a, m )
	} )( window, document, 'script', 'https://www.google-analytics.com/analytics.js', 'ga' );
	ga( 'create', 'UA-XXXXX-Y', 'auto' );
	ga( 'send', 'pageview' ); </script>
</div>

I'm wondering whether we should try to render invalid structures. Couldn't we display a container with an error message if something went wrong during parsing the input HTML?

@Reinmar
Copy link
Member

Reinmar commented Oct 23, 2020

I'm wondering whether we should try to render invalid structures. Couldn't we display a container with an error message if something went wrong during parsing the input HTML?

Yeah, our parser and HTML->View->HTML processing is a bit limited, so it's actually a good idea.

Also, which is a bigger problem, I've just realized that ideally, we should take the DOM structure directly from HtmlDataProcessor and avoid processing it to a view structure as it's going to:

  • Lose comments (relatively simple to fix).
  • Reformat some white-spaces.
  • Reorder some attributes (not sure about this).
  • Fix invalid HTML and strip out some stuff.

I think it could be doable to have an instance of the DOM converter in upcast converters. It'd require some changes in DataController#parse() and UpcastDispatcher#convert().

This might also help with tricky cases like what MathType guys do. Moreover, with some more work, it could lead to a nice optimization – we could configure the DomConverter to stop processing certain structures (e.g. inside Raw HTML embed and inside MathML).

cc @jodator

@Reinmar Reinmar added squad:dx squad:core Issue to be handled by the Core team. labels Oct 23, 2020
@Reinmar Reinmar added this to the nice-to-have milestone Oct 23, 2020
@Reinmar Reinmar removed the squad:core Issue to be handled by the Core team. label Oct 26, 2020
@Reinmar Reinmar modified the milestones: nice-to-have, iteration 38 Oct 26, 2020
@Reinmar Reinmar added the squad:core Issue to be handled by the Core team. label Nov 23, 2020
@niegowski niegowski self-assigned this Nov 23, 2020
@niegowski
Copy link
Contributor

I think it could be doable to have an instance of the DOM converter in upcast converters. It'd require some changes in DataController#parse() and UpcastDispatcher#convert().

This might also help with tricky cases like what MathType guys do. Moreover, with some more work, it could lead to a nice optimization – we could configure the DomConverter to stop processing certain structures (e.g. inside Raw HTML embed and inside MathML).

The HtmlDataProcessor is already using DomConverter to convert a DOM tree to the view DocumentFragment.

In branch https://github.com/ckeditor/ckeditor5/compare/i/8323 I prepared a WIP of the approach, where we are able to register some elements that should be treated by the DomConverter (while converting from DOM to view) as their content would be a CDATA and convert it to a single view Text node. This solves the original issue and also allows us to keep code comments and original HTML structure (at least DomConverter is not affecting it, it still can get some changes while creating a view Text node from data generated by domNode.innerHTML).

@jodator
Copy link
Contributor

jodator commented Nov 24, 2020

The idea behind it is nice IMO. It is easy to extend AFAICS. As the POC looks OK, I'd expose this API on editor.data.processor probably (as in your comment).

For the solution:

  1. I would change _cdataElementMatcher name to something more verbose and a bit separated from the actual <![CDATA[. My worry here is that is a bit too cryptic.
  2. How does Matcher's definition look for comments?
  3. A case for the MathML example would be nice later on.

@Reinmar
Copy link
Member

Reinmar commented Nov 24, 2020

I like the idea very much 👍

  1. I would change _cdataElementMatcher name to something more verbose and a bit separated from the actual <![CDATA[. My worry here is that is a bit too cryptic.

I'm also unsure whether using the cdata concept will convey the message to developers.

  • Pros:
    • I think this is a very special feature that's unlikely to be found without assistance or looking into existing features anyway, so it does not matter.
  • Cons:
    • CDATA is related to XML so it might look weird on different data processors. OTOH, will any non-XML/HTML data processor implement it?

My other worry is – where can we expose this API? If on the data processor, then only on XML/HTML ones? Or on the MD data processor too? How will a feature know whether it can use this API? By duck typing? Or will this be an abstract method of the base data processor class? Or should we move this API elsewhere?

My next worry is – what about performance? If every node that's being converted from DOM to View is being checked with our Matcher it may have a significant impact on performance, even if this matcher has no rules. Could you check it? Can we secure us from this? Or maybe we should not use Matcher but a simple callback check?

3. A case for the MathML example would be nice later on.

👍 cc @xaviripo

@niegowski
Copy link
Contributor

My other worry is – where can we expose this API? If on the data processor, then only on XML/HTML ones? Or on the MD data processor too? How will a feature know whether it can use this API? By duck typing? Or will this be an abstract method of the base data processor class? Or should we move this API elsewhere?

I think that every processor should implement it. For example code blocks in markdown could be processed in this way (this could reduce code in some other places).

My next worry is – what about performance? If every node that's being converted from DOM to View is being checked with our Matcher it may have a significant impact on performance, even if this matcher has no rules. Could you check it? Can we secure us from this? Or maybe we should not use Matcher but a simple callback check?

I'll check that, but by looking at the Matcher code I believe that it shouldn't affect performance (at least not to much).

@xaviripo
Copy link

  1. A case for the MathML example would be nice later on.

What do you need from our side?

@niegowski
Copy link
Contributor

My next worry is – what about performance? If every node that's being converted from DOM to View is being checked with our Matcher it may have a significant impact on performance, even if this matcher has no rules. Could you check it? Can we secure us from this? Or maybe we should not use Matcher but a simple callback check?

I checked the performance in /tests/manual/performance/setdata.html and I don't see any change in the performance of DomConverter#domToView().

@Reinmar
Copy link
Member

Reinmar commented Nov 26, 2020

  1. A case for the MathML example would be nice later on.

What do you need from our side?

Nothing :) Just wanted to let you know that we'll work on such a PoC. Perhaps you'll be able to update your implementation accordingly.

@xaviripo
Copy link

Oh, my bad.

Yeah, this looks promising. It's been a while since I've taken a look at our plugin's code but IIRC we had some possible improvements that depend on this issue.

If Wiris can help somehow, just let me know.

jodator added a commit that referenced this issue Nov 28, 2020
Feature (engine): Introduce the `DataProcessor#registerRawContentMatcher()` API that marks content sections that contains arbitrary character data and should not be parsed during conversion. See #8323.

Fix (html-embed): Editor will not crash after inserting a broken HTML. Closes #8323.

Fix (engine): `DomConverter` will not trim whitespaces in nodes that are siblings to inline raw content elements (e.g. MathML). Closes #5870.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:html-embed squad:core Issue to be handled by the Core team. type:bug This issue reports a buggy (incorrect) behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants