Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images with descriptions output HTML as well as image #44897

Closed
jordesign opened this issue Aug 13, 2020 · 19 comments
Closed

Images with descriptions output HTML as well as image #44897

jordesign opened this issue Aug 13, 2020 · 19 comments

Comments

@jordesign
Copy link
Contributor

jordesign commented Aug 13, 2020

Steps to reproduce

  1. Create a new blog post
  2. Add a featured image - making sure the image has a 'Description' in the metadata
  3. Publish post.
  4. Create a new page and insert a "Latest Post' block that includes the post you just created and shows featured images
  5. Set an alignment on the 'Latest Posts' block (left/right/center/wide width/full width).
  6. Publish the page

What I expected

I expect the most recent post to show the feature image and then post title

What happened instead

Underneath the feature image some HTML code of the image src is output
This appears to not occur on self-hosted sites but on both SImple and AT sites on WPCOM

Browser / OS version

Mac OSX/Chrome

Screenshot / Video

Screen Shot 2020-08-13 at 1 24 52 pm

Screen Shot 2020-08-13 at 1 24 59 pm

Context / Source

This was uncovered originally in #3220085-zen and then further tested by @JoshuaGoode and myself.

@lsl
Copy link
Contributor

lsl commented Aug 13, 2020

Somewhat debugged in D48055-code looks like it might be a wider issue, you can likely replicate this in more contexts using html e.g. <strong>Test</strong> in the image description.

@jordesign
Copy link
Contributor Author

Have tested this in a .org/JurassicNinja site and could not recreate - even with HTML in the image description.

@lsl
Copy link
Contributor

lsl commented Aug 14, 2020

It seems limited to the wpcom context, I was not able to repro the issue on .org with a default jetpack setup activated. (I was potentially missing an activating a JP module or something?)

I suspect this is related to the issues seen in jetpack here:

@lsl
Copy link
Contributor

lsl commented Aug 14, 2020

@jeherve would it make sense to move this issue into the Jetpack repo or is it ok to track this here too?

@jeherve
Copy link
Member

jeherve commented Aug 14, 2020

I was not able to repro the issue on .org with a default jetpack setup activated. (I was potentially missing an activating a JP module or something?)

If you still have your Jetpack site setup, could you try to activate the Carousel module under Jetpack > Settings?

image

If the issue appears with the module on, I think it's safe to close this issue in favor of Automattic/jetpack#13428

@jordesign
Copy link
Contributor Author

If you still have your Jetpack site setup, could you try to activate the Carousel module under Jetpack > Settings?

I've just tried this on my .org test - and can't reproduce even with the Carousel module active

@jeherve
Copy link
Member

jeherve commented Aug 14, 2020

Hm, that's an odd one. On WordPress.com Simple and Atomic sites we also run the Gutenberg plugin; maybe the Latest Posts Block was recently modified in Gutenberg and the bug was introduced then? What happens when you activate the Gutenberg plugin on your Jetpack site?

@jordesign
Copy link
Contributor Author

jordesign commented Aug 16, 2020

I've been able to recreate slightly different symptoms on
https://specific-guanaco.jurassic.ninja/testing-latest-posts-block/

This is with Jetpack Modules enabled and Gutenberg plugin activated. I've confirmed this for not take place when Gutenberg plugin is disabled - so I guess it is something there?

Screen Shot 2020-08-17 at 9 50 06 am

No code output - but they still seem broken - no links on titles when image has a description.

Screen Shot 2020-08-17 at 9 52 03 am

@jordesign
Copy link
Contributor Author

Further testing - the issue is not seen when Gutenberg plugin is active and Carousel mode is turned off. So a conflict between the two?

@jeherve
Copy link
Member

jeherve commented Aug 17, 2020

Yes, it would seem like it, and that's most likely something that would be fixed by Automattic/jetpack#13428

lsl added a commit to lsl/gutenberg that referenced this issue Aug 17, 2020
@lsl
Copy link
Contributor

lsl commented Aug 17, 2020

So turns out that note about requiring an alignment (even if its a center alignment) is quite important.

👋 @ockham, @jeherve

The jetpack carousel module augments the latest-posts block featured images with some extra attributes. I'm assuming this is to apply some jetpack styling/features.

This augmentation includes adding an image description which will by way of wpautop always include some html.

The html is being escaped here and injected here

When an alignment is added to this latest posts block, the alignment block supports code in Gutenberg is triggered leading to an mb_convert_encoding to make use of DomDocument.

This mb_convert_encoding call and subsequent reversal seems to be what causes the issue we see here.

The issue seems to be that the fix for WordPress/gutenberg/issues/24445 is decoding html entities it probably shouldn't be in its attempt to rollback the encoding it had to do to use DomDocument cleanly.

I can't really think of a good fix other than doing a dodgy temp replacement of anything that's already encoded but I'm not really sure how safe that is. Then again, it does work so there's that. lsl/gutenberg@126b1a3

Should I move this to a Wordpress/Gutenberg issue (if one doesn't exist?) - Or do you think we should fix this on ourside? (remove image descriptions from Jetpack's carousel data attributes?)

@jeherve
Copy link
Member

jeherve commented Aug 17, 2020

The jetpack carousel module augments the latest-posts block featured images with some extra attributes. I'm assuming this is to apply some jetpack styling/features.

Ideally, the Carousel module should not add extra attributes here, I think. This may be the best approach here.

@ockham
Copy link
Contributor

ockham commented Aug 17, 2020

Flagging for @sirreal @fullofcaffeine @WunderBart since this is related to our Gutenberg encoding fix 😕

@ockham
Copy link
Contributor

ockham commented Aug 17, 2020

Thanks for flagging @lsl!

I can't really think of a good fix other than doing a dodgy temp replacement of anything that's already encoded but I'm not really sure how safe that is. Then again, it does work so there's that. lsl/gutenberg@126b1a3

Dodgy doesn't sound great TBH 😬 As a rule of thumb, the further upstream we go, the cleaner our fixes should be. So: Gutenberg should handle things in a clean fashion; okay to have a workaround in Jetpack, if need be.

On a more practical level -- did you check whether Gutenberg unit tests still pass with your fix applied?

Should I move this to a Wordpress/Gutenberg issue (if one doesn't exist?) - Or do you think we should fix this on ourside? (remove image descriptions from Jetpack's carousel data attributes?)

Leaning towards Jetpack.

There's a bit of a chance that WordPress/gutenberg#24447 also affects other downstream projects that inject attributes into the rendered HTML, but I'm not sure we can construct a char replacement function that still fixes WordPress/gutenberg#24445 but doesn't break attributes injected by 3rd parties (such as Jetpack). If we can, all the better, but my intuition says no, so this might just be a pill that downstream projects have to swallow.

lsl added a commit to lsl/gutenberg that referenced this issue Aug 18, 2020
Sets the charset of the html passed into DomDocument to utf-8.

Replaces the mb_convert_encoding call replacing utf-8 with html entities
before handing off to DomDocument.

This avoids the need to later convert back to utf-8 from html entities
afterward. This secondary mb_convert_encoding call was converting not
only the utf-8 we converted earlier but also entity encoding html stored
inside data-* or other attributes of html elements.

Fixes Automattic/wp-calypso#44897
Maintains the fix for WordPress#24445 (WordPress#24447)
@lsl
Copy link
Contributor

lsl commented Aug 18, 2020

On a more practical level -- did you check whether Gutenberg unit tests still pass with your fix applied?

The expected tests do, a separate css test was failing (so yes, it was a dodgy fix after all 😬!)

Leaning towards Jetpack.

I found a better way to fix this (based on), all tests pass and I've confirmed the desired behavior for WordPress/gutenberg#24445 is maintained.

I would appreciate any ideas around what to cover when writing a couple of tests for this new fix.

Also is dropping LIBXML_HTML_NOIMPLIED a problem? I think we can keep this in if its required.

@ockham
Copy link
Contributor

ockham commented Aug 18, 2020

@lsl Great, thanks! I left a comment on the commit. Let's discuss there (or better yet, on a PR in the Gutenberg repo 🙂 )

ockham added a commit to WordPress/gutenberg that referenced this issue Aug 19, 2020
This change specifies the content type and charset of the html passed into `DomDocument` as `utf-8`.

Replaces the `mb_convert_encoding` call which encodes `UTF-8` as `HTML-ENTITIES` before handing off to `DomDocument`.

This change avoids the need to later revert the encoding back to `UTF-8` afterwards using `mb_convert_encoding`. This secondary `mb_convert_encoding` call was converting not only the `UTF-8` characters that were converted earlier but also any pre-existing entity encoded html stored inside block content.

This issue was originally raised here: Automattic/wp-calypso#44897 as I wasn't sure of the root cause at the time, originally thinking it may be because of the way [Jetpack is injecting](https://github.com/Automattic/jetpack/blob/dcfa5ca8bdfc31aacec107aec27bb24357d6cdac/modules/carousel/jetpack-carousel.php#L434) html into the [`data-image-description` attributes](https://github.com/Automattic/jetpack/blob/dcfa5ca8bdfc31aacec107aec27bb24357d6cdac/modules/carousel/jetpack-carousel.php#L485). 

There are more situations where this can be a problem such as encoded html entities existing inside block content then being decoded breaking html validation.

Co-authored-by: Bernie Reiter <[email protected]>
jorgefilipecosta pushed a commit to WordPress/gutenberg that referenced this issue Aug 19, 2020
This change specifies the content type and charset of the html passed into `DomDocument` as `utf-8`.

Replaces the `mb_convert_encoding` call which encodes `UTF-8` as `HTML-ENTITIES` before handing off to `DomDocument`.

This change avoids the need to later revert the encoding back to `UTF-8` afterwards using `mb_convert_encoding`. This secondary `mb_convert_encoding` call was converting not only the `UTF-8` characters that were converted earlier but also any pre-existing entity encoded html stored inside block content.

This issue was originally raised here: Automattic/wp-calypso#44897 as I wasn't sure of the root cause at the time, originally thinking it may be because of the way [Jetpack is injecting](https://github.com/Automattic/jetpack/blob/dcfa5ca8bdfc31aacec107aec27bb24357d6cdac/modules/carousel/jetpack-carousel.php#L434) html into the [`data-image-description` attributes](https://github.com/Automattic/jetpack/blob/dcfa5ca8bdfc31aacec107aec27bb24357d6cdac/modules/carousel/jetpack-carousel.php#L485). 

There are more situations where this can be a problem such as encoded html entities existing inside block content then being decoded breaking html validation.

Co-authored-by: Bernie Reiter <[email protected]>
@lsl
Copy link
Contributor

lsl commented Aug 20, 2020

This will be fixed in the next Gutenberg rollout of v8.8

@lsl lsl closed this as completed Aug 20, 2020
@sirreal
Copy link
Member

sirreal commented Aug 24, 2020

I was expecting to fix this for Simple and Atomic sites running 8.7.1, but I've been unable to reproduce the issue.

Is this issue still present on Simple and Atomic (not running 8.8)?

@lsl
Copy link
Contributor

lsl commented Aug 25, 2020

Seems to be https://testing705633960.wordpress.com/home/ 8.7.1 simple site business plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants