Skip to content

Conversation

@Zegnat
Copy link
Member

@Zegnat Zegnat commented May 9, 2020

As can be clearly seen in the html property of content, there are exactly 8 spaces between the \n and the start of the <img>. For some reason the plain text output was showing 9 spaces. Removed one.

Copy link
Member

@jgarber623 jgarber623 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a chance the output is actually correct as-is (before this change proposal).

See this note in section 1.3.4 of the parsing specification:

value: the textContent of the element after:

  • dropping any nested <script> & <style> elements;
  • replacing any nested <img> elements with their alt attribute, if present; otherwise their src attribute, if present, adding a space at the beginning and end, resolving the URL if it’s relative;
  • removing all leading/trailing whitespace

The second item in the list replaces an <img> element with a text value and whitespace at the beginning and end.

The third rule removes the trailing whitespace.

So… in this case, there are 8 whitespace characters in the original markup and, by my read of the spec, a ninth is inserted before the alt/src string.

@Zegnat
Copy link
Member Author

Zegnat commented May 9, 2020

Closing, this was actually a bug in the pre-2018-03-29 code of the PHP parser (re-introduced for test running in microformats/php-mf2@c2bbd9f). Hopefully patched correctly in the testing branch per microformats/php-mf2@d5a8f85.

Thanks for the sanity check @jgarber623 👍

@Zegnat Zegnat closed this May 9, 2020
@Zegnat Zegnat deleted the Zegnat-patch-1 branch May 9, 2020 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants