Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure valid HTML is properly processed by refining regex handling #5697

Closed

Conversation

co6x0
Copy link

@co6x0 co6x0 commented Nov 22, 2023

Ensures valid HTML is worked correctly by wptexturize(), wp_html_split(), etc.
I started working on this PR when I noticed that using TailwindCSS children selector would break the layout of block theme (also reported in Trac ticket: 57381).

I have identified a problem with the regular expression defined in _get_wptexturize_split_regex() used in wptexturize().
This problem seemed to be affecting get_the_block_template_html() and causing the block theme layout collapse described above.
Changing this regex fixes the layout issue.

Also, wp_html_split() uses almost the same regex.
Other trac tickets caused by this function will also be fixed by updating to a similar regex.

According to the HTML reference at html.spec.whatwg.org, attribute values can contain a variety of characters.
With this in mind, I have modified the regex to exclude matching characters within quotation marks.
This fixes the misplacement of GREATER-THAN SIGN(>) and prevents other valid HTML structures from being mishandled.

I've included tests to cover these changes in tests/phpunit/tests/formatting/wpTexturize.php and tests/phpunit/tests/formatting/wpHtmlSplit.php. If there's anything I've missed, please let me know.

Trac ticket: https://core.trac.wordpress.org/ticket/57381
Trac ticket: https://core.trac.wordpress.org/ticket/45387
Trac ticket: https://core.trac.wordpress.org/ticket/43457


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Copy link

Hi @co6x0! 👋

Thank you for your contribution to WordPress! 💖

It looks like this is your first pull request to wordpress-develop. Here are a few things to be aware of that may help you out!

No one monitors this repository for new pull requests. Pull requests must be attached to a Trac ticket to be considered for inclusion in WordPress Core. To attach a pull request to a Trac ticket, please include the ticket's full URL in your pull request description.

Pull requests are never merged on GitHub. The WordPress codebase continues to be managed through the SVN repository that this GitHub repository mirrors. Please feel free to open pull requests to work on any contribution you are making.

More information about how GitHub pull requests can be used to contribute to WordPress can be found in this blog post.

Please include automated tests. Including tests in your pull request is one way to help your patch be considered faster. To learn about WordPress' test suites, visit the Automated Testing page in the handbook.

If you have not had a chance, please review the Contribute with Code page in the WordPress Core Handbook.

The Developer Hub also documents the various coding standards that are followed:

Thank you,
The WordPress Project

@co6x0
Copy link
Author

co6x0 commented Nov 22, 2023

Added commit.
Removed tranformation of & to & in HTML attribute values modified by https://core.trac.wordpress.org/ticket/35008.

This ticket seems to have been created because the W3C HTML Validator found it to be invalid HTML, but as of now, the & in the URL is valid.

@co6x0
Copy link
Author

co6x0 commented Nov 22, 2023

@dmsnell
Copy link
Member

dmsnell commented May 27, 2024

howdy! just wanted to stop by and mention that I've been exploring updating these same functions using the HTML API, which provides a full spec-compliant parse of the HTML stream.

You can find some rough notes on the broader roadmap

Copy link

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props co6x0, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@co6x0
Copy link
Author

co6x0 commented May 28, 2024

@dmsnell
Thank you for letting me know.
Handling this with regular expressions has been challenging, so it would be wonderful if we could address it using the HTML API.

Please let me know if there's anything I can help with!
I will close this PR now.

@co6x0 co6x0 closed this May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants