Dennis' list of broad and interesting things. #62437
Labels
[Feature] HTML API
An API for updating HTML attributes in markup
[Type] Enhancement
A suggestion for improvement.
Overall values and goals.
Performance guidelines:
If it's not measured, it's neither faster nor slower.
Modern CPUs are incredible machines. Take advantage of every abstraction leak. PHP does not run the way it looks like it should.
Defer where possible.
step()
ornext_thing()
functions which communicate where they find their match and how long the match is. These functions can appear inside a loop to do a full parse, but they can also be used for finding the first of a thing in a document, or analyze a document with low overhead.array()
. This carries the added benefit that it's possible to add semantics and avoid pushing out internal details to all of the call sites for a given thing. For example,WP_HTML_Decoder::attribute_starts_with()
is much more efficient thanstr_starts_with( WP_HTML_Decoder::decode( ) )
because it stops parsing as soon as it finds the given prefix or asserts that it cannot be there. This can save processing and allocating megabytes of data when applied on data URLs which are thesrc
of images pasted from other applications.Static structures are much faster than
array()
, and they provide inline documentation too!Block Parser
Replace the default everything-at-once block parser with a lazy low-overhead parser.
next_delimiter()
as a low-level utility. [#6760]The current block parser has served WordPress well, but it demands that it parses the entire document into a block tree in memory all at once, and it's not particularly efficient. In one damaged post that was 3 MB in size, it took 14 GB to fully parse the document. This should not happen.
Core needs to be able to view blocks in isolation and only store in memory as much as it needs to properly render and process blocks. The need for less block structure has been highlighted by projects and needs such as:
Block API
block.json
file. [#6388]Block Hooks
HTML API
Overall Roadmap for the HTML API
There is no end in sight to the development of the HTML API, but development work largely falls into two categories: developing the API itself; and rewriting Core to take advantage of what the HTML API offers.
Further developing the HTML API.
New features and functionality.
Introduce safe-by-default HTML templating. [#5949]
Properly parse and normalize URLs. [#6666]
Introduce Bits, for server-replacement of dynamic tokens. [Make, Discussion]
Encoding and Decoding of Text Spans
There is so much in Core that would benefit from clarifying all of these boundaries, or of creating a clear point of demarcation between encoded and decoded content.
attribute_starts_with()
which is akin tostr_starts_with()
but only for attributes.Decoding GET and POST args.
There is almost no consistency in how code decodes the values from
$_GET
and$_POST
. Yet, there is and can be incredible confusion over some basic transformations that occur:Prior art
The HTML API can help here in coordination with other changes in core. Notably:
FORM
elements add theaccept-charset="utf-8"
argument, which overrides a user-preferred charset for a webpage (meaning that this is still necessary even if the<meta charset=utf-8>
tag is present).With these new specifications, the HTML API can ensure that whatever is decoded from
$_GET
and$_POST
are what was intended to be communicated from a browser or other HTTP request. In addition, they can provide helpers not present with existing WordPress idioms, like default values.Rewriting Core to take advantage of the HTML API.
Big Picture Changes
Create a final pass over the fully-rendered HTML for global filtering and processing. [#5662]
Mandate HTML5 and UTF-8 output everywhere. [#6536]
<meta charset="…">
that besides UTF-8. All escaping and encoding should occur as needed for HTML5. XML parsing, encoding, and decoding must take a completely different path. [See the section on the XML API].Create a new fundamental Search infrastructure for WordPress.
Confusion of encoded and decoded text.
There's a dual nature to encoded text in HTML. WordPress itself frequently conflates the encoded domain and the decoded domain.
Consider, for example,
wp_space_regexp()
, which by default returns the following pattern:[\r\n\t ]|\xC2\xA0|
. There are multiple things about this pattern that reflect the legacy of conflation:
. So if the text is encoded we may find either, but if the text is decoded then this pattern will erroneously match on
which presumably started as&nbsp;
and might have been someone trying to write about the non-breaking space.Parsing and performance.
In addition to confused and corrupted content, Core also stands to make significant performance improvements by adopting the values of the HTML API and the streaming parser interfaces. Some functions are themselves extremely susceptible to catastrophic backtracking or memory bloat.
convert_smilies()
. [#6762]force_balance_tags()
. [#5562]normalize()
method for constructing fully-normative HTML. But even this may not be necessary given the fact that the HTML Processor can properly navigate through a document structurally.wp_html_split()
. [#6651]wp_kses_hair()
and friends. [#6572]wp_replace_in_html_tags()
. [#6651]wp_strip_tags()
.wp_strip_all_tags()
. [#6196]wp_targeted_link_rel()
. [#5590]wp_kses_hair()
, and passes around PCRE results.Database
mysql_real_escape_string()
? This calls a C function inside of MySQL that examines the currently set character sets for the connection/session/table/database. If we could reliably escape content from PHP then we could eliminate a database round-trip per placeholder in prepared statements.Sync Protocol
WordPress needs the ability to reliably synchronize data with other WordPresses and internal services. This depends on having two things:
While this works to synchronize resources between WordPresses, it also serves interesting purposes within a single WordPress, for any number of processes that rely on invalidating data or caches:
XML API
Overall Roadmap for the XML API
While less prominent than the HTML API, WordPress also needs to reliably read, modify, and write XML. XML handling appears in a number of places:
The text was updated successfully, but these errors were encountered: