Excessive compile time for HTML entity resolution #763

hanna-kruppe · 2024-06-20T12:33:03Z

Updating from quick-xml v0.31.0 to v0.32.0, I was surprised to see my project's build suddenly take longer. While quick-xml v0.31.0 takes less than three seconds to compile with optimizations, v0.32.0 takes between 10 and 30 seconds depending on how the stars align. This turns it from "beneath notice" into "the one thing that holds up everything" -- especially because I use Cargo profile overrides to enabled optimizations for quick-xml even for debug builds, so tests that have to deal with a lot of XML don't take forever. I did some digging and concluded that this is due to a single function (resolve_html5_entity) whose enormous control flow graph causes several LLVM passes to take several seconds each.

This is not a new problem: the function has existed in the same form for a long time. I've only noticed it now because it used to be feature-gated until 10d1ff8 and I had never enabled the escape-html feature. But v0.31.0 with that feature enabled compiles just as slowly as v0.32.0 does by default. So the easy workaround (modulo semver concerns) would make that function feature-gated again. Another workaround would be slapping #[inline] on it so that it's not lowered to LLVM IR if downstream crates never call it, but that's silly for other reasons. I assume y'all had reasons to make this function available unconditionally, but as long as it has such a hefty build time impact, I'd appreciate a way to side-step it.

Of course, it would be best to address the root of the problem (the huge match statement), though that may be more involved. The standard solution in such cases is to convert it into a data structure of some sort, so the lookup doesn't require O(n) code for n key-value pairs. If the function is hot, a well-tuned perfect hash table will probably also improve performance. From skimming the code that gets generated right now, it seems to do a jump table on entity.len() followed by a lot of byte-by-byte comparisons/jump tables. While that probably works pretty well if you see the same couple of entities repeatedly, I wouldn't expect it to be competitive in more complex workloads.

The text was updated successfully, but these errors were encountered:

Mingun · 2024-06-20T15:02:23Z

Hm. This is interesting consequence of making long function public, I've never think about that. It was made public just for convenience of possible users, we can hide it under feature again.

Mingun · 2024-06-21T16:00:42Z

I just released 0.33.0 with this fix so you can save some time for your life 😃

hanna-kruppe · 2024-06-21T16:52:48Z

Thanks!

dralley mentioned this issue Jun 20, 2024

Better benchmarks for match checking? rust-lang/rustc-perf#792

Open

Mingun mentioned this issue Jun 20, 2024

Hide quick_xml::escape::resolve_html5_entity under escape-html feature again #764

Merged

dralley closed this as completed in #764 Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive compile time for HTML entity resolution #763

Excessive compile time for HTML entity resolution #763

hanna-kruppe commented Jun 20, 2024 •

edited

Loading

Mingun commented Jun 20, 2024

Mingun commented Jun 21, 2024

hanna-kruppe commented Jun 21, 2024

Excessive compile time for HTML entity resolution #763

Excessive compile time for HTML entity resolution #763

Comments

hanna-kruppe commented Jun 20, 2024 • edited Loading

Mingun commented Jun 20, 2024

Mingun commented Jun 21, 2024

hanna-kruppe commented Jun 21, 2024

hanna-kruppe commented Jun 20, 2024 •

edited

Loading