-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
syntax: Optimize some literal parsing #53521
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
cc @fitzgen (you're probably interested in this) |
src/libsyntax/parse/mod.rs
Outdated
@@ -430,7 +430,7 @@ crate fn lit_token(lit: token::Lit, suf: Option<Symbol>, diag: Option<(Span, &Ha | |||
// There are some valid suffixes for integer and float literals, | |||
// so all the handling is done internally. | |||
token::Integer(s) => (false, integer_lit(&s.as_str(), suf, diag)), | |||
token::Float(s) => (false, float_lit(&s.as_str(), suf, diag)), | |||
token::Float(s) => (false, float_lit(s, suf, diag)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change necessary? It would be nice to keep integer_lit
and float_lit
's arguments similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
integer_lit
could be changed to also take a Symbol
instead of a &str
.
src/libsyntax/parse/mod.rs
Outdated
-> Option<ast::LitKind> { | ||
debug!("float_lit: {:?}, {:?}", s, suffix); | ||
let sym = if s.as_str().contains('_') { | ||
Symbol::intern(&s.as_str().chars().filter(|&c| c != '_').collect::<String>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that float literals lacking a '_' will not be interned. This is a significant change in behaviour, and one that seems unintentional?
Also, the float_lit
and integer_lit
changes in this patch are orthogonal to the byte_str_lit
changes, and are not mentioned in the commit message. Put them in a separate commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that float literals lacking a '_' will not be interned.
No, with the patch, the literal is not re-interned. We can keep the original Symbol
value if it stays unchanged.
Heh, I did the "don't allocate when stripping out underscores" thing for \u{...} chars in |
I agree that the other changes should be mentioned in the commit message or moved to their own commit. Looks great, otherwise! r=me with the comments addressed. |
Currently in the `wasm-bindgen` project we have a very very large crate that's procedurally generated, `web-sys`. To generate this crate we parse all of a browser's WebIDL and we then generate bindings for all of the APIs contained within. The resulting Rust file is 18MB large (wow!) and currently takes a very long time to compile in debug mode. On the nightly compiler a *debug* build takes 90s for the crate to finish. I was curious what was taking so long and upon investigating a *massive* portion of the time was spent in the `lit_token` method of the compiler, primarily formatting strings via `format!`. Upon some more investigation it looks like the `byte_str_lit` was allocating an error message once per byte, causing a very large number of allocations to happen for large literals, of which wasm-bindgen generates quite a few (some are MB large). This commit fixes the issue by lazily allocating the error message, only doing so if the error message is actually needed (which should be never). As a result, the debug mode compilation time for our `web-sys` crate decreased from 90s to 20s, a very nice improvement! (although we've still got some work to do).
52d36ba
to
5bf2ad3
Compare
Ok I've backed out the changes for integer/float literals which I wasn't specifically measuring for, so now it's just the one thing I know for sure benefits quite a lot! @bors: r=michaelwoerister |
📌 Commit 5bf2ad3 has been approved by |
…michaelwoerister syntax: Optimize some literal parsing Currently in the `wasm-bindgen` project we have a very very large crate that's procedurally generated, `web-sys`. To generate this crate we parse all of a browser's WebIDL and we then generate bindings for all of the APIs contained within. The resulting Rust file is 18MB large (wow!) and currently takes a very long time to compile in debug mode. On the nightly compiler a *debug* build takes 90s for the crate to finish. I was curious what was taking so long and upon investigating a *massive* portion of the time was spent in the `lit_token` method of the compiler, primarily formatting strings via `format!`. Upon some more investigation it looks like the `byte_str_lit` was allocating an error message once per byte, causing a very large number of allocations to happen for large literals, of which wasm-bindgen generates quite a few (some are MB large). This commit fixes the issue by lazily allocating the error message, only doing so if the error message is actually needed (which should be never). As a result, the debug mode compilation time for our `web-sys` crate decreased from 90s to 20s, a very nice improvement! (although we've still got some work to do).
Rollup of 17 pull requests Successful merges: - #53030 (Updated RELEASES.md for 1.29.0) - #53104 (expand the documentation on the `Unpin` trait) - #53213 (Stabilize IP associated constants) - #53296 (When closure with no arguments was expected, suggest wrapping) - #53329 (Replace usages of ptr::offset with ptr::{add,sub}.) - #53363 (add individual docs to `core::num::NonZero*`) - #53370 (Stabilize macro_vis_matcher) - #53393 (Mark libserialize functions as inline) - #53405 (restore the page title after escaping out of a search) - #53452 (Change target triple used to check for lldb in build-manifest) - #53462 (Document Box::into_raw returns non-null ptr) - #53465 (Remove LinkMeta struct) - #53492 (update lld submodule to include RISCV patch) - #53496 (Fix typos found by codespell.) - #53521 (syntax: Optimize some literal parsing) - #53540 (Moved issue-53157.rs into src/test/ui/consts/const-eval/) - #53551 (Avoid some Place clones.) Failed merges: r? @ghost
Currently in the
wasm-bindgen
project we have a very very large crate that'sprocedurally generated,
web-sys
. To generate this crate we parse all of abrowser's WebIDL and we then generate bindings for all of the APIs contained
within.
The resulting Rust file is 18MB large (wow!) and currently takes a very long
time to compile in debug mode. On the nightly compiler a debug build takes 90s
for the crate to finish. I was curious what was taking so long and upon
investigating a massive portion of the time was spent in the
lit_token
method of the compiler, primarily formatting strings via
format!
.Upon some more investigation it looks like the
byte_str_lit
was allocating anerror message once per byte, causing a very large number of allocations to
happen for large literals, of which wasm-bindgen generates quite a few (some are
MB large).
This commit fixes the issue by lazily allocating the error message, only doing
so if the error message is actually needed (which should be never). As a result,
the debug mode compilation time for our
web-sys
crate decreased from 90s to20s, a very nice improvement! (although we've still got some work to do).