-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve code generated for starts_with(<literal char>)
#67249
Improve code generated for starts_with(<literal char>)
#67249
Conversation
The comparison can be performed on the raw bytes, as the chars can only match if their UTF8 encoding matches. This avoids the `is_char_boundary` checks and translates to a straight `u8` slice comparison which is optimized to a memcmp or inline comparison where appropriate.
This enables constant folding when matching a literal char. Fixes rust-lang#41993.
r? @shepmaster (rust_highfive has picked a reviewer for you, use r? to override) |
cc @kennytm r? @BurntSushi perhaps? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change itself LGTM. Do you have benchmarks showing a difference here? Mostly just to confirm there aren't any regressions. Even with no benefit, I think these changes make the code simpler.
@@ -715,16 +715,13 @@ impl<'a, 'b> Pattern<'a> for &'b str { | |||
/// Checks whether the pattern matches at the front of the haystack | |||
#[inline] | |||
fn is_prefix_of(self, haystack: &'a str) -> bool { | |||
haystack.is_char_boundary(self.len()) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. According to git blame
, @bluss added this in 2015. But yeah, this looks unnecessary to me and I agree with the change.
src/libcore/str/pattern.rs
Outdated
false | ||
} | ||
let mut buffer = [0u8; 4]; | ||
self.encode_utf8(&mut buffer).is_prefix_of(haystack) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can just be simplified to self.encode_utf8(&mut [0; 4]).is_prefix_of(haystack)
? And similarly for below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right; I added a cleanup commit.
I used this code to check the generated assembly.
|
@ranma42 Is there a place where those benchmarks can be added in this PR? |
I added the benchmarks in 3de1923 , but I am not completely sure if that is the correct/best way to do it. |
LGTM. Thanks so much! @bors r+ |
📌 Commit 3de1923 has been approved by |
🌲 The tree is currently closed for pull requests below priority 100, this pull request will be tested once the tree is reopened |
…-char, r=BurntSushi Improve code generated for `starts_with(<literal char>)` This PR includes two minor improvements to the code generated when checking for string prefix/suffix. The first commit simplifies the str/str operation, by taking advantage of the raw UTF-8 representation. The second commit replaces the current str/char matching logic with a char->str encoding and then the previous method. The resulting code should be equivalent in the generic case (one char is being encoded versus one char being decoded), but it becomes easy to optimize in the case of a literal char, which in most cases a developer might expect to be at least as simple as that of a literal string. This PR should fix rust-lang#41993
…-char, r=BurntSushi Improve code generated for `starts_with(<literal char>)` This PR includes two minor improvements to the code generated when checking for string prefix/suffix. The first commit simplifies the str/str operation, by taking advantage of the raw UTF-8 representation. The second commit replaces the current str/char matching logic with a char->str encoding and then the previous method. The resulting code should be equivalent in the generic case (one char is being encoded versus one char being decoded), but it becomes easy to optimize in the case of a literal char, which in most cases a developer might expect to be at least as simple as that of a literal string. This PR should fix rust-lang#41993
Rollup of 8 pull requests Successful merges: - #67249 (Improve code generated for `starts_with(<literal char>)`) - #67308 (Delete flaky test net::tcp::tests::fast_rebind) - #67318 (Improve typeck & lowering docs for slice patterns) - #67322 (use Self alias in place of macros) - #67323 (make transparent enums more ordinary) - #67336 (Fix JS error when loading page with search) - #67344 (.gitignore: Don't ignore a file that exists in the repository) - #67349 (Minor: update Unsize docs for dyn syntax) Failed merges: r? @ghost
This PR includes two minor improvements to the code generated when checking for string prefix/suffix.
The first commit simplifies the str/str operation, by taking advantage of the raw UTF-8 representation.
The second commit replaces the current str/char matching logic with a char->str encoding and then the previous method.
The resulting code should be equivalent in the generic case (one char is being encoded versus one char being decoded), but it becomes easy to optimize in the case of a literal char, which in most cases a developer might expect to be at least as simple as that of a literal string.
This PR should fix #41993