Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark str::is_char_boundary and str::split_at* unstably const. #131520

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

zachs18
Copy link
Contributor

@zachs18 zachs18 commented Oct 10, 2024

Tracking issues: #131516, #131518

First commit implements const_is_char_boundary, second commit implements const_str_split_at (which depends on const_is_char_boundary)

I used const_eval_select for is_char_boundary since there is a comment about optimizations that would theoretically not happen with the simple const-compatible version (since slice::get is not constifiable) cc #84751. I have not checked if this code difference is still required for the optimization, so it might not be worth the code complication, but 🤷.

This changes str::split_at_checked to use a new private helper function split_at_unchecked (copied from split_at_mut_unchecked) that does pointer stuff instead of get_unchecked, since that is not currently constifiable due to using the SliceIndex trait.

@rustbot
Copy link
Collaborator

rustbot commented Oct 10, 2024

r? @Noratrieb

rustbot has assigned @Noratrieb.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 10, 2024
@okaneco
Copy link
Contributor

okaneco commented Oct 11, 2024

I found the commit f621193 that added the optimization comments mentioning avoiding a panic branch and touching memory when index was 0 which you can see here, comparing 1.8, 1.9, and nightly
https://rust.godbolt.org/z/GTx4Mc3G4

I don't think the last comment line rules out const-incompatibility. It looks like it's trying to reinforce that slice.get(..i) relies on that specific early return to optimize out checks, not that using get(i) in is_char_boundary is required for the optimization.

I would assume you'd get the same optimizations if the match+get() was converted to an if-else statement, but there should probably be codegen tests to make sure the 0-constant slicing behavior is unchanged.

if index == 0 { return true; }
if index < self.as_bytes().len() {
    self.as_bytes()[index].is_utf8_char_boundary()
} else {
    index == self.len()
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants