Skip to content

Comments

[flake8-simplify] Fix incorrect fix for positive maxsplit without separator (SIM905)#20056

Merged
ntBre merged 2 commits intoastral-sh:mainfrom
RazerM:fix/sim905-maxsplit-whitespace-count
Sep 18, 2025
Merged

[flake8-simplify] Fix incorrect fix for positive maxsplit without separator (SIM905)#20056
ntBre merged 2 commits intoastral-sh:mainfrom
RazerM:fix/sim905-maxsplit-whitespace-count

Conversation

@RazerM
Copy link
Contributor

@RazerM RazerM commented Aug 23, 2025

Summary

Resolves #20033

Test Plan

unit tests added to the new split function, existing snapshot test updated.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 23, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for jumping on this!

I'm kind of leaning away from defining our own iterator, if we can avoid it.

let start = self
.remaining
.rfind(py_unicode_is_whitespace)
.map(|pos| pos + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is safe to do. rfind returns the byte offset of the first character of the match. I tried a case like this:

const fn py_unicode_is_whitespace(ch: char) -> bool {
    matches!(ch, '\u{3000}')
}

fn main() {
    let text = "\u{3000}next";
    let start = text.rfind(py_unicode_is_whitespace).unwrap();
    dbg!(&text[start + 1..]);
}

in the Rust playground, and it panics with a byte index is not a char boundary error. This would be the case for any of the multi-byte whitespace characters in our real py_unicode_is_whitespace function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I need to increment by char::len_utf8.

@RazerM
Copy link
Contributor Author

RazerM commented Aug 26, 2025

I'm kind of leaning away from defining our own iterator, if we can avoid it.

Depending how you feel about it we could just remove the fix, since it wasn't spotted in #19851 by me or in review. The iterator is a bit verbose but I think the logic is ok.

@RazerM RazerM force-pushed the fix/sim905-maxsplit-whitespace-count branch from 1a408cd to b69799f Compare August 26, 2025 19:22
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think you're right, I was being overly scared of the iterator. This looks good to me overall, just one idea for simplifying a bit by using (r)split_once. It seemed to work well for me locally, but let me know if I'm still missing something.

@ntBre ntBre added bug Something isn't working fixes Related to suggested fixes for violations labels Sep 9, 2025
Co-authored-by: Brent Westbrook <brentrwestbrook@gmail.com>
@RazerM RazerM force-pushed the fix/sim905-maxsplit-whitespace-count branch from b69799f to f4c9f98 Compare September 12, 2025 21:42
Copy link
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@ntBre
Copy link
Contributor

ntBre commented Sep 18, 2025

I took care of the merge conflicts. It looks like Direction mostly just got renamed to Method in #20459.

@ntBre ntBre changed the title [flake8-simplify] Fix incorrect fix for positive maxsplit without separator (SIM905) [flake8-simplify] Fix incorrect fix for positive maxsplit without separator (SIM905) Sep 18, 2025
@ntBre ntBre enabled auto-merge (squash) September 18, 2025 20:53
@ntBre ntBre merged commit bc89d03 into astral-sh:main Sep 18, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fixes Related to suggested fixes for violations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SIM905 fix counts splits wrong for positive maxsplit and unspecified sep

2 participants