[refurb] Preserve digit separators in Decimal constructor (FURB157)#20588
[refurb] Preserve digit separators in Decimal constructor (FURB157)#20588ntBre merged 13 commits intoastral-sh:mainfrom
refurb] Preserve digit separators in Decimal constructor (FURB157)#20588Conversation
|
ntBre
left a comment
There was a problem hiding this comment.
Thanks! But I think we need to be more careful here. At least one of the new snapshots is incorrect.
crates/ruff_linter/src/rules/refurb/rules/verbose_decimal_constructor.rs
Outdated
Show resolved
Hide resolved
| let numeric_value = format!("{unary}{rest}"); | ||
| add_thousand_separators(&numeric_value) |
There was a problem hiding this comment.
Would it be better to just strip leading underscores, and otherwise preserve the placement and frequency, rather than assume that thousands are the only use of underscores in numeric literals?
There was a problem hiding this comment.
Leading underscores aren't the only problem. Duplicates in the middle cause a syntax error too, but not in the Decimal constructor:
>>> from decimal import Decimal
>>> 1__2
File "<python-input-2>", line 1
1__2
^
SyntaxError: invalid decimal literal
>>> Decimal("1__2")
Decimal('12')But I agree in principle, add_thousand_separators does not seem like the right approach to me.
I'm starting to think we should just mark the fix as unsafe if the value we're replacing the original with is different. The edge cases here seem pretty tricky.
There was a problem hiding this comment.
Thanks for catching that; I agree that marking it unsafe is probably the way to go to not break too much.
There was a problem hiding this comment.
Wouldn't stripping leading underscores and replacing duplicates with a single underscore be enough?
In my opinion keeping Decimal("10_000_000") -> Decimal(10_000_000) transition as safe would be very convenient.
There was a problem hiding this comment.
It would be enough to strip leading underscores, strip trailing underscores, strip leading zeros (unless all the digits are zeros), and collapse medial underscore sequences: Decimal(" _-_0_01__2_ ") would become Decimal(-1_2).
There was a problem hiding this comment.
That's the behavior I fight for ❤️ 🚀 :)
There was a problem hiding this comment.
That sounds reasonable to me, I gave up a bit too soon :) @danparizher do you want to give this a shot? We can still revert to marking the fix unsafe if the code gets too complicated, but this sounds feasible with the transformations we're already doing.
|
The current state of this PR is that underscores are normalized to be thousands separators, but cf. @amyreese’s earlier comment. If a user writes |
|
What's the status of this PR? Is there something left that needs addressing or is it good to go? |
|
I don't think we should normalize the digit separators to thousands separators, as @dscorbett pointed out. |
Update normalize_digit_separators to retain the original pattern of digit separators instead of forcing thousands separators. This change ensures that only syntax errors are fixed while the user's formatting is preserved.
| // Extract only digits from the trimmed string | ||
| let digits: String = trimmed.chars().filter(char::is_ascii_digit).collect(); | ||
|
|
||
| // If no digits found, return 0 | ||
| if digits.is_empty() { | ||
| return format!("{unary}0"); | ||
| } | ||
|
|
||
| let mut result = String::new(); | ||
| let chars: Vec<char> = trimmed.chars().collect(); | ||
| let mut i = 0; | ||
|
|
||
| while i < chars.len() { | ||
| if chars[i].is_ascii_digit() { | ||
| result.push(chars[i]); | ||
| } else if chars[i] == '_' { | ||
| // Only add underscore if the previous character was a digit | ||
| // and we haven't already added an underscore | ||
| if !result.is_empty() && !result.ends_with('_') { | ||
| result.push('_'); | ||
| } | ||
| } | ||
| i += 1; | ||
| } | ||
|
|
||
| // Remove trailing underscores | ||
| result = result.trim_end_matches('_').to_string(); |
There was a problem hiding this comment.
I think this ends up being overly verbose and defensive, and lines 225-251 could be replaced with just:
let result = trimmed
.chars()
.dedup_by(|a, b| {*a == '_' && a == b})
.collect::<String>();
Leading/trailing underscores were already trimmed by line 218, so there shouldn't be a need to both check that there are some number of digits and the result after trimming/deduplication is not empty.
| Decimal("15_000_000") # Safe fix: normalizes separators, becomes Decimal(15_000_000) | ||
| Decimal("1_234_567") # Safe fix: normalizes separators, becomes Decimal(1_234_567) | ||
| Decimal("-5_000") # Safe fix: normalizes separators, becomes Decimal(-5_000) | ||
| Decimal("+9_999") # Safe fix: normalizes separators, becomes Decimal(+9_999) |
There was a problem hiding this comment.
Would be nice to include test cases showing that non-thousands separators are maintained accordingly, eg "12_34_56_78" and "1234_5678".
| if has_digit_separators { | ||
| diagnostic.set_fix(Fix::unsafe_edit(Edit::range_replacement( | ||
| replacement, | ||
| value.range(), | ||
| ))); | ||
| } else { | ||
| diagnostic.set_fix(Fix::safe_edit(Edit::range_replacement( | ||
| replacement, | ||
| value.range(), | ||
| ))); | ||
| } |
There was a problem hiding this comment.
We should use Fix::applicable_edit here.
There was a problem hiding this comment.
Or wait, if we're going to preserve the original formatting, should it just be safe again?
| /// - Stripping leading and trailing underscores | ||
| /// - Collapsing medial underscore sequences to single underscores | ||
| /// - Do not force thousands separators | ||
| fn normalize_digit_separators(original_str: &str, unary: &str, _numeric_part: &str) -> String { |
There was a problem hiding this comment.
We should either remove _numeric_part if it's unused or rename it to numeric_part, if it's used.
| // If no digits found, return 0 | ||
| if digits.is_empty() { | ||
| return format!("{unary}0"); | ||
| } |
There was a problem hiding this comment.
These is_empty checks don't seem right to me at all. I don't think we should return an arbitrary value for invalid input, and I think we've already validated the input in verbose_decimal_constructor, but I could also be missing something.
| let without_unary = if original_str.starts_with('+') || original_str.starts_with('-') { | ||
| &original_str[1..] | ||
| } else { | ||
| original_str | ||
| }; |
There was a problem hiding this comment.
We already stripped the prefix once on line 99. Can we reuse more of the work from the parent function?
Updates the verbose_decimal_constructor rule to always mark fixes for digit separators in Decimal constructors as safe, removing the previous unsafe applicability. Adds test cases for non-thousands separators and updates related snapshots.
ntBre
left a comment
There was a problem hiding this comment.
I pushed a few commits simplifying things slightly to reuse the earlier unary prefix removal, but this looks good to me.
|
After a bit more thought, I came up with some additional edge cases with separators and leading zeros. Hopefully everything is covered now. |
* origin/main: (21 commits) [ty] Update "constraint implication" relation to work on constraints between two typevars (#21068) [`flake8-type-checking`] Fix `TC003` false positive with `future-annotations` (#21125) [ty] Fix lookup of `__new__` on instances (#21147) Fix syntax error false positive on nested alternative patterns (#21104) [`pyupgrade`] Fix false positive for `TypeVar` with default on Python <3.13 (`UP046`,`UP047`) (#21045) [ty] Reachability and narrowing for enum methods (#21130) [ty] Use `range` instead of custom `IntIterable` (#21138) [`ruff`] Add support for additional eager conversion patterns (`RUF065`) (#20657) [`ruff-ecosystem`] Fix CLI crash on Python 3.14 (#21092) [ty] Infer type of `self` for decorated methods and properties (#21123) [`flake8-bandit`] Fix correct example for `S308` (#21128) [ty] Dont provide goto definition for definitions which are not reexported in builtins (#21127) [`airflow`] warning `airflow....DAG.create_dagrun` has been removed (`AIR301`) (#21093) [ty] follow the breaking API changes made in salsa-rs/salsa#1015 (#21117) [ty] Rename `Type::into_nominal_instance` (#21124) [ty] Filter out "unimported" from the current module [ty] Add evaluation test for auto-import including symbols in current module [ty] Refactor `ty_ide` completion tests [ty] Render `import <...>` in completions when "label details" isn't supported [`refurb`] Preserve digit separators in `Decimal` constructor (`FURB157`) (#20588) ...
Summary
Fixes #20572