Skip to content

Conversation

@guillp
Copy link

@guillp guillp commented May 20, 2025

Summary

This adds a rule (with code RUF062) that automatically formats large numbers with underscore separators to make them more readable. This is as described in PEP515, and discussed in #12787 which I opened a while back.

E.g:
123456 becomes 123_456
.12345 becomes .123_45
0xDEADBEEF becomes 0xDEAD_BEEF
(see test snapshot for more examples)

This rule works for:

  • integers: if they are 5 digits or more (configurable), the rule will require underscores as thousands, millions, billions, etc. separators. This threshold avoid formatting relatively small integers (9999 or less) since they are already readable enough. 123456789 gives 123_456_789
  • floats: same rules as integers, and the float part is also formatted with those same rules, but the groups are formed left to right (thousandths, millionths, etc.)
    123456789.123456789 gives 123_456_789.123_456_789
  • hexadecimal notation (0xABCD): add underscores to form groups of 4 digits by default (configurable)
    0x1234567890ABCD gives 0x12_3456_7890_ABCD
  • octal notation (0o1234): add underscores to form groups of 4 digits by default (configurable)
    0o1234567 gives 0o123_4567
  • binary notation (0b1010101): add underscores to from groups of 8 bits (octets) by default (configurable)
    0o1010111100100101 gives 0o10101111_00100101
  • scientific notation (123e10): the leading part is formatted with the same rules as integers. The exponent part is untouched as it should never be more than 3 chars anyway.
  • positive or negative literals: the leading + or - sign is not part of Expr::NumberLiteral instances once parsed by ruff, so this rule does not modify them in any way, they just stay in place.
  • any kind of number already containing separators but in the wrong places: number will be reformatted with the defined configuration.

Support for indian-style number formatting:

According to https://randombits.dev/articles/number-localization/formatting , most of the world groups decimal digits 3 by 3, excepted for India who uses groups of 2 after the first group of 3 (so thousands, hundred of thousands, hundreds of hundreds of thousands, etc.). A configuration option allows enabling this kind of grouping.
I am however not sure about what is the practice for formatting the float part in India. I implemented a "reversed" logic, with separators on thousandth, then hundredth of thousandth, hundredth of hundredth of thousandth, etc. (not sure if anyone ever needed such a float precision ^^'). This may need to be adjusted.

So 123456789.123456789 formatted with indian mode will give 12_34_56_789.123_45_67_89

Test Plan

A new test file RUF062.py is part of the PR and is executed on cargo test.

TODO / to discuss

  • rule code: I took RUF062 as it is the next available in the RUFF group, and I selected this group because I did not see such a rule implemented in any other formatter/linter.
  • naming of configuration options should be reviewed and probably improved
  • default config values to be discussed
  • more tests?

@guillp guillp changed the title Rule/large number without separators Rule: large number without underscore separators (PEP 515) May 20, 2025
@MichaReiser MichaReiser added rule Implementing or modifying a lint rule needs-decision Awaiting a decision from a maintainer labels May 20, 2025
@guillp
Copy link
Author

guillp commented Jul 1, 2025

I rebased on current main and changed the code to RUF062 since 061 is now taken by another rule.

@ntBre
Copy link
Contributor

ntBre commented Jul 1, 2025

I think something went wrong with the rebase, as GitHub is now showing more than 500 commits and over 100,000 lines changed, which would make it pretty difficult to review!

@guillp guillp force-pushed the rule/large_number_without_separators branch from d005005 to c30fbe2 Compare July 3, 2025 07:02
@guillp
Copy link
Author

guillp commented Jul 3, 2025

Not sure what I did wrong. But I made some clean-up so there is a single commit on top of current main with all the changes now.
And also I added configuration options for digit group sizes, and also added support for indian-style number formatting.
Edited the PR description to reflect this. Thanks for reviewing this.

@ntBre
Copy link
Contributor

ntBre commented Jul 4, 2025

Thanks for tidying up and for your work on this! I think we'll still want to resolve the needs-decision label on the issue before giving this a full review.

@MichaReiser
Copy link
Member

I agree that we still need to answer the question if we want to have such an opinionated rule that enforces specific grouping of numbers.

I do like clippy's rule that are less opinionated but enforce good practice:

The first two seem very useful to me. It's less clear to me if we want to add any more opinionated rules.

@guillp
Copy link
Author

guillp commented Aug 7, 2025

I think the point of having an opinionated rule is to allow auto-fixing, which should be suitable for most cases. If some codebase has very specific requirements, the rule should be configurable enough to accommodate them (as long as those requirements are consistent across the whole code-base). Or the rule can be disabled otherwise.

Another different rule that only warns about inconsistent, uncommon or too large grouping (like in clippy) may also be created, and user may choose to activate one or the other. Or even both, because if the "enforcer" rule (from this PR) is ran before the "warning" rule, the former should fix any issue before any warning is issued by the later. Would that suit what you had in mind? I can implement that warning rule if needed.

@ntBre
Copy link
Contributor

ntBre commented Aug 7, 2025

Another different rule that only warns about inconsistent, uncommon or too large grouping (like in clippy) may also be created, and user may choose to activate one or the other. Or even both, because if the "enforcer" rule (from this PR) is ran before the "warning" rule, the former should fix any issue before any warning is issued by the later. Would that suit what you had in mind? I can implement that warning rule if needed.

I think it would be very unusual, probably unprecedented, to have two overlapping rules with different severities, so I don't think that's a viable option. We do plan to add a warning level/severity (#1256) in the future, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-decision Awaiting a decision from a maintainer rule Implementing or modifying a lint rule

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants