-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new lint: Mixed locale ident #7376
Conversation
r? @phansch (rust-highfive has picked a reviewer for you, use r? to override) |
What would the lint say about a value name like |
Good question and great test case! Added it, and yes, it does not spawn the lint (I'm not a specialist in unicode by any means, but AFAIK diaeresis doesn't move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @Manishearth since you maintain the unicode crates in question here. Do you think adding this dep to Clippy (with the same configuration as in rustc
) is fine?
Co-authored-by: Philipp Krones <[email protected]>
Strongly against this: I designed the lints that are in rustc for this and they were pretty carefully designed, over months of discussion. This particular thing was not considered to be a case worth addressing because idents like I don't understand why "handwriting the code is hard" is a concern at all; nor do I understand what you mean by the builtin getting implicitly shadowed. |
Oh, I see what you mean by "shadowing". Yes, this was also considered when designing the rustc lint, and also determined to not be worth it: if someone has chosen to use Cyrillic in their code it's not worth it to try and nitpick "good usage" from "bad usage" because those can be pretty linked. There are some other potential designs here: e.g. warning about mixed locales when it's only mixed-script confusables and only when not separated by underscores. But the current proposed design will catch too many legitimate use cases. Probably more than illegitimate ones, even, this kind of situation is super rare to trigger by accident. It does sound like a reasonable style guideline to not mix scripts across underscores in a way that one of the scripts is purely using confusables. |
That's why it's in clippy and not in the compiler, isn't it? It's completely OK to 'allow' lints you find not suiting your style. I guess not every project will be happy with the approach in rustc (at least I am personally have plans to actively dogfood this lint in projects I participate in). Maybe just put this lint into 'pedantic' category? |
Yes, but adding it to clippy still makes it a value judgement; and I do not consider this a good value judgement in this case. I'm not making this comment as a personal comment of style, I'm making this comment as a clippy maintainer who thinks that we need to be certain of the value judgements we are making when we add lints and as the person who did the research and design of the non ascii idents RFC. The lint as currently posed would likely warn on a lot of good code, more than the bad code. Also 99% of the people who would enable this by default probably would also be okay with There's a way to do this better that I already proposed, which would work as a pedantic lint, though it's trickier to implement. |
Another thing that I would be fine with adding would be a |
Well, actually it makes sense. Gonna implement. |
I think it's done. Does it look better now? |
r? @Manishearth (reassigning since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably be a pedantic
lint
still kinda feel like a restriction lint with a list of scripts is the more extensible way to go.
#[derive(Debug)] | ||
enum Case { | ||
/// E.g. `SomeStruct`, delimiter is uppercase letter. | ||
Camel, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so a problem with this is that not all writing systems have uppercase. More thought needs to be put into how that will work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But perhaps using only mixed script confusables without underscores in a character is enough for us for now. Idk.
I was going to implement it after this one gets merged, as script-detection dependency is introduced here. I don't think that having two lints in one PR is going to make things easier, assuming that it's already kinda sensitive 😅 |
However, this going farther and farther from my original intent, and become more and more complicated. I'll close this PR and will implement a restriction lint for locales instead. |
Thank you! |
New lint: `disallowed_script_idents` This PR implements a new lint to restrict locales that can be used in the code, as proposed in #7376. Current concerns / unresolved questions: - ~~Mixed usage of `script` (as a Unicode term) and `locale` (as something that is easier to understand for the broad audience). I'm not sure whether these terms are fully interchangeable and whether in the current form it is more confusing than helpful.~~ `script` is now used everywhere. - ~~Having to mostly copy-paste `AllowedScript`. Probably it's not a big problem, as the list of scripts is standardized and is unlikely to change, and even if we'd stick to the `unicode_script::Script`, we'll still have to implement custom deserialization, and I don't think that it will be shorter in terms of the amount of LoC.~~ `unicode::Script` is used together with a filtering deserialize function. - Should we stick to the list of "recommended scripts" from [UAX #31](http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts) in the configuration? *Please write a short comment explaining your change (or "none" for internal only changes)* changelog: ``[`disallowed_script_idents`]`` r? `@Manishearth`
New lint: `disallowed_script_idents` This PR implements a new lint to restrict locales that can be used in the code, as proposed in #7376. Current concerns / unresolved questions: - ~~Mixed usage of `script` (as a Unicode term) and `locale` (as something that is easier to understand for the broad audience). I'm not sure whether these terms are fully interchangeable and whether in the current form it is more confusing than helpful.~~ `script` is now used everywhere. - ~~Having to mostly copy-paste `AllowedScript`. Probably it's not a big problem, as the list of scripts is standardized and is unlikely to change, and even if we'd stick to the `unicode_script::Script`, we'll still have to implement custom deserialization, and I don't think that it will be shorter in terms of the amount of LoC.~~ `unicode::Script` is used together with a filtering deserialize function. - Should we stick to the list of "recommended scripts" from [UAX #31](http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts) in the configuration? *Please write a short comment explaining your change (or "none" for internal only changes)* changelog: ``[`disallowed_script_idents`]`` r? `@Manishearth`
This PR adds a new lint to check that the identifier name has multiple locales.
I think that it's not the thing that must happen normally, as it both makes hand-writing the code much harder, and can lead to confusing errors (
rustc
's built-in lintmixed_script_confusables
can be implicitly shadowed, which makes it not really reliable).stderr
example:Please write a short comment explaining your change (or "none" for internal only changes)
changelog:
[`mixed_locale_idents`]