[`eradicate`] ignore `# language=` in commented-out-code rule (ERA001) #14069

fabiob · 2024-11-03T21:23:21Z

Fixes one common case cited on #6019

Summary

The commented-out-code rule (ERA001) from eradicate is currently flagging a very common idiom that marks Python strings as another language, to help with syntax highlighting:

This PR adds this idiom to the list of allowed exceptions to the rule.

Test Plan

I've added some additional test cases.

charliermarsh · 2024-11-03T21:24:54Z

Do you mind pushing the test cases? I only see the code changes.

For reference: https://www.jetbrains.com/help/pycharm/using-language-injections.html

fabiob · 2024-11-03T21:26:11Z

Sorry @charliermarsh , forgot to force-push after I amended the commit with the test cases.

charliermarsh · 2024-11-03T21:26:43Z

No prob, thanks for the PR.

fabiob · 2024-11-03T21:26:56Z

And looks like I need to fix the formatting... Working on it.

Fixes one common case cited on astral-sh#6019 Should work with the examples from Jetbrains documentation: https://www.jetbrains.com/help/pycharm/using-language-injections.html

fabiob · 2024-11-03T21:36:30Z

For reference: https://www.jetbrains.com/help/pycharm/using-language-injections.html

I've added some more test cases and fixed the regexp with the additional cases described here.

github-actions · 2024-11-03T21:50:06Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

charliermarsh · 2024-11-03T22:10:04Z

Thanks and welcome to the project.

InSyncWithFoo · 2024-11-04T14:32:35Z

crates/ruff_linter/src/rules/eradicate/detection.rs

@@ -16,7 +16,7 @@ static CODE_INDICATORS: LazyLock<AhoCorasick> = LazyLock::new(|| {

 static ALLOWLIST_REGEX: LazyLock<Regex> = LazyLock::new(|| {
    Regex::new(
-        r"^(?i)(?:pylint|pyright|noqa|nosec|region|endregion|type:\s*ignore|fmt:\s*(on|off)|isort:\s*(on|off|skip|skip_file|split|dont-add-imports(:\s*\[.*?])?)|mypy:|SPDX-License-Identifier:|(?:en)?coding[:=][ \t]*([-_.a-zA-Z0-9]+))",
+        r"^(?i)(?:pylint|pyright|noqa|nosec|region|endregion|type:\s*ignore|fmt:\s*(on|off)|isort:\s*(on|off|skip|skip_file|split|dont-add-imports(:\s*\[.*?])?)|mypy:|SPDX-License-Identifier:|language=[a-zA-Z](?: ?[-_.a-zA-Z0-9]+)+(?:\s+prefix=\S+)?(?:\s+suffix=\S+)?|(?:en)?coding[:=][ \t]*([-_.a-zA-Z0-9]+))",


This nested quantifier doesn't look so nice: [a-zA-Z](?: ?[-_.a-zA-Z0-9]+)+. The optional space at the start of the group leads to multiple ways of matching the same text.

I know Rust's regex engine guarantees linear time, but this is nevertheless a bad pattern. Prefer [a-zA-Z][-_.a-zA-Z0-9]*(?: [-_.a-zA-Z0-9]+)* instead.

Technically, language IDs can be anything, so it's perhaps better to drop the leading character requirement: [-_.a-zA-Z0-9]+(?: [-_.a-zA-Z0-9]+)*.

Should we just use language=? All language identifiers will start with that, and false positives seem somewhat rare / not a huge deal here.

@InSyncWithFoo maybe I was extra cautious about false positives.

Technically, language IDs can be anything,

Good catch! Maybe we can just require a non-space character after the equals sign. This should take care of most false positives anyway, as it's uncommon for Python users to omit spaces around assignment operators.

The only thing that worries me is about detecting a commented-out named parameter on a multi-line method invocation or declaration. But maybe I'm just being overzealous.

def check_spelling( text, # language=DEFAULT_LANGUAGE ) ...

A quick search shows that matching on language= alone would lead to many false negatives.

Language IDs are also displayed to the user as part of the UI, so I doubt they would contain non-ASCII characters. I would say limiting to [-_.a-zA-Z0-9] and spaces is the most balanced heuristic.

[...] it's uncommon for Python users to omit spaces around assignment operators.

Don't forget keyword arguments, which are recommended to be written with no spaces around the equal sign:

# frobnicate( # language='en' # Could also be `language=EN` with a predefined constant `EN` # )

I think both this and the false negative you mention are well within the acceptable error margin.

So @InSyncWithFoo , will you write the PR with the new regexp and test cases, or should I?

I'll handle it.

Turns out language IDs in comments must be normalized.

This is JSON Lines (one value on each line, comments allowed):

This is pure JSON (one single top-level value, comments disallowed):

jsonlines must be used even though autocompletion popups use json lines:

Notably, this rule applies to some languages but not others (though Ruff doesn't need to care about this):

charliermarsh approved these changes Nov 3, 2024

View reviewed changes

charliermarsh added the bug Something isn't working label Nov 3, 2024

fabiob force-pushed the adds-language-to-era001 branch from 96f0c10 to b94339b Compare November 3, 2024 21:25

fabiob force-pushed the adds-language-to-era001 branch from b94339b to 0d9abfe Compare November 3, 2024 21:30

[eradicate] ignore # language= in commented-out-code rule (ERA001)

58922e4

Fixes one common case cited on astral-sh#6019 Should work with the examples from Jetbrains documentation: https://www.jetbrains.com/help/pycharm/using-language-injections.html

fabiob force-pushed the adds-language-to-era001 branch from 0d9abfe to 58922e4 Compare November 3, 2024 21:35

charliermarsh merged commit 2b73a1c into astral-sh:main Nov 3, 2024
20 checks passed

fabiob deleted the adds-language-to-era001 branch November 3, 2024 22:12

InSyncWithFoo reviewed Nov 4, 2024

View reviewed changes

InSyncWithFoo mentioned this pull request Nov 4, 2024

[eradicate] Better detection of IntelliJ language injection comments (ERA001) #14094

Merged

BrewTestBot mentioned this pull request Nov 8, 2024

ruff 0.7.3 Homebrew/homebrew-core#197081

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`eradicate`] ignore `# language=` in commented-out-code rule (ERA001) #14069

[`eradicate`] ignore `# language=` in commented-out-code rule (ERA001) #14069

fabiob commented Nov 3, 2024

charliermarsh commented Nov 3, 2024

fabiob commented Nov 3, 2024

charliermarsh commented Nov 3, 2024

fabiob commented Nov 3, 2024

fabiob commented Nov 3, 2024

github-actions bot commented Nov 3, 2024

charliermarsh commented Nov 3, 2024

InSyncWithFoo Nov 4, 2024

charliermarsh Nov 4, 2024

fabiob Nov 4, 2024

InSyncWithFoo Nov 4, 2024

InSyncWithFoo Nov 4, 2024 •

edited

Loading

fabiob Nov 4, 2024

InSyncWithFoo Nov 4, 2024

InSyncWithFoo Nov 4, 2024 •

edited

Loading

[eradicate] ignore # language= in commented-out-code rule (ERA001) #14069

[eradicate] ignore # language= in commented-out-code rule (ERA001) #14069

Conversation

fabiob commented Nov 3, 2024

Summary

Test Plan

charliermarsh commented Nov 3, 2024

fabiob commented Nov 3, 2024

charliermarsh commented Nov 3, 2024

fabiob commented Nov 3, 2024

fabiob commented Nov 3, 2024

github-actions bot commented Nov 3, 2024

ruff-ecosystem results

Linter (stable)

Linter (preview)

charliermarsh commented Nov 3, 2024

InSyncWithFoo Nov 4, 2024

Choose a reason for hiding this comment

charliermarsh Nov 4, 2024

Choose a reason for hiding this comment

fabiob Nov 4, 2024

Choose a reason for hiding this comment

InSyncWithFoo Nov 4, 2024

Choose a reason for hiding this comment

InSyncWithFoo Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

fabiob Nov 4, 2024

Choose a reason for hiding this comment

InSyncWithFoo Nov 4, 2024

Choose a reason for hiding this comment

InSyncWithFoo Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

[`eradicate`] ignore `# language=` in commented-out-code rule (ERA001) #14069

[`eradicate`] ignore `# language=` in commented-out-code rule (ERA001) #14069

`ruff-ecosystem` results

InSyncWithFoo Nov 4, 2024 •

edited

Loading

InSyncWithFoo Nov 4, 2024 •

edited

Loading