feat(lint): add nursery rule useNamedCaptureGroup#9048
Conversation
🦋 Changeset detectedLatest commit: 4174fcb The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new nursery lint rule UseNamedCaptureGroup for the Biome JS analyser that detects unnamed capture groups in regex literals and RegExp constructor/call usages, and emits diagnostics encouraging named capture groups. Includes precise per-group ranges for simple string-literal patterns when possible, registration metadata, unit tests (valid and invalid cases), a new UseNamedCaptureGroupOptions type, and a changelog entry. Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/biome_js_analyze/src/lint/nursery/use_named_capture_group.rs`:
- Around line 229-249: The constructor path currently returns a single
diagnostic covering the whole expression in run_regexp_constructor (return
Box::new([node.range()])), which differs from the literal path that emits one
diagnostic per unnamed group; either (A) adjust run_regexp_constructor to mirror
the literal handling by parsing pattern_text (from extract_pattern_from_args) to
find each unnamed group span, convert those spans into TextRange values relative
to the argument/string literal and return a Box<[TextRange]> containing one
range per unnamed group (use
parse_regexp_node/extract_pattern_from_args/has_unnamed_capture_groups as hooks
to locate the pattern and compute offsets), or (B) if single-diagnostic behavior
is intentional, update the rule metadata where the rule is declared (change
.same() to .inspired() at the rule setup referenced near line 76) so the
reported behavior is documented as differing from ESLint; pick one and implement
accordingly.
🧹 Nitpick comments (2)
crates/biome_js_analyze/src/lint/nursery/use_named_capture_group.rs (2)
85-173: Helper functions should be placed below theimpl Ruleblock.Per project convention, all helper functions, structs, and enums must be placed below the
impl Ruleblock (the only exception being node union declarations used in theQuerytype).find_unnamed_capture_groups,has_unnamed_capture_groups,is_regexp_object,parse_regexp_node, andextract_pattern_from_argsshould all be moved below theimpl Rule for UseNamedCaptureGroupblock.Based on learnings: "In crates/biome_analyze/**/*.rs rule files, all helper functions, structs, and enums must be placed below the
impl Ruleblock."
175-208:Semantic<AnyJsExpression>as the query type is broad.This means
runis invoked for every expression node in the file. The earlymatch+Default::default()exit is cheap, so this isn't a blocker — just something to be aware of if profiling shows overhead. A narrower query viadeclare_node_union!combiningJsRegexLiteralExpression,JsNewExpression, andJsCallExpressioncould reduce invocations.
ematipico
left a comment
There was a problem hiding this comment.
Nice work! Left some comments
| while i < pattern.len() { | ||
| match pattern[i] { |
There was a problem hiding this comment.
See if you can use as_bytes instead. Then, evaluate the removal of the counter by using an iterator
There was a problem hiding this comment.
Thanks for the suggestion!
I’ve updated the code to use as_bytes and replaced the counter-based loop with an iterator.
| let raw_inner = &token_text[1..token_text.len() - 1]; | ||
| let inner_text = string_lit.inner_string_text().ok()?; | ||
| // If raw source and interpreted text differ, escapes are present | ||
| if raw_inner != inner_text.text() { | ||
| return None; | ||
| } |
There was a problem hiding this comment.
I don't understand when this is the case. Are there tests for this?
There was a problem hiding this comment.
Sorry for missing the test cases earlier.
raw_inner != inner_text becomes true when the string contains escape sequences.
Without this guard find_unnamed_capture_groups would run on the raw source and produce incorrect offsets.
To clarify the behavior I added tests covering both directions
-
valid:
new RegExp("\\(foo)")— using the raw source\\(foo)would incorrectly report(as an unnamed group, while the actual regex\(foo)has no capture group. -
invalid:
new RegExp("\\d+(foo)")— since escapes are present, precise mapping is skipped but the fallback path correctly detects the unnamed group(foo)using the interpreted string\d+(foo).
de6612f to
d020593
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
crates/biome_rule_options/src/use_named_capture_group.rs (1)
3-6: Add a rustdoc comment toUseNamedCaptureGroupOptions.Per the coding guidelines, options structs need inline rustdoc. Even if the struct is currently empty, a brief doc comment keeps things consistent with the rest of the codebase.
📝 Suggested doc comment
+/// Options for the [UseNamedCaptureGroup](https://biomejs.dev/linter/rules/use-named-capture-group) rule. #[derive(Default, Clone, Debug, Deserialize, Deserializable, Merge, Eq, PartialEq, Serialize)] #[cfg_attr(feature = "schema", derive(schemars::JsonSchema))] #[serde(rename_all = "camelCase", deny_unknown_fields, default)] pub struct UseNamedCaptureGroupOptions {}As per coding guidelines, "Use inline rustdoc documentation for rules, assists, and their options."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/biome_rule_options/src/use_named_capture_group.rs` around lines 3 - 6, Add an inline rustdoc comment above the UseNamedCaptureGroupOptions struct describing its purpose (e.g., "Options for the UseNamedCaptureGroup rule.") and any notes about default/empty options; ensure the comment follows rustdoc style (///) and matches other options structs' phrasing so documentation and tooling pick it up alongside the existing derives and serde attributes on UseNamedCaptureGroupOptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/biome_rule_options/src/use_named_capture_group.rs`:
- Around line 1-6: Add a blank line between the import block (the two use
statements for biome_deserialize_macros and serde) and the struct declaration’s
attribute block so the #[derive(...)]/#[cfg_attr(...)] lines are separated from
the use statements; run the formatter (just f) after making this change to
ensure spacing and lint rules are applied for the UseNamedCaptureGroupOptions
struct.
---
Nitpick comments:
In `@crates/biome_rule_options/src/use_named_capture_group.rs`:
- Around line 3-6: Add an inline rustdoc comment above the
UseNamedCaptureGroupOptions struct describing its purpose (e.g., "Options for
the UseNamedCaptureGroup rule.") and any notes about default/empty options;
ensure the comment follows rustdoc style (///) and matches other options
structs' phrasing so documentation and tooling pick it up alongside the existing
derives and serde attributes on UseNamedCaptureGroupOptions.
|
@ematipico I've resolved the conflicts and updated the PR. |
.changeset/loose-cooks-report.md
Outdated
| Added the nursery rule [`useNamedCaptureGroup`](https://biomejs.dev/linter/rules/use-named-capture-group/). The rule enforces using named capture groups in regular expressions instead | ||
| of numbered ones. It supports both regex literals and `RegExp` constructor calls. |
There was a problem hiding this comment.
Interesting manual line-break, maybe force these breaks after a dot (.)
.changeset/loose-cooks-report.md
Outdated
| --- | ||
| "@biomejs/biome": patch |
There was a problem hiding this comment.
Unexpected white-spaces after every line? Something Claude added?
There was a problem hiding this comment.
Thank you for the suggestions. While using a translation tool during editing, some trailing whitespace was unintentionally introduced. I’ve adjusted the line breaks and removed the extra whitespace. The updates have been pushed.
dyc3
left a comment
There was a problem hiding this comment.
Looks good. Just need to rebase and rerun codegen to deal with the merge conflicts.
| /// | ||
| /// Returns a list of byte offsets (relative to pattern start) for each | ||
| /// unnamed capture group `(` found. | ||
| fn find_unnamed_capture_groups(pattern: &str) -> Vec<u32> { |
There was a problem hiding this comment.
nit: move all helpers below the impl Rule block
658c7c4 to
b017b25
Compare
Summary
Added the nursery rule
useNamedCaptureGroup, which enforces using named capture groups ((?<name>...)) in regular expressions instead of numbered ones ((...)).This rule corresponds to ESLint's
prefer-named-capture-group.Supports:
/(foo)/new RegExp("(foo)")/RegExp("(foo)")constructor callsnew RegExp(pattern)) are safely skippedRegExpidentifiers are correctly ignored via semantic analysisCloses #8737
Test Plan
just test-lintrule useNamedCaptureGroup— all spec tests passnew RegExp(...),RegExp(...)constructor callsRegExpDocs
Documentation is included as rustdoc examples in the rule implementation with
expect_diagnosticannotations.