Skip to content

feat: restructure codeowner_overrides.json for PR auto-labeling#2683

Draft
aleozlx wants to merge 3 commits intoflashinfer-ai:mainfrom
aleozlx:auto-triage
Draft

feat: restructure codeowner_overrides.json for PR auto-labeling#2683
aleozlx wants to merge 3 commits intoflashinfer-ai:mainfrom
aleozlx:auto-triage

Conversation

@aleozlx
Copy link
Collaborator

@aleozlx aleozlx commented Mar 4, 2026

📌 Description

Restructure codeowner_overrides.json from a flat dict to a grouped array where each group has labels (for PR auto-labeling) and owners (path to owner mappings). All existing path→owner mappings are preserved.

Update & simplify codeowner_analyzer.py

Add pr-auto-label.yml workflow that reads the grouped overrides, matches changed files by path prefix, and additively applies labels. Uses pull_request_target for safe write access on external PRs. Skips PRs older than 21 days.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Chores

    • Added automated PR auto-labeling that assigns labels based on changed paths and skips stale PRs.
    • Reorganized codeowner overrides into labeled groups for clearer ownership and improved error messaging during analysis.
  • New Features

    • Test selection now respects PR labels—tests are included or skipped based on label-driven categories, with informative skip messages.

Restructure codeowner_overrides.json from a flat dict to a grouped array
where each group has `labels` (for PR auto-labeling) and `owners` (path
to owner mappings). All existing path→owner mappings are preserved.

Update codeowner_analyzer.py to parse the new format via a standalone
`load_overrides()` function, and simplify CODEOWNERS generation to a
single alphabetically-sorted pass over the union of computed and override
paths. Override owners are prepended on exact matches for a wider owner
list; override-only paths (including directories) are now emitted too.

Add pr-auto-label.yml workflow that reads the grouped overrides, matches
changed files by path prefix, and additively applies labels. Uses
pull_request_target for safe write access on external PRs. Skips PRs
older than 21 days.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the repository's automation by enabling automatic labeling of pull requests. This is achieved by restructuring the codeowner_overrides.json file to group code ownership paths with associated labels, and by updating the codeowner_analyzer.py script to process this new format. A new GitHub Actions workflow has been added to utilize this updated configuration for intelligent PR labeling, streamlining the review process and improving organization.

Highlights

  • codeowner_overrides.json structure: Transformed the codeowner_overrides.json file from a flat dictionary to a grouped array of objects, enabling the association of labels with specific code ownership groups for PR auto-labeling.
  • codeowner_analyzer.py updates: Simplified and updated codeowner_analyzer.py to correctly parse the new codeowner_overrides.json format, extract computed usernames, and merge owners with overrides more efficiently.
  • PR auto-labeling workflow: Introduced a new GitHub Actions workflow (pr-auto-label.yml) that leverages the restructured codeowner_overrides.json to automatically apply labels to pull requests based on changed files.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • scripts/codeowner_analyzer.py
    • Added load_overrides function to parse the new grouped JSON array format for codeowner overrides, including validation.
    • Refactored _merge_owners_with_overrides to simplify the merging logic for manual and computed owners, ensuring overrides are prepended and duplicates are handled.
    • Extracted username extraction logic into a new _extract_computed_usernames method for better modularity.
    • Modified generate_codeowners_file to iterate through all relevant paths (computed and override-only) and format them correctly for the CODEOWNERS file, ensuring proper precedence.
    • Updated the main function to use the new load_overrides function, simplifying the override file loading and validation process.
  • scripts/codeowner_overrides.json
    • Restructured the file from a flat dictionary mapping paths to owners into an array of objects, where each object defines a labels array and an owners dictionary, facilitating PR auto-labeling.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/pr-auto-label.yml
Activity
  • No specific activity (comments, reviews, progress updates) was provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

📝 Walkthrough

Walkthrough

Adds a GitHub Actions workflow to auto-label PRs based on grouped codeowner overrides; restructures overrides JSON and the analyzer to use labeled groups; and makes test selection label-aware so test runs can be filtered by PR labels.

Changes

Cohort / File(s) Summary
PR Auto-Label Workflow
.github/workflows/pr-auto-label.yml
New workflow triggered on pull_request_target that skips PRs older than 21 days, sparse-checkouts scripts/codeowner_overrides.json, fetches changed files, and applies labels from matching override groups via gh pr edit.
Codeowner analyzer & overrides
scripts/codeowner_analyzer.py, scripts/codeowner_overrides.json
Reworks overrides from flat map to grouped entries with labels+owners; adds load_overrides() and helper methods, merges overrides with computed owners when emitting CODEOWNERS, reorders emission, and tightens validation/error messages. (Patch contains a duplicated helper definition.)
Test filtering & runner
scripts/test_utils.sh, scripts/task_run_unit_tests.sh
Adds PR-label-aware test filtering: test_utils.sh parses --labels, initializes label filter from grouped overrides and exposes gh_label_filter; task_run_unit_tests.sh uses gh_label_filter when collecting tests and logs skipped files.

Sequence Diagram(s)

sequenceDiagram
  participant PR as Pull Request
  participant GH as GitHub Actions (pr-auto-label)
  participant Repo as Repository
  participant Overrides as scripts/codeowner_overrides.json
  participant CLI as gh CLI

  PR->>GH: trigger on opened/synchronize
  GH->>GH: check PR age vs 21-day cutoff
  alt PR too old
    GH-->>PR: exit (skip)
  else PR recent
    GH->>Repo: sparse-checkout `scripts/codeowner_overrides.json`
    GH->>CLI: request changed files (GitHub API)
    CLI->>Overrides: load grouped overrides (labels + owners)
    CLI->>CLI: match changed file paths to override prefixes
    CLI->>GH: apply labels via `gh pr edit` for each matched group
    GH-->>PR: labels applied
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

automated

Suggested reviewers

  • yzh119
  • sricketts
  • nvmbreughe
  • kahyunnam

Poem

🐰 I hopped through JSON, labels in hand,

Matched file prefixes across the land.
Tests now listen to tags you send,
Groups and overrides neatly blend.
A tiny rabbit cheers this branch amend.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the primary structural change to codeowner_overrides.json and its use in PR auto-labeling, matching the main objective of the PR.
Description check ✅ Passed The description includes all required sections from the template with meaningful content: a clear description of changes, related issues placeholder, completed pre-commit checks, and notes on testing status.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively restructures codeowner_overrides.json to support PR auto-labeling and refactors codeowner_analyzer.py accordingly. While the code is generally well-structured and cleaner, a security audit identified two critical injection vulnerabilities. The most significant is an injection vulnerability in the CODEOWNERS file generation, allowing arbitrary ownership rule injection via unsanitized paths and usernames with newlines. A second parameter injection vulnerability in the GitHub API lookup logic could enable identity spoofing. Additionally, the validation logic in load_overrides should be improved for robustness against malformed input. Proper sanitization, URL encoding, and enhanced validation are required to address these concerns.

Comment on lines +630 to +631
if final:
f.write(f"{self._format_codeowners_path(path)} {' '.join(final)}\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The generate_codeowners_file function writes paths and usernames directly into the CODEOWNERS file without sanitizing them for newlines. Since these values can originate from untrusted sources (such as git commit history or the overrides.json file which may be part of a pull request), an attacker can inject arbitrary rules into the CODEOWNERS file. For example, a malicious username or path containing a newline followed by * @attacker would grant the attacker ownership over all files in the repository. This is a high-severity injection vulnerability.

Suggested change
if final:
f.write(f"{self._format_codeowners_path(path)} {' '.join(final)}\n")
sanitized_path = self._format_codeowners_path(path).replace("\n", "").replace("\r", "")
sanitized_owners = [u.replace("\n", "").replace("\r", "") for u in final]
if final:
f.write(f"{sanitized_path} {' '.join(sanitized_owners)}\n")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check latest

Comment on lines +34 to +35
if not isinstance(group, dict) or "owners" not in group:
raise ValueError("Each group must be an object with an 'owners' key")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The validation logic can be made more robust. Currently, if group['owners'] is not a dictionary (e.g., null or an array in the JSON), the script will raise an AttributeError on .items(), which is not caught in main. It's better to validate the type of group['owners'] here to provide a clear error message and prevent the script from crashing.

        if not isinstance(group, dict) or not isinstance(group.get("owners"), dict):
            raise ValueError(
                "Each group must be an object with an 'owners' key containing a dictionary"
            )

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check latest

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
scripts/codeowner_analyzer.py (1)

552-567: Consider defensive handling for malformed author strings.

Line 563 assumes the author string contains <email> format. While this should always be true for data from git log, a malformed entry would raise IndexError.

♻️ Optional defensive approach
             gh = self.get_github_username(owner["author"])
-            name = f"@{gh}" if gh else owner["author"].split("<")[1].rstrip(">")
+            if gh:
+                name = f"@{gh}"
+            else:
+                # Fallback to email from author string "Name <email>"
+                parts = owner["author"].split("<")
+                name = parts[1].rstrip(">") if len(parts) > 1 else owner["author"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/codeowner_analyzer.py` around lines 552 - 567, In
_extract_computed_usernames: guard against malformed owner["author"] values
before assuming a "<email>" part to avoid IndexError; update the logic in
_extract_computed_usernames to check whether owner["author"] contains "<" and
">" (or use a regex to extract the email portion) and fall back to a safe
display name (e.g., the whole author string or the result of
get_github_username) when the angle-bracket format is missing or split fails;
ensure the deduplication (seen_lower) still works by normalizing the chosen
fallback name.
.github/workflows/pr-auto-label.yml (2)

21-24: Consider using a neutral exit instead of failing the workflow.

Using exit 1 marks the workflow run as failed in the PR checks, which can be noisy for old PRs. Consider using exit 0 or a GitHub Actions mechanism to skip subsequent steps gracefully.

♻️ Suggested approach using step outputs
       - name: Skip PRs older than 21 days
+        id: age-check
         env:
           PR_CREATED: ${{ github.event.pull_request.created_at }}
         run: |
           created=$(date -d "$PR_CREATED" +%s)
           cutoff=$(date -d '21 days ago' +%s)
           if [ "$created" -lt "$cutoff" ]; then
             echo "PR older than 21 days, skipping"
-            exit 1
+            echo "skip=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "skip=false" >> "$GITHUB_OUTPUT"
           fi
 
       - name: Sparse checkout overrides file from base branch
+        if: steps.age-check.outputs.skip != 'true'
         uses: actions/checkout@v4

Then add the same if condition to the "Apply labels" step.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr-auto-label.yml around lines 21 - 24, The branch that
checks if [ "$created" -lt "$cutoff" ] should not fail the whole workflow;
replace the failing exit (exit 1) with a neutral skip mechanism such as exit 0
or, better, set a step output (e.g., SKIP_LABELS=true) when the condition
matches and keep the step printing "PR older than 21 days, skipping"; then
update the "Apply labels" step to guard on that output (skip when SKIP_LABELS is
true) so the run is not marked failed while still preventing label application.

3-5: Concurrent label mutations on synchronize events.

Both this workflow and pr-label-cleanup.yml trigger on synchronize and call gh pr edit to modify labels. While they operate on different labels (this adds op:* labels; cleanup removes run-ci), concurrent API calls could theoretically conflict. In practice, GitHub's API handles this gracefully, but be aware of this if adding more label operations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr-auto-label.yml around lines 3 - 5, This workflow and
pr-label-cleanup.yml both run on pull_request_target with types: [synchronize]
and both call gh pr edit to mutate labels (this one adds op:* labels; the other
removes run-ci), risking conflicting concurrent edits; to fix, stop overlapping
label writes by either removing "synchronize" from one workflow (so only one
runs on updates), consolidating label logic into a single workflow, or changing
the label mutation step to read the current labels and submit a single atomic
set (fetch labels then use gh pr edit --add/--remove or the REST API to set the
full label list) so edits are deterministic; update the workflow trigger or the
label-edit step accordingly to reference the pull_request_target trigger and the
gh pr edit label operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr-auto-label.yml:
- Around line 42-62: The loops over $labels and $prefixes use word-splitting
which breaks labels with spaces; change iteration to preserve items (e.g., read
labels and prefixes into arrays or pipe the jq arrays into while read -r loops)
and iterate using quoted expansions so each label/prefix is treated as a single
entry (avoid constructs like for label in $labels; use for label in
"${labels[@]}" or while read -r label; do ... done). Update the sections that
populate/iterate variables labels, prefixes and the inner loop over
changed_files (and the gh pr edit --add-label call) to use quoted variables
("$label", "$prefix", "$file") and array-aware iteration to prevent splitting on
spaces.

---

Nitpick comments:
In @.github/workflows/pr-auto-label.yml:
- Around line 21-24: The branch that checks if [ "$created" -lt "$cutoff" ]
should not fail the whole workflow; replace the failing exit (exit 1) with a
neutral skip mechanism such as exit 0 or, better, set a step output (e.g.,
SKIP_LABELS=true) when the condition matches and keep the step printing "PR
older than 21 days, skipping"; then update the "Apply labels" step to guard on
that output (skip when SKIP_LABELS is true) so the run is not marked failed
while still preventing label application.
- Around line 3-5: This workflow and pr-label-cleanup.yml both run on
pull_request_target with types: [synchronize] and both call gh pr edit to mutate
labels (this one adds op:* labels; the other removes run-ci), risking
conflicting concurrent edits; to fix, stop overlapping label writes by either
removing "synchronize" from one workflow (so only one runs on updates),
consolidating label logic into a single workflow, or changing the label mutation
step to read the current labels and submit a single atomic set (fetch labels
then use gh pr edit --add/--remove or the REST API to set the full label list)
so edits are deterministic; update the workflow trigger or the label-edit step
accordingly to reference the pull_request_target trigger and the gh pr edit
label operations.

In `@scripts/codeowner_analyzer.py`:
- Around line 552-567: In _extract_computed_usernames: guard against malformed
owner["author"] values before assuming a "<email>" part to avoid IndexError;
update the logic in _extract_computed_usernames to check whether owner["author"]
contains "<" and ">" (or use a regex to extract the email portion) and fall back
to a safe display name (e.g., the whole author string or the result of
get_github_username) when the angle-bracket format is missing or split fails;
ensure the deduplication (seen_lower) still works by normalizing the chosen
fallback name.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0a3840a-9806-44a2-86e5-7412092c95d3

📥 Commits

Reviewing files that changed from the base of the PR and between cb593c8 and b3686a0.

📒 Files selected for processing (3)
  • .github/workflows/pr-auto-label.yml
  • scripts/codeowner_analyzer.py
  • scripts/codeowner_overrides.json

aleozlx and others added 2 commits March 3, 2026 19:10
Add --labels option to test_utils.sh parse_args() accepting
comma-separated PR labels (e.g., "op: attention,op: gemm").

Implement gh_label_filter() which reads codeowner_overrides.json to
determine which test prefixes are categorized under which labels.
Four-step logic: (1) no labels → run all, (2) label match → run,
(3) categorized but no match → skip, (4) uncategorized → always run.

Wire the filter into task_run_unit_tests.sh find_test_files() so
discovered test files are filtered before execution.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Sanitize paths and usernames against newline injection in CODEOWNERS
  output (high severity)
- Validate group['owners'] is a dict, not just present (medium)
- Use readarray + quoted array expansion in pr-auto-label.yml to handle
  labels with spaces (e.g., "op: attention")

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/test_utils.sh`:
- Around line 120-121: The prefix check using [[ "$test_file" == "$prefix"* ]]
is unsafe because it matches sibling names; update the logic in the
functions/blocks that use variables test_file and prefix (the checks at the
shown occurrences) to only match when test_file is exactly equal to prefix or
when test_file starts with prefix plus a path separator (i.e., treat prefix as a
directory boundary), and apply the same change to the second occurrence around
lines 128-130 so sibling files like tests/comm_test.py do not match a prefix
intended for tests/comm.
- Around line 54-57: The --labels case in the argument-parsing switch assigns
LABELS and shifts without validation; update the --labels branch to first verify
that a next positional argument exists and does not start with a hyphen (so it
isn't another flag), and if validation fails print a useful error/usage message
and exit non-zero; only after successful validation perform the shift and set
LABELS="$1" (keep the switch label --labels and variable name LABELS so it's
easy to find).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62225627-8d28-43a4-aaeb-fd1166454f84

📥 Commits

Reviewing files that changed from the base of the PR and between b3686a0 and b22fa5f.

📒 Files selected for processing (2)
  • scripts/task_run_unit_tests.sh
  • scripts/test_utils.sh

Comment on lines +54 to +57
--labels)
shift
LABELS="$1"
;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat scripts/test_utils.sh | head -100 | tail -50

Repository: flashinfer-ai/flashinfer

Length of output: 1716


🏁 Script executed:

#!/bin/bash
set -euo pipefail

python3 <<'PY'
from pathlib import Path

# Read the file
content = Path("scripts/test_utils.sh").read_text()
lines = content.splitlines()

# Find the --labels block
for i, line in enumerate(lines):
    if "--labels)" in line:
        start_line = i + 1  # 1-indexed
        # Find the ;; that closes this case
        for j in range(i, len(lines)):
            if ";;" in lines[j]:
                end_line = j + 1  # 1-indexed
                break
        
        print("Found --labels block at lines", start_line, "-", end_line)
        print("\nCode:")
        for n in range(start_line - 1, end_line):
            print(f"{n+1}: {lines[n]}")
        
        # Check for guards
        block_text = "\n".join(lines[start_line-1:end_line])
        has_guard = ("-lt 2" in block_text or 
                     "requires a" in block_text or
                     "shift 2" in block_text or
                     "[ $# -" in block_text or
                     "[[ " in block_text)
        
        print(f"\nGuard/validation present: {has_guard}")
        break
PY

Repository: flashinfer-ai/flashinfer

Length of output: 252


🏁 Script executed:

# Verify the loop structure to confirm double-shift pattern
sed -n '50,65p' scripts/test_utils.sh

Repository: flashinfer-ai/flashinfer

Length of output: 446


Validate --labels has a value before shifting.

Lines 55–57 shift and assign without checking if a value exists. When --labels is the last argument, the second shift at line 60 fails. When --labels precedes another flag (e.g., --labels --threads), that flag is incorrectly assigned to LABELS.

Suggested fix
             --labels)
+                if [ $# -lt 2 ] || [[ "$2" == --* ]]; then
+                    echo "ERROR: --labels requires a comma-separated value" >&2
+                    exit 2
+                fi
                 shift
                 LABELS="$1"
                 ;;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
--labels)
shift
LABELS="$1"
;;
--labels)
if [ $# -lt 2 ] || [[ "$2" == --* ]]; then
echo "ERROR: --labels requires a comma-separated value" >&2
exit 2
fi
shift
LABELS="$1"
;;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test_utils.sh` around lines 54 - 57, The --labels case in the
argument-parsing switch assigns LABELS and shifts without validation; update the
--labels branch to first verify that a next positional argument exists and does
not start with a hyphen (so it isn't another flag), and if validation fails
print a useful error/usage message and exit non-zero; only after successful
validation perform the shift and set LABELS="$1" (keep the switch label --labels
and variable name LABELS so it's easy to find).

Comment on lines +120 to +121
if [[ "$test_file" == "$prefix"* ]]; then
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the test_utils.sh file
find . -name "test_utils.sh" -type f

Repository: flashinfer-ai/flashinfer

Length of output: 91


🏁 Script executed:

# Let's examine the file size first, then read the relevant sections
wc -l ./scripts/test_utils.sh
echo "---"
# Read lines around 120-121 and 128-130
sed -n '115,135p' ./scripts/test_utils.sh

Repository: flashinfer-ai/flashinfer

Length of output: 795


🏁 Script executed:

# Let's run the validation script from the review comment to verify the issue
cd ./scripts && python - <<'PY'
import json, pathlib

overrides = json.loads(pathlib.Path("codeowner_overrides.json").read_text())
prefixes = sorted({
    p.rstrip("/")
    for group in overrides
    for p in group.get("owners", {}).keys()
    if p.startswith("tests/")
})

files = [str(p).replace("\\", "/") for p in pathlib.Path("../tests").rglob("test_*.py")]

ambiguous = False
for prefix in prefixes:
    loose = [f for f in files if f.startswith(prefix)]
    strict = [f for f in files if f == prefix or f.startswith(prefix + "/")]
    extras = sorted(set(loose) - set(strict))
    if extras:
        ambiguous = True
        print(f"[ambiguous] {prefix}")
        for f in extras[:10]:
            print("  ", f)

if not ambiguous:
    print("No ambiguous matches found with current test files.")
PY

Repository: flashinfer-ai/flashinfer

Length of output: 119


🏁 Script executed:

# Let's check the codeowner_overrides.json to see what prefixes exist
cd ./scripts && cat codeowner_overrides.json | python -m json.tool | head -100

Repository: flashinfer-ai/flashinfer

Length of output: 2176


Use boundary-safe prefix checks for test paths.

Lines 120 and 128 use loose prefix matching with "$prefix"*, which can misclassify sibling files (e.g., tests/comm_test.py would incorrectly match the prefix tests/comm intended only for the directory). The current test files don't exhibit this issue, but the logic is vulnerable.

💡 Suggested fix
-        if [[ "$test_file" == "$prefix"* ]]; then
+        if [[ "$test_file" == "$prefix" || "$test_file" == "$prefix/"* ]]; then
             return 0
         fi
@@
-        if [[ "$test_file" == "$prefix"* ]]; then
+        if [[ "$test_file" == "$prefix" || "$test_file" == "$prefix/"* ]]; then
             echo "no matching label (have: $LABELS)"
             return 1
         fi

Also applies to: 128-130

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test_utils.sh` around lines 120 - 121, The prefix check using [[
"$test_file" == "$prefix"* ]] is unsafe because it matches sibling names; update
the logic in the functions/blocks that use variables test_file and prefix (the
checks at the shown occurrences) to only match when test_file is exactly equal
to prefix or when test_file starts with prefix plus a path separator (i.e.,
treat prefix as a directory boundary), and apply the same change to the second
occurrence around lines 128-130 so sibling files like tests/comm_test.py do not
match a prefix intended for tests/comm.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/codeowner_analyzer.py`:
- Around line 33-42: When building the overrides dict in
scripts/codeowner_analyzer.py (inside the for group in raw loop that iterates
group["owners"]), detect if a normalized path key already exists in overrides
and do not silently overwrite it; either raise a ValueError identifying the
duplicate path (including the path string) or merge the owner lists with
de-duplication so owners from both groups are preserved. Update the logic around
overrides[path.lstrip("./").rstrip("/")] = users to check
overrides.get(path_key) first and handle duplicates accordingly, referencing the
overrides variable and the group["owners"] iteration to locate the change.
- Around line 595-598: _format_codeowners_path currently appends "/" based on
_is_file_path which misclassifies extensionless filenames (e.g., Dockerfile,
LICENSE) as directories; change the logic so that _format_codeowners_path (or
_is_file_path) treats a path as a file if the basename contains a dot OR the
basename matches a small whitelist of common extensionless filenames (e.g.,
"Dockerfile","LICENSE","README","Makefile") and also respect an explicit
trailing "/" (i.e., if path.endswith("/") treat as directory), then return path
or f"{path}/" accordingly; update the function that calls or implements
_is_file_path (_format_codeowners_path) to use this combined rule.
- Line 41: The path normalization currently uses path.lstrip("./").rstrip("/")
which strips any leading dots or slashes and corrupts dot-prefixed paths; change
the normalization to only remove a leading "./" when present (e.g. use
path.removeprefix("./").rstrip("/") or an equivalent startswith check) so that
entries like ".github/workflows" keep their leading dot; update the assignment
that sets overrides[...] = users accordingly (look for the overrides[...]
assignment using the path variable).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3696939f-5eef-48f2-9828-e6d6745176a9

📥 Commits

Reviewing files that changed from the base of the PR and between b22fa5f and 1489253.

📒 Files selected for processing (2)
  • .github/workflows/pr-auto-label.yml
  • scripts/codeowner_analyzer.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/pr-auto-label.yml

Comment on lines +33 to +42
for group in raw:
if not isinstance(group, dict) or not isinstance(group.get("owners"), dict):
raise ValueError("Each group must be an object with an 'owners' dict")
for path, users in group["owners"].items():
if not isinstance(users, list) or not all(
isinstance(u, str) for u in users
):
raise ValueError(f"Override for '{path}' must be a list of strings")
overrides[path.lstrip("./").rstrip("/")] = users

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Duplicate paths across groups are silently overwritten.

If the same path appears in multiple groups, the later group replaces earlier owners without warning. That can drop intended owners.

🔧 Proposed fix
-            overrides[normalized_path] = users
+            if normalized_path in overrides and overrides[normalized_path] != users:
+                raise ValueError(
+                    f"Duplicate override path with conflicting owners: '{normalized_path}'"
+                )
+            overrides[normalized_path] = users
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 35-35: Prefer TypeError exception for invalid type

(TRY004)


[warning] 35-35: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 40-40: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/codeowner_analyzer.py` around lines 33 - 42, When building the
overrides dict in scripts/codeowner_analyzer.py (inside the for group in raw
loop that iterates group["owners"]), detect if a normalized path key already
exists in overrides and do not silently overwrite it; either raise a ValueError
identifying the duplicate path (including the path string) or merge the owner
lists with de-duplication so owners from both groups are preserved. Update the
logic around overrides[path.lstrip("./").rstrip("/")] = users to check
overrides.get(path_key) first and handle duplicates accordingly, referencing the
overrides variable and the group["owners"] iteration to locate the change.

isinstance(u, str) for u in users
):
raise ValueError(f"Override for '{path}' must be a list of strings")
overrides[path.lstrip("./").rstrip("/")] = users
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, explore the file structure and read the relevant section
head -60 scripts/codeowner_analyzer.py | tail -30

Repository: flashinfer-ai/flashinfer

Length of output: 1082


🏁 Script executed:

#!/bin/bash
# Get full context around line 41
sed -n '30,55p' scripts/codeowner_analyzer.py

Repository: flashinfer-ai/flashinfer

Length of output: 1022


🏁 Script executed:

#!/bin/bash
# Check how overrides dict is used in the codebase
rg -n "overrides" scripts/codeowner_analyzer.py -A 2 -B 2

Repository: flashinfer-ai/flashinfer

Length of output: 4981


🏁 Script executed:

#!/bin/bash
# Verify the lstrip behavior with actual test cases
python - <<'PY'
# Demonstrate lstrip behavior with dot-prefixed paths
test_cases = [
    ".github/workflows",
    "./src/module", 
    "/docs",
    "/.hidden",
    ".hidden/file",
    "normal/path"
]

print("Testing lstrip('./'):")
for path in test_cases:
    result = path.lstrip("./")
    print(f"  {path!r:25} -> {result!r}")

print("\nTesting strip() then manual prefix removal:")
for path in test_cases:
    normalized = path.strip()
    if normalized.startswith("./"):
        normalized = normalized[2:]
    elif normalized.startswith("/"):
        normalized = normalized[1:]
    normalized = normalized.rstrip("/")
    print(f"  {path!r:25} -> {result!r}")
PY

Repository: flashinfer-ai/flashinfer

Length of output: 666


lstrip("./") corrupts dot-prefixed override paths, breaking override matching.

Line 41 uses lstrip("./") which removes any combination of "." and "/" characters from the left side, not just the "./" prefix. This transforms valid paths like .github/workflows into github/workflows, causing override lookups to fail silently. Dot-prefixed paths (.github, .hidden, etc.) will have their manual overrides ignored in the CODEOWNERS file.

🔧 Proposed fix
-            overrides[path.lstrip("./").rstrip("/")] = users
+            normalized_path = path.strip()
+            if normalized_path.startswith("./"):
+                normalized_path = normalized_path[2:]
+            elif normalized_path.startswith("/"):
+                normalized_path = normalized_path[1:]
+            normalized_path = normalized_path.rstrip("/")
+            if not normalized_path:
+                raise ValueError("Override path cannot be empty")
+            overrides[normalized_path] = users
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/codeowner_analyzer.py` at line 41, The path normalization currently
uses path.lstrip("./").rstrip("/") which strips any leading dots or slashes and
corrupts dot-prefixed paths; change the normalization to only remove a leading
"./" when present (e.g. use path.removeprefix("./").rstrip("/") or an equivalent
startswith check) so that entries like ".github/workflows" keep their leading
dot; update the assignment that sets overrides[...] = users accordingly (look
for the overrides[...] assignment using the path variable).

Comment on lines +595 to +598
def _format_codeowners_path(self, path: str) -> str:
"""Format a path for CODEOWNERS (append / for directories)."""
return path if self._is_file_path(path) else f"{path}/"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Extensionless files are misclassified as directories.

_is_file_path relies on a dot in basename, so valid files like Dockerfile/LICENSE get a trailing /, generating wrong CODEOWNERS patterns.

🔧 Proposed fix
     def _format_codeowners_path(self, path: str) -> str:
-        """Format a path for CODEOWNERS (append / for directories)."""
-        return path if self._is_file_path(path) else f"{path}/"
+        """Format a path for CODEOWNERS (append / only for real directories)."""
+        normalized = path.rstrip("/")
+        abs_path = self.repo_path / normalized
+        if abs_path.is_dir():
+            return f"{normalized}/"
+        return normalized
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _format_codeowners_path(self, path: str) -> str:
"""Format a path for CODEOWNERS (append / for directories)."""
return path if self._is_file_path(path) else f"{path}/"
def _format_codeowners_path(self, path: str) -> str:
"""Format a path for CODEOWNERS (append / only for real directories)."""
normalized = path.rstrip("/")
abs_path = self.repo_path / normalized
if abs_path.is_dir():
return f"{normalized}/"
return normalized
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/codeowner_analyzer.py` around lines 595 - 598,
_format_codeowners_path currently appends "/" based on _is_file_path which
misclassifies extensionless filenames (e.g., Dockerfile, LICENSE) as
directories; change the logic so that _format_codeowners_path (or _is_file_path)
treats a path as a file if the basename contains a dot OR the basename matches a
small whitelist of common extensionless filenames (e.g.,
"Dockerfile","LICENSE","README","Makefile") and also respect an explicit
trailing "/" (i.e., if path.endswith("/") treat as directory), then return path
or f"{path}/" accordingly; update the function that calls or implements
_is_file_path (_format_codeowners_path) to use this combined rule.

@yzh119
Copy link
Collaborator

yzh119 commented Mar 4, 2026

@aleozlx seems recent PRs have marked all org members as reviewers, is it expected? Could this PR fix the issue?

@aleozlx
Copy link
Collaborator Author

aleozlx commented Mar 4, 2026

@aleozlx seems recent PRs have marked all org members as reviewers, is it expected? Could this PR fix the issue?

hi i think i saw some but not every new PR. i clicked into a few recent ones and seemed reasonable
but i'd like to understand more. are there some specific PRs? i can take a look what might have caused it

@bkryu
Copy link
Collaborator

bkryu commented Mar 4, 2026

@aleozlx seems recent PRs have marked all org members as reviewers, is it expected? Could this PR fix the issue?

hi i think i saw some but not every new PR. i clicked into a few recent ones and seemed reasonable but i'd like to understand more. are there some specific PRs? i can take a look what might have caused it

It appears that ‎.github/workflows/pr-auto-label.yml‎ is detected as a file owned by all org members. Maybe that's why?

Screenshot 2026-03-04 at 3 33 10 PM

@aleozlx
Copy link
Collaborator Author

aleozlx commented Mar 4, 2026

@bkryu found out this line may have caused the issue
f521fe1#diff-3d36a1bf06148bc6ba1ce2ed3d19de32ea708d955fed212c0d27c536f0bd4da7R7

(right hand side line 7)

i can investigate what caused that line in codeowner_analyzer.py figure out a fix if it's not expected, @yzh119

@aleozlx aleozlx marked this pull request as draft March 18, 2026 21:35
@aleozlx
Copy link
Collaborator Author

aleozlx commented Mar 18, 2026

will revisit this for targeted testing after i impl the #2815. and we can do label based targeted testing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants