feat: restructure codeowner_overrides.json for PR auto-labeling by aleozlx · Pull Request #2683 · flashinfer-ai/flashinfer

aleozlx · 2026-03-04T02:47:31Z

📌 Description

Restructure codeowner_overrides.json from a flat dict to a grouped array where each group has labels (for PR auto-labeling) and owners (path to owner mappings). All existing path→owner mappings are preserved.

Update & simplify codeowner_analyzer.py

Add pr-auto-label.yml workflow that reads the grouped overrides, matches changed files by path prefix, and additively applies labels. Uses pull_request_target for safe write access on external PRs. Skips PRs older than 21 days.

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Chores
- Added automated PR auto-labeling that assigns labels based on changed paths and skips stale PRs.
- Reorganized codeowner overrides into labeled groups for clearer ownership and improved error messaging during analysis.
New Features
- Test selection now respects PR labels—tests are included or skipped based on label-driven categories, with informative skip messages.

Restructure codeowner_overrides.json from a flat dict to a grouped array where each group has `labels` (for PR auto-labeling) and `owners` (path to owner mappings). All existing path→owner mappings are preserved. Update codeowner_analyzer.py to parse the new format via a standalone `load_overrides()` function, and simplify CODEOWNERS generation to a single alphabetically-sorted pass over the union of computed and override paths. Override owners are prepended on exact matches for a wider owner list; override-only paths (including directories) are now emitted too. Add pr-auto-label.yml workflow that reads the grouped overrides, matches changed files by path prefix, and additively applies labels. Uses pull_request_target for safe write access on external PRs. Skips PRs older than 21 days. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-03-04T02:47:48Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the repository's automation by enabling automatic labeling of pull requests. This is achieved by restructuring the codeowner_overrides.json file to group code ownership paths with associated labels, and by updating the codeowner_analyzer.py script to process this new format. A new GitHub Actions workflow has been added to utilize this updated configuration for intelligent PR labeling, streamlining the review process and improving organization.

Highlights

codeowner_overrides.json structure: Transformed the codeowner_overrides.json file from a flat dictionary to a grouped array of objects, enabling the association of labels with specific code ownership groups for PR auto-labeling.
codeowner_analyzer.py updates: Simplified and updated codeowner_analyzer.py to correctly parse the new codeowner_overrides.json format, extract computed usernames, and merge owners with overrides more efficiently.
PR auto-labeling workflow: Introduced a new GitHub Actions workflow (pr-auto-label.yml) that leverages the restructured codeowner_overrides.json to automatically apply labels to pull requests based on changed files.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

scripts/codeowner_analyzer.py
- Added load_overrides function to parse the new grouped JSON array format for codeowner overrides, including validation.
- Refactored _merge_owners_with_overrides to simplify the merging logic for manual and computed owners, ensuring overrides are prepended and duplicates are handled.
- Extracted username extraction logic into a new _extract_computed_usernames method for better modularity.
- Modified generate_codeowners_file to iterate through all relevant paths (computed and override-only) and format them correctly for the CODEOWNERS file, ensuring proper precedence.
- Updated the main function to use the new load_overrides function, simplifying the override file loading and validation process.
scripts/codeowner_overrides.json
- Restructured the file from a flat dictionary mapping paths to owners into an array of objects, where each object defines a labels array and an owners dictionary, facilitating PR auto-labeling.

Ignored Files

Ignored by pattern: .github/workflows/** (1)
- .github/workflows/pr-auto-label.yml

Activity

No specific activity (comments, reviews, progress updates) was provided in the context for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-04T02:47:49Z

📝 Walkthrough

Walkthrough

Adds a GitHub Actions workflow to auto-label PRs based on grouped codeowner overrides; restructures overrides JSON and the analyzer to use labeled groups; and makes test selection label-aware so test runs can be filtered by PR labels.

Changes

Cohort / File(s)	Summary
PR Auto-Label Workflow `.github/workflows/pr-auto-label.yml`	New workflow triggered on `pull_request_target` that skips PRs older than 21 days, sparse-checkouts `scripts/codeowner_overrides.json`, fetches changed files, and applies labels from matching override groups via `gh pr edit`.
Codeowner analyzer & overrides `scripts/codeowner_analyzer.py`, `scripts/codeowner_overrides.json`	Reworks overrides from flat map to grouped entries with `labels`+`owners`; adds `load_overrides()` and helper methods, merges overrides with computed owners when emitting CODEOWNERS, reorders emission, and tightens validation/error messages. (Patch contains a duplicated helper definition.)
Test filtering & runner `scripts/test_utils.sh`, `scripts/task_run_unit_tests.sh`	Adds PR-label-aware test filtering: `test_utils.sh` parses `--labels`, initializes label filter from grouped overrides and exposes `gh_label_filter`; `task_run_unit_tests.sh` uses `gh_label_filter` when collecting tests and logs skipped files.

Sequence Diagram(s)

sequenceDiagram
  participant PR as Pull Request
  participant GH as GitHub Actions (pr-auto-label)
  participant Repo as Repository
  participant Overrides as scripts/codeowner_overrides.json
  participant CLI as gh CLI

  PR->>GH: trigger on opened/synchronize
  GH->>GH: check PR age vs 21-day cutoff
  alt PR too old
    GH-->>PR: exit (skip)
  else PR recent
    GH->>Repo: sparse-checkout `scripts/codeowner_overrides.json`
    GH->>CLI: request changed files (GitHub API)
    CLI->>Overrides: load grouped overrides (labels + owners)
    CLI->>CLI: match changed file paths to override prefixes
    CLI->>GH: apply labels via `gh pr edit` for each matched group
    GH-->>PR: labels applied
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: Update CODEOWNERS #2152: updates CODEOWNERS generation and relates to analyzer/overrides changes.
chore: Update CODEOWNERS #2286: consumes/depends on scripts/codeowner_overrides.json format for CODEOWNERS output.
infra: add manual code owner override support in codeowner_analyzer.py #2418: modifies scripts/codeowner_analyzer.py override handling and merging logic.

Suggested labels

automated

Suggested reviewers

yzh119
sricketts
nvmbreughe
kahyunnam

Poem

🐰 I hopped through JSON, labels in hand,

Matched file prefixes across the land.
Tests now listen to tags you send,
Groups and overrides neatly blend.
A tiny rabbit cheers this branch amend.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the primary structural change to codeowner_overrides.json and its use in PR auto-labeling, matching the main objective of the PR.
Description check	✅ Passed	The description includes all required sections from the template with meaningful content: a clear description of changes, related issues placeholder, completed pre-commit checks, and notes on testing status.
Docstring Coverage	✅ Passed	Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request effectively restructures codeowner_overrides.json to support PR auto-labeling and refactors codeowner_analyzer.py accordingly. While the code is generally well-structured and cleaner, a security audit identified two critical injection vulnerabilities. The most significant is an injection vulnerability in the CODEOWNERS file generation, allowing arbitrary ownership rule injection via unsanitized paths and usernames with newlines. A second parameter injection vulnerability in the GitHub API lookup logic could enable identity spoofing. Additionally, the validation logic in load_overrides should be improved for robustness against malformed input. Proper sanitization, URL encoding, and enhanced validation are required to address these concerns.

gemini-code-assist · 2026-03-04T02:50:00Z

scripts/codeowner_analyzer.py

+                if final:
+                    f.write(f"{self._format_codeowners_path(path)} {' '.join(final)}\n")


The generate_codeowners_file function writes paths and usernames directly into the CODEOWNERS file without sanitizing them for newlines. Since these values can originate from untrusted sources (such as git commit history or the overrides.json file which may be part of a pull request), an attacker can inject arbitrary rules into the CODEOWNERS file. For example, a malicious username or path containing a newline followed by * @attacker would grant the attacker ownership over all files in the repository. This is a high-severity injection vulnerability.

Suggested change

if final:

f.write(f"{self._format_codeowners_path(path)} {' '.join(final)}\n")

sanitized_path = self._format_codeowners_path(path).replace("\n", "").replace("\r", "")

sanitized_owners = [u.replace("\n", "").replace("\r", "") for u in final]

if final:

f.write(f"{sanitized_path} {' '.join(sanitized_owners)}\n")

check latest

gemini-code-assist · 2026-03-04T02:50:00Z

scripts/codeowner_analyzer.py

+        if not isinstance(group, dict) or "owners" not in group:
+            raise ValueError("Each group must be an object with an 'owners' key")


The validation logic can be made more robust. Currently, if group['owners'] is not a dictionary (e.g., null or an array in the JSON), the script will raise an AttributeError on .items(), which is not caught in main. It's better to validate the type of group['owners'] here to provide a clear error message and prevent the script from crashing.

if not isinstance(group, dict) or not isinstance(group.get("owners"), dict): raise ValueError( "Each group must be an object with an 'owners' key containing a dictionary" )

check latest

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

scripts/codeowner_analyzer.py (1)

552-567: Consider defensive handling for malformed author strings.

Line 563 assumes the author string contains <email> format. While this should always be true for data from git log, a malformed entry would raise IndexError.

♻️ Optional defensive approach

             gh = self.get_github_username(owner["author"])
-            name = f"@{gh}" if gh else owner["author"].split("<")[1].rstrip(">")
+            if gh:
+                name = f"@{gh}"
+            else:
+                # Fallback to email from author string "Name <email>"
+                parts = owner["author"].split("<")
+                name = parts[1].rstrip(">") if len(parts) > 1 else owner["author"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/codeowner_analyzer.py` around lines 552 - 567, In
_extract_computed_usernames: guard against malformed owner["author"] values
before assuming a "<email>" part to avoid IndexError; update the logic in
_extract_computed_usernames to check whether owner["author"] contains "<" and
">" (or use a regex to extract the email portion) and fall back to a safe
display name (e.g., the whole author string or the result of
get_github_username) when the angle-bracket format is missing or split fails;
ensure the deduplication (seen_lower) still works by normalizing the chosen
fallback name.

.github/workflows/pr-auto-label.yml (2)

21-24: Consider using a neutral exit instead of failing the workflow.

Using exit 1 marks the workflow run as failed in the PR checks, which can be noisy for old PRs. Consider using exit 0 or a GitHub Actions mechanism to skip subsequent steps gracefully.

♻️ Suggested approach using step outputs

       - name: Skip PRs older than 21 days
+        id: age-check
         env:
           PR_CREATED: ${{ github.event.pull_request.created_at }}
         run: |
           created=$(date -d "$PR_CREATED" +%s)
           cutoff=$(date -d '21 days ago' +%s)
           if [ "$created" -lt "$cutoff" ]; then
             echo "PR older than 21 days, skipping"
-            exit 1
+            echo "skip=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "skip=false" >> "$GITHUB_OUTPUT"
           fi
 
       - name: Sparse checkout overrides file from base branch
+        if: steps.age-check.outputs.skip != 'true'
         uses: actions/checkout@v4

Then add the same if condition to the "Apply labels" step.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr-auto-label.yml around lines 21 - 24, The branch that
checks if [ "$created" -lt "$cutoff" ] should not fail the whole workflow;
replace the failing exit (exit 1) with a neutral skip mechanism such as exit 0
or, better, set a step output (e.g., SKIP_LABELS=true) when the condition
matches and keep the step printing "PR older than 21 days, skipping"; then
update the "Apply labels" step to guard on that output (skip when SKIP_LABELS is
true) so the run is not marked failed while still preventing label application.

3-5: Concurrent label mutations on synchronize events.

Both this workflow and pr-label-cleanup.yml trigger on synchronize and call gh pr edit to modify labels. While they operate on different labels (this adds op:* labels; cleanup removes run-ci), concurrent API calls could theoretically conflict. In practice, GitHub's API handles this gracefully, but be aware of this if adding more label operations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/pr-auto-label.yml around lines 3 - 5, This workflow and
pr-label-cleanup.yml both run on pull_request_target with types: [synchronize]
and both call gh pr edit to mutate labels (this one adds op:* labels; the other
removes run-ci), risking conflicting concurrent edits; to fix, stop overlapping
label writes by either removing "synchronize" from one workflow (so only one
runs on updates), consolidating label logic into a single workflow, or changing
the label mutation step to read the current labels and submit a single atomic
set (fetch labels then use gh pr edit --add/--remove or the REST API to set the
full label list) so edits are deterministic; update the workflow trigger or the
label-edit step accordingly to reference the pull_request_target trigger and the
gh pr edit label operations.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/pr-auto-label.yml:
- Around line 42-62: The loops over $labels and $prefixes use word-splitting
which breaks labels with spaces; change iteration to preserve items (e.g., read
labels and prefixes into arrays or pipe the jq arrays into while read -r loops)
and iterate using quoted expansions so each label/prefix is treated as a single
entry (avoid constructs like for label in $labels; use for label in
"${labels[@]}" or while read -r label; do ... done). Update the sections that
populate/iterate variables labels, prefixes and the inner loop over
changed_files (and the gh pr edit --add-label call) to use quoted variables
("$label", "$prefix", "$file") and array-aware iteration to prevent splitting on
spaces.

---

Nitpick comments:
In @.github/workflows/pr-auto-label.yml:
- Around line 21-24: The branch that checks if [ "$created" -lt "$cutoff" ]
should not fail the whole workflow; replace the failing exit (exit 1) with a
neutral skip mechanism such as exit 0 or, better, set a step output (e.g.,
SKIP_LABELS=true) when the condition matches and keep the step printing "PR
older than 21 days, skipping"; then update the "Apply labels" step to guard on
that output (skip when SKIP_LABELS is true) so the run is not marked failed
while still preventing label application.
- Around line 3-5: This workflow and pr-label-cleanup.yml both run on
pull_request_target with types: [synchronize] and both call gh pr edit to mutate
labels (this one adds op:* labels; the other removes run-ci), risking
conflicting concurrent edits; to fix, stop overlapping label writes by either
removing "synchronize" from one workflow (so only one runs on updates),
consolidating label logic into a single workflow, or changing the label mutation
step to read the current labels and submit a single atomic set (fetch labels
then use gh pr edit --add/--remove or the REST API to set the full label list)
so edits are deterministic; update the workflow trigger or the label-edit step
accordingly to reference the pull_request_target trigger and the gh pr edit
label operations.

In `@scripts/codeowner_analyzer.py`:
- Around line 552-567: In _extract_computed_usernames: guard against malformed
owner["author"] values before assuming a "<email>" part to avoid IndexError;
update the logic in _extract_computed_usernames to check whether owner["author"]
contains "<" and ">" (or use a regex to extract the email portion) and fall back
to a safe display name (e.g., the whole author string or the result of
get_github_username) when the angle-bracket format is missing or split fails;
ensure the deduplication (seen_lower) still works by normalizing the chosen
fallback name.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a0a3840a-9806-44a2-86e5-7412092c95d3

📥 Commits

Reviewing files that changed from the base of the PR and between cb593c8 and b3686a0.

📒 Files selected for processing (3)

.github/workflows/pr-auto-label.yml
scripts/codeowner_analyzer.py
scripts/codeowner_overrides.json

.github/workflows/pr-auto-label.yml

Add --labels option to test_utils.sh parse_args() accepting comma-separated PR labels (e.g., "op: attention,op: gemm"). Implement gh_label_filter() which reads codeowner_overrides.json to determine which test prefixes are categorized under which labels. Four-step logic: (1) no labels → run all, (2) label match → run, (3) categorized but no match → skip, (4) uncategorized → always run. Wire the filter into task_run_unit_tests.sh find_test_files() so discovered test files are filtered before execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Sanitize paths and usernames against newline injection in CODEOWNERS output (high severity) - Validate group['owners'] is a dict, not just present (medium) - Use readarray + quoted array expansion in pr-auto-label.yml to handle labels with spaces (e.g., "op: attention") Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/test_utils.sh`:
- Around line 120-121: The prefix check using [[ "$test_file" == "$prefix"* ]]
is unsafe because it matches sibling names; update the logic in the
functions/blocks that use variables test_file and prefix (the checks at the
shown occurrences) to only match when test_file is exactly equal to prefix or
when test_file starts with prefix plus a path separator (i.e., treat prefix as a
directory boundary), and apply the same change to the second occurrence around
lines 128-130 so sibling files like tests/comm_test.py do not match a prefix
intended for tests/comm.
- Around line 54-57: The --labels case in the argument-parsing switch assigns
LABELS and shifts without validation; update the --labels branch to first verify
that a next positional argument exists and does not start with a hyphen (so it
isn't another flag), and if validation fails print a useful error/usage message
and exit non-zero; only after successful validation perform the shift and set
LABELS="$1" (keep the switch label --labels and variable name LABELS so it's
easy to find).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 62225627-8d28-43a4-aaeb-fd1166454f84

📥 Commits

Reviewing files that changed from the base of the PR and between b3686a0 and b22fa5f.

📒 Files selected for processing (2)

scripts/task_run_unit_tests.sh
scripts/test_utils.sh

coderabbitai · 2026-03-04T03:19:04Z

scripts/test_utils.sh

+            --labels)
+                shift
+                LABELS="$1"
+                ;;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat scripts/test_utils.sh | head -100 | tail -50

Repository: flashinfer-ai/flashinfer

Length of output: 1716

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 <<'PY' from pathlib import Path # Read the file content = Path("scripts/test_utils.sh").read_text() lines = content.splitlines() # Find the --labels block for i, line in enumerate(lines): if "--labels)" in line: start_line = i + 1 # 1-indexed # Find the ;; that closes this case for j in range(i, len(lines)): if ";;" in lines[j]: end_line = j + 1 # 1-indexed break print("Found --labels block at lines", start_line, "-", end_line) print("\nCode:") for n in range(start_line - 1, end_line): print(f"{n+1}: {lines[n]}") # Check for guards block_text = "\n".join(lines[start_line-1:end_line]) has_guard = ("-lt 2" in block_text or "requires a" in block_text or "shift 2" in block_text or "[ $# -" in block_text or "[[ " in block_text) print(f"\nGuard/validation present: {has_guard}") break PY

Repository: flashinfer-ai/flashinfer

Length of output: 252

🏁 Script executed:

# Verify the loop structure to confirm double-shift pattern sed -n '50,65p' scripts/test_utils.sh

Repository: flashinfer-ai/flashinfer

Length of output: 446

Validate --labels has a value before shifting.

Lines 55–57 shift and assign without checking if a value exists. When --labels is the last argument, the second shift at line 60 fails. When --labels precedes another flag (e.g., --labels --threads), that flag is incorrectly assigned to LABELS.

Suggested fix

--labels) + if [ $# -lt 2 ] || [[ "$2" == --* ]]; then + echo "ERROR: --labels requires a comma-separated value" >&2 + exit 2 + fi shift LABELS="$1" ;;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

--labels)

shift

LABELS="$1"

;;

--labels)

if [ $# -lt 2 ] || [[ "$2" == --* ]]; then

echo "ERROR: --labels requires a comma-separated value" >&2

exit 2

fi

shift

LABELS="$1"

;;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/test_utils.sh` around lines 54 - 57, The --labels case in the argument-parsing switch assigns LABELS and shifts without validation; update the --labels branch to first verify that a next positional argument exists and does not start with a hyphen (so it isn't another flag), and if validation fails print a useful error/usage message and exit non-zero; only after successful validation perform the shift and set LABELS="$1" (keep the switch label --labels and variable name LABELS so it's easy to find).

coderabbitai · 2026-03-04T03:19:04Z

scripts/test_utils.sh

+        if [[ "$test_file" == "$prefix"* ]]; then
+            return 0


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the test_utils.sh file find . -name "test_utils.sh" -type f

Repository: flashinfer-ai/flashinfer

Length of output: 91

🏁 Script executed:

# Let's examine the file size first, then read the relevant sections wc -l ./scripts/test_utils.sh echo "---" # Read lines around 120-121 and 128-130 sed -n '115,135p' ./scripts/test_utils.sh

Repository: flashinfer-ai/flashinfer

Length of output: 795

🏁 Script executed:

# Let's run the validation script from the review comment to verify the issue cd ./scripts && python - <<'PY' import json, pathlib overrides = json.loads(pathlib.Path("codeowner_overrides.json").read_text()) prefixes = sorted({ p.rstrip("/") for group in overrides for p in group.get("owners", {}).keys() if p.startswith("tests/") }) files = [str(p).replace("\\", "/") for p in pathlib.Path("../tests").rglob("test_*.py")] ambiguous = False for prefix in prefixes: loose = [f for f in files if f.startswith(prefix)] strict = [f for f in files if f == prefix or f.startswith(prefix + "/")] extras = sorted(set(loose) - set(strict)) if extras: ambiguous = True print(f"[ambiguous] {prefix}") for f in extras[:10]: print(" ", f) if not ambiguous: print("No ambiguous matches found with current test files.") PY

Repository: flashinfer-ai/flashinfer

Length of output: 119

🏁 Script executed:

# Let's check the codeowner_overrides.json to see what prefixes exist cd ./scripts && cat codeowner_overrides.json | python -m json.tool | head -100

Repository: flashinfer-ai/flashinfer

Length of output: 2176

Use boundary-safe prefix checks for test paths.

Lines 120 and 128 use loose prefix matching with "$prefix"*, which can misclassify sibling files (e.g., tests/comm_test.py would incorrectly match the prefix tests/comm intended only for the directory). The current test files don't exhibit this issue, but the logic is vulnerable.

💡 Suggested fix

- if [[ "$test_file" == "$prefix"* ]]; then + if [[ "$test_file" == "$prefix" || "$test_file" == "$prefix/"* ]]; then return 0 fi @@ - if [[ "$test_file" == "$prefix"* ]]; then + if [[ "$test_file" == "$prefix" || "$test_file" == "$prefix/"* ]]; then echo "no matching label (have: $LABELS)" return 1 fi

Also applies to: 128-130

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/test_utils.sh` around lines 120 - 121, The prefix check using [[ "$test_file" == "$prefix"* ]] is unsafe because it matches sibling names; update the logic in the functions/blocks that use variables test_file and prefix (the checks at the shown occurrences) to only match when test_file is exactly equal to prefix or when test_file starts with prefix plus a path separator (i.e., treat prefix as a directory boundary), and apply the same change to the second occurrence around lines 128-130 so sibling files like tests/comm_test.py do not match a prefix intended for tests/comm.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/codeowner_analyzer.py`:
- Around line 33-42: When building the overrides dict in
scripts/codeowner_analyzer.py (inside the for group in raw loop that iterates
group["owners"]), detect if a normalized path key already exists in overrides
and do not silently overwrite it; either raise a ValueError identifying the
duplicate path (including the path string) or merge the owner lists with
de-duplication so owners from both groups are preserved. Update the logic around
overrides[path.lstrip("./").rstrip("/")] = users to check
overrides.get(path_key) first and handle duplicates accordingly, referencing the
overrides variable and the group["owners"] iteration to locate the change.
- Around line 595-598: _format_codeowners_path currently appends "/" based on
_is_file_path which misclassifies extensionless filenames (e.g., Dockerfile,
LICENSE) as directories; change the logic so that _format_codeowners_path (or
_is_file_path) treats a path as a file if the basename contains a dot OR the
basename matches a small whitelist of common extensionless filenames (e.g.,
"Dockerfile","LICENSE","README","Makefile") and also respect an explicit
trailing "/" (i.e., if path.endswith("/") treat as directory), then return path
or f"{path}/" accordingly; update the function that calls or implements
_is_file_path (_format_codeowners_path) to use this combined rule.
- Line 41: The path normalization currently uses path.lstrip("./").rstrip("/")
which strips any leading dots or slashes and corrupts dot-prefixed paths; change
the normalization to only remove a leading "./" when present (e.g. use
path.removeprefix("./").rstrip("/") or an equivalent startswith check) so that
entries like ".github/workflows" keep their leading dot; update the assignment
that sets overrides[...] = users accordingly (look for the overrides[...]
assignment using the path variable).

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3696939f-5eef-48f2-9828-e6d6745176a9

📥 Commits

Reviewing files that changed from the base of the PR and between b22fa5f and 1489253.

📒 Files selected for processing (2)

.github/workflows/pr-auto-label.yml
scripts/codeowner_analyzer.py

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/pr-auto-label.yml

coderabbitai · 2026-03-04T03:24:42Z

scripts/codeowner_analyzer.py

+    for group in raw:
+        if not isinstance(group, dict) or not isinstance(group.get("owners"), dict):
+            raise ValueError("Each group must be an object with an 'owners' dict")
+        for path, users in group["owners"].items():
+            if not isinstance(users, list) or not all(
+                isinstance(u, str) for u in users
+            ):
+                raise ValueError(f"Override for '{path}' must be a list of strings")
+            overrides[path.lstrip("./").rstrip("/")] = users
+


⚠️ Potential issue | 🟠 Major

Duplicate paths across groups are silently overwritten.

If the same path appears in multiple groups, the later group replaces earlier owners without warning. That can drop intended owners.

🔧 Proposed fix

- overrides[normalized_path] = users + if normalized_path in overrides and overrides[normalized_path] != users: + raise ValueError( + f"Duplicate override path with conflicting owners: '{normalized_path}'" + ) + overrides[normalized_path] = users

🧰 Tools

🪛 Ruff (0.15.2)

[warning] 35-35: Prefer TypeError exception for invalid type

(TRY004)

[warning] 35-35: Avoid specifying long messages outside the exception class

(TRY003)

[warning] 40-40: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/codeowner_analyzer.py` around lines 33 - 42, When building the overrides dict in scripts/codeowner_analyzer.py (inside the for group in raw loop that iterates group["owners"]), detect if a normalized path key already exists in overrides and do not silently overwrite it; either raise a ValueError identifying the duplicate path (including the path string) or merge the owner lists with de-duplication so owners from both groups are preserved. Update the logic around overrides[path.lstrip("./").rstrip("/")] = users to check overrides.get(path_key) first and handle duplicates accordingly, referencing the overrides variable and the group["owners"] iteration to locate the change.

coderabbitai · 2026-03-04T03:24:42Z

scripts/codeowner_analyzer.py

+                isinstance(u, str) for u in users
+            ):
+                raise ValueError(f"Override for '{path}' must be a list of strings")
+            overrides[path.lstrip("./").rstrip("/")] = users


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, explore the file structure and read the relevant section head -60 scripts/codeowner_analyzer.py | tail -30

Repository: flashinfer-ai/flashinfer

Length of output: 1082

🏁 Script executed:

#!/bin/bash # Get full context around line 41 sed -n '30,55p' scripts/codeowner_analyzer.py

Repository: flashinfer-ai/flashinfer

Length of output: 1022

🏁 Script executed:

#!/bin/bash # Check how overrides dict is used in the codebase rg -n "overrides" scripts/codeowner_analyzer.py -A 2 -B 2

Repository: flashinfer-ai/flashinfer

Length of output: 4981

🏁 Script executed:

#!/bin/bash # Verify the lstrip behavior with actual test cases python - <<'PY' # Demonstrate lstrip behavior with dot-prefixed paths test_cases = [ ".github/workflows", "./src/module", "/docs", "/.hidden", ".hidden/file", "normal/path" ] print("Testing lstrip('./'):") for path in test_cases: result = path.lstrip("./") print(f" {path!r:25} -> {result!r}") print("\nTesting strip() then manual prefix removal:") for path in test_cases: normalized = path.strip() if normalized.startswith("./"): normalized = normalized[2:] elif normalized.startswith("/"): normalized = normalized[1:] normalized = normalized.rstrip("/") print(f" {path!r:25} -> {result!r}") PY

Repository: flashinfer-ai/flashinfer

Length of output: 666

lstrip("./") corrupts dot-prefixed override paths, breaking override matching.

Line 41 uses lstrip("./") which removes any combination of "." and "/" characters from the left side, not just the "./" prefix. This transforms valid paths like .github/workflows into github/workflows, causing override lookups to fail silently. Dot-prefixed paths (.github, .hidden, etc.) will have their manual overrides ignored in the CODEOWNERS file.

🔧 Proposed fix

- overrides[path.lstrip("./").rstrip("/")] = users + normalized_path = path.strip() + if normalized_path.startswith("./"): + normalized_path = normalized_path[2:] + elif normalized_path.startswith("/"): + normalized_path = normalized_path[1:] + normalized_path = normalized_path.rstrip("/") + if not normalized_path: + raise ValueError("Override path cannot be empty") + overrides[normalized_path] = users

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/codeowner_analyzer.py` at line 41, The path normalization currently uses path.lstrip("./").rstrip("/") which strips any leading dots or slashes and corrupts dot-prefixed paths; change the normalization to only remove a leading "./" when present (e.g. use path.removeprefix("./").rstrip("/") or an equivalent startswith check) so that entries like ".github/workflows" keep their leading dot; update the assignment that sets overrides[...] = users accordingly (look for the overrides[...] assignment using the path variable).

coderabbitai · 2026-03-04T03:24:42Z

scripts/codeowner_analyzer.py

+    def _format_codeowners_path(self, path: str) -> str:
+        """Format a path for CODEOWNERS (append / for directories)."""
+        return path if self._is_file_path(path) else f"{path}/"
+


⚠️ Potential issue | 🟠 Major

Extensionless files are misclassified as directories.

_is_file_path relies on a dot in basename, so valid files like Dockerfile/LICENSE get a trailing /, generating wrong CODEOWNERS patterns.

🔧 Proposed fix

def _format_codeowners_path(self, path: str) -> str: - """Format a path for CODEOWNERS (append / for directories).""" - return path if self._is_file_path(path) else f"{path}/" + """Format a path for CODEOWNERS (append / only for real directories).""" + normalized = path.rstrip("/") + abs_path = self.repo_path / normalized + if abs_path.is_dir(): + return f"{normalized}/" + return normalized

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _format_codeowners_path(self, path: str) -> str:

"""Format a path for CODEOWNERS (append / for directories)."""

return path if self._is_file_path(path) else f"{path}/"

def _format_codeowners_path(self, path: str) -> str:

"""Format a path for CODEOWNERS (append / only for real directories)."""

normalized = path.rstrip("/")

abs_path = self.repo_path / normalized

if abs_path.is_dir():

return f"{normalized}/"

return normalized

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/codeowner_analyzer.py` around lines 595 - 598, _format_codeowners_path currently appends "/" based on _is_file_path which misclassifies extensionless filenames (e.g., Dockerfile, LICENSE) as directories; change the logic so that _format_codeowners_path (or _is_file_path) treats a path as a file if the basename contains a dot OR the basename matches a small whitelist of common extensionless filenames (e.g., "Dockerfile","LICENSE","README","Makefile") and also respect an explicit trailing "/" (i.e., if path.endswith("/") treat as directory), then return path or f"{path}/" accordingly; update the function that calls or implements _is_file_path (_format_codeowners_path) to use this combined rule.

yzh119 · 2026-03-04T07:39:19Z

@aleozlx seems recent PRs have marked all org members as reviewers, is it expected? Could this PR fix the issue?

aleozlx · 2026-03-04T23:28:59Z

@aleozlx seems recent PRs have marked all org members as reviewers, is it expected? Could this PR fix the issue?

hi i think i saw some but not every new PR. i clicked into a few recent ones and seemed reasonable
but i'd like to understand more. are there some specific PRs? i can take a look what might have caused it

bkryu · 2026-03-04T23:33:58Z

@aleozlx seems recent PRs have marked all org members as reviewers, is it expected? Could this PR fix the issue?

hi i think i saw some but not every new PR. i clicked into a few recent ones and seemed reasonable but i'd like to understand more. are there some specific PRs? i can take a look what might have caused it

It appears that ‎.github/workflows/pr-auto-label.yml‎ is detected as a file owned by all org members. Maybe that's why?

aleozlx · 2026-03-04T23:44:10Z

@bkryu found out this line may have caused the issue
f521fe1#diff-3d36a1bf06148bc6ba1ce2ed3d19de32ea708d955fed212c0d27c536f0bd4da7R7

(right hand side line 7)

i can investigate what caused that line in codeowner_analyzer.py figure out a fix if it's not expected, @yzh119

aleozlx · 2026-03-18T21:37:08Z

will revisit this for targeted testing after i impl the #2815. and we can do label based targeted testing

aleozlx requested review from Anerudhan, MasterJH5574, abcdabcd987, bkryu, fzyzcjy, joker-eph, jwu1980, kaixih, sricketts, tqchen, ttyio, weireweire, wenscarl, yongwww, youkaichao, yyihuang, yzh119 and zhyncs as code owners March 4, 2026 02:47

gemini-code-assist bot reviewed Mar 4, 2026

View reviewed changes

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

.github/workflows/pr-auto-label.yml Show resolved Hide resolved

aleozlx and others added 2 commits March 3, 2026 19:10

coderabbitai bot reviewed Mar 4, 2026

View reviewed changes

aleozlx marked this pull request as draft March 18, 2026 21:35

		if final:
		f.write(f"{self._format_codeowners_path(path)} {' '.join(final)}\n")

		if not isinstance(group, dict) or "owners" not in group:
		raise ValueError("Each group must be an object with an 'owners' key")

-    def _format_codeowners_path(self, path: str) -> str:
-        """Format a path for CODEOWNERS (append / for directories)."""
-        return path if self._is_file_path(path) else f"{path}/"
+    def _format_codeowners_path(self, path: str) -> str:
+        """Format a path for CODEOWNERS (append / only for real directories)."""
+        normalized = path.rstrip("/")
+        abs_path = self.repo_path / normalized
+        if abs_path.is_dir():
+            return f"{normalized}/"
+        return normalized

Conversation

aleozlx commented Mar 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

aleozlx Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

aleozlx Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Mar 4, 2026

Uh oh!

aleozlx commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkryu commented Mar 4, 2026

Uh oh!

aleozlx commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aleozlx commented Mar 18, 2026

Uh oh!

aleozlx commented Mar 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

aleozlx commented Mar 4, 2026 •

edited

Loading

aleozlx commented Mar 4, 2026 •

edited

Loading