feat: restructure codeowner_overrides.json for PR auto-labeling#2683
feat: restructure codeowner_overrides.json for PR auto-labeling#2683aleozlx wants to merge 3 commits intoflashinfer-ai:mainfrom
Conversation
Restructure codeowner_overrides.json from a flat dict to a grouped array where each group has `labels` (for PR auto-labeling) and `owners` (path to owner mappings). All existing path→owner mappings are preserved. Update codeowner_analyzer.py to parse the new format via a standalone `load_overrides()` function, and simplify CODEOWNERS generation to a single alphabetically-sorted pass over the union of computed and override paths. Override owners are prepended on exact matches for a wider owner list; override-only paths (including directories) are now emitted too. Add pr-auto-label.yml workflow that reads the grouped overrides, matches changed files by path prefix, and additively applies labels. Uses pull_request_target for safe write access on external PRs. Skips PRs older than 21 days. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the repository's automation by enabling automatic labeling of pull requests. This is achieved by restructuring the codeowner_overrides.json file to group code ownership paths with associated labels, and by updating the codeowner_analyzer.py script to process this new format. A new GitHub Actions workflow has been added to utilize this updated configuration for intelligent PR labeling, streamlining the review process and improving organization. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughAdds a GitHub Actions workflow to auto-label PRs based on grouped codeowner overrides; restructures overrides JSON and the analyzer to use labeled groups; and makes test selection label-aware so test runs can be filtered by PR labels. Changes
Sequence Diagram(s)sequenceDiagram
participant PR as Pull Request
participant GH as GitHub Actions (pr-auto-label)
participant Repo as Repository
participant Overrides as scripts/codeowner_overrides.json
participant CLI as gh CLI
PR->>GH: trigger on opened/synchronize
GH->>GH: check PR age vs 21-day cutoff
alt PR too old
GH-->>PR: exit (skip)
else PR recent
GH->>Repo: sparse-checkout `scripts/codeowner_overrides.json`
GH->>CLI: request changed files (GitHub API)
CLI->>Overrides: load grouped overrides (labels + owners)
CLI->>CLI: match changed file paths to override prefixes
CLI->>GH: apply labels via `gh pr edit` for each matched group
GH-->>PR: labels applied
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request effectively restructures codeowner_overrides.json to support PR auto-labeling and refactors codeowner_analyzer.py accordingly. While the code is generally well-structured and cleaner, a security audit identified two critical injection vulnerabilities. The most significant is an injection vulnerability in the CODEOWNERS file generation, allowing arbitrary ownership rule injection via unsanitized paths and usernames with newlines. A second parameter injection vulnerability in the GitHub API lookup logic could enable identity spoofing. Additionally, the validation logic in load_overrides should be improved for robustness against malformed input. Proper sanitization, URL encoding, and enhanced validation are required to address these concerns.
scripts/codeowner_analyzer.py
Outdated
| if final: | ||
| f.write(f"{self._format_codeowners_path(path)} {' '.join(final)}\n") |
There was a problem hiding this comment.
The generate_codeowners_file function writes paths and usernames directly into the CODEOWNERS file without sanitizing them for newlines. Since these values can originate from untrusted sources (such as git commit history or the overrides.json file which may be part of a pull request), an attacker can inject arbitrary rules into the CODEOWNERS file. For example, a malicious username or path containing a newline followed by * @attacker would grant the attacker ownership over all files in the repository. This is a high-severity injection vulnerability.
| if final: | |
| f.write(f"{self._format_codeowners_path(path)} {' '.join(final)}\n") | |
| sanitized_path = self._format_codeowners_path(path).replace("\n", "").replace("\r", "") | |
| sanitized_owners = [u.replace("\n", "").replace("\r", "") for u in final] | |
| if final: | |
| f.write(f"{sanitized_path} {' '.join(sanitized_owners)}\n") |
scripts/codeowner_analyzer.py
Outdated
| if not isinstance(group, dict) or "owners" not in group: | ||
| raise ValueError("Each group must be an object with an 'owners' key") |
There was a problem hiding this comment.
The validation logic can be made more robust. Currently, if group['owners'] is not a dictionary (e.g., null or an array in the JSON), the script will raise an AttributeError on .items(), which is not caught in main. It's better to validate the type of group['owners'] here to provide a clear error message and prevent the script from crashing.
if not isinstance(group, dict) or not isinstance(group.get("owners"), dict):
raise ValueError(
"Each group must be an object with an 'owners' key containing a dictionary"
)There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
scripts/codeowner_analyzer.py (1)
552-567: Consider defensive handling for malformed author strings.Line 563 assumes the author string contains
<email>format. While this should always be true for data from git log, a malformed entry would raiseIndexError.♻️ Optional defensive approach
gh = self.get_github_username(owner["author"]) - name = f"@{gh}" if gh else owner["author"].split("<")[1].rstrip(">") + if gh: + name = f"@{gh}" + else: + # Fallback to email from author string "Name <email>" + parts = owner["author"].split("<") + name = parts[1].rstrip(">") if len(parts) > 1 else owner["author"]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/codeowner_analyzer.py` around lines 552 - 567, In _extract_computed_usernames: guard against malformed owner["author"] values before assuming a "<email>" part to avoid IndexError; update the logic in _extract_computed_usernames to check whether owner["author"] contains "<" and ">" (or use a regex to extract the email portion) and fall back to a safe display name (e.g., the whole author string or the result of get_github_username) when the angle-bracket format is missing or split fails; ensure the deduplication (seen_lower) still works by normalizing the chosen fallback name..github/workflows/pr-auto-label.yml (2)
21-24: Consider using a neutral exit instead of failing the workflow.Using
exit 1marks the workflow run as failed in the PR checks, which can be noisy for old PRs. Consider usingexit 0or a GitHub Actions mechanism to skip subsequent steps gracefully.♻️ Suggested approach using step outputs
- name: Skip PRs older than 21 days + id: age-check env: PR_CREATED: ${{ github.event.pull_request.created_at }} run: | created=$(date -d "$PR_CREATED" +%s) cutoff=$(date -d '21 days ago' +%s) if [ "$created" -lt "$cutoff" ]; then echo "PR older than 21 days, skipping" - exit 1 + echo "skip=true" >> "$GITHUB_OUTPUT" + else + echo "skip=false" >> "$GITHUB_OUTPUT" fi - name: Sparse checkout overrides file from base branch + if: steps.age-check.outputs.skip != 'true' uses: actions/checkout@v4Then add the same
ifcondition to the "Apply labels" step.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/pr-auto-label.yml around lines 21 - 24, The branch that checks if [ "$created" -lt "$cutoff" ] should not fail the whole workflow; replace the failing exit (exit 1) with a neutral skip mechanism such as exit 0 or, better, set a step output (e.g., SKIP_LABELS=true) when the condition matches and keep the step printing "PR older than 21 days, skipping"; then update the "Apply labels" step to guard on that output (skip when SKIP_LABELS is true) so the run is not marked failed while still preventing label application.
3-5: Concurrent label mutations onsynchronizeevents.Both this workflow and
pr-label-cleanup.ymltrigger onsynchronizeand callgh pr editto modify labels. While they operate on different labels (this addsop:*labels; cleanup removesrun-ci), concurrent API calls could theoretically conflict. In practice, GitHub's API handles this gracefully, but be aware of this if adding more label operations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/pr-auto-label.yml around lines 3 - 5, This workflow and pr-label-cleanup.yml both run on pull_request_target with types: [synchronize] and both call gh pr edit to mutate labels (this one adds op:* labels; the other removes run-ci), risking conflicting concurrent edits; to fix, stop overlapping label writes by either removing "synchronize" from one workflow (so only one runs on updates), consolidating label logic into a single workflow, or changing the label mutation step to read the current labels and submit a single atomic set (fetch labels then use gh pr edit --add/--remove or the REST API to set the full label list) so edits are deterministic; update the workflow trigger or the label-edit step accordingly to reference the pull_request_target trigger and the gh pr edit label operations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/pr-auto-label.yml:
- Around line 42-62: The loops over $labels and $prefixes use word-splitting
which breaks labels with spaces; change iteration to preserve items (e.g., read
labels and prefixes into arrays or pipe the jq arrays into while read -r loops)
and iterate using quoted expansions so each label/prefix is treated as a single
entry (avoid constructs like for label in $labels; use for label in
"${labels[@]}" or while read -r label; do ... done). Update the sections that
populate/iterate variables labels, prefixes and the inner loop over
changed_files (and the gh pr edit --add-label call) to use quoted variables
("$label", "$prefix", "$file") and array-aware iteration to prevent splitting on
spaces.
---
Nitpick comments:
In @.github/workflows/pr-auto-label.yml:
- Around line 21-24: The branch that checks if [ "$created" -lt "$cutoff" ]
should not fail the whole workflow; replace the failing exit (exit 1) with a
neutral skip mechanism such as exit 0 or, better, set a step output (e.g.,
SKIP_LABELS=true) when the condition matches and keep the step printing "PR
older than 21 days, skipping"; then update the "Apply labels" step to guard on
that output (skip when SKIP_LABELS is true) so the run is not marked failed
while still preventing label application.
- Around line 3-5: This workflow and pr-label-cleanup.yml both run on
pull_request_target with types: [synchronize] and both call gh pr edit to mutate
labels (this one adds op:* labels; the other removes run-ci), risking
conflicting concurrent edits; to fix, stop overlapping label writes by either
removing "synchronize" from one workflow (so only one runs on updates),
consolidating label logic into a single workflow, or changing the label mutation
step to read the current labels and submit a single atomic set (fetch labels
then use gh pr edit --add/--remove or the REST API to set the full label list)
so edits are deterministic; update the workflow trigger or the label-edit step
accordingly to reference the pull_request_target trigger and the gh pr edit
label operations.
In `@scripts/codeowner_analyzer.py`:
- Around line 552-567: In _extract_computed_usernames: guard against malformed
owner["author"] values before assuming a "<email>" part to avoid IndexError;
update the logic in _extract_computed_usernames to check whether owner["author"]
contains "<" and ">" (or use a regex to extract the email portion) and fall back
to a safe display name (e.g., the whole author string or the result of
get_github_username) when the angle-bracket format is missing or split fails;
ensure the deduplication (seen_lower) still works by normalizing the chosen
fallback name.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a0a3840a-9806-44a2-86e5-7412092c95d3
📒 Files selected for processing (3)
.github/workflows/pr-auto-label.ymlscripts/codeowner_analyzer.pyscripts/codeowner_overrides.json
Add --labels option to test_utils.sh parse_args() accepting comma-separated PR labels (e.g., "op: attention,op: gemm"). Implement gh_label_filter() which reads codeowner_overrides.json to determine which test prefixes are categorized under which labels. Four-step logic: (1) no labels → run all, (2) label match → run, (3) categorized but no match → skip, (4) uncategorized → always run. Wire the filter into task_run_unit_tests.sh find_test_files() so discovered test files are filtered before execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Sanitize paths and usernames against newline injection in CODEOWNERS output (high severity) - Validate group['owners'] is a dict, not just present (medium) - Use readarray + quoted array expansion in pr-auto-label.yml to handle labels with spaces (e.g., "op: attention") Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/test_utils.sh`:
- Around line 120-121: The prefix check using [[ "$test_file" == "$prefix"* ]]
is unsafe because it matches sibling names; update the logic in the
functions/blocks that use variables test_file and prefix (the checks at the
shown occurrences) to only match when test_file is exactly equal to prefix or
when test_file starts with prefix plus a path separator (i.e., treat prefix as a
directory boundary), and apply the same change to the second occurrence around
lines 128-130 so sibling files like tests/comm_test.py do not match a prefix
intended for tests/comm.
- Around line 54-57: The --labels case in the argument-parsing switch assigns
LABELS and shifts without validation; update the --labels branch to first verify
that a next positional argument exists and does not start with a hyphen (so it
isn't another flag), and if validation fails print a useful error/usage message
and exit non-zero; only after successful validation perform the shift and set
LABELS="$1" (keep the switch label --labels and variable name LABELS so it's
easy to find).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 62225627-8d28-43a4-aaeb-fd1166454f84
📒 Files selected for processing (2)
scripts/task_run_unit_tests.shscripts/test_utils.sh
| --labels) | ||
| shift | ||
| LABELS="$1" | ||
| ;; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat scripts/test_utils.sh | head -100 | tail -50Repository: flashinfer-ai/flashinfer
Length of output: 1716
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 <<'PY'
from pathlib import Path
# Read the file
content = Path("scripts/test_utils.sh").read_text()
lines = content.splitlines()
# Find the --labels block
for i, line in enumerate(lines):
if "--labels)" in line:
start_line = i + 1 # 1-indexed
# Find the ;; that closes this case
for j in range(i, len(lines)):
if ";;" in lines[j]:
end_line = j + 1 # 1-indexed
break
print("Found --labels block at lines", start_line, "-", end_line)
print("\nCode:")
for n in range(start_line - 1, end_line):
print(f"{n+1}: {lines[n]}")
# Check for guards
block_text = "\n".join(lines[start_line-1:end_line])
has_guard = ("-lt 2" in block_text or
"requires a" in block_text or
"shift 2" in block_text or
"[ $# -" in block_text or
"[[ " in block_text)
print(f"\nGuard/validation present: {has_guard}")
break
PYRepository: flashinfer-ai/flashinfer
Length of output: 252
🏁 Script executed:
# Verify the loop structure to confirm double-shift pattern
sed -n '50,65p' scripts/test_utils.shRepository: flashinfer-ai/flashinfer
Length of output: 446
Validate --labels has a value before shifting.
Lines 55–57 shift and assign without checking if a value exists. When --labels is the last argument, the second shift at line 60 fails. When --labels precedes another flag (e.g., --labels --threads), that flag is incorrectly assigned to LABELS.
Suggested fix
--labels)
+ if [ $# -lt 2 ] || [[ "$2" == --* ]]; then
+ echo "ERROR: --labels requires a comma-separated value" >&2
+ exit 2
+ fi
shift
LABELS="$1"
;;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| --labels) | |
| shift | |
| LABELS="$1" | |
| ;; | |
| --labels) | |
| if [ $# -lt 2 ] || [[ "$2" == --* ]]; then | |
| echo "ERROR: --labels requires a comma-separated value" >&2 | |
| exit 2 | |
| fi | |
| shift | |
| LABELS="$1" | |
| ;; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/test_utils.sh` around lines 54 - 57, The --labels case in the
argument-parsing switch assigns LABELS and shifts without validation; update the
--labels branch to first verify that a next positional argument exists and does
not start with a hyphen (so it isn't another flag), and if validation fails
print a useful error/usage message and exit non-zero; only after successful
validation perform the shift and set LABELS="$1" (keep the switch label --labels
and variable name LABELS so it's easy to find).
| if [[ "$test_file" == "$prefix"* ]]; then | ||
| return 0 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the test_utils.sh file
find . -name "test_utils.sh" -type fRepository: flashinfer-ai/flashinfer
Length of output: 91
🏁 Script executed:
# Let's examine the file size first, then read the relevant sections
wc -l ./scripts/test_utils.sh
echo "---"
# Read lines around 120-121 and 128-130
sed -n '115,135p' ./scripts/test_utils.shRepository: flashinfer-ai/flashinfer
Length of output: 795
🏁 Script executed:
# Let's run the validation script from the review comment to verify the issue
cd ./scripts && python - <<'PY'
import json, pathlib
overrides = json.loads(pathlib.Path("codeowner_overrides.json").read_text())
prefixes = sorted({
p.rstrip("/")
for group in overrides
for p in group.get("owners", {}).keys()
if p.startswith("tests/")
})
files = [str(p).replace("\\", "/") for p in pathlib.Path("../tests").rglob("test_*.py")]
ambiguous = False
for prefix in prefixes:
loose = [f for f in files if f.startswith(prefix)]
strict = [f for f in files if f == prefix or f.startswith(prefix + "/")]
extras = sorted(set(loose) - set(strict))
if extras:
ambiguous = True
print(f"[ambiguous] {prefix}")
for f in extras[:10]:
print(" ", f)
if not ambiguous:
print("No ambiguous matches found with current test files.")
PYRepository: flashinfer-ai/flashinfer
Length of output: 119
🏁 Script executed:
# Let's check the codeowner_overrides.json to see what prefixes exist
cd ./scripts && cat codeowner_overrides.json | python -m json.tool | head -100Repository: flashinfer-ai/flashinfer
Length of output: 2176
Use boundary-safe prefix checks for test paths.
Lines 120 and 128 use loose prefix matching with "$prefix"*, which can misclassify sibling files (e.g., tests/comm_test.py would incorrectly match the prefix tests/comm intended only for the directory). The current test files don't exhibit this issue, but the logic is vulnerable.
💡 Suggested fix
- if [[ "$test_file" == "$prefix"* ]]; then
+ if [[ "$test_file" == "$prefix" || "$test_file" == "$prefix/"* ]]; then
return 0
fi
@@
- if [[ "$test_file" == "$prefix"* ]]; then
+ if [[ "$test_file" == "$prefix" || "$test_file" == "$prefix/"* ]]; then
echo "no matching label (have: $LABELS)"
return 1
fiAlso applies to: 128-130
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/test_utils.sh` around lines 120 - 121, The prefix check using [[
"$test_file" == "$prefix"* ]] is unsafe because it matches sibling names; update
the logic in the functions/blocks that use variables test_file and prefix (the
checks at the shown occurrences) to only match when test_file is exactly equal
to prefix or when test_file starts with prefix plus a path separator (i.e.,
treat prefix as a directory boundary), and apply the same change to the second
occurrence around lines 128-130 so sibling files like tests/comm_test.py do not
match a prefix intended for tests/comm.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/codeowner_analyzer.py`:
- Around line 33-42: When building the overrides dict in
scripts/codeowner_analyzer.py (inside the for group in raw loop that iterates
group["owners"]), detect if a normalized path key already exists in overrides
and do not silently overwrite it; either raise a ValueError identifying the
duplicate path (including the path string) or merge the owner lists with
de-duplication so owners from both groups are preserved. Update the logic around
overrides[path.lstrip("./").rstrip("/")] = users to check
overrides.get(path_key) first and handle duplicates accordingly, referencing the
overrides variable and the group["owners"] iteration to locate the change.
- Around line 595-598: _format_codeowners_path currently appends "/" based on
_is_file_path which misclassifies extensionless filenames (e.g., Dockerfile,
LICENSE) as directories; change the logic so that _format_codeowners_path (or
_is_file_path) treats a path as a file if the basename contains a dot OR the
basename matches a small whitelist of common extensionless filenames (e.g.,
"Dockerfile","LICENSE","README","Makefile") and also respect an explicit
trailing "/" (i.e., if path.endswith("/") treat as directory), then return path
or f"{path}/" accordingly; update the function that calls or implements
_is_file_path (_format_codeowners_path) to use this combined rule.
- Line 41: The path normalization currently uses path.lstrip("./").rstrip("/")
which strips any leading dots or slashes and corrupts dot-prefixed paths; change
the normalization to only remove a leading "./" when present (e.g. use
path.removeprefix("./").rstrip("/") or an equivalent startswith check) so that
entries like ".github/workflows" keep their leading dot; update the assignment
that sets overrides[...] = users accordingly (look for the overrides[...]
assignment using the path variable).
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3696939f-5eef-48f2-9828-e6d6745176a9
📒 Files selected for processing (2)
.github/workflows/pr-auto-label.ymlscripts/codeowner_analyzer.py
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/pr-auto-label.yml
| for group in raw: | ||
| if not isinstance(group, dict) or not isinstance(group.get("owners"), dict): | ||
| raise ValueError("Each group must be an object with an 'owners' dict") | ||
| for path, users in group["owners"].items(): | ||
| if not isinstance(users, list) or not all( | ||
| isinstance(u, str) for u in users | ||
| ): | ||
| raise ValueError(f"Override for '{path}' must be a list of strings") | ||
| overrides[path.lstrip("./").rstrip("/")] = users | ||
|
|
There was a problem hiding this comment.
Duplicate paths across groups are silently overwritten.
If the same path appears in multiple groups, the later group replaces earlier owners without warning. That can drop intended owners.
🔧 Proposed fix
- overrides[normalized_path] = users
+ if normalized_path in overrides and overrides[normalized_path] != users:
+ raise ValueError(
+ f"Duplicate override path with conflicting owners: '{normalized_path}'"
+ )
+ overrides[normalized_path] = users🧰 Tools
🪛 Ruff (0.15.2)
[warning] 35-35: Prefer TypeError exception for invalid type
(TRY004)
[warning] 35-35: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 40-40: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/codeowner_analyzer.py` around lines 33 - 42, When building the
overrides dict in scripts/codeowner_analyzer.py (inside the for group in raw
loop that iterates group["owners"]), detect if a normalized path key already
exists in overrides and do not silently overwrite it; either raise a ValueError
identifying the duplicate path (including the path string) or merge the owner
lists with de-duplication so owners from both groups are preserved. Update the
logic around overrides[path.lstrip("./").rstrip("/")] = users to check
overrides.get(path_key) first and handle duplicates accordingly, referencing the
overrides variable and the group["owners"] iteration to locate the change.
| isinstance(u, str) for u in users | ||
| ): | ||
| raise ValueError(f"Override for '{path}' must be a list of strings") | ||
| overrides[path.lstrip("./").rstrip("/")] = users |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, explore the file structure and read the relevant section
head -60 scripts/codeowner_analyzer.py | tail -30Repository: flashinfer-ai/flashinfer
Length of output: 1082
🏁 Script executed:
#!/bin/bash
# Get full context around line 41
sed -n '30,55p' scripts/codeowner_analyzer.pyRepository: flashinfer-ai/flashinfer
Length of output: 1022
🏁 Script executed:
#!/bin/bash
# Check how overrides dict is used in the codebase
rg -n "overrides" scripts/codeowner_analyzer.py -A 2 -B 2Repository: flashinfer-ai/flashinfer
Length of output: 4981
🏁 Script executed:
#!/bin/bash
# Verify the lstrip behavior with actual test cases
python - <<'PY'
# Demonstrate lstrip behavior with dot-prefixed paths
test_cases = [
".github/workflows",
"./src/module",
"/docs",
"/.hidden",
".hidden/file",
"normal/path"
]
print("Testing lstrip('./'):")
for path in test_cases:
result = path.lstrip("./")
print(f" {path!r:25} -> {result!r}")
print("\nTesting strip() then manual prefix removal:")
for path in test_cases:
normalized = path.strip()
if normalized.startswith("./"):
normalized = normalized[2:]
elif normalized.startswith("/"):
normalized = normalized[1:]
normalized = normalized.rstrip("/")
print(f" {path!r:25} -> {result!r}")
PYRepository: flashinfer-ai/flashinfer
Length of output: 666
lstrip("./") corrupts dot-prefixed override paths, breaking override matching.
Line 41 uses lstrip("./") which removes any combination of "." and "/" characters from the left side, not just the "./" prefix. This transforms valid paths like .github/workflows into github/workflows, causing override lookups to fail silently. Dot-prefixed paths (.github, .hidden, etc.) will have their manual overrides ignored in the CODEOWNERS file.
🔧 Proposed fix
- overrides[path.lstrip("./").rstrip("/")] = users
+ normalized_path = path.strip()
+ if normalized_path.startswith("./"):
+ normalized_path = normalized_path[2:]
+ elif normalized_path.startswith("/"):
+ normalized_path = normalized_path[1:]
+ normalized_path = normalized_path.rstrip("/")
+ if not normalized_path:
+ raise ValueError("Override path cannot be empty")
+ overrides[normalized_path] = users🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/codeowner_analyzer.py` at line 41, The path normalization currently
uses path.lstrip("./").rstrip("/") which strips any leading dots or slashes and
corrupts dot-prefixed paths; change the normalization to only remove a leading
"./" when present (e.g. use path.removeprefix("./").rstrip("/") or an equivalent
startswith check) so that entries like ".github/workflows" keep their leading
dot; update the assignment that sets overrides[...] = users accordingly (look
for the overrides[...] assignment using the path variable).
| def _format_codeowners_path(self, path: str) -> str: | ||
| """Format a path for CODEOWNERS (append / for directories).""" | ||
| return path if self._is_file_path(path) else f"{path}/" | ||
|
|
There was a problem hiding this comment.
Extensionless files are misclassified as directories.
_is_file_path relies on a dot in basename, so valid files like Dockerfile/LICENSE get a trailing /, generating wrong CODEOWNERS patterns.
🔧 Proposed fix
def _format_codeowners_path(self, path: str) -> str:
- """Format a path for CODEOWNERS (append / for directories)."""
- return path if self._is_file_path(path) else f"{path}/"
+ """Format a path for CODEOWNERS (append / only for real directories)."""
+ normalized = path.rstrip("/")
+ abs_path = self.repo_path / normalized
+ if abs_path.is_dir():
+ return f"{normalized}/"
+ return normalized📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _format_codeowners_path(self, path: str) -> str: | |
| """Format a path for CODEOWNERS (append / for directories).""" | |
| return path if self._is_file_path(path) else f"{path}/" | |
| def _format_codeowners_path(self, path: str) -> str: | |
| """Format a path for CODEOWNERS (append / only for real directories).""" | |
| normalized = path.rstrip("/") | |
| abs_path = self.repo_path / normalized | |
| if abs_path.is_dir(): | |
| return f"{normalized}/" | |
| return normalized |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/codeowner_analyzer.py` around lines 595 - 598,
_format_codeowners_path currently appends "/" based on _is_file_path which
misclassifies extensionless filenames (e.g., Dockerfile, LICENSE) as
directories; change the logic so that _format_codeowners_path (or _is_file_path)
treats a path as a file if the basename contains a dot OR the basename matches a
small whitelist of common extensionless filenames (e.g.,
"Dockerfile","LICENSE","README","Makefile") and also respect an explicit
trailing "/" (i.e., if path.endswith("/") treat as directory), then return path
or f"{path}/" accordingly; update the function that calls or implements
_is_file_path (_format_codeowners_path) to use this combined rule.
|
@aleozlx seems recent PRs have marked all org members as reviewers, is it expected? Could this PR fix the issue? |
hi i think i saw some but not every new PR. i clicked into a few recent ones and seemed reasonable |
It appears that
|
|
@bkryu found out this line may have caused the issue (right hand side line 7) i can investigate what caused that line in codeowner_analyzer.py figure out a fix if it's not expected, @yzh119 |
|
will revisit this for targeted testing after i impl the #2815. and we can do label based targeted testing |

📌 Description
Restructure codeowner_overrides.json from a flat dict to a grouped array where each group has
labels(for PR auto-labeling) andowners(path to owner mappings). All existing path→owner mappings are preserved.Update & simplify codeowner_analyzer.py
Add pr-auto-label.yml workflow that reads the grouped overrides, matches changed files by path prefix, and additively applies labels. Uses pull_request_target for safe write access on external PRs. Skips PRs older than 21 days.
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit
Chores
New Features