Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions .github/workflows/pr-auto-label.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: PR Auto-Label

on:
pull_request_target:
types: [opened, synchronize]

permissions:
contents: read
pull-requests: write

jobs:
label:
runs-on: ubuntu-latest
steps:
- name: Skip PRs older than 21 days
env:
PR_CREATED: ${{ github.event.pull_request.created_at }}
run: |
created=$(date -d "$PR_CREATED" +%s)
cutoff=$(date -d '21 days ago' +%s)
if [ "$created" -lt "$cutoff" ]; then
echo "PR older than 21 days, skipping"
exit 1
fi

- name: Sparse checkout overrides file from base branch
uses: actions/checkout@v4
with:
sparse-checkout: scripts/codeowner_overrides.json
sparse-checkout-cone-mode: false

- name: Apply labels based on changed files
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
PR_NUMBER: ${{ github.event.pull_request.number }}
run: |
# Fetch changed files in the PR
changed_files=$(gh api "repos/${{ github.repository }}/pulls/${PR_NUMBER}/files" \
--paginate --jq '.[].filename')

# Read override groups and apply matching labels
jq -c '.[]' scripts/codeowner_overrides.json | while read -r group; do
readarray -t labels < <(echo "$group" | jq -r '.labels[]')
readarray -t prefixes < <(echo "$group" | jq -r '.owners | keys[]')

match=false
for prefix in "${prefixes[@]}"; do
for file in $changed_files; do
case "$file" in
"$prefix"*) match=true; break ;;
esac
done
if $match; then break; fi
done

if $match; then
for label in "${labels[@]}"; do
echo "Adding label: $label"
gh pr edit "$PR_NUMBER" \
--repo "${{ github.repository }}" \
--add-label "$label" || true
done
Comment thread
coderabbitai[bot] marked this conversation as resolved.
fi
done
208 changes: 86 additions & 122 deletions scripts/codeowner_analyzer.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,31 @@
import re


def load_overrides(filename: str) -> Dict[str, List[str]]:
"""Load codeowner overrides from a grouped JSON array file.

Returns a flat dict mapping path prefixes to lists of owners.
"""
with open(filename, "r") as f:
raw = json.load(f)

if not isinstance(raw, list):
raise ValueError("Overrides file must contain a JSON array")

overrides: Dict[str, List[str]] = {}
for group in raw:
if not isinstance(group, dict) or not isinstance(group.get("owners"), dict):
raise ValueError("Each group must be an object with an 'owners' dict")
for path, users in group["owners"].items():
if not isinstance(users, list) or not all(
isinstance(u, str) for u in users
):
raise ValueError(f"Override for '{path}' must be a list of strings")
overrides[path.lstrip("./").rstrip("/")] = users
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, explore the file structure and read the relevant section
head -60 scripts/codeowner_analyzer.py | tail -30

Repository: flashinfer-ai/flashinfer

Length of output: 1082


🏁 Script executed:

#!/bin/bash
# Get full context around line 41
sed -n '30,55p' scripts/codeowner_analyzer.py

Repository: flashinfer-ai/flashinfer

Length of output: 1022


🏁 Script executed:

#!/bin/bash
# Check how overrides dict is used in the codebase
rg -n "overrides" scripts/codeowner_analyzer.py -A 2 -B 2

Repository: flashinfer-ai/flashinfer

Length of output: 4981


🏁 Script executed:

#!/bin/bash
# Verify the lstrip behavior with actual test cases
python - <<'PY'
# Demonstrate lstrip behavior with dot-prefixed paths
test_cases = [
    ".github/workflows",
    "./src/module", 
    "/docs",
    "/.hidden",
    ".hidden/file",
    "normal/path"
]

print("Testing lstrip('./'):")
for path in test_cases:
    result = path.lstrip("./")
    print(f"  {path!r:25} -> {result!r}")

print("\nTesting strip() then manual prefix removal:")
for path in test_cases:
    normalized = path.strip()
    if normalized.startswith("./"):
        normalized = normalized[2:]
    elif normalized.startswith("/"):
        normalized = normalized[1:]
    normalized = normalized.rstrip("/")
    print(f"  {path!r:25} -> {result!r}")
PY

Repository: flashinfer-ai/flashinfer

Length of output: 666


lstrip("./") corrupts dot-prefixed override paths, breaking override matching.

Line 41 uses lstrip("./") which removes any combination of "." and "/" characters from the left side, not just the "./" prefix. This transforms valid paths like .github/workflows into github/workflows, causing override lookups to fail silently. Dot-prefixed paths (.github, .hidden, etc.) will have their manual overrides ignored in the CODEOWNERS file.

🔧 Proposed fix
-            overrides[path.lstrip("./").rstrip("/")] = users
+            normalized_path = path.strip()
+            if normalized_path.startswith("./"):
+                normalized_path = normalized_path[2:]
+            elif normalized_path.startswith("/"):
+                normalized_path = normalized_path[1:]
+            normalized_path = normalized_path.rstrip("/")
+            if not normalized_path:
+                raise ValueError("Override path cannot be empty")
+            overrides[normalized_path] = users
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/codeowner_analyzer.py` at line 41, The path normalization currently
uses path.lstrip("./").rstrip("/") which strips any leading dots or slashes and
corrupts dot-prefixed paths; change the normalization to only remove a leading
"./" when present (e.g. use path.removeprefix("./").rstrip("/") or an equivalent
startswith check) so that entries like ".github/workflows" keep their leading
dot; update the assignment that sets overrides[...] = users accordingly (look
for the overrides[...] assignment using the path variable).


Comment on lines +33 to +42
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Duplicate paths across groups are silently overwritten.

If the same path appears in multiple groups, the later group replaces earlier owners without warning. That can drop intended owners.

🔧 Proposed fix
-            overrides[normalized_path] = users
+            if normalized_path in overrides and overrides[normalized_path] != users:
+                raise ValueError(
+                    f"Duplicate override path with conflicting owners: '{normalized_path}'"
+                )
+            overrides[normalized_path] = users
🧰 Tools
🪛 Ruff (0.15.2)

[warning] 35-35: Prefer TypeError exception for invalid type

(TRY004)


[warning] 35-35: Avoid specifying long messages outside the exception class

(TRY003)


[warning] 40-40: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/codeowner_analyzer.py` around lines 33 - 42, When building the
overrides dict in scripts/codeowner_analyzer.py (inside the for group in raw
loop that iterates group["owners"]), detect if a normalized path key already
exists in overrides and do not silently overwrite it; either raise a ValueError
identifying the duplicate path (including the path string) or merge the owner
lists with de-duplication so owners from both groups are preserved. Update the
logic around overrides[path.lstrip("./").rstrip("/")] = users to check
overrides.get(path_key) first and handle duplicates accordingly, referencing the
overrides variable and the group["owners"] iteration to locate the change.

return overrides


class CodeOwnersAnalyzer:
def __init__(
self,
Expand Down Expand Up @@ -524,62 +549,67 @@ def _normalize_usernames(self, users: List[str]) -> List[str]:
"""Normalize usernames to have @ prefix."""
return [f"@{u}" if not u.startswith("@") else u for u in users]

def _extract_computed_usernames(self, data: Dict[str, Any]) -> List[str]:
"""Extract deduplicated GitHub usernames from computed ownership data."""
usernames: List[str] = []
seen_lower: set = set()
if not data["owners"]:
return usernames
top_owners = [
o for o in data["owners"][: self.top_n_owners] if o["ownership_score"] > 0.1
]
for owner in top_owners:
gh = self.get_github_username(owner["author"])
name = f"@{gh}" if gh else owner["author"].split("<")[1].rstrip(">")
if name.lower() not in seen_lower:
usernames.append(name)
seen_lower.add(name.lower())
return usernames

def _merge_owners_with_overrides(
self, module: str, computed_usernames: List[str]
) -> List[str]:
"""Merge manual overrides with computed owners.

Override users are prepended to the list (indicating primary ownership),
and duplicates are removed. Override users are always included, even if
that exceeds top_n_owners. Only computed owners are subject to truncation.
"""Merge manual overrides with computed owners for an exact-match path.

Note: File-level overrides are handled separately in generate_codeowners_file(),
since there's no computed ownership for individual files.

Args:
module: The module path (e.g., "flashinfer/fused_moe")
computed_usernames: List of GitHub usernames from git history analysis
(with @ prefix, e.g., ["@alice", "@bob"])

Returns:
Merged list of usernames with overrides first (all overrides preserved)
Override users are prepended; computed users fill remaining slots up to
top_n_owners. The result is wider than either list alone.
"""
if not self.owner_overrides:
return computed_usernames[: self.top_n_owners]

# Skip file-level overrides (handled separately)
if self._is_file_path(module):
return computed_usernames[: self.top_n_owners]

# Get override users for this module (without @ prefix in the config)
override_users = self.owner_overrides.get(module, [])
if not override_users:
return computed_usernames[: self.top_n_owners]

# Normalize override users to have @ prefix
override_usernames = self._normalize_usernames(override_users)

# Build merged list: overrides first, then computed (excluding duplicates)
# Override users are always preserved; only computed users are truncated
# Use case-insensitive comparison since GitHub usernames are case-insensitive
merged = list(override_usernames)
merged = self._normalize_usernames(override_users)
merged_lower = {u.lower() for u in merged}
num_overrides = len(merged)
remaining_slots = max(0, self.top_n_owners - num_overrides)
remaining = max(0, self.top_n_owners - len(merged))

for username in computed_usernames:
if remaining <= 0:
break
if username.lower() not in merged_lower:
if remaining_slots > 0:
merged.append(username)
merged_lower.add(username.lower())
remaining_slots -= 1
merged.append(username)
merged_lower.add(username.lower())
remaining -= 1

return merged

def _format_codeowners_path(self, path: str) -> str:
"""Format a path for CODEOWNERS (append / for directories)."""
return path if self._is_file_path(path) else f"{path}/"

Comment on lines +595 to +598
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Extensionless files are misclassified as directories.

_is_file_path relies on a dot in basename, so valid files like Dockerfile/LICENSE get a trailing /, generating wrong CODEOWNERS patterns.

🔧 Proposed fix
     def _format_codeowners_path(self, path: str) -> str:
-        """Format a path for CODEOWNERS (append / for directories)."""
-        return path if self._is_file_path(path) else f"{path}/"
+        """Format a path for CODEOWNERS (append / only for real directories)."""
+        normalized = path.rstrip("/")
+        abs_path = self.repo_path / normalized
+        if abs_path.is_dir():
+            return f"{normalized}/"
+        return normalized
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _format_codeowners_path(self, path: str) -> str:
"""Format a path for CODEOWNERS (append / for directories)."""
return path if self._is_file_path(path) else f"{path}/"
def _format_codeowners_path(self, path: str) -> str:
"""Format a path for CODEOWNERS (append / only for real directories)."""
normalized = path.rstrip("/")
abs_path = self.repo_path / normalized
if abs_path.is_dir():
return f"{normalized}/"
return normalized
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/codeowner_analyzer.py` around lines 595 - 598,
_format_codeowners_path currently appends "/" based on _is_file_path which
misclassifies extensionless filenames (e.g., Dockerfile, LICENSE) as
directories; change the logic so that _format_codeowners_path (or _is_file_path)
treats a path as a file if the basename contains a dot OR the basename matches a
small whitelist of common extensionless filenames (e.g.,
"Dockerfile","LICENSE","README","Makefile") and also respect an explicit
trailing "/" (i.e., if path.endswith("/") treat as directory), then return path
or f"{path}/" accordingly; update the function that calls or implements
_is_file_path (_format_codeowners_path) to use this combined rule.

def generate_codeowners_file(
self, results: Dict[str, Any], output_file: str = "CODEOWNERS"
) -> None:
"""Generate a CODEOWNERS file based on analysis."""
"""Generate a CODEOWNERS file from computed results and overrides.

All paths (computed and override-only) are emitted in alphabetical
order. For exact-match paths, override owners are prepended to the
computed list so the final owner set is wider. Alphabetical ordering
naturally puts parent dirs before children, which matters because
CODEOWNERS uses last-match-wins.
"""
override_paths = set(self.owner_overrides) if self.owner_overrides else set()
all_paths = sorted(set(results) | override_paths)

with open(output_file, "w") as f:
f.write("# Code Owners File\n")
f.write("# Generated automatically from git history analysis\n")
Expand All @@ -589,58 +619,23 @@ def generate_codeowners_file(
f.write("# Manual overrides applied from overrides file\n")
f.write("\n")

# Write directory entries (computed + merged overrides)
for module, data in results.items():
# Extract GitHub usernames from computed owners
# Use case-insensitive deduplication since the same user may appear
# multiple times with different email addresses
computed_usernames = []
seen_usernames_lower = set()
if data["owners"]:
# Take top N owners or those with ownership score > 0.1
top_owners = [
owner
for owner in data["owners"][: self.top_n_owners]
if owner["ownership_score"] > 0.1
]

for owner in top_owners:
github_username = self.get_github_username(owner["author"])
if github_username:
username = f"@{github_username}"
else:
# Fallback to email if no GitHub username found
username = owner["author"].split("<")[1].rstrip(">")

# Skip duplicates (case-insensitive)
if username.lower() not in seen_usernames_lower:
computed_usernames.append(username)
seen_usernames_lower.add(username.lower())

# Merge with overrides
final_usernames = self._merge_owners_with_overrides(
module, computed_usernames
)

if final_usernames:
owners_list = " ".join(final_usernames)
f.write(f"{module}/ {owners_list}\n")

# Write file-level overrides LAST (CODEOWNERS uses last-match-wins)
# This ensures file-specific overrides take precedence over directory patterns
if self.owner_overrides:
file_overrides = [
(path, users)
for path, users in self.owner_overrides.items()
if self._is_file_path(path)
]
if file_overrides:
f.write(
"\n# File-level overrides (must come last for precedence)\n"
for path in all_paths:
if path in results:
computed = self._extract_computed_usernames(results[path])
final = self._merge_owners_with_overrides(path, computed)
else:
# Override-only path (no computed match)
final = self._normalize_usernames(self.owner_overrides[path])

if final:
# Sanitize against newline injection
safe_path = (
self._format_codeowners_path(path)
.replace("\n", "")
.replace("\r", "")
)
for path, users in sorted(file_overrides):
usernames = self._normalize_usernames(users)
f.write(f"{path} {' '.join(usernames)}\n")
safe_owners = [u.replace("\n", "").replace("\r", "") for u in final]
f.write(f"{safe_path} {' '.join(safe_owners)}\n")

def print_detailed_report(self, results: Dict[str, Any]) -> None:
"""Print a detailed ownership report."""
Expand Down Expand Up @@ -777,46 +772,15 @@ def main() -> int:
owner_overrides = None
if args.overrides_file:
try:
with open(args.overrides_file, "r") as f:
owner_overrides = json.load(f)
if not isinstance(owner_overrides, dict):
print(
"Error: Overrides file must contain a JSON object (dict)",
file=sys.stderr,
)
return 1
# Validate structure: all values should be lists of strings
for path, users in owner_overrides.items():
if not isinstance(users, list):
print(
f"Error: Override for '{path}' must be a list of usernames",
file=sys.stderr,
)
return 1
for user in users:
if not isinstance(user, str):
print(
f"Error: Override users for '{path}' must be strings",
file=sys.stderr,
)
return 1

# Normalize paths: strip leading "./" and trailing "/"
owner_overrides = {
path.lstrip("./").rstrip("/"): users
for path, users in owner_overrides.items()
}
owner_overrides = load_overrides(args.overrides_file)
except FileNotFoundError:
print(
f"Error: Overrides file not found: {args.overrides_file}",
file=sys.stderr,
)
return 1
except json.JSONDecodeError as e:
print(f"Error: Invalid JSON in overrides file: {e}", file=sys.stderr)
return 1
except Exception as e:
print(f"Error reading overrides file: {e}", file=sys.stderr)
except (json.JSONDecodeError, ValueError) as e:
print(f"Error in overrides file: {e}", file=sys.stderr)
return 1

try:
Expand Down
68 changes: 44 additions & 24 deletions scripts/codeowner_overrides.json
Original file line number Diff line number Diff line change
@@ -1,24 +1,44 @@
{
"scripts/codeowner_overrides.json": ["@ci-users"],

"flashinfer/attention.py": ["@flashinfer-ai/attention-owners"],
"include/flashinfer/attention": ["@flashinfer-ai/attention-owners"],
"csrc/fmha_v2": ["@flashinfer-ai/attention-owners"],
"csrc/xqa": ["@flashinfer-ai/attention-owners"],
"tests/attention": ["@flashinfer-ai/attention-owners"],

"flashinfer/gemm": ["@flashinfer-ai/gemm-owners"],
"include/flashinfer/gemm": ["@flashinfer-ai/gemm-owners"],
"tests/gemm": ["@flashinfer-ai/gemm-owners"],

"flashinfer/fused_moe": ["@flashinfer-ai/moe-owners"],
"csrc/fused_moe": ["@flashinfer-ai/moe-owners"],
"tests/moe": ["@flashinfer-ai/moe-owners"],

"flashinfer/comm": ["@flashinfer-ai/comm-owners"],
"include/flashinfer/comm": ["@flashinfer-ai/comm-owners"],
"tests/comm": ["@flashinfer-ai/comm-owners"],

"flashinfer/norm.py": ["@flashinfer-ai/misc-op-owners"],
"tests/norm": ["@flashinfer-ai/misc-op-owners"]
}
[
{
"labels": ["op: attention"],
"owners": {
"flashinfer/attention.py": ["@flashinfer-ai/attention-owners"],
"include/flashinfer/attention": ["@flashinfer-ai/attention-owners"],
"csrc/fmha_v2": ["@flashinfer-ai/attention-owners"],
"csrc/xqa": ["@flashinfer-ai/attention-owners"],
"tests/attention": ["@flashinfer-ai/attention-owners"]
}
},
{
"labels": ["op: gemm"],
"owners": {
"flashinfer/gemm": ["@flashinfer-ai/gemm-owners"],
"include/flashinfer/gemm": ["@flashinfer-ai/gemm-owners"],
"tests/gemm": ["@flashinfer-ai/gemm-owners"]
}
},
{
"labels": ["op: moe"],
"owners": {
"flashinfer/fused_moe": ["@flashinfer-ai/moe-owners"],
"csrc/fused_moe": ["@flashinfer-ai/moe-owners"],
"tests/moe": ["@flashinfer-ai/moe-owners"]
}
},
{
"labels": ["op: comm"],
"owners": {
"flashinfer/comm": ["@flashinfer-ai/comm-owners"],
"include/flashinfer/comm": ["@flashinfer-ai/comm-owners"],
"tests/comm": ["@flashinfer-ai/comm-owners"]
}
},
{
"labels": ["op: misc"],
"owners": {
"scripts/codeowner_overrides.json": ["@ci-users"],
"flashinfer/norm.py": ["@flashinfer-ai/misc-op-owners"],
"tests/norm": ["@flashinfer-ai/misc-op-owners"]
}
}
]
Loading
Loading