ci: Create Github Action to Automate CODEOWNER update#1870
Conversation
This reverts commit 3934511.
Testing if the token has permission to create PRs from pull_request trigger. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Use workflow_dispatch for testing instead to ensure secrets are accessible. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary of ChangesHello @yzh119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds configurability for directory depth and the number of top owners in the codeowner_analyzer.py script. The changes are logical and correctly plumb the new parameters through the script. However, I've found a high-severity issue where negative values for the new command-line arguments can lead to incorrect and unexpected behavior. I've also noted that tests for this new functionality are missing. Please see my detailed comments for suggestions on how to address these points.
| parser.add_argument( | ||
| "--depth", | ||
| type=int, | ||
| default=3, | ||
| help="Maximum directory depth for module detection (default: 3)", | ||
| ) | ||
| parser.add_argument( | ||
| "--top-n", | ||
| type=int, | ||
| default=3, | ||
| help="Number of top owners to include in CODEOWNERS file (default: 3)", | ||
| ) |
There was a problem hiding this comment.
The new command-line arguments --depth and --top-n are defined to accept any integer. This allows negative values, which can lead to incorrect behavior:
- A negative
--depthwill causemin(len(path_parts), self.max_depth)to likely return a negative number, resulting in no modules being detected without any warning. - A negative
--top-nwill be interpreted as a slice from the end of the owners list due to Python's slicing behavior (e.g.,-1selects all but the last owner). This is not the intended behavior for selecting the top N owners and will produce an incorrectCODEOWNERSfile.
To prevent this, you should validate that these arguments are non-negative integers. You can do this by creating a custom type function for argparse.
Here is an example of a validator function you could add before main():
import argparse
def non_negative_int(value):
"""Custom argparse type for non-negative integers."""
try:
ivalue = int(value)
if ivalue < 0:
raise argparse.ArgumentTypeError(f"{value} is an invalid non-negative int value")
return ivalue
except ValueError:
raise argparse.ArgumentTypeError(f"{value} is not an integer")You can then use it in add_argument like this:
type=non_negative_int
| parser.add_argument( | ||
| "--depth", | ||
| type=int, | ||
| default=3, | ||
| help="Maximum directory depth for module detection (default: 3)", | ||
| ) | ||
| parser.add_argument( | ||
| "--top-n", | ||
| type=int, | ||
| default=3, | ||
| help="Number of top owners to include in CODEOWNERS file (default: 3)", | ||
| ) |
There was a problem hiding this comment.
While adding these new configuration options is a great improvement, the pull request is missing tests to verify the new functionality. Adding tests is crucial for ensuring the features work as expected and to prevent future regressions.
Please consider adding unit tests that cover:
- The
--depthargument correctly limits the module hierarchy being generated. - The
--top-nargument correctly selects the specified number of top owners. - Edge cases, such as
--depth=0or--top-n=0.
|
LGTM |
|
Should we add tests/ to the codeowners anyway (noticed it got intentionally excluded)? If a test fails, it may be good to find the person who is most knowledgeable about it. Sometimes it's easy to find the associated module, sometimes it may not be. Just a nitpick, since we could also use git blame. |
|
Updated the script to use github CLI to extract username. Added
Great suggestion! we can work on this in later PRs :) @nvmbreughe |
nvmbreughe
left a comment
There was a problem hiding this comment.
LGTM. Left minor suggestion
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughWeekly-scheduled and on-demand workflow added to regenerate .github/CODEOWNERS via a refactored analyzer that replaces GitHub API token use with GitHub CLI-based username lookups, adds configurable max depth and top-owner limits, and uses an authorized users list. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as GitHub Scheduler
participant Runner as Actions Runner
participant Analyzer as codeowner_analyzer.py
participant GH as GitHub CLI (gh)
participant Repo as Repository
participant PR as PR Creator (peter-evans)
Scheduler->>Runner: Trigger (weekly or manual)
Runner->>Runner: Checkout repo, setup Python 3.11
Runner->>Analyzer: Run (e.g., --depth 3 --top-n 5)
Analyzer->>Analyzer: Verify `gh` exists (fail if missing)
Analyzer->>Repo: Scan commits (min_commits, days_back)
Analyzer->>Analyzer: Discover modules (limit by max_depth)
loop per author email
Analyzer->>GH: Query commits / author login via gh
GH-->>Analyzer: username or no-result
end
Analyzer->>Analyzer: Score owners, pick top_n_owners (score>0.1)
Analyzer->>Repo: Write .github/CODEOWNERS
Runner->>Runner: Detect diff (new/modified)
alt changes detected
Runner->>PR: Create PR (peter-evans/create-pull-request)
PR->>Repo: Push branch, open PR, set metadata, delete branch after merge
else no changes
Runner->>Runner: Skip PR creation
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
scripts/codeowner_analyzer.py (2)
588-606: Add tests for new flags (--depth, --top-n) and edge casesThe new CLI options lack tests (including 0 values). Please add minimal unit tests for module depth limiting and owner count selection. Previously raised.
I can scaffold tests (pytest) that feed a small fixture repo and assert the output lines for given flags. Want me to open a PR?
633-644: Validate --depth and --top-n as non‑negative integersNegative values silently break behavior (
rangeempty for depth; Python slicing semantics for negative top‑n). Add a validator and use it here. This was raised earlier; still unresolved.Apply:
- parser.add_argument( - "--depth", - type=int, - default=3, - help="Maximum directory depth for module detection (default: 3)", - ) + parser.add_argument( + "--depth", + type=non_negative_int, + default=3, + help="Maximum directory depth for module detection (non-negative; default: 3)", + ) @@ - parser.add_argument( - "--top-n", - type=int, - default=3, - help="Number of top owners to include in CODEOWNERS file (default: 3)", - ) + parser.add_argument( + "--top-n", + type=non_negative_int, + default=3, + help="Number of top owners to include in CODEOWNERS file (non-negative; default: 3)", + )Add the validator near the imports:
+import argparse @@ +def non_negative_int(value: str) -> int: + try: + ivalue = int(value) + except ValueError as e: + raise argparse.ArgumentTypeError(f"{value} is not an integer") from e + if ivalue < 0: + raise argparse.ArgumentTypeError(f"{value} must be >= 0") + return ivalue.github/workflows/update-codeowners.yml (1)
7-9: Removepull_requesttrigger to prevent unintended re-runs and resource wasteYour concern is valid. The workflow currently has
pull_request:on line 9, which contradicts the git history (commits f038549 and 7547693 show prior attempts to remove it). Since this workflow creates pull requests viapeter-evans/create-pull-request@v7, the trigger can cause unnecessary re-runs on unrelated PRs that modify.github/CODEOWNERS.Recommended fix: Remove the
pull_request:trigger entirely, as the scheduled and manual triggers suffice for the workflow's intent. If you must retain it for specific scenarios, scope it with a path filter:on: schedule: # Run weekly on Monday at 00:00 UTC - cron: '0 0 * * 1' workflow_dispatch: # Allow manual triggering - pull_request: + pull_request: + paths: + - '.github/CODEOWNERS'
🧹 Nitpick comments (8)
scripts/authorized_codeowner.txt (1)
1-21: Normalize and document the allowlist file for longevity
- Convert all handles to lowercase (your code lowercases, but the file looks mixed-case).
- Sort the list and ensure a trailing newline to reduce diff churn and ease reviews.
- Add a short header comment explaining the source of truth and review process.
Apply minimal hygiene:
+# Authorized GitHub usernames for CODEOWNERS generation +# One per line, lowercase. Keep sorted. Update via PR only. aleozlx -Amir-19 +amir-19 -Anerudhan +anerudhan azhurkevich bkryu cyx-6 dierksen IwakuraRein joker-eph kahyunnam kaixih nv-yunzheq nvmbreughe paul841029 Quackens sergachev sunggg ttyio wenscarl yongwww yzh119 +``` </blockquote></details> <details> <summary>.github/workflows/update-codeowners.yml (2)</summary><blockquote> `14-18`: **Add concurrency to avoid overlapping weekly runs** Prevents two scheduled jobs from racing if a previous one runs long. ```diff jobs: update-codeowners: runs-on: ubuntu-latest timeout-minutes: 30 + concurrency: + group: update-codeowners-${{ github.ref }} + cancel-in-progress: false
25-29: De-duplicate configuration to prevent drift
days-back,min-commits,depth, andtop-nappear both in the run step and in the PR body. Use job-level env and reference them in both places to keep them in sync.update-codeowners: runs-on: ubuntu-latest timeout-minutes: 30 + env: + DAYS_BACK: 180 + MIN_COMMITS: 1 + DEPTH: 3 + TOP_N: 5 @@ python scripts/codeowner_analyzer.py \ --output .github/CODEOWNERS \ - --depth 3 \ - --min-commits 1 \ - --days-back 180 \ - --top-n 5 \ + --depth $DEPTH \ + --min-commits $MIN_COMMITS \ + --days-back $DAYS_BACK \ + --top-n $TOP_N \ --allowed-users-file scripts/authorized_codeowner.txt @@ - Minimum commits threshold: 1 - - Analysis period: 180 days - - Directory depth: 3 levels - - Top N owners per module: 5 + - Analysis period: ${{ env.DAYS_BACK }} days + - Directory depth: ${{ env.DEPTH }} levels + - Top N owners per module: ${{ env.TOP_N }}Also applies to: 83-105
scripts/codeowner_analyzer.py (5)
274-276: Tests exclusion logic is confusing; simplify to match intentCurrent check excludes any path containing “test” unless it starts with “./tests/”.
git ls-filestypically doesn’t prefix with “./”, so this effectively excludestests/anyway. If the intent is to exclude tests for now, simplify.- # Skip test directories unless specifically analyzing tests - if "test" in file_path.lower() and not file_path.startswith("./tests/"): - continue + # Skip tests by default (can add a CLI flag later to include) + parts = Path(file_path).parts + if any(p.lower() in {"test", "tests"} for p in parts): + continue
84-99: Hard dependency on gh CLI: consider graceful fallback or clearer errorFailing early is fine, but the multi-line ValueError trips TRY003 and makes local use noisy. Either degrade gracefully when
allowed_usersis None, or tighten the error.- except ( + except ( subprocess.CalledProcessError, FileNotFoundError, subprocess.TimeoutExpired, ) as e: - raise ValueError( - "GitHub CLI (gh) is not installed or not available in PATH.\n" - "Please install it from: https://cli.github.com/\n" - "Or use package manager: brew install gh / apt install gh / etc." - ) from e + raise ValueError("GitHub CLI (gh) not found; install it to resolve contributors to @usernames.") from eOptionally, allow running without
ghby skipping username lookups and relying on emails (valid in most cases). (docs.github.com)
170-189: gh API call robustness and auth
- Ensure
ghis authenticated viaGH_TOKEN(the workflow fix addresses this). (docs.github.com)- Consider adding
--hostname $GH_HOSTsupport for GHE if ever needed.- Add
per_page=1&order=descand sort params are not guaranteed; you may want the most recent match explicitly.- gh_command = [ + gh_command = [ "gh", "api", f"repos/{repo_full}/commits?author={email}&per_page=1", "--jq", ".[0].author.login // empty", ]Note: keep using argument lists to avoid shell injection; this is already safe.
523-547: Owners fallback to email is acceptable, but note EMU caveatUsing email when a GitHub username can’t be resolved is allowed on GitHub.com but not for Enterprise Managed Users. Document this in a comment to avoid surprises if the repo ever moves orgs. (docs.github.com)
- # Fallback to email if no GitHub username found + # Fallback to email if no GitHub username found. + # Note: Email owners are supported on GitHub.com, but not for EMU accounts.
251-261: Subprocess error handling: avoid noisy stdout, keep stderr structured
run_git_commandprints errors directly; prefer raising with context or logging at debug to keep action logs clean.- except subprocess.CalledProcessError as e: - print(f"Error running git command: {' '.join(command)}") - print(f"Error: {e.stderr}") - return "" + except subprocess.CalledProcessError as e: + return ""Optionally add a
--debugflag to toggle verbose errors.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/update-codeowners.yml(1 hunks)scripts/authorized_codeowner.txt(1 hunks)scripts/codeowner_analyzer.py(10 hunks)
🧰 Additional context used
🪛 LanguageTool
scripts/authorized_codeowner.txt
[grammar] ~16-~16: Taalfout gevonden
Context: ...vmbreughe paul841029 Quackens sergachev sunggg ttyio wenscarl yongwww yzh119
(QB_NEW_NL_OTHER_ERROR_IDS_REPLACEMENT_OTHER)
🪛 Ruff (0.14.0)
scripts/codeowner_analyzer.py
87-87: Starting a process with a partial executable path
(S607)
94-98: Avoid specifying long messages outside the exception class
(TRY003)
180-180: subprocess call: check for execution of untrusted input
(S603)
188-188: Consider moving this statement to an else block
(TRY300)
190-190: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
| env: | ||
| GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
| run: | | ||
| python scripts/codeowner_analyzer.py \ | ||
| --output .github/CODEOWNERS \ | ||
| --depth 3 \ | ||
| --min-commits 1 \ | ||
| --days-back 180 \ | ||
| --top-n 5 \ | ||
| --allowed-users-file scripts/authorized_codeowner.txt | ||
|
|
There was a problem hiding this comment.
Authenticate gh with GH_TOKEN, not just GITHUB_TOKEN
The analyzer calls gh api. In Actions, gh requires GH_TOKEN to be set; relying solely on GITHUB_TOKEN may fail. Set both for clarity. (docs.github.com)
Apply:
- name: Run CODEOWNERS analyzer
env:
- GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # optional, keeps other tooling happy
run: |
python scripts/codeowner_analyzer.py \
--output .github/CODEOWNERS \
--depth 3 \
--min-commits 1 \
--days-back 180 \
--top-n 5 \
--allowed-users-file scripts/authorized_codeowner.txt🤖 Prompt for AI Agents
.github/workflows/update-codeowners.yml lines 31-41: the workflow only sets
GITHUB_TOKEN but the codeowner analyzer uses the GitHub CLI which expects
GH_TOKEN; update the env block to also export GH_TOKEN by assigning GH_TOKEN:
${{ secrets.GITHUB_TOKEN }} (so both GITHUB_TOKEN and GH_TOKEN are available to
the run) and ensure the GH_TOKEN line is added alongside the existing
GITHUB_TOKEN entry.
| commit-message: | | ||
| chore: update CODEOWNERS based on git history | ||
|
|
||
| Auto-generated CODEOWNERS update based on commit activity over the last 180 days. | ||
|
|
||
| 🤖 Generated with [Claude Code](https://claude.com/claude-code) | ||
|
|
||
| Co-Authored-By: Claude <noreply@anthropic.com> | ||
| branch: auto-update-codeowners |
There was a problem hiding this comment.
Remove external vendor branding from commit body
Commit message body includes references to “Claude Code” and “Co-Authored-By: Claude.” Prefer neutral/first‑party wording to avoid brand/legal noise.
commit-message: |
chore: update CODEOWNERS based on git history
-
- Auto-generated CODEOWNERS update based on commit activity over the last 180 days.
-
- 🤖 Generated with [Claude Code](https://claude.com/claude-code)
-
- Co-Authored-By: Claude <noreply@anthropic.com>
+ Auto-generated CODEOWNERS update based on commit activity over the last 180 days.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| commit-message: | | |
| chore: update CODEOWNERS based on git history | |
| Auto-generated CODEOWNERS update based on commit activity over the last 180 days. | |
| 🤖 Generated with [Claude Code](https://claude.com/claude-code) | |
| Co-Authored-By: Claude <noreply@anthropic.com> | |
| branch: auto-update-codeowners | |
| commit-message: | | |
| chore: update CODEOWNERS based on git history | |
| Auto-generated CODEOWNERS update based on commit activity over the last 180 days. | |
| branch: auto-update-codeowners |
🤖 Prompt for AI Agents
.github/workflows/update-codeowners.yml lines 68-76: the autogenerated commit
body includes external vendor branding ("Claude Code" and "Co-Authored-By:
Claude"); remove or replace those references with neutral first‑party wording.
Update the commit-message block to omit the "🤖 Generated with [Claude Code]..."
line and remove the "Co-Authored-By" trailer, or replace with a neutral note
like "Auto-generated" and ensure the branch and message remain intact.
📌 Description
Duplicate of #1869 but created from flashinfer/workflow-update-codeowner to make sure we have permission.
This PR introduces a GitHub Action that automatically creates pull requests to update the CODEOWNERS file.
This PR also adds functionality to the
codeowner_analyzer.py:--depth: Maximum directory depth for module detection (default: 3)--top-n: Number of top owners to include in CODEOWNERS file (default: 3)Example can be found at https://github.com/flashinfer-ai/flashinfer/pull/1871/files
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
cc @sricketts @yongwww @nvmbreughe @bkryu @dierksen
Summary by CodeRabbit
New Features
Chores
Behavior