feat: separate human vs bot activity in health dashboard#3917
feat: separate human vs bot activity in health dashboard#3917marcusquinn merged 3 commits intomainfrom
Conversation
Replace unreliable 'productive hours' metric with commit-type breakdown using the committer email field to distinguish PR merges (GitHub squash- merge, committer=noreply@github.com) from direct commits. New columns: Direct | PR Merges | Total | Active Days | Avg/Day Add cross-repo-summary command that aggregates activity across all pulse-enabled repos without revealing repo names (privacy-safe). Health issues now show per-repo stats and cross-repo totals.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the contributor activity tracking system by providing a more accurate and insightful view of development efforts. It moves beyond a simple commit count to categorize contributions by their origin (direct push vs. PR merge) and offers a consolidated, high-level summary of activity across an entire portfolio of repositories. This change improves the understanding of team dynamics and overall project health by focusing on meaningful engagement rather than just raw output. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughEnhanced contributor activity analysis with commit-type awareness (direct commits vs PR merges), bot-filtering logic, and cross-repository aggregation. New Changes
Sequence DiagramsequenceDiagram
participant PW as pulse-wrapper.sh
participant CAH as contributor-activity-helper.sh
participant RJ as repos.json
participant R as Repository
PW->>RJ: Check if repos exist
alt Multiple repos detected
PW->>PW: Initialize cross_repo_md
PW->>CAH: cross-repo-summary [paths] --period 30
CAH->>R: Iterate repositories
R->>CAH: Extract commits (filter bots)
CAH->>CAH: Classify commits (direct/PR merge)
CAH->>CAH: Aggregate per-repo metrics
CAH->>CAH: Compute totals across repos
CAH-->>PW: Return aggregated JSON/markdown
PW->>PW: Insert into pulse body
PW-->>PW: Include "Cross-Repo Totals" section
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sun Mar 8 19:06:54 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/contributor-activity-helper.sh:
- Around line 339-345: The aggregation currently concatenates per-repo JSON
counts (repo_json) into all_json and then sums active_days, which double-counts
users active in multiple repos on the same calendar day; change
compute_activity(...) calls to return per-user per-day keys (e.g., list of ISO
dates or raw commit rows) instead of only counts, then update the aggregation
logic that builds all_json and computes Active Days/Avg/Day to deduplicate by
unique user+date before counting active_days; specifically modify the
compute_activity caller in this script (where repo_json is assigned) to request
the day-level output and change the downstream merging logic (all_json assembly
and the active_days/Avg/Day rollup routines) to union user-date entries rather
than summing per-repo counts so cross-repo overlaps are not double-counted.
In @.agents/scripts/pulse-wrapper.sh:
- Around line 1974-1987: The cross-repo markdown is being regenerated inside the
per-repo loop inside _update_health_issue_for_repo, causing redundant scans;
move the logic that builds cross_repo_md (the block using repos_json_path, jq,
cross_args and calling bash "$activity_helper" cross-repo-summary) out of
_update_health_issue_for_repo and build it once in update_health_issues() before
iterating repos, store it in a shared variable (e.g., shared_cross_repo_md) and
pass that value into each call to _update_health_issue_for_repo (or memoize
access) so the heavy git log walk runs only once per pulse.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d18a8996-b6b4-45e6-808a-da864899cbcc
📒 Files selected for processing (2)
.agents/scripts/contributor-activity-helper.sh.agents/scripts/pulse-wrapper.sh
There was a problem hiding this comment.
Code Review
This pull request enhances contributor activity metrics by replacing the less reliable "productive hours" metric with a breakdown of direct commits versus PR merges, and introduces a cross-repository activity summary for a holistic view. A critical security concern has been identified: a potential command injection vulnerability in .agents/scripts/contributor-activity-helper.sh due to unquoted variable usage in a shell command, which should be addressed by proper quoting. Additionally, a significant logical error in the calculation of cross-repository active days results in inflated metrics, which is critical to address for the reliability of this feature. There is also a minor maintainability concern with some duplicated Python helper functions.
| local git_data | ||
| # shellcheck disable=SC2086 | ||
| git_data=$(git -C "$repo_path" log --all --format='%ae|%aI' $since_arg) || git_data="" | ||
| git_data=$(git -C "$repo_path" log --all --format='%ae|%ce|%aI' $since_arg) || git_data="" |
There was a problem hiding this comment.
The variable $since_arg is used without double quotes in the git log command. While this is likely intended to allow word splitting for multiple arguments (e.g., --since='...'), it can lead to command injection if the input used to construct $since_arg is not strictly validated. In the cross_repo_summary function, the period argument is taken directly from the command line and passed to compute_activity, which then uses it to build $since_arg. An attacker could potentially inject arbitrary shell commands by providing a malicious --period value.
| git_data=$(git -C "$repo_path" log --all --format='%ae|%ce|%aI' $since_arg) || git_data="" | |
| git_data=$(git -C "$repo_path" log --all --format='%ae|%ce|%aI' "$since_arg") || git_data="" |
References
- To prevent command injection from shell metacharacters, wrap variables in single quotes. This is a POSIX-compliant method to prevent all shell expansion, including
$(...), backticks, and variable expansion. While double quotes are used in the suggestion, the core principle of preventing command injection through proper quoting is aligned with this rule.
| 'active_days': active_days, | ||
| 'productive_hours': round(total_hours, 1), | ||
| 'avg_commits_per_day': round(avg_per_day, 1) |
There was a problem hiding this comment.
To enable correct aggregation of active days in the cross-repo-summary command, the JSON output from this function needs to include the raw list of active days for each contributor. This is a prerequisite for the fix in cross_repo_summary.
| 'active_days': active_days, | |
| 'productive_hours': round(total_hours, 1), | |
| 'avg_commits_per_day': round(avg_per_day, 1) | |
| 'active_days': active_days, | |
| 'active_days_list': list(data['days']), | |
| 'avg_commits_per_day': round(avg_per_day, 1) |
| # Aggregate per contributor across all repos | ||
| totals = {} | ||
| for repo in repos: | ||
| for entry in repo.get('data', []): | ||
| login = entry['login'] | ||
| if login not in totals: | ||
| totals[login] = { | ||
| 'direct_commits': 0, | ||
| 'pr_merges': 0, | ||
| 'total_commits': 0, | ||
| 'active_days': 0, | ||
| 'repo_count': 0, | ||
| } | ||
| totals[login]['direct_commits'] += entry.get('direct_commits', 0) | ||
| totals[login]['pr_merges'] += entry.get('pr_merges', 0) | ||
| totals[login]['total_commits'] += entry.get('total_commits', 0) | ||
| totals[login]['active_days'] += entry.get('active_days', 0) | ||
| if entry.get('total_commits', 0) > 0: | ||
| totals[login]['repo_count'] += 1 | ||
|
|
||
| results = [] | ||
| for login, data in sorted(totals.items(), key=lambda x: -x[1]['total_commits']): | ||
| avg = data['total_commits'] / data['active_days'] if data['active_days'] > 0 else 0 | ||
| results.append({ | ||
| 'login': login, | ||
| 'direct_commits': data['direct_commits'], | ||
| 'pr_merges': data['pr_merges'], | ||
| 'total_commits': data['total_commits'], | ||
| 'active_days': data['active_days'], | ||
| 'repos_active': data['repo_count'], | ||
| 'avg_commits_per_day': round(avg, 1), | ||
| }) |
There was a problem hiding this comment.
The current implementation incorrectly calculates active_days in the cross-repo summary by summing the counts from each repository. This leads to inflated values if a contributor is active in multiple repos on the same day. The correct approach is to count the number of unique active days across all repositories. This also makes the Avg/Day calculation inaccurate.
This can be fixed by using a set to collect unique day strings from the active_days_list (which needs to be added to the compute_activity output) and then calculating the total active days from the size of the set.
# Aggregate per contributor across all repos
totals = {}
for repo in repos:
for entry in repo.get('data', []):
login = entry['login']
if login not in totals:
totals[login] = {
'direct_commits': 0,
'pr_merges': 0,
'total_commits': 0,
'active_days_set': set(),
'repo_count': 0,
}
totals[login]['direct_commits'] += entry.get('direct_commits', 0)
totals[login]['pr_merges'] += entry.get('pr_merges', 0)
totals[login]['total_commits'] += entry.get('total_commits', 0)
totals[login]['active_days_set'].update(entry.get('active_days_list', []))
if entry.get('total_commits', 0) > 0:
totals[login]['repo_count'] += 1
results = []
for login, data in sorted(totals.items(), key=lambda x: -x[1]['total_commits']):
active_days = len(data['active_days_set'])
avg = data['total_commits'] / active_days if active_days > 0 else 0
results.append({
'login': login,
'direct_commits': data['direct_commits'],
'pr_merges': data['pr_merges'],
'total_commits': data['total_commits'],
'active_days': active_days,
'repos_active': data['repo_count'],
'avg_commits_per_day': round(avg, 1),
})| def email_to_login(email): | ||
| if email.endswith('@users.noreply.github.com'): | ||
| local = email.split('@')[0] | ||
| return local.split('+', 1)[1] if '+' in local else local | ||
| local_part = email.split('@')[0] | ||
| return local_part.split('+', 1)[1] if '+' in local_part else local_part | ||
| if email in ('actions@github.com', 'action@github.com'): | ||
| return 'github-actions' | ||
| return email.split('@')[0] | ||
|
|
||
| def is_pr_merge(committer_email): | ||
| return committer_email == 'noreply@github.com' |
There was a problem hiding this comment.
The helper functions email_to_login and is_pr_merge are duplicated between the Python scripts in compute_activity and user_activity. This creates a maintainability issue, as any changes to this logic would need to be applied in multiple places. Consider refactoring this shared logic into a common location that can be sourced by both Python snippets to avoid duplication. For example, you could define the functions in a separate Python file and prepend its content to the python3 -c command string.
- JSON output now includes active_days_list for cross-repo set union - Cross-repo active days uses set union instead of sum, fixing inflation where same-day commits in multiple repos were double-counted - Cross-repo summary computed once in update_health_issues() and passed to each _update_health_issue_for_repo() call, avoiding N×N git walks - Extract shared Python helpers (email_to_login, is_bot, is_pr_merge) into PYTHON_HELPERS shell variable, injected into all Python blocks - Add comment explaining SC2086 since_arg safety (hardcoded case values)
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sun Mar 8 19:18:58 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
@coderabbitai review Addressed both review findings:
|
|
Triggering a fresh review now to assess the updated implementation. 🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/contributor-activity-helper.sh:
- Around line 148-151: Replace the literal equality check against
'github-actions' with the existing is_bot(...) helper: after computing
committer_login = email_to_login(committer_email) use is_bot(committer_login)
(or is_bot(committer_email) if that helper expects an email) to skip any bot
committers; update the identical check later in the script (the other occurrence
using committer_login) to the same is_bot(...) call so all bot variants like
'github-actions[bot]' or 'dependabot[bot]' are filtered consistently.
- Around line 339-345: The loop over repo_paths in cross-repo-summary silently
skips non-git paths (the if checking "$rp/.git" then continue), causing false
"no activity" exits; change this to fail fast: when a path lacks a .git entry,
write a clear error to stderr including the offending repo path (use the same
variable rp) and exit with a non-zero status instead of continuing. Update
handling around compute_activity and repo_json so the script does not mask input
errors (i.e., only allow repo_json="[]" for genuine compute_activity failures,
not for invalid paths), and ensure any callers of this script see the non-zero
exit code on invalid repo paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0b2b9191-9d6b-47bb-be7c-8008288932d5
📒 Files selected for processing (2)
.agents/scripts/contributor-activity-helper.sh.agents/scripts/pulse-wrapper.sh
| for rp in "${repo_paths[@]}"; do | ||
| if [[ ! -d "$rp/.git" && ! -f "$rp/.git" ]]; then | ||
| continue | ||
| fi | ||
| local repo_json | ||
| repo_json=$(compute_activity "$rp" "$period" "json") || repo_json="[]" | ||
| if [[ "$first" == "true" ]]; then |
There was a problem hiding this comment.
Fail fast on invalid repo paths in cross-repo-summary.
Right now nongit inputs are silently skipped, so this command can exit 0 and print “no activity” for bad paths. That makes operator mistakes look like empty data instead of an input error.
Suggested fix
for rp in "${repo_paths[@]}"; do
if [[ ! -d "$rp/.git" && ! -f "$rp/.git" ]]; then
- continue
+ echo "Error: $rp is not a git repository" >&2
+ return 1
fi
local repo_json
repo_json=$(compute_activity "$rp" "$period" "json") || repo_json="[]"As per coding guidelines, automation scripts should focus on clear logging and feedback and proper exit codes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/contributor-activity-helper.sh around lines 339 - 345, The
loop over repo_paths in cross-repo-summary silently skips non-git paths (the if
checking "$rp/.git" then continue), causing false "no activity" exits; change
this to fail fast: when a path lacks a .git entry, write a clear error to stderr
including the offending repo path (use the same variable rp) and exit with a
non-zero status instead of continuing. Update handling around compute_activity
and repo_json so the script does not mask input errors (i.e., only allow
repo_json="[]" for genuine compute_activity failures, not for invalid paths),
and ensure any callers of this script see the non-zero exit code on invalid repo
paths.
- Replace literal 'github-actions' check with is_bot(committer_login) in both compute_activity and user_activity Python blocks, catching dependabot[bot] and other bot committers consistently - Add stderr warning for invalid repo paths in cross-repo-summary instead of silent skip (keeps resilient behaviour for unmounted drives)
|
@coderabbitai review Round 2 fixes applied:
|
|
Triggering a fresh review now against the current state of the branch. 🧠 Learnings used✅ Actions performedReview triggered.
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sun Mar 8 19:27:19 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|



Summary
cross-repo-summarycommand incontributor-activity-helper.shfor aggregating stats across multiple reposChanges
contributor-activity-helper.sh%ce) alongside author email (%ae)is_pr_merge()detects GitHub squash-merges (committer =noreply@github.com)productive_hoursanddaily_spanstracking — replaced withdirect_commitsandpr_mergescounterscross-repo-summarycommand accepts multiple repo paths, outputs privacy-safe aggregated tableusercommand updated with same direct/PR-merge breakdownContributor | Direct | PR Merges | Total | Active Days | Avg/DayReposcolumn showing how many repos each contributor is active inpulse-wrapper.shcross-repo-summarywith all pulse-enabled repo paths fromrepos.jsonTesting
Verified against all 8 managed repos:
Summary by CodeRabbit
New Features
Updates
Removed