Skip to content

Add weekly agent metrics aggregation#104

Closed
stranske wants to merge 11 commits intomainfrom
codex/github-mention-chorecodex-bootstrap-pr-for-issue-#93-la00v3
Closed

Add weekly agent metrics aggregation#104
stranske wants to merge 11 commits intomainfrom
codex/github-mention-chorecodex-bootstrap-pr-for-issue-#93-la00v3

Conversation

@stranske
Copy link
Copy Markdown
Owner

Summary

  • add metrics aggregation script to compute keepalive completion rate, autofix success, and verifier pass data from NDJSON artifacts
  • introduce weekly scheduled workflow to download recent metrics artifacts, run aggregation, and post a summary to issue Add workflow execution metrics and observability dashboard #93

Testing

  • python scripts/aggregate_agent_metrics.py --input tmp-metrics --output tmp-summary.md

Codex Task

Copilot AI review requested due to automatic review settings December 24, 2025 12:12
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +90 to +94
mkdir -p metrics-artifacts/extracted
shopt -s nullglob
for archive in metrics-artifacts/*.zip; do
unzip -o "$archive" -d metrics-artifacts/extracted
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent overwriting metrics during artifact extraction

The weekly workflow unzips every matching artifact into a shared metrics-artifacts/extracted directory with unzip -o in a single loop. Each artifact produced by the keepalive/autofix/verifier jobs contains identically named NDJSON files (e.g., keepalive-metrics.ndjson), so extracting multiple archives from the last 35 days will overwrite earlier files rather than accumulate them. As a result, the aggregation step runs on only the last archive processed and the summary posted to issue #93 misses most of the period’s data.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces automated weekly aggregation of agent workflow metrics to track keepalive completion rates, autofix success, and verifier performance.

  • Adds a Python script to parse NDJSON metrics artifacts and generate markdown summaries with key performance indicators
  • Implements a scheduled GitHub Actions workflow that runs weekly on Mondays to download artifacts, aggregate metrics, and post results to issue #93
  • Configures artifact retention and summary generation for weekly reporting

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File Description
scripts/aggregate_agent_metrics.py New Python script that loads NDJSON metrics from artifacts, computes completion rates, success percentages, and generates a formatted markdown summary
.github/workflows/agents-metrics-weekly.yml New weekly scheduled workflow that downloads recent metrics artifacts, extracts them, runs the aggregation script, and publishes results to an issue and artifact

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +76 to +78
recorded_at = _parse_datetime(str(record.get("recorded_at", ""))) or cutoff
if recorded_at < cutoff:
continue
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a record's recorded_at cannot be parsed, it defaults to cutoff, causing the record to be excluded (line 77-78 checks recorded_at < cutoff). This means records with invalid timestamps are silently dropped. Consider using a different default like datetime.now(timezone.utc) to include records with unparseable timestamps, or add explicit logging when timestamps fail to parse.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +70
while (true) {
const { data } = await github.rest.actions.listArtifactsForRepo({
owner,
repo,
per_page: perPage,
page,
});

const artifacts = data.artifacts || [];
for (const artifact of artifacts) {
const created = new Date(artifact.created_at);
if (artifact.expired) continue;
if (created < cutoff) continue;
if (!patterns.some((re) => re.test(artifact.name))) continue;
matches.push(artifact);
}

if (artifacts.length < perPage) break;
page += 1;
if (page > 10) break;
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pagination logic has a hardcoded limit of 10 pages (line 70), which means at most 1000 artifacts will be checked (100 per page × 10 pages). If the repository has more artifacts, some recent ones might be missed. Consider either removing this limit or making it configurable, especially since artifacts are filtered by date afterward.

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +115
pr = int(rec.get("pr_number") or 0)
pr_counts[pr] += 1
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR number 0 (from records missing pr_number) is treated as a valid PR and included in the completion rate calculation. This could affect accuracy. Consider filtering out records with pr_number of 0 or None, or handling them separately as invalid records.

Copilot uses AI. Check for mistakes.
const body = fs.existsSync(summaryPath)
? fs.readFileSync(summaryPath, 'utf-8')
: 'No metrics available for this period.';
const issueNumber = 93;
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue number is hardcoded as 93. If this issue doesn't exist or is closed, the workflow will fail silently. Consider making this configurable via an environment variable or workflow input, and add error handling to check if the issue exists and is open before attempting to post.

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +119
stop = str(rec.get("stop_reason", "")).lower()
if stop:
stop_reasons[stop] += 1
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stop reason can be an empty string (line 117-118), and when it's empty, it's still added to the counter (line 119). This means empty strings will appear in the "top stop reasons" output. Consider skipping the counter update when stop is an empty string or falsy value.

Copilot uses AI. Check for mistakes.
Base automatically changed from codex/issue-93 to main December 24, 2025 20:02
stranske and others added 4 commits December 24, 2025 23:50
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@stranske
Copy link
Copy Markdown
Owner Author

@codex fix comments

@chatgpt-codex-connector
Copy link
Copy Markdown

Summary

  • Updated the weekly metrics workflow to avoid overwriting extracted NDJSON files, added configurable pagination and artifact/issue settings, and validated the target issue before posting summaries. .github/workflows/agents-metrics-weekly.ymlL17-L147
  • Hardened metrics aggregation by logging invalid timestamps, keeping records with unparsable dates, filtering out non-positive PR numbers, trimming empty stop reasons, and computing autofix success per PR. scripts/aggregate_agent_metrics.pyL10-L190

Testing

  • python -m compileall scripts/aggregate_agent_metrics.py

View task →

Copy link
Copy Markdown
Owner Author

Closing this PR - after resolving conflicts with main, there's no unique content remaining:

  1. agents-keepalive-loop.yml - main's version already has the auto-reconcile step (from PR Add comprehensive error recovery and retry logic to agent workflows #102/fix(security): reduce prompt injection guard false positives #126)
  2. aggregate_agent_metrics.py - main's version (using env vars) is more recent and tested
  3. agents-metrics-weekly.yml - main already has agents-weekly-metrics.yml (similar weekly metrics workflow)

The weekly metrics workflow concept from this PR was valuable, but main already has an implementation at .github/workflows/agents-weekly-metrics.yml.

Note: Main's agents-weekly-metrics.yml has a bug - it calls the script with CLI args (--artifacts-dir, --output) that don't exist. The script uses environment variables. Will fix that separately.

@stranske stranske closed this Dec 25, 2025
stranske added a commit that referenced this pull request Dec 25, 2025
The aggregate_agent_metrics.py script uses environment variables
(METRICS_DIR, OUTPUT_PATH), not CLI arguments. The workflow was
calling it with --artifacts-dir and --output which don't exist.

Found while evaluating PR #104 (now closed as redundant).
stranske added a commit that referenced this pull request Dec 25, 2025
The aggregate_agent_metrics.py script uses environment variables
(METRICS_DIR, OUTPUT_PATH), not CLI arguments. The workflow was
calling it with --artifacts-dir and --output which don't exist.

Found while evaluating PR #104 (now closed as redundant).
@stranske stranske deleted the codex/github-mention-chorecodex-bootstrap-pr-for-issue-#93-la00v3 branch December 25, 2025 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants