Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| mkdir -p metrics-artifacts/extracted | ||
| shopt -s nullglob | ||
| for archive in metrics-artifacts/*.zip; do | ||
| unzip -o "$archive" -d metrics-artifacts/extracted | ||
| done |
There was a problem hiding this comment.
Prevent overwriting metrics during artifact extraction
The weekly workflow unzips every matching artifact into a shared metrics-artifacts/extracted directory with unzip -o in a single loop. Each artifact produced by the keepalive/autofix/verifier jobs contains identically named NDJSON files (e.g., keepalive-metrics.ndjson), so extracting multiple archives from the last 35 days will overwrite earlier files rather than accumulate them. As a result, the aggregation step runs on only the last archive processed and the summary posted to issue #93 misses most of the period’s data.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR introduces automated weekly aggregation of agent workflow metrics to track keepalive completion rates, autofix success, and verifier performance.
- Adds a Python script to parse NDJSON metrics artifacts and generate markdown summaries with key performance indicators
- Implements a scheduled GitHub Actions workflow that runs weekly on Mondays to download artifacts, aggregate metrics, and post results to issue #93
- Configures artifact retention and summary generation for weekly reporting
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
| scripts/aggregate_agent_metrics.py | New Python script that loads NDJSON metrics from artifacts, computes completion rates, success percentages, and generates a formatted markdown summary |
| .github/workflows/agents-metrics-weekly.yml | New weekly scheduled workflow that downloads recent metrics artifacts, extracts them, runs the aggregation script, and publishes results to an issue and artifact |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| recorded_at = _parse_datetime(str(record.get("recorded_at", ""))) or cutoff | ||
| if recorded_at < cutoff: | ||
| continue |
There was a problem hiding this comment.
When a record's recorded_at cannot be parsed, it defaults to cutoff, causing the record to be excluded (line 77-78 checks recorded_at < cutoff). This means records with invalid timestamps are silently dropped. Consider using a different default like datetime.now(timezone.utc) to include records with unparseable timestamps, or add explicit logging when timestamps fail to parse.
| while (true) { | ||
| const { data } = await github.rest.actions.listArtifactsForRepo({ | ||
| owner, | ||
| repo, | ||
| per_page: perPage, | ||
| page, | ||
| }); | ||
|
|
||
| const artifacts = data.artifacts || []; | ||
| for (const artifact of artifacts) { | ||
| const created = new Date(artifact.created_at); | ||
| if (artifact.expired) continue; | ||
| if (created < cutoff) continue; | ||
| if (!patterns.some((re) => re.test(artifact.name))) continue; | ||
| matches.push(artifact); | ||
| } | ||
|
|
||
| if (artifacts.length < perPage) break; | ||
| page += 1; | ||
| if (page > 10) break; |
There was a problem hiding this comment.
The pagination logic has a hardcoded limit of 10 pages (line 70), which means at most 1000 artifacts will be checked (100 per page × 10 pages). If the repository has more artifacts, some recent ones might be missed. Consider either removing this limit or making it configurable, especially since artifacts are filtered by date afterward.
| pr = int(rec.get("pr_number") or 0) | ||
| pr_counts[pr] += 1 |
There was a problem hiding this comment.
PR number 0 (from records missing pr_number) is treated as a valid PR and included in the completion rate calculation. This could affect accuracy. Consider filtering out records with pr_number of 0 or None, or handling them separately as invalid records.
| const body = fs.existsSync(summaryPath) | ||
| ? fs.readFileSync(summaryPath, 'utf-8') | ||
| : 'No metrics available for this period.'; | ||
| const issueNumber = 93; |
There was a problem hiding this comment.
The issue number is hardcoded as 93. If this issue doesn't exist or is closed, the workflow will fail silently. Consider making this configurable via an environment variable or workflow input, and add error handling to check if the issue exists and is open before attempting to post.
| stop = str(rec.get("stop_reason", "")).lower() | ||
| if stop: | ||
| stop_reasons[stop] += 1 |
There was a problem hiding this comment.
The stop reason can be an empty string (line 117-118), and when it's empty, it's still added to the counter (line 119). This means empty strings will appear in the "top stop reasons" output. Consider skipping the counter update when stop is an empty string or falsy value.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@codex fix comments |
|
Summary
Testing
|
|
Closing this PR - after resolving conflicts with main, there's no unique content remaining:
The weekly metrics workflow concept from this PR was valuable, but main already has an implementation at Note: Main's |
The aggregate_agent_metrics.py script uses environment variables (METRICS_DIR, OUTPUT_PATH), not CLI arguments. The workflow was calling it with --artifacts-dir and --output which don't exist. Found while evaluating PR #104 (now closed as redundant).
The aggregate_agent_metrics.py script uses environment variables (METRICS_DIR, OUTPUT_PATH), not CLI arguments. The workflow was calling it with --artifacts-dir and --output which don't exist. Found while evaluating PR #104 (now closed as redundant).
Summary
Testing
python scripts/aggregate_agent_metrics.py --input tmp-metrics --output tmp-summary.mdCodex Task