Add weekly agent metrics aggregation by stranske · Pull Request #104 · stranske/Workflows

stranske · 2025-12-24T12:12:31Z

Summary

add metrics aggregation script to compute keepalive completion rate, autofix success, and verifier pass data from NDJSON artifacts
introduce weekly scheduled workflow to download recent metrics artifacts, run aggregation, and post a summary to issue Add workflow execution metrics and observability dashboard #93

Testing

python scripts/aggregate_agent_metrics.py --input tmp-metrics --output tmp-summary.md

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-24T12:15:16Z

.github/workflows/agents-metrics-weekly.yml

+          mkdir -p metrics-artifacts/extracted
+          shopt -s nullglob
+          for archive in metrics-artifacts/*.zip; do
+            unzip -o "$archive" -d metrics-artifacts/extracted
+          done


Prevent overwriting metrics during artifact extraction

The weekly workflow unzips every matching artifact into a shared metrics-artifacts/extracted directory with unzip -o in a single loop. Each artifact produced by the keepalive/autofix/verifier jobs contains identically named NDJSON files (e.g., keepalive-metrics.ndjson), so extracting multiple archives from the last 35 days will overwrite earlier files rather than accumulate them. As a result, the aggregation step runs on only the last archive processed and the summary posted to issue #93 misses most of the period’s data.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR introduces automated weekly aggregation of agent workflow metrics to track keepalive completion rates, autofix success, and verifier performance.

Adds a Python script to parse NDJSON metrics artifacts and generate markdown summaries with key performance indicators
Implements a scheduled GitHub Actions workflow that runs weekly on Mondays to download artifacts, aggregate metrics, and post results to issue #93
Configures artifact retention and summary generation for weekly reporting

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File	Description
scripts/aggregate_agent_metrics.py	New Python script that loads NDJSON metrics from artifacts, computes completion rates, success percentages, and generates a formatted markdown summary
.github/workflows/agents-metrics-weekly.yml	New weekly scheduled workflow that downloads recent metrics artifacts, extracts them, runs the aggregation script, and publishes results to an issue and artifact

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-24T12:15:53Z

scripts/aggregate_agent_metrics.py

+        recorded_at = _parse_datetime(str(record.get("recorded_at", ""))) or cutoff
+        if recorded_at < cutoff:
+            continue


When a record's recorded_at cannot be parsed, it defaults to cutoff, causing the record to be excluded (line 77-78 checks recorded_at < cutoff). This means records with invalid timestamps are silently dropped. Consider using a different default like datetime.now(timezone.utc) to include records with unparseable timestamps, or add explicit logging when timestamps fail to parse.

Copilot · 2025-12-24T12:15:54Z

.github/workflows/agents-metrics-weekly.yml

+            while (true) {
+              const { data } = await github.rest.actions.listArtifactsForRepo({
+                owner,
+                repo,
+                per_page: perPage,
+                page,
+              });
+
+              const artifacts = data.artifacts || [];
+              for (const artifact of artifacts) {
+                const created = new Date(artifact.created_at);
+                if (artifact.expired) continue;
+                if (created < cutoff) continue;
+                if (!patterns.some((re) => re.test(artifact.name))) continue;
+                matches.push(artifact);
+              }
+
+              if (artifacts.length < perPage) break;
+              page += 1;
+              if (page > 10) break;


The pagination logic has a hardcoded limit of 10 pages (line 70), which means at most 1000 artifacts will be checked (100 per page × 10 pages). If the repository has more artifacts, some recent ones might be missed. Consider either removing this limit or making it configurable, especially since artifacts are filtered by date afterward.

scripts/aggregate_agent_metrics.py

Copilot · 2025-12-24T12:15:54Z

scripts/aggregate_agent_metrics.py

+        pr = int(rec.get("pr_number") or 0)
+        pr_counts[pr] += 1


PR number 0 (from records missing pr_number) is treated as a valid PR and included in the completion rate calculation. This could affect accuracy. Consider filtering out records with pr_number of 0 or None, or handling them separately as invalid records.

scripts/aggregate_agent_metrics.py

Copilot · 2025-12-24T12:15:55Z

.github/workflows/agents-metrics-weekly.yml

+            const body = fs.existsSync(summaryPath)
+              ? fs.readFileSync(summaryPath, 'utf-8')
+              : 'No metrics available for this period.';
+            const issueNumber = 93;


The issue number is hardcoded as 93. If this issue doesn't exist or is closed, the workflow will fail silently. Consider making this configurable via an environment variable or workflow input, and add error handling to check if the issue exists and is open before attempting to post.

.github/workflows/agents-metrics-weekly.yml

Copilot · 2025-12-24T12:15:56Z

scripts/aggregate_agent_metrics.py

+        stop = str(rec.get("stop_reason", "")).lower()
+        if stop:
+            stop_reasons[stop] += 1


The stop reason can be an empty string (line 117-118), and when it's empty, it's still added to the counter (line 119). This means empty strings will appear in the "top stop reasons" output. Consider skipping the counter update when stop is an empty string or falsy value.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

stranske · 2025-12-25T05:52:12Z

@codex fix comments

chatgpt-codex-connector · 2025-12-25T05:54:00Z

Summary

Updated the weekly metrics workflow to avoid overwriting extracted NDJSON files, added configurable pagination and artifact/issue settings, and validated the target issue before posting summaries. .github/workflows/agents-metrics-weekly.ymlL17-L147
Hardened metrics aggregation by logging invalid timestamps, keeping records with unparsable dates, filtering out non-positive PR numbers, trimming empty stop reasons, and computing autofix success per PR. scripts/aggregate_agent_metrics.pyL10-L190

Testing

✅ python -m compileall scripts/aggregate_agent_metrics.py

View task →

stranske · 2025-12-25T06:44:42Z

Closing this PR - after resolving conflicts with main, there's no unique content remaining:

agents-keepalive-loop.yml - main's version already has the auto-reconcile step (from PR Add comprehensive error recovery and retry logic to agent workflows #102/fix(security): reduce prompt injection guard false positives #126)
aggregate_agent_metrics.py - main's version (using env vars) is more recent and tested
agents-metrics-weekly.yml - main already has agents-weekly-metrics.yml (similar weekly metrics workflow)

The weekly metrics workflow concept from this PR was valuable, but main already has an implementation at .github/workflows/agents-weekly-metrics.yml.

Note: Main's agents-weekly-metrics.yml has a bug - it calls the script with CLI args (--artifacts-dir, --output) that don't exist. The script uses environment variables. Will fix that separately.

The aggregate_agent_metrics.py script uses environment variables (METRICS_DIR, OUTPUT_PATH), not CLI arguments. The workflow was calling it with --artifacts-dir and --output which don't exist. Found while evaluating PR #104 (now closed as redundant).

github-actions bot and others added 7 commits December 24, 2025 10:12

chore(codex): bootstrap PR for issue #93

b1bbc7a

Merge branch 'main' into codex/issue-93

6cf70d4

Merge branch 'main' into codex/issue-93

1831c52

Add metrics reporting to autofix loop (#97)

f632427

Add keepalive metrics summary and artifact (#98)

e45db7b

Add verifier workflow metrics emission

9a538a9

Add weekly agent metrics aggregation

0040471

Copilot AI review requested due to automatic review settings December 24, 2025 12:12

stranske added the codex label Dec 24, 2025 — with ChatGPT Codex Connector

Copilot started reviewing on behalf of stranske December 24, 2025 12:13 View session

chatgpt-codex-connector bot reviewed Dec 24, 2025

View reviewed changes

Copilot AI reviewed Dec 24, 2025

View reviewed changes

Base automatically changed from codex/issue-93 to main December 24, 2025 20:02

stranske and others added 4 commits December 24, 2025 23:50

Update .github/workflows/agents-metrics-weekly.yml

d96ca89

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update scripts/aggregate_agent_metrics.py

30099c7

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update scripts/aggregate_agent_metrics.py

8077985

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update scripts/aggregate_agent_metrics.py

3dfafea

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

stranske closed this Dec 25, 2025

stranske mentioned this pull request Dec 25, 2025

fix: agents-weekly-metrics uses env vars not CLI args #134

Merged

3 tasks

stranske deleted the codex/github-mention-chorecodex-bootstrap-pr-for-issue-#93-la00v3 branch December 25, 2025 23:00

Conversation

stranske commented Dec 24, 2025

Summary

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

stranske commented Dec 25, 2025

Uh oh!

chatgpt-codex-connector bot commented Dec 25, 2025

Uh oh!

stranske commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants