Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3fd4541701
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Adds a new lint script to detect stale GitHub Actions runner labels in workflow YAMLs, with an allow-list that’s periodically re-verified and a distinct warning mode when the allow-list itself is stale.
Changes:
- Introduces
tools/lint/runner-version-freshness.shto scan.github/workflows/*for staleruns-on/ matrixoslabels. - Encodes an allow-list +
LAST_VERIFIEDtimestamp and emits a warning exit code when the allow-list is older than 30 days. - Emits actionable output including line hits, canonical labels, and the authoritative GitHub docs URL.
…Otto-213 durable lesson Otto-214 implementation of the tooling-level enforcement I proposed Otto-213. Memory-alone was not sufficient to stop the "write a stale version number" recurrence pattern; this script adds a CI-fail gate. Behavior: - Walks .github/workflows/*.yml files - Extracts runs-on: + os: matrix lines - Fails (exit 2) if any line references a STALE runner version (ubuntu-22.04, macos-14, macos-15, windows-2022, ubuntu-20.04, macos-13, macos-15-intel, etc.) - Warns (exit 3) if the allow-list itself is stale (>30 days since LAST_VERIFIED) - Prints the canonical list of ALLOWED labels on failure + the authoritative GitHub docs URL for re-verification Allow-list verified 2026-04-24 via https://docs.github.com/en/actions/how-tos/write-workflows/choose-where-workflows-run/choose-the-runner-for-a-job#standard-github-hosted-runners-for-public-repositories exact quote "Use of the standard GitHub-hosted runners is free and unlimited on public repositories." First-run detects 13 stale-label hits across codeql.yml, gate.yml, github-settings-drift.yml (plus stale comment- block references in gate.yml from the pre-correction history). These will be cleaned up by PR #359 for gate.yml; codeql.yml + github-settings-drift.yml need separate follow-up PRs. Does NOT wire into gate.yml automatically — separate step to add the lint check after the baseline is green. Premature enforcement would block every current PR. Sequencing: (1) this PR ships the tool; (2) follow-up PRs clean up existing stale refs (gate.yml already covered by #359; others queued); (3) once baseline is clean, add to gate.yml lint job. Composes with: - Otto-213 version-numbers-require-websearch memory - Otto-212 use-latest-tags + security-hygiene directive - Otto-210/211 macOS-is-free + M1-not-Intel corrections - FACTORY-HYGIENE row #43 safe-pattern compliance - Analogous pattern to audit-cross-platform-parity.sh (detect-only-first, enforce-when-baseline-green) Test plan: - Runs clean when no stale labels present - Exits 2 with clear message when stale labels present - Warns when allow-list >30 days old - Shellcheck clean (SC2001 note acknowledged; the non-bash-4 sed-style substitution is intentional for macOS default-bash-3.x compatibility per FACTORY- HYGIENE row #51 cross-platform parity) - Portable: no mapfile (bash 4+ only); uses while-read loop pattern that works in bash 3.x Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e + comment-strip + rolling-alias forbidden + warn-only exit
Six Codex findings on tools/lint/runner-version-freshness.sh:
P0 (line 133) — regex-metachar escape:
`stale_pattern` was built from raw label strings; `.` in
ubuntu-22.04 was a regex wildcard, producing false matches/
misses. Added `escape_for_regex` helper that escapes . + *
? ( ) [ ] { } | \ / before alternation.
P0 (line 149) — BSD-grep portability:
`\b` word-boundary doesn't work in BSD grep (macOS default;
treated as backspace per POSIX ERE). Replaced with explicit
non-word boundaries: `([^A-Za-z0-9_]|^)` start +
`([^A-Za-z0-9_]|$)` end, expressed without backrefs so it
works in both GNU and BSD grep.
P1 (line 149-1) — exclude comments:
Stale-label-in-comment was triggering false positives. Added
a comment-stripping pre-filter (`grep -vE '^[[:space:]]*#'`)
so YAML comments are excluded from the scan.
P1 (line 149-2) — explicit-file-not-found masking:
`grep ... 2>/dev/null || true` silently swallowed missing-
file errors and reported 'ok' for nothing-actually-linted.
Added an explicit `[ ! -r "$file" ]` precheck that fails
loud (exit 2) rather than passing silent.
P1 (line 73) — rolling-aliases forbidden by convention:
ALLOWED_LABELS included ubuntu-latest / windows-latest /
macos-latest, contradicting the repo convention of pinned
major-OS-version labels. Removed from ALLOWED_LABELS, added
a separate ROLLING_ALIASES forbidden list, added a
distinct error-class scan ('ROLLING-ALIAS RUNNER LABEL') so
contributors get a different error message than for
stale-version pins. Same fail=1 flag, different operator
message.
P2 (line 179) — warn-only exit on stale freshness:
Header documents this as warning-only; code exited 3 (which
some CI configurations treat as failure). Updated to exit 0
on stale-freshness-only path; warning is still printed to
stderr. Stale-version-detection still exit 2 (a real failure).
Smoke-test note: the new script now flags ubuntu-22.04 in
gate.yml as stale (real finding) — exit 2 with the expected
output. gate.yml's own runner-pin upgrade is out of scope
for this PR; will land separately.
3fd4541 to
55eeb4c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 55eeb4ce6b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ping Two more substantive Codex findings: P1 (line 183) — quoted matrix entries missed: The matrix-entry prefilter was `^[[:space:]]*-[[:space:]]+` which only matched bare `- <label>`. Common YAML syntax `- "ubuntu-22.04"` or `- 'macos-15'` was being missed. Updated prefilter to `^[[:space:]]*-[[:space:]]+(['\"]?)` which optionally consumes a leading single or double quote. Smoke-tested with mixed quoting + matrix block: catches both forms now. P2 (line 179) — trailing inline comments not stripped: `runs-on: ubuntu-24.04 # was ubuntu-22.04` was falsely flagging `ubuntu-22.04` in the trailing comment. Added a second sed pass: `sed -E 's/[[:space:]]+#.*$//'` strips everything after the first ` #` (YAML-spec comment-start sentinel with required leading space). Conservative: doesn't handle `#` inside quoted strings (rare in workflow YAML). Smoke-tested: trailing comments correctly stripped.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 391b045ab4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…+ allow-list scope)
P0 (line 187) — set -e + grep -v abort:
`grep -vE ... | sed ...` aborts under set -euo pipefail
when grep -v outputs nothing (every line is a comment).
Added `|| true` to neutralise the exit 1.
P1 (line 195) — validate against allow-list, not stale-subset:
Old code only flagged labels in STALE_LABELS. A label like
`ubuntu-30.04` invented after this script's last refresh
would silently pass. Added a third scan: extract `runs-on:
<value>` and flag anything not in (ALLOWED ∪ ROLLING) and
not in expression form `${{ ... }}`. Distinct error class
'NOT-ON-ALLOW-LIST RUNNER LABEL'.
P1 (line 174) — env-error vs stale-label exit code:
Unreadable files set fail=1 → exit 2 (stale labels). That
mixed environment errors with content findings. Split into
`env_error` (exit 1) vs `fail` (exit 2). Header exit-code
contract updated.
P1 (line 142) — convention: cd to REPO_ROOT:
Other tools/lint/*.sh scripts establish REPO_ROOT and cd
there. Aligned: `git rev-parse --show-toplevel` + cd to
REPO_ROOT before file discovery.
P1 (line 9) — code comments explain code, not history:
Removed Otto-213 / Otto-214 lineage tokens from the header.
Replaced 'Otto-213 durable compounding-failure mitigation'
with the structural rationale (training-data version
numbers decay; structural lint enforces).
P1 (line 237) — exit-code contract reconciliation:
Earlier reviewer wanted warn-only path to exit 0 (so
freshness warning doesn't fail CI). This reviewer pointed
out the header still claimed exit 3 for warn. Updated
header explicitly: 'exit 0 even with freshness warning;
warning is on stderr, non-fatal'. Removed exit 3 from
exit-code contract; now only 0/1/2.
P2 (line 188) — guard comment-stripping pipeline:
Same as P0 fix above (`|| true` neutralises grep -v's
no-match exit 1).
P2 (line 116) — UTC vs local-time TZ skew:
`now_epoch` was UTC but BSD `date -j -f` defaults to local
time. Forced `TZ=UTC` on both branches so age_days computes
in a single timezone (no ±1 day skew across DST/timezones).
P0 (Copilot L265) / P1 (Codex L185) — `warn` unbound under `set -u`:
Initialize `warn=0` alongside `fail=0` and `env_error=0`. Without this,
`_verify_age_ok` returning success leaves `warn` unset; the final
`[ "$warn" = "1" ]` check then aborts with "unbound variable" — turning
passing-lint into env-error. Real regression.
P1 (Copilot L56) — `cd "$REPO_ROOT"` before consuming `$@`:
Normalize CLI args to absolute paths BEFORE the chdir into REPO_ROOT, so
paths given relative to the caller's cwd survive. Without this,
`script.sh ./foo.yml` from outside REPO_ROOT errors after the cd.
P1 (Copilot L213) — `|| true` masking real `sed` failures:
Group only the `grep` with `|| true` (`{ grep -vE ... || true; } | sed`)
so a real sed failure (missing tool / unsupported -E) still surfaces;
before, pipefail propagated through `cmd1 | sed ... || true`, hiding
legitimate environment errors.
P1 (Codex L244 / Copilot L245) — escape ALLOWED_LABELS for ERE:
Labels like `ubuntu-24.04` contain `.` (ERE wildcard); without escaping,
typos like `ubuntu-24x04` would slip through as "allow-listed". Reuses
`escape_for_regex` for ALLOWED + ROLLING lists.
P1 (Copilot L252) — extend NOT-ON-ALLOW-LIST scan to matrix entries:
Previously skipped `runs-on: ${{ matrix.os }}` workflows entirely. Now
validates matrix list entries against the allow-list using a
line-number-aware exclude prefix `(^|^[0-9]+:)` (grep -n prepends
`<linenum>:` which would otherwise break the `^` anchor in the exclude
filter). Smoke-tested: matrix entries `macos-26` / `ubuntu-24.04` /
`ubuntu-24.04-arm` correctly excluded as allow-listed; existing stale
`ubuntu-22.04` findings unchanged.
Verified: clean lint pass on `.github/workflows/codeql.yml`, stale-label
detection unchanged on `.github/workflows/gate.yml`, relative-path arg
from a subdirectory now resolves correctly.
…indings (#432) * drain(#360 follow-up: 8 Codex P0/P1/P2 findings on shell portability + allow-list scope) P0 (line 187) — set -e + grep -v abort: `grep -vE ... | sed ...` aborts under set -euo pipefail when grep -v outputs nothing (every line is a comment). Added `|| true` to neutralise the exit 1. P1 (line 195) — validate against allow-list, not stale-subset: Old code only flagged labels in STALE_LABELS. A label like `ubuntu-30.04` invented after this script's last refresh would silently pass. Added a third scan: extract `runs-on: <value>` and flag anything not in (ALLOWED ∪ ROLLING) and not in expression form `${{ ... }}`. Distinct error class 'NOT-ON-ALLOW-LIST RUNNER LABEL'. P1 (line 174) — env-error vs stale-label exit code: Unreadable files set fail=1 → exit 2 (stale labels). That mixed environment errors with content findings. Split into `env_error` (exit 1) vs `fail` (exit 2). Header exit-code contract updated. P1 (line 142) — convention: cd to REPO_ROOT: Other tools/lint/*.sh scripts establish REPO_ROOT and cd there. Aligned: `git rev-parse --show-toplevel` + cd to REPO_ROOT before file discovery. P1 (line 9) — code comments explain code, not history: Removed Otto-213 / Otto-214 lineage tokens from the header. Replaced 'Otto-213 durable compounding-failure mitigation' with the structural rationale (training-data version numbers decay; structural lint enforces). P1 (line 237) — exit-code contract reconciliation: Earlier reviewer wanted warn-only path to exit 0 (so freshness warning doesn't fail CI). This reviewer pointed out the header still claimed exit 3 for warn. Updated header explicitly: 'exit 0 even with freshness warning; warning is on stderr, non-fatal'. Removed exit 3 from exit-code contract; now only 0/1/2. P2 (line 188) — guard comment-stripping pipeline: Same as P0 fix above (`|| true` neutralises grep -v's no-match exit 1). P2 (line 116) — UTC vs local-time TZ skew: `now_epoch` was UTC but BSD `date -j -f` defaults to local time. Forced `TZ=UTC` on both branches so age_days computes in a single timezone (no ±1 day skew across DST/timezones). * drain(#360 follow-up): fix Codex P0/P1/P2 + Copilot findings P0 (Copilot L265) / P1 (Codex L185) — `warn` unbound under `set -u`: Initialize `warn=0` alongside `fail=0` and `env_error=0`. Without this, `_verify_age_ok` returning success leaves `warn` unset; the final `[ "$warn" = "1" ]` check then aborts with "unbound variable" — turning passing-lint into env-error. Real regression. P1 (Copilot L56) — `cd "$REPO_ROOT"` before consuming `$@`: Normalize CLI args to absolute paths BEFORE the chdir into REPO_ROOT, so paths given relative to the caller's cwd survive. Without this, `script.sh ./foo.yml` from outside REPO_ROOT errors after the cd. P1 (Copilot L213) — `|| true` masking real `sed` failures: Group only the `grep` with `|| true` (`{ grep -vE ... || true; } | sed`) so a real sed failure (missing tool / unsupported -E) still surfaces; before, pipefail propagated through `cmd1 | sed ... || true`, hiding legitimate environment errors. P1 (Codex L244 / Copilot L245) — escape ALLOWED_LABELS for ERE: Labels like `ubuntu-24.04` contain `.` (ERE wildcard); without escaping, typos like `ubuntu-24x04` would slip through as "allow-listed". Reuses `escape_for_regex` for ALLOWED + ROLLING lists. P1 (Copilot L252) — extend NOT-ON-ALLOW-LIST scan to matrix entries: Previously skipped `runs-on: ${{ matrix.os }}` workflows entirely. Now validates matrix list entries against the allow-list using a line-number-aware exclude prefix `(^|^[0-9]+:)` (grep -n prepends `<linenum>:` which would otherwise break the `^` anchor in the exclude filter). Smoke-tested: matrix entries `macos-26` / `ubuntu-24.04` / `ubuntu-24.04-arm` correctly excluded as allow-listed; existing stale `ubuntu-22.04` findings unchanged. Verified: clean lint pass on `.github/workflows/codeql.yml`, stale-label detection unchanged on `.github/workflows/gate.yml`, relative-path arg from a subdirectory now resolves correctly.
Summary
Otto-214 shipping the tooling-level enforcement I proposed Otto-213. Memory-alone was not sufficient to stop the "write a stale version number" recurrence pattern (I did it 3× this session alone); this lint adds a CI-fail gate so the failure stops compounding.
What the lint does
.github/workflows/*.ymlruns-on:+os:matrix linesubuntu-22.04,macos-14,macos-15,windows-2022, etc.LAST_VERIFIED)Allow-list verified 2026-04-24 via the standard-runners docs.
First-run output on current main
13 stale-label hits across 3 workflows:
gate.ymlhits are cleaned up by PR #359 (already in queue).codeql.yml+github-settings-drift.ymlneed separate follow-up PRs.Sequencing — detect-only first, enforce when baseline green
Same pattern as
audit-cross-platform-parity.sh(FACTORY-HYGIENE row #51): ship the detector, clean up the existing baseline, THEN wire enforcement intogate.ymllint job. Premature enforcement blocks every current PR. Concrete plan:gate.ymlstale refs (already in queue)codeql.yml+github-settings-drift.ymlstale refsrunner-version-freshness.shintogate.ymllint (...)job chain after baseline is cleanPortability
mapfile)date+ BSDdatefallback for theLAST_VERIFIEDage checkComposes with
version-numbers-always-websearchmemory (the durable lesson this tool structurally enforces)Test plan
🤖 Generated with Claude Code