Conversation
…ational-enforcement candidate #1) Implements the first of the four operational-enforcement candidates named in memory/feedback_recurrence_after_correction_needs_operational_enforcement_otto_2026_05_02.md (merged via PR #1206 earlier this session). The Tick-80 memo's empirical finding: substrate-knowledge alone is insufficient for failure modes the LLM training prior strongly favors — the no-op-cadence pattern recurred at Tick-71-79 even after the Tick-61 corrective memo named it. The architectural answer is operational enforcement: mechanical checks at decision-time, not just substrate-read at wake-time. This script is one such mechanical check. What it does: - Reads last N (default 7) tick-history shards from current UTC date under docs/hygiene-history/ticks/YYYY/MM/DD/ - Counts shards matching minimal-observation pattern (heuristic: short body OR observation-class language regex) - If MIN_OBS_COUNT >= THRESHOLD (default 5), prints a WARNING with composing-substrate references and party-class operation alternatives What it does NOT do: - Does NOT block the tick (informational only; exit 0) - Does NOT auto-correct (the agent's judgment to act on the warning) - Does NOT examine prior days (current-day window only) Configurable via NO_OP_CHECK_WINDOW and NO_OP_CHECK_THRESHOLD env vars. Self-tested before commit: - Default threshold (5) on recent 7 shards: 2 matches, no warning fires — correct (Tick-80-84 have been substantive) - Lowered threshold (1): warning fires correctly with full body Composes with: - tools/hygiene/check-role-ref-on-current-state-surfaces.sh (B-0162 sibling pattern; mechanical check at commit-time vs. tick-time) - tools/hygiene/check-tick-history-shard-schema.sh (sibling pattern) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2954bd0aac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR adds a new hygiene script intended to warn at autonomous-loop tick start when recent tick-history shards suggest a recurring “minimal observation / no-op cadence” pattern. It fits the repo’s broader move toward mechanical, decision-time guardrails for known recurring failure modes in the autonomous loop.
Changes:
- Adds
tools/hygiene/check-no-op-cadence-pattern.shas an informational pre-tick warning check. - Scans recent same-day tick-history shard files and counts shards matching a minimal-observation heuristic.
- Prints a warning with suggested alternative work patterns when the configured threshold is reached.
… shellcheck SC2010 Two findings on PR #1207 line 62 had the same root cause and one fix: 1. **Codex Connector P2**: `RECENT_SHARDS=$(ls "$SHARD_DIR" | grep -E ...)` could exit 1 under `set -euo pipefail` when the directory exists but `grep` finds no matching schema-conforming filenames, killing the script before the `[[ -z "$RECENT_SHARDS" ]]` fallback runs. This defeats the script's "informational only / does NOT block tick" promise — a tick-start invocation hitting a fresh shard directory would unexpectedly fail. 2. **Shellcheck SC2010**: `ls | grep` is the wrong shape; use a glob or for-loop with conditions to allow non-alphanumeric filenames. Fix: replace the pipeline with a `shopt -s nullglob` + glob loop that filters via bash regex. Bash 3.2 compatible per Otto-235 4-shell target. Same schema acceptance: `HHMMZ.md`, `HHMMZ-<hex>.md`, `HHMMSSZ-<hex>.md` per docs/hygiene-history/ticks/README.md. Verified: - Shellcheck: clean (no output) - Default threshold (5) on recent 7 shards: 2 matches, no warning - Lowered threshold (1): warning fires correctly - Empty directory case: "nothing to check" + exit 0 (the bug Codex caught — verified working) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…ation Addresses three Copilot findings on PR #1207: 1. **Midnight UTC reset blind window**: previous version only looked at today's directory, so a no-op streak spanning midnight would be invisible during the first ticks of the new day — exactly when the check should still warn. Fix: collect from both today AND yesterday directories. Yesterday computed via BSD `date -v-1d` OR GNU `date -d "yesterday"` per Otto-235 4-shell target. 2. **Mixed-format sort drift**: raw lexicographic sort misorders `1550Z.md` vs `1550Z-01.md` (and HHMMZ vs HHMMSSZ-<hex>) per docs/hygiene-history/ticks/README.md mixed-format-sort caveat. Fix: parse the timestamp prefix into YYYYMMDDHHMMSS sortkey (HHMM padded with `00` for seconds) and sort by that, not by raw filename. The parsed-timestamp approach also lets today + yesterday combine correctly under one sort. 3. **Env var validation**: `NO_OP_CHECK_WINDOW=foo` previously made `tail -n foo` fail under `set -e`, defeating the "informational only / does NOT block tick" promise. Fix: validate both env vars match `^[0-9]+$` and are >= 1; on invalid input, warn and fall back to defaults. Verified: - Shellcheck: clean (no output) - Default threshold (5) on recent 7 shards: 2 matches, no warning - Lowered threshold (1): warning fires correctly - Bad env var (NO_OP_CHECK_WINDOW=foo): warns + uses default + exit 0 - Empty directory case: "nothing to check" + exit 0 (Codex fix intact) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nt candidate #1 opened (#1208) * hygiene(tick-history): 2026-05-02T17:16Z Tick-85 — PR #1207 operational-enforcement candidate #1 opened No-op-cadence pre-tick mechanical check script lands. Self-tested both paths before commit. Closes architectural loop named in Tick-80 second-order self-grading memo (#1206). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(tick-history): add memory/ prefix to file ref for grep-ability and click-through Per Copilot review feedback on PR #1208 — `feedback_recurrence_*.md` without the `memory/` prefix isn't a valid repo-relative path; click- through fails and grep-ability suffers. Other shards use the prefixed form consistently. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 83fcf52157
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…kage Codex Connector P2 finding on PR #1207: `THRESHOLD` regex `^[0-9]+$` accepts zero-padded values like `08`, but bash arithmetic context then parses `08` as octal and fails with "value too great for base", short-circuiting the validation and producing nondeterministic behavior (skip warning path + emit shell error) instead of either accepting the value or falling back to default. `NO_OP_CHECK_THRESHOLD=08` is a common zero-padded env style; the script's "informational only / does NOT block tick" promise breaks under that input. Fix: use `10#$VAR` arithmetic-base coercion in the validation checks AND normalize to base-10 immediately after validation (`VAR=$((10#$VAR))`) so all downstream usage (arithmetic comparisons + `tail -n`) sees unambiguous decimal. Verified: - Shellcheck: clean - Default: 2/7, no warning fires - `NO_OP_CHECK_THRESHOLD=08`: correctly interpreted as 8 (was octal-error before) - `NO_OP_CHECK_WINDOW=08`: correctly reads 8 shards - `NO_OP_CHECK_THRESHOLD=foo`: regex rejects → default 5 (regression check) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Three Copilot findings on PR #1207: 1. **Whole-file size measured instead of body column** (line 129). Shard schema is six pipe-separated columns: `| timestamp | model | cron-id | <body> | <PR ref> | <observation> |` Previous code did `wc -c < $shard_path` on the whole file, so a terse-body row with a long observation column was treated as "not short" and missed the no-op signal. Fix: extract column 5 (the body) via `awk -F'|' 'NR==1 {print $5}'` and measure its length. Threshold tightened from 800 → 600 chars to match body-only measurement. 2. **Header docstring AND-semantic vs implementation OR-semantic** (line 28). Header said "short body + observation-class language" but code was `||`. OR is the correct semantic (any signal of minimal-observation counts). Fixed docstring to spell out the OR-semantic explicitly. 3. **Same-minute disambiguators sort wrong** (line 103). Previous sort key was `YYYYMMDDHHMM00` for both `HHMMZ.md` and `HHMMZ-XX.md` (identical primary key). Lex-sort on the path tiebreaker put `1550Z-01.md` BEFORE `1550Z.md` (because `-` < `.` in ASCII), so the base shard came after its own disambiguators. Fix: emit the disambiguator as a SECOND sort field; `sort -k1,1 -k2,2` now orders base-before-disambiguators (empty disambiguator sorts first because end-of-string < any char). While implementing #3, discovered a 4th hidden bug: bash's IFS whitespace-collapsing rule silently merged the empty disambiguator field with the surrounding tabs in the `read` loop, so `shard_path` came back empty and the script reported 0 shards. Switched the field separator from tab to `|` (non-whitespace, no collapsing). Verified: - Shellcheck: clean - Default threshold (5): 2/7 matches, no warning - THRESHOLD=1: warning fires - THRESHOLD=08: correctly interpreted as 8 (regression check on earlier base-10 fix) - Same-minute fixture: `0900Z.md`, `0900Z-01.md`, `0900Z-02.md`, `0901Z.md` now order correctly under `sort -t'|' -k1,1 -k2,2` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…re-merge across 3 PRs (#1210) * hygiene(tick-history): 2026-05-02T17:28Z Tick-87 — 7 review-bugs caught + fixed pre-merge across 3 PRs Opened immune-system memory file via PR #1209. External graders caught 7 real bugs across PRs #1207/#1208/#1209: empty-dir exit-1, zero-padded octal-parse, midnight-UTC blind window, mixed-format sort drift, env-var validation, MEMORY.md pairing, phantom xref. All addressed pre-merge. The immune system the human maintainer just named IS the worked example. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(tick-shard): consistent count + remove unescaped pipe inside backticks Three Copilot findings on PR #1210: 1. **P1 count inconsistency**: bolded summary said "5 review-bugs" but the rest of the row + PR title + commit message say 7. Fixed to 7 throughout. 2. **P0 GFM-table pipe corruption**: `ls | grep` inside backticks still splits the column in GFM table rendering (escape-with- backslash inside code spans is inconsistently handled across renderers). Cleanest fix: rephrase to avoid the pipe entirely — "the `ls`-piped-to-`grep` pipeline" reads naturally and produces no extra column dividers. 3. **P0 schema-violation**: same root cause as #2 — the unescaped pipe was producing 9 awk-fields (8 pipes), failing the 6-column schema requirement. Now 8 fields (7 pipes) = 6 columns. Verified with `tools/hygiene/check-tick-history-shard-schema.sh`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Implements operational-enforcement candidate #1 from
memory/feedback_recurrence_after_correction_needs_operational_enforcement_otto_2026_05_02.md(merged via #1206).The Tick-80 memo's empirical finding: substrate-knowledge alone is insufficient for failure modes the LLM training prior strongly favors. The no-op-cadence pattern recurred at Tick-71-79 even after the Tick-61 corrective named it. The architectural answer is mechanical checks at decision-time, not just substrate-read at wake-time.
This script is one such mechanical check.
What it does
docs/hygiene-history/ticks/YYYY/MM/DD/MIN_OBS_COUNT >= THRESHOLD(default 5), prints aWARNINGwith composing-substrate references and party-class operation alternativesNO_OP_CHECK_WINDOWandNO_OP_CHECK_THRESHOLDenv varsIntended invocation
At every autonomous-loop tick start:
The agent reads the warning alongside the substrate. Closer-to-decision-time than substrate-read at wake.
Self-test before commit
Composes with
tools/hygiene/check-role-ref-on-current-state-surfaces.sh(B-0162 sibling — mechanical check at commit-time vs. tick-time)tools/hygiene/check-tick-history-shard-schema.sh(sibling)feedback_recurrence_after_correction_needs_operational_enforcement_otto_2026_05_02.md,feedback_training_distribution_mismatch_firing_in_real_time_during_aaron_paused_phase_otto_2026_05_02.md,feedback_party_during_human_sleep_*.mdTest plan
chmod +xand self-test default threshold/loopautonomous-loop tick start (out of scope here)What's next (out of scope)
This lands operational-enforcement candidate #1. Remaining candidates from the Tick-80 memo:
🤖 Generated with Claude Code