Skip to content

tools(hygiene): pre-tick mechanical no-op-cadence check (Tick-80 op-enforcement #1)#1207

Merged
AceHack merged 5 commits intomainfrom
mechanize-no-op-cadence-pretick-check
May 2, 2026
Merged

tools(hygiene): pre-tick mechanical no-op-cadence check (Tick-80 op-enforcement #1)#1207
AceHack merged 5 commits intomainfrom
mechanize-no-op-cadence-pretick-check

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented May 2, 2026

Summary

Implements operational-enforcement candidate #1 from memory/feedback_recurrence_after_correction_needs_operational_enforcement_otto_2026_05_02.md (merged via #1206).

The Tick-80 memo's empirical finding: substrate-knowledge alone is insufficient for failure modes the LLM training prior strongly favors. The no-op-cadence pattern recurred at Tick-71-79 even after the Tick-61 corrective named it. The architectural answer is mechanical checks at decision-time, not just substrate-read at wake-time.

This script is one such mechanical check.

What it does

  • Reads last N (default 7) tick-history shards from current UTC date under docs/hygiene-history/ticks/YYYY/MM/DD/
  • Counts shards matching minimal-observation pattern (heuristic: short body OR observation-keyword regex)
  • If MIN_OBS_COUNT >= THRESHOLD (default 5), prints a WARNING with composing-substrate references and party-class operation alternatives
  • Configurable via NO_OP_CHECK_WINDOW and NO_OP_CHECK_THRESHOLD env vars
  • Informational only (exit 0); does NOT block tick

Intended invocation

At every autonomous-loop tick start:

bash tools/hygiene/check-no-op-cadence-pattern.sh

The agent reads the warning alongside the substrate. Closer-to-decision-time than substrate-read at wake.

Self-test before commit

  • Default threshold (5) on recent 7 shards: 2 matches, no warning fires — correct (Tick-80-84 have been substantive)
  • Lowered threshold (1): warning fires correctly with full body content

Composes with

  • tools/hygiene/check-role-ref-on-current-state-surfaces.sh (B-0162 sibling — mechanical check at commit-time vs. tick-time)
  • tools/hygiene/check-tick-history-shard-schema.sh (sibling)
  • Memos: feedback_recurrence_after_correction_needs_operational_enforcement_otto_2026_05_02.md, feedback_training_distribution_mismatch_firing_in_real_time_during_aaron_paused_phase_otto_2026_05_02.md, feedback_party_during_human_sleep_*.md

Test plan

  • chmod +x and self-test default threshold
  • Self-test with lowered threshold to verify warning path
  • CI green
  • Optional follow-up: wire into /loop autonomous-loop tick start (out of scope here)

What's next (out of scope)

This lands operational-enforcement candidate #1. Remaining candidates from the Tick-80 memo:

🤖 Generated with Claude Code

…ational-enforcement candidate #1)

Implements the first of the four operational-enforcement candidates
named in memory/feedback_recurrence_after_correction_needs_operational_enforcement_otto_2026_05_02.md
(merged via PR #1206 earlier this session).

The Tick-80 memo's empirical finding: substrate-knowledge alone is
insufficient for failure modes the LLM training prior strongly favors —
the no-op-cadence pattern recurred at Tick-71-79 even after the
Tick-61 corrective memo named it. The architectural answer is
operational enforcement: mechanical checks at decision-time, not just
substrate-read at wake-time.

This script is one such mechanical check.

What it does:
- Reads last N (default 7) tick-history shards from current UTC date
  under docs/hygiene-history/ticks/YYYY/MM/DD/
- Counts shards matching minimal-observation pattern (heuristic: short
  body OR observation-class language regex)
- If MIN_OBS_COUNT >= THRESHOLD (default 5), prints a WARNING with
  composing-substrate references and party-class operation alternatives

What it does NOT do:
- Does NOT block the tick (informational only; exit 0)
- Does NOT auto-correct (the agent's judgment to act on the warning)
- Does NOT examine prior days (current-day window only)

Configurable via NO_OP_CHECK_WINDOW and NO_OP_CHECK_THRESHOLD env vars.

Self-tested before commit:
- Default threshold (5) on recent 7 shards: 2 matches, no warning
  fires — correct (Tick-80-84 have been substantive)
- Lowered threshold (1): warning fires correctly with full body

Composes with:
- tools/hygiene/check-role-ref-on-current-state-surfaces.sh (B-0162
  sibling pattern; mechanical check at commit-time vs. tick-time)
- tools/hygiene/check-tick-history-shard-schema.sh (sibling pattern)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2954bd0aac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/hygiene/check-no-op-cadence-pattern.sh Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new hygiene script intended to warn at autonomous-loop tick start when recent tick-history shards suggest a recurring “minimal observation / no-op cadence” pattern. It fits the repo’s broader move toward mechanical, decision-time guardrails for known recurring failure modes in the autonomous loop.

Changes:

  • Adds tools/hygiene/check-no-op-cadence-pattern.sh as an informational pre-tick warning check.
  • Scans recent same-day tick-history shard files and counts shards matching a minimal-observation heuristic.
  • Prints a warning with suggested alternative work patterns when the configured threshold is reached.

Comment thread tools/hygiene/check-no-op-cadence-pattern.sh Outdated
Comment thread tools/hygiene/check-no-op-cadence-pattern.sh Outdated
Comment thread tools/hygiene/check-no-op-cadence-pattern.sh Outdated
… shellcheck SC2010

Two findings on PR #1207 line 62 had the same root cause and one fix:

1. **Codex Connector P2**: `RECENT_SHARDS=$(ls "$SHARD_DIR" | grep -E ...)`
   could exit 1 under `set -euo pipefail` when the directory exists
   but `grep` finds no matching schema-conforming filenames, killing
   the script before the `[[ -z "$RECENT_SHARDS" ]]` fallback runs.
   This defeats the script's "informational only / does NOT block tick"
   promise — a tick-start invocation hitting a fresh shard directory
   would unexpectedly fail.

2. **Shellcheck SC2010**: `ls | grep` is the wrong shape; use a glob
   or for-loop with conditions to allow non-alphanumeric filenames.

Fix: replace the pipeline with a `shopt -s nullglob` + glob loop
that filters via bash regex. Bash 3.2 compatible per Otto-235 4-shell
target. Same schema acceptance: `HHMMZ.md`, `HHMMZ-<hex>.md`,
`HHMMSSZ-<hex>.md` per docs/hygiene-history/ticks/README.md.

Verified:
- Shellcheck: clean (no output)
- Default threshold (5) on recent 7 shards: 2 matches, no warning
- Lowered threshold (1): warning fires correctly
- Empty directory case: "nothing to check" + exit 0 (the bug Codex
  caught — verified working)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

…ation

Addresses three Copilot findings on PR #1207:

1. **Midnight UTC reset blind window**: previous version only looked
   at today's directory, so a no-op streak spanning midnight would be
   invisible during the first ticks of the new day — exactly when the
   check should still warn. Fix: collect from both today AND yesterday
   directories. Yesterday computed via BSD `date -v-1d` OR GNU
   `date -d "yesterday"` per Otto-235 4-shell target.

2. **Mixed-format sort drift**: raw lexicographic sort misorders
   `1550Z.md` vs `1550Z-01.md` (and HHMMZ vs HHMMSSZ-<hex>) per
   docs/hygiene-history/ticks/README.md mixed-format-sort caveat. Fix:
   parse the timestamp prefix into YYYYMMDDHHMMSS sortkey (HHMM padded
   with `00` for seconds) and sort by that, not by raw filename. The
   parsed-timestamp approach also lets today + yesterday combine
   correctly under one sort.

3. **Env var validation**: `NO_OP_CHECK_WINDOW=foo` previously made
   `tail -n foo` fail under `set -e`, defeating the "informational
   only / does NOT block tick" promise. Fix: validate both env vars
   match `^[0-9]+$` and are >= 1; on invalid input, warn and fall
   back to defaults.

Verified:
- Shellcheck: clean (no output)
- Default threshold (5) on recent 7 shards: 2 matches, no warning
- Lowered threshold (1): warning fires correctly
- Bad env var (NO_OP_CHECK_WINDOW=foo): warns + uses default + exit 0
- Empty directory case: "nothing to check" + exit 0 (Codex fix intact)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 2, 2026 17:24
AceHack added a commit that referenced this pull request May 2, 2026
…nt candidate #1 opened (#1208)

* hygiene(tick-history): 2026-05-02T17:16Z Tick-85 — PR #1207 operational-enforcement candidate #1 opened

No-op-cadence pre-tick mechanical check script lands. Self-tested
both paths before commit. Closes architectural loop named in
Tick-80 second-order self-grading memo (#1206).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(tick-history): add memory/ prefix to file ref for grep-ability and click-through

Per Copilot review feedback on PR #1208 — `feedback_recurrence_*.md`
without the `memory/` prefix isn't a valid repo-relative path; click-
through fails and grep-ability suffers. Other shards use the prefixed
form consistently.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83fcf52157

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/hygiene/check-no-op-cadence-pattern.sh Outdated
…kage

Codex Connector P2 finding on PR #1207: `THRESHOLD` regex
`^[0-9]+$` accepts zero-padded values like `08`, but bash arithmetic
context then parses `08` as octal and fails with "value too great
for base", short-circuiting the validation and producing
nondeterministic behavior (skip warning path + emit shell error)
instead of either accepting the value or falling back to default.

`NO_OP_CHECK_THRESHOLD=08` is a common zero-padded env style; the
script's "informational only / does NOT block tick" promise breaks
under that input.

Fix: use `10#$VAR` arithmetic-base coercion in the validation
checks AND normalize to base-10 immediately after validation
(`VAR=$((10#$VAR))`) so all downstream usage (arithmetic
comparisons + `tail -n`) sees unambiguous decimal.

Verified:
- Shellcheck: clean
- Default: 2/7, no warning fires
- `NO_OP_CHECK_THRESHOLD=08`: correctly interpreted as 8 (was
  octal-error before)
- `NO_OP_CHECK_WINDOW=08`: correctly reads 8 shards
- `NO_OP_CHECK_THRESHOLD=foo`: regex rejects → default 5 (regression
  check)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread tools/hygiene/check-no-op-cadence-pattern.sh
Comment thread tools/hygiene/check-no-op-cadence-pattern.sh Outdated
Comment thread tools/hygiene/check-no-op-cadence-pattern.sh
Three Copilot findings on PR #1207:

1. **Whole-file size measured instead of body column** (line 129).
   Shard schema is six pipe-separated columns:
   `| timestamp | model | cron-id | <body> | <PR ref> | <observation> |`
   Previous code did `wc -c < $shard_path` on the whole file, so a
   terse-body row with a long observation column was treated as
   "not short" and missed the no-op signal. Fix: extract column 5
   (the body) via `awk -F'|' 'NR==1 {print $5}'` and measure its
   length. Threshold tightened from 800 → 600 chars to match
   body-only measurement.

2. **Header docstring AND-semantic vs implementation OR-semantic**
   (line 28). Header said "short body + observation-class language"
   but code was `||`. OR is the correct semantic (any signal of
   minimal-observation counts). Fixed docstring to spell out the
   OR-semantic explicitly.

3. **Same-minute disambiguators sort wrong** (line 103). Previous
   sort key was `YYYYMMDDHHMM00` for both `HHMMZ.md` and `HHMMZ-XX.md`
   (identical primary key). Lex-sort on the path tiebreaker put
   `1550Z-01.md` BEFORE `1550Z.md` (because `-` < `.` in ASCII), so
   the base shard came after its own disambiguators. Fix: emit the
   disambiguator as a SECOND sort field; `sort -k1,1 -k2,2` now
   orders base-before-disambiguators (empty disambiguator sorts
   first because end-of-string < any char).

While implementing #3, discovered a 4th hidden bug: bash's IFS
whitespace-collapsing rule silently merged the empty disambiguator
field with the surrounding tabs in the `read` loop, so `shard_path`
came back empty and the script reported 0 shards. Switched the
field separator from tab to `|` (non-whitespace, no collapsing).

Verified:
- Shellcheck: clean
- Default threshold (5): 2/7 matches, no warning
- THRESHOLD=1: warning fires
- THRESHOLD=08: correctly interpreted as 8 (regression check on
  earlier base-10 fix)
- Same-minute fixture: `0900Z.md`, `0900Z-01.md`, `0900Z-02.md`,
  `0901Z.md` now order correctly under `sort -t'|' -k1,1 -k2,2`

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@AceHack AceHack merged commit 475da25 into main May 2, 2026
25 checks passed
@AceHack AceHack deleted the mechanize-no-op-cadence-pretick-check branch May 2, 2026 17:43
AceHack added a commit that referenced this pull request May 2, 2026
…re-merge across 3 PRs (#1210)

* hygiene(tick-history): 2026-05-02T17:28Z Tick-87 — 7 review-bugs caught + fixed pre-merge across 3 PRs

Opened immune-system memory file via PR #1209. External graders
caught 7 real bugs across PRs #1207/#1208/#1209: empty-dir exit-1,
zero-padded octal-parse, midnight-UTC blind window, mixed-format
sort drift, env-var validation, MEMORY.md pairing, phantom xref.
All addressed pre-merge. The immune system the human maintainer
just named IS the worked example.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(tick-shard): consistent count + remove unescaped pipe inside backticks

Three Copilot findings on PR #1210:

1. **P1 count inconsistency**: bolded summary said "5 review-bugs"
   but the rest of the row + PR title + commit message say 7. Fixed
   to 7 throughout.

2. **P0 GFM-table pipe corruption**: `ls | grep` inside backticks
   still splits the column in GFM table rendering (escape-with-
   backslash inside code spans is inconsistently handled across
   renderers). Cleanest fix: rephrase to avoid the pipe entirely —
   "the `ls`-piped-to-`grep` pipeline" reads naturally and produces
   no extra column dividers.

3. **P0 schema-violation**: same root cause as #2 — the unescaped
   pipe was producing 9 awk-fields (8 pipes), failing the 6-column
   schema requirement. Now 8 fields (7 pipes) = 6 columns. Verified
   with `tools/hygiene/check-tick-history-shard-schema.sh`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants