Conversation
…catch col1 schema violations at write-time, not 2 days later via PR review Per the structural-fix-beats-process-discipline pattern (Otto-341) + the rediscoverable-from-main invariant landed in PR #969: the check that catches col1 drift at write-time is the mechanism that preserves schema uniformity on main. What this checks: 1. Filename matches HHMMZ.md or HHMMSSZ-<hash>.md per the schema in docs/hygiene-history/ticks/README.md. 2. First non-empty line is a 6-column markdown table row starting with `| YYYY-MM-DDTHH:MM(:SS)?Z |` — exactly the ISO timestamp, no parenthetical, no extra prose. (Both with-seconds and no-seconds forms are valid ISO-8601; the schema doesn't pick a side.) 3. The col1 timestamp's date + HH:MM matches the filename's path date and HHMM. What this does NOT check: - Body content (cols 4-6) — intentionally free-form prose - The prefab pattern (col1 timestamp ≫ commit-author time) — requires git-log access not available pre-push for the current commit; deferred per the memory file linked in the script header Current-state on main: ~5 historical shards from April 28 violate this check. Those 5 are also implicated in the prefab-shard finding (memory/feedback_tick_history_prefab*), so fixing col1 mechanically would launder the body-level prefab claim. The check therefore lands in DORMANT mode — not yet wired into CI/pre-push. A future cleanup PR resolves the prefab-vs-schema decision before the check goes binding. Composes with B-0114 sub-item 1 (pre-push lint hook) — when that lands, this check joins the pre-push run. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d1f22628c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # HHMMSSZ-<hash> forms). | ||
| if [[ "$base" =~ ^([0-9]{4})Z(-[0-9a-f]+)?$ ]]; then | ||
| hhmm="${BASH_REMATCH[1]}" | ||
| elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z(-[0-9a-f]+)?$ ]]; then |
There was a problem hiding this comment.
Require hash suffix on HHMMSS shard filenames
The HHMMSS filename branch makes the -<hash> part optional, so files like 232045Z.md are treated as valid even though the documented schema allows only HHMMZ.md or HHMMSSZ-<short-content-hash>.md. In high-concurrency use, this accepts second-resolution names without the collision discriminator the schema relies on, which weakens the uniqueness guarantee this check is supposed to enforce.
Useful? React with 👍 / 👎.
| # form (`...T23:04Z`) are valid ISO-8601 UTC; the schema | ||
| # in docs/hygiene-history/ticks/README.md does not pick a side. | ||
| # Capture the timestamp and verify it matches the path. | ||
| if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|\ ]]; then |
There was a problem hiding this comment.
Enforce full six-column shard row schema
This regex only validates that the row starts with a timestamp in col1, but it does not verify that the rest of the required table structure exists. As a result, malformed rows such as | 2026-04-30T23:20Z | only-one-column | pass validation, so column-loss/schema-drift bugs can slip through despite this script being the write-time schema guard.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a new hygiene tool to validate tick-history shard files under docs/hygiene-history/ticks/ against the schema documented in docs/hygiene-history/ticks/README.md, aiming to catch col1 timestamp drift at write-time instead of during later PR review.
Changes:
- Introduces a bash script that scans shard files and validates filename format + col1 timestamp shape.
- Checks that col1’s timestamp date/time (to minute) matches the shard’s directory path and filename time prefix.
- Reports violations with non-zero exit for use as a future lint/pre-push/CI gate (currently intended to remain unwired/dormant per PR description).
| # Pull the HHMM from the filename (handle both HHMMZ and | ||
| # HHMMSSZ-<hash> forms). | ||
| if [[ "$base" =~ ^([0-9]{4})Z(-[0-9a-f]+)?$ ]]; then | ||
| hhmm="${BASH_REMATCH[1]}" | ||
| elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z(-[0-9a-f]+)?$ ]]; then | ||
| hhmm="${BASH_REMATCH[1]}" | ||
| else | ||
| echo "VIOLATION: $path_rel — filename does not match HHMMZ.md or HHMMSSZ-<hash>.md schema" >&2 |
There was a problem hiding this comment.
P0: The HHMMSSZ naming is documented as HHMMSSZ-<short-content-hash>.md, but the filename regex makes the -<hash> optional, so HHMMSSZ.md would incorrectly pass. Tighten the HHMMSSZ branch to require a suffix (while still allowing HHMMZ-01/-02 minute-collision suffixes).
| # Pull the HHMM from the filename (handle both HHMMZ and | |
| # HHMMSSZ-<hash> forms). | |
| if [[ "$base" =~ ^([0-9]{4})Z(-[0-9a-f]+)?$ ]]; then | |
| hhmm="${BASH_REMATCH[1]}" | |
| elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z(-[0-9a-f]+)?$ ]]; then | |
| hhmm="${BASH_REMATCH[1]}" | |
| else | |
| echo "VIOLATION: $path_rel — filename does not match HHMMZ.md or HHMMSSZ-<hash>.md schema" >&2 | |
| # Pull the HHMM from the filename (handle HHMMZ, HHMMZ-01, | |
| # and HHMMSSZ-<hash> forms). | |
| if [[ "$base" =~ ^([0-9]{4})Z(-[0-9]{2})?$ ]]; then | |
| hhmm="${BASH_REMATCH[1]}" | |
| elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z-[0-9a-f]+$ ]]; then | |
| hhmm="${BASH_REMATCH[1]}" | |
| else | |
| echo "VIOLATION: $path_rel — filename does not match HHMMZ.md, HHMMZ-NN.md, or HHMMSSZ-<hash>.md schema" >&2 |
| done < <(find "$SHARD_DIR" -type f -name '*.md' \ | ||
| ! -name 'README.md' \ | ||
| -print0) |
There was a problem hiding this comment.
P1: Comment says the shard scan “skip README.md and any schema/* docs”, but the find only excludes README.md. Either implement the additional exclusion(s) or update the comment so the scan behavior is accurate.
| # Schema rule: first cell must be `| YYYY-MM-DDTHH:MM(:SS)?Z |` | ||
| # with no extra content before the next column boundary. Both | ||
| # the with-seconds form (`...T23:04:00Z`) and the no-seconds | ||
| # form (`...T23:04Z`) are valid ISO-8601 UTC; the schema | ||
| # in docs/hygiene-history/ticks/README.md does not pick a side. |
There was a problem hiding this comment.
P0: The shard schema requires a 6-column row (and comments say this tool checks a “6-column markdown table row”), but the implementation only validates the col1 timestamp prefix and never enforces column count / trailing final |. As a result, malformed shard rows can pass this check while still violating the declared schema.
| continue | ||
| fi | ||
| else | ||
| echo "VIOLATION: $path_rel — col1 must be exactly '| YYYY-MM-DDTHH:MM:SSZ | ...' (no parenthetical, no extra prose)" >&2 |
There was a problem hiding this comment.
P2: The violation message hard-codes the with-seconds example (...THH:MM:SSZ) even though the matcher accepts both ...THH:MMZ and ...THH:MM:SSZ. Updating the message to reflect the accepted forms will reduce confusion when this trips.
| echo "VIOLATION: $path_rel — col1 must be exactly '| YYYY-MM-DDTHH:MM:SSZ | ...' (no parenthetical, no extra prose)" >&2 | |
| echo "VIOLATION: $path_rel — col1 must be exactly '| YYYY-MM-DDTHH:MMZ | ...' or '| YYYY-MM-DDTHH:MM:SSZ | ...' (no parenthetical, no extra prose)" >&2 |
| # form (`...T23:04Z`) are valid ISO-8601 UTC; the schema | ||
| # in docs/hygiene-history/ticks/README.md does not pick a side. | ||
| # Capture the timestamp and verify it matches the path. | ||
| if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|\ ]]; then |
There was a problem hiding this comment.
P0: The col1 regex is too strict about spacing after the timestamp. It currently requires two spaces after the | that closes col1 (...Z) \| ), but existing shard rows are formatted like | <ts> | <col2> | ... (single space). This will flag valid shards as violations. Relax the pattern to allow exactly one space (or any [[:space:]]+) after the col1 boundary, while still forbidding parentheticals/extra prose inside col1.
| if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|\ ]]; then | |
| if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|[[:space:]]+ ]]; then |
…hema.sh — pre-push compat Default mode (full-tree) is unchanged — manual runs and full- tree audits work as before. New --files PATH... mode restricts the check to the listed shard files. Shape that pre-push hooks and per-PR CI jobs want, so they can run only on changed shards instead of failing on the 5 known-stale shards documented in the script header. Non-shard paths in the argument list are silently skipped, so callers can pass a broader file list (e.g. all changed files from `git diff --name-only`). Refactor: extracted the per-shard validator into scan_one() function; replaced `continue` with `return 1` for early-exit semantics that work both inside the find loop and inside the --files iteration. Composes with B-0114 sub-item 1 (pre-push lint hook) — the --files mode is what that hook will invoke. Stacks on #975 (the original DORMANT-mode landing). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… pre-push compat (stacked on #975) (#977) * tooling(hygiene): add --files argument to check-tick-history-shard-schema.sh — pre-push compat Default mode (full-tree) is unchanged — manual runs and full- tree audits work as before. New --files PATH... mode restricts the check to the listed shard files. Shape that pre-push hooks and per-PR CI jobs want, so they can run only on changed shards instead of failing on the 5 known-stale shards documented in the script header. Non-shard paths in the argument list are silently skipped, so callers can pass a broader file list (e.g. all changed files from `git diff --name-only`). Refactor: extracted the per-shard validator into scan_one() function; replaced `continue` with `return 1` for early-exit semantics that work both inside the find loop and inside the --files iteration. Composes with B-0114 sub-item 1 (pre-push lint hook) — the --files mode is what that hook will invoke. Stacks on #975 (the original DORMANT-mode landing). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(check-tick-history-shard-schema): three thread fixes from PR #977 review Three substantive findings from the chatgpt-codex-connector + copilot-pull-request-reviewer review pass on PR #977: 1. P2 (Codex) — six-column row enforcement. The col1 regex only validated col1 + the col1-end pipe; rows with too few columns (e.g. `| <ts> | a |`) passed. Added explicit pipe- count check via awk that requires ≥7 pipes (= 6 columns + trailing pipe). Runs before the col1 regex; col1 check only fires if column count is right. 2. P2 (Codex) — HHMMSSZ filenames must require the hash suffix. The `(-[0-9a-f]+)?$` made the suffix optional, so `111122Z.md` would pass even though the schema documents `HHMMSSZ-<hash>.md` with hash REQUIRED for collision- avoidance under high-concurrency writes. Made the hash non-optional in the HHMMSS branch. 3. P2 (Copilot) — header contract said non-shard paths "silently skipped" but the code emitted `skipped (not a file): ...` to stderr. Removed the stderr message to match the contract. Two findings left as form-2 closures (false positives): - P0 (Copilot) — case glob doesn't span YYYY/MM/DD depth. False: bash case patterns `*` matches `/`. Verified by running --files with a real 4-level path; matched correctly. Added a comment explaining the bash-case-glob semantics so future readers don't re-flag. - P0 (Copilot) — col1 regex requires "two spaces after |". False: trailing whitespace before `]]` is regex delimiter, not part of the pattern. Verified the regex matches single- space shards (the actual on-main format). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
New hygiene tool that validates per-tick shard files against the schema in
docs/hygiene-history/ticks/README.md. Catches the col1 parenthetical violation at write-time instead of 2 days later via PR review.What it checks
HHMMZ.mdorHHMMSSZ-<hash>.md| YYYY-MM-DDTHH:MM(:SS)?Z | ...(no parenthetical, no extra prose, both ISO forms valid)What it does NOT check
memory/feedback_tick_history_prefabricated_shards_codex_finding_audit_trail_integrity_2026_04_30.mdWhy DORMANT (not yet wired into CI)
On main as of this PR, ~5 historical shards from April 28 violate the check. Those 5 are also implicated in the prefab finding (#973). Mechanically fixing col1 would launder the body-level prefab claim. The check lands in DORMANT mode — not wired into CI/pre-push — until the prefab-vs-schema decision lands.
Composes with
Test plan
🤖 Generated with Claude Code