Skip to content

tooling(hygiene): check-tick-history-shard-schema.sh — catch col1 drift at write-time#975

Merged
AceHack merged 1 commit intomainfrom
tooling/check-tick-history-shard-schema-prevent-col1-drift-2026-04-30
Apr 30, 2026
Merged

tooling(hygiene): check-tick-history-shard-schema.sh — catch col1 drift at write-time#975
AceHack merged 1 commit intomainfrom
tooling/check-tick-history-shard-schema-prevent-col1-drift-2026-04-30

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 30, 2026

Summary

New hygiene tool that validates per-tick shard files against the schema in docs/hygiene-history/ticks/README.md. Catches the col1 parenthetical violation at write-time instead of 2 days later via PR review.

What it checks

  1. Filename matches HHMMZ.md or HHMMSSZ-<hash>.md
  2. First non-empty line is | YYYY-MM-DDTHH:MM(:SS)?Z | ... (no parenthetical, no extra prose, both ISO forms valid)
  3. Col1 timestamp's date + HH:MM matches the file path's date and HHMM

What it does NOT check

  • Body content (cols 4-6) — free-form prose by design
  • The prefab pattern (col1 timestamp ≫ commit time) — needs git-log access not available pre-push; deferred per memory/feedback_tick_history_prefabricated_shards_codex_finding_audit_trail_integrity_2026_04_30.md

Why DORMANT (not yet wired into CI)

On main as of this PR, ~5 historical shards from April 28 violate the check. Those 5 are also implicated in the prefab finding (#973). Mechanically fixing col1 would launder the body-level prefab claim. The check lands in DORMANT mode — not wired into CI/pre-push — until the prefab-vs-schema decision lands.

Composes with

Test plan

  • shellcheck passes
  • Manual run on current main detects 5 expected violations (April 28 shards) and 0 false positives
  • Both ISO timestamp forms (with/without seconds) accepted
  • Filename / col1 timestamp match check works (verified against current passing shards)

🤖 Generated with Claude Code

…catch col1 schema violations at write-time, not 2 days later via PR review

Per the structural-fix-beats-process-discipline pattern
(Otto-341) + the rediscoverable-from-main invariant landed
in PR #969: the check that catches col1 drift at write-time
is the mechanism that preserves schema uniformity on main.

What this checks:

1. Filename matches HHMMZ.md or HHMMSSZ-<hash>.md per the
   schema in docs/hygiene-history/ticks/README.md.
2. First non-empty line is a 6-column markdown table row
   starting with `| YYYY-MM-DDTHH:MM(:SS)?Z |` — exactly
   the ISO timestamp, no parenthetical, no extra prose.
   (Both with-seconds and no-seconds forms are valid
   ISO-8601; the schema doesn't pick a side.)
3. The col1 timestamp's date + HH:MM matches the filename's
   path date and HHMM.

What this does NOT check:

- Body content (cols 4-6) — intentionally free-form prose
- The prefab pattern (col1 timestamp ≫ commit-author time)
  — requires git-log access not available pre-push for the
  current commit; deferred per the memory file linked in
  the script header

Current-state on main: ~5 historical shards from April 28
violate this check. Those 5 are also implicated in the
prefab-shard finding (memory/feedback_tick_history_prefab*),
so fixing col1 mechanically would launder the body-level
prefab claim. The check therefore lands in DORMANT mode —
not yet wired into CI/pre-push. A future cleanup PR resolves
the prefab-vs-schema decision before the check goes binding.

Composes with B-0114 sub-item 1 (pre-push lint hook) — when
that lands, this check joins the pre-push run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 23:30
@AceHack AceHack enabled auto-merge (squash) April 30, 2026 23:30
@AceHack AceHack merged commit 42dc154 into main Apr 30, 2026
26 checks passed
@AceHack AceHack deleted the tooling/check-tick-history-shard-schema-prevent-col1-drift-2026-04-30 branch April 30, 2026 23:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d1f22628c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# HHMMSSZ-<hash> forms).
if [[ "$base" =~ ^([0-9]{4})Z(-[0-9a-f]+)?$ ]]; then
hhmm="${BASH_REMATCH[1]}"
elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z(-[0-9a-f]+)?$ ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require hash suffix on HHMMSS shard filenames

The HHMMSS filename branch makes the -<hash> part optional, so files like 232045Z.md are treated as valid even though the documented schema allows only HHMMZ.md or HHMMSSZ-<short-content-hash>.md. In high-concurrency use, this accepts second-resolution names without the collision discriminator the schema relies on, which weakens the uniqueness guarantee this check is supposed to enforce.

Useful? React with 👍 / 👎.

# form (`...T23:04Z`) are valid ISO-8601 UTC; the schema
# in docs/hygiene-history/ticks/README.md does not pick a side.
# Capture the timestamp and verify it matches the path.
if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|\ ]]; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Enforce full six-column shard row schema

This regex only validates that the row starts with a timestamp in col1, but it does not verify that the rest of the required table structure exists. As a result, malformed rows such as | 2026-04-30T23:20Z | only-one-column | pass validation, so column-loss/schema-drift bugs can slip through despite this script being the write-time schema guard.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new hygiene tool to validate tick-history shard files under docs/hygiene-history/ticks/ against the schema documented in docs/hygiene-history/ticks/README.md, aiming to catch col1 timestamp drift at write-time instead of during later PR review.

Changes:

  • Introduces a bash script that scans shard files and validates filename format + col1 timestamp shape.
  • Checks that col1’s timestamp date/time (to minute) matches the shard’s directory path and filename time prefix.
  • Reports violations with non-zero exit for use as a future lint/pre-push/CI gate (currently intended to remain unwired/dormant per PR description).

Comment on lines +103 to +110
# Pull the HHMM from the filename (handle both HHMMZ and
# HHMMSSZ-<hash> forms).
if [[ "$base" =~ ^([0-9]{4})Z(-[0-9a-f]+)?$ ]]; then
hhmm="${BASH_REMATCH[1]}"
elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z(-[0-9a-f]+)?$ ]]; then
hhmm="${BASH_REMATCH[1]}"
else
echo "VIOLATION: $path_rel — filename does not match HHMMZ.md or HHMMSSZ-<hash>.md schema" >&2
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0: The HHMMSSZ naming is documented as HHMMSSZ-<short-content-hash>.md, but the filename regex makes the -<hash> optional, so HHMMSSZ.md would incorrectly pass. Tighten the HHMMSSZ branch to require a suffix (while still allowing HHMMZ-01/-02 minute-collision suffixes).

Suggested change
# Pull the HHMM from the filename (handle both HHMMZ and
# HHMMSSZ-<hash> forms).
if [[ "$base" =~ ^([0-9]{4})Z(-[0-9a-f]+)?$ ]]; then
hhmm="${BASH_REMATCH[1]}"
elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z(-[0-9a-f]+)?$ ]]; then
hhmm="${BASH_REMATCH[1]}"
else
echo "VIOLATION: $path_rel — filename does not match HHMMZ.md or HHMMSSZ-<hash>.md schema" >&2
# Pull the HHMM from the filename (handle HHMMZ, HHMMZ-01,
# and HHMMSSZ-<hash> forms).
if [[ "$base" =~ ^([0-9]{4})Z(-[0-9]{2})?$ ]]; then
hhmm="${BASH_REMATCH[1]}"
elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z-[0-9a-f]+$ ]]; then
hhmm="${BASH_REMATCH[1]}"
else
echo "VIOLATION: $path_rel — filename does not match HHMMZ.md, HHMMZ-NN.md, or HHMMSSZ-<hash>.md schema" >&2

Copilot uses AI. Check for mistakes.
Comment on lines +164 to +166
done < <(find "$SHARD_DIR" -type f -name '*.md' \
! -name 'README.md' \
-print0)
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Comment says the shard scan “skip README.md and any schema/* docs”, but the find only excludes README.md. Either implement the additional exclusion(s) or update the comment so the scan behavior is accurate.

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +136
# Schema rule: first cell must be `| YYYY-MM-DDTHH:MM(:SS)?Z |`
# with no extra content before the next column boundary. Both
# the with-seconds form (`...T23:04:00Z`) and the no-seconds
# form (`...T23:04Z`) are valid ISO-8601 UTC; the schema
# in docs/hygiene-history/ticks/README.md does not pick a side.
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0: The shard schema requires a 6-column row (and comments say this tool checks a “6-column markdown table row”), but the implementation only validates the col1 timestamp prefix and never enforces column count / trailing final |. As a result, malformed shard rows can pass this check while still violating the declared schema.

Copilot uses AI. Check for mistakes.
continue
fi
else
echo "VIOLATION: $path_rel — col1 must be exactly '| YYYY-MM-DDTHH:MM:SSZ | ...' (no parenthetical, no extra prose)" >&2
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The violation message hard-codes the with-seconds example (...THH:MM:SSZ) even though the matcher accepts both ...THH:MMZ and ...THH:MM:SSZ. Updating the message to reflect the accepted forms will reduce confusion when this trips.

Suggested change
echo "VIOLATION: $path_rel — col1 must be exactly '| YYYY-MM-DDTHH:MM:SSZ | ...' (no parenthetical, no extra prose)" >&2
echo "VIOLATION: $path_rel — col1 must be exactly '| YYYY-MM-DDTHH:MMZ | ...' or '| YYYY-MM-DDTHH:MM:SSZ | ...' (no parenthetical, no extra prose)" >&2

Copilot uses AI. Check for mistakes.
# form (`...T23:04Z`) are valid ISO-8601 UTC; the schema
# in docs/hygiene-history/ticks/README.md does not pick a side.
# Capture the timestamp and verify it matches the path.
if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|\ ]]; then
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0: The col1 regex is too strict about spacing after the timestamp. It currently requires two spaces after the | that closes col1 (...Z) \| ), but existing shard rows are formatted like | <ts> | <col2> | ... (single space). This will flag valid shards as violations. Relax the pattern to allow exactly one space (or any [[:space:]]+) after the col1 boundary, while still forbidding parentheticals/extra prose inside col1.

Suggested change
if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|\ ]]; then
if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|[[:space:]]+ ]]; then

Copilot uses AI. Check for mistakes.
AceHack added a commit that referenced this pull request Apr 30, 2026
…hema.sh — pre-push compat

Default mode (full-tree) is unchanged — manual runs and full-
tree audits work as before.

New --files PATH... mode restricts the check to the listed
shard files. Shape that pre-push hooks and per-PR CI jobs
want, so they can run only on changed shards instead of
failing on the 5 known-stale shards documented in the script
header. Non-shard paths in the argument list are silently
skipped, so callers can pass a broader file list (e.g. all
changed files from `git diff --name-only`).

Refactor: extracted the per-shard validator into scan_one()
function; replaced `continue` with `return 1` for early-exit
semantics that work both inside the find loop and inside the
--files iteration.

Composes with B-0114 sub-item 1 (pre-push lint hook) — the
--files mode is what that hook will invoke.

Stacks on #975 (the original DORMANT-mode landing).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 30, 2026
… pre-push compat (stacked on #975) (#977)

* tooling(hygiene): add --files argument to check-tick-history-shard-schema.sh — pre-push compat

Default mode (full-tree) is unchanged — manual runs and full-
tree audits work as before.

New --files PATH... mode restricts the check to the listed
shard files. Shape that pre-push hooks and per-PR CI jobs
want, so they can run only on changed shards instead of
failing on the 5 known-stale shards documented in the script
header. Non-shard paths in the argument list are silently
skipped, so callers can pass a broader file list (e.g. all
changed files from `git diff --name-only`).

Refactor: extracted the per-shard validator into scan_one()
function; replaced `continue` with `return 1` for early-exit
semantics that work both inside the find loop and inside the
--files iteration.

Composes with B-0114 sub-item 1 (pre-push lint hook) — the
--files mode is what that hook will invoke.

Stacks on #975 (the original DORMANT-mode landing).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(check-tick-history-shard-schema): three thread fixes from PR #977 review

Three substantive findings from the chatgpt-codex-connector +
copilot-pull-request-reviewer review pass on PR #977:

1. P2 (Codex) — six-column row enforcement. The col1 regex
   only validated col1 + the col1-end pipe; rows with too few
   columns (e.g. `| <ts> | a |`) passed. Added explicit pipe-
   count check via awk that requires ≥7 pipes (= 6 columns +
   trailing pipe). Runs before the col1 regex; col1 check
   only fires if column count is right.

2. P2 (Codex) — HHMMSSZ filenames must require the hash
   suffix. The `(-[0-9a-f]+)?$` made the suffix optional,
   so `111122Z.md` would pass even though the schema documents
   `HHMMSSZ-<hash>.md` with hash REQUIRED for collision-
   avoidance under high-concurrency writes. Made the hash
   non-optional in the HHMMSS branch.

3. P2 (Copilot) — header contract said non-shard paths "silently
   skipped" but the code emitted `skipped (not a file): ...`
   to stderr. Removed the stderr message to match the contract.

Two findings left as form-2 closures (false positives):

- P0 (Copilot) — case glob doesn't span YYYY/MM/DD depth.
  False: bash case patterns `*` matches `/`. Verified by
  running --files with a real 4-level path; matched
  correctly. Added a comment explaining the bash-case-glob
  semantics so future readers don't re-flag.

- P0 (Copilot) — col1 regex requires "two spaces after |".
  False: trailing whitespace before `]]` is regex delimiter,
  not part of the pattern. Verified the regex matches single-
  space shards (the actual on-main format).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants