-
Notifications
You must be signed in to change notification settings - Fork 1
tooling(hygiene): check-tick-history-shard-schema.sh — catch col1 drift at write-time #975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,173 @@ | ||||||||||||||||||||||||||||||||||
| #!/usr/bin/env bash | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # tools/hygiene/check-tick-history-shard-schema.sh — validates | ||||||||||||||||||||||||||||||||||
| # that per-tick shard files under docs/hygiene-history/ticks/ | ||||||||||||||||||||||||||||||||||
| # match the schema declared in docs/hygiene-history/ticks/README.md. | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # Why this exists (2026-04-30): | ||||||||||||||||||||||||||||||||||
| # The col1 schema (first cell = exactly an ISO-8601 UTC timestamp) | ||||||||||||||||||||||||||||||||||
| # was repeatedly violated across 15+ shards on main (April 28-30) | ||||||||||||||||||||||||||||||||||
| # plus 6+ open PRs. Each violation was caught at PR-review time | ||||||||||||||||||||||||||||||||||
| # by Copilot, weeks-to-days after the shard was written. The | ||||||||||||||||||||||||||||||||||
| # structural fix is a hygiene check that catches it at write | ||||||||||||||||||||||||||||||||||
| # time instead of relying on each tick author's vigilance. | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # This is the "structural fix beats process discipline" pattern | ||||||||||||||||||||||||||||||||||
| # per Otto-341 + the rediscoverable-from-main invariant on | ||||||||||||||||||||||||||||||||||
| # docs/AUTONOMOUS-LOOP.md (added in PR #969): the invariant | ||||||||||||||||||||||||||||||||||
| # requires schema uniformity on main; this check is the | ||||||||||||||||||||||||||||||||||
| # mechanism that preserves it. | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # What this checks: | ||||||||||||||||||||||||||||||||||
| # 1. Shard file exists at the canonical path | ||||||||||||||||||||||||||||||||||
| # docs/hygiene-history/ticks/YYYY/MM/DD/<HHMMZ>.md | ||||||||||||||||||||||||||||||||||
| # (or the extended HHMMSSZ-<hash>.md form per the schema's | ||||||||||||||||||||||||||||||||||
| # high-concurrency option). | ||||||||||||||||||||||||||||||||||
| # 2. First non-empty line is a 6-column markdown table row | ||||||||||||||||||||||||||||||||||
| # starting with `| YYYY-MM-DDTHH:MM:SSZ |` — exactly the ISO | ||||||||||||||||||||||||||||||||||
| # timestamp, no parenthetical, no extra prose, no leading | ||||||||||||||||||||||||||||||||||
| # whitespace beyond the standard `| `. | ||||||||||||||||||||||||||||||||||
| # 3. The timestamp inside col1 matches the filename's `HHMMZ` | ||||||||||||||||||||||||||||||||||
| # — i.e. a shard at `2026/04/30/2304Z.md` must carry a col1 | ||||||||||||||||||||||||||||||||||
| # timestamp of `2026-04-30T23:04:??Z` (any second). | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # What this does NOT do: | ||||||||||||||||||||||||||||||||||
| # - Does NOT validate body content (cols 4-6). The body is | ||||||||||||||||||||||||||||||||||
| # intentionally free-form prose. | ||||||||||||||||||||||||||||||||||
| # - Does NOT enforce that col2 = `<model id>` or col3 = | ||||||||||||||||||||||||||||||||||
| # `<cron sentinel>` strictly. The schema's lower columns | ||||||||||||||||||||||||||||||||||
| # have drifted in practice (col3 commonly carries a commit | ||||||||||||||||||||||||||||||||||
| # SHA instead of the cron sentinel); enforcing that would | ||||||||||||||||||||||||||||||||||
| # be its own clean-up effort. | ||||||||||||||||||||||||||||||||||
| # - Does NOT detect the prefab pattern (col1 timestamp | ||||||||||||||||||||||||||||||||||
| # significantly ahead of commit-author time). That requires | ||||||||||||||||||||||||||||||||||
| # git-log access which isn't available pre-push for the | ||||||||||||||||||||||||||||||||||
| # current commit. See | ||||||||||||||||||||||||||||||||||
| # `memory/feedback_tick_history_prefabricated_shards_codex_finding_audit_trail_integrity_2026_04_30.md` | ||||||||||||||||||||||||||||||||||
| # for the deferred check. | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # Exit codes: | ||||||||||||||||||||||||||||||||||
| # 0 — all shards valid | ||||||||||||||||||||||||||||||||||
| # 1 — one or more violations found (details on stderr) | ||||||||||||||||||||||||||||||||||
| # 2 — invocation error (script bug or missing inputs) | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # Composes with: | ||||||||||||||||||||||||||||||||||
| # - tools/hygiene/check-tick-history-order.sh (the legacy | ||||||||||||||||||||||||||||||||||
| # monolithic-table order check; this is the per-shard | ||||||||||||||||||||||||||||||||||
| # analogue for the post-shard-transport surface) | ||||||||||||||||||||||||||||||||||
| # - docs/hygiene-history/ticks/README.md (the schema this | ||||||||||||||||||||||||||||||||||
| # check enforces) | ||||||||||||||||||||||||||||||||||
| # - .github/workflows/gate.yml (where this should be wired | ||||||||||||||||||||||||||||||||||
| # as a lint job; not yet done — that's a follow-up) | ||||||||||||||||||||||||||||||||||
| # | ||||||||||||||||||||||||||||||||||
| # Current-state note (2026-04-30): | ||||||||||||||||||||||||||||||||||
| # On the main branch as of this script's introduction, ~5 | ||||||||||||||||||||||||||||||||||
| # historical shards from April 28 violate this check (col1 | ||||||||||||||||||||||||||||||||||
| # contains a parenthetical after the timestamp). Those 5 | ||||||||||||||||||||||||||||||||||
| # shards are also implicated in the prefab-shard finding | ||||||||||||||||||||||||||||||||||
| # filed at | ||||||||||||||||||||||||||||||||||
| # `memory/feedback_tick_history_prefabricated_shards_codex_finding_audit_trail_integrity_2026_04_30.md` | ||||||||||||||||||||||||||||||||||
| # — fixing col1 mechanically would launder the body-level | ||||||||||||||||||||||||||||||||||
| # prefab claim. The check is therefore landed in DORMANT | ||||||||||||||||||||||||||||||||||
| # mode (not yet wired into CI); a future cleanup PR resolves | ||||||||||||||||||||||||||||||||||
| # the prefab-vs-schema decision before the check goes | ||||||||||||||||||||||||||||||||||
| # binding. | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| set -euo pipefail | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| ROOT="${REPO_ROOT:-$(git rev-parse --show-toplevel 2>/dev/null || echo .)}" | ||||||||||||||||||||||||||||||||||
| SHARD_DIR="$ROOT/docs/hygiene-history/ticks" | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| if [ ! -d "$SHARD_DIR" ]; then | ||||||||||||||||||||||||||||||||||
| echo "error: $SHARD_DIR does not exist" >&2 | ||||||||||||||||||||||||||||||||||
| exit 2 | ||||||||||||||||||||||||||||||||||
| fi | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| violations=0 | ||||||||||||||||||||||||||||||||||
| total=0 | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| # Find every shard file (skip README.md and any schema/* docs). | ||||||||||||||||||||||||||||||||||
| while IFS= read -r -d '' shard; do | ||||||||||||||||||||||||||||||||||
| total=$((total + 1)) | ||||||||||||||||||||||||||||||||||
| base="$(basename "$shard" .md)" | ||||||||||||||||||||||||||||||||||
| path_rel="${shard#"$ROOT/"}" | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| # Extract YYYY/MM/DD from path components. | ||||||||||||||||||||||||||||||||||
| parts="${path_rel#docs/hygiene-history/ticks/}" | ||||||||||||||||||||||||||||||||||
| yyyy="${parts%%/*}" | ||||||||||||||||||||||||||||||||||
| rest_a="${parts#*/}" | ||||||||||||||||||||||||||||||||||
| mm="${rest_a%%/*}" | ||||||||||||||||||||||||||||||||||
| rest_b="${rest_a#*/}" | ||||||||||||||||||||||||||||||||||
| dd="${rest_b%%/*}" | ||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||
| # Pull the HHMM from the filename (handle both HHMMZ and | ||||||||||||||||||||||||||||||||||
| # HHMMSSZ-<hash> forms). | ||||||||||||||||||||||||||||||||||
| if [[ "$base" =~ ^([0-9]{4})Z(-[0-9a-f]+)?$ ]]; then | ||||||||||||||||||||||||||||||||||
| hhmm="${BASH_REMATCH[1]}" | ||||||||||||||||||||||||||||||||||
| elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z(-[0-9a-f]+)?$ ]]; then | ||||||||||||||||||||||||||||||||||
| hhmm="${BASH_REMATCH[1]}" | ||||||||||||||||||||||||||||||||||
| else | ||||||||||||||||||||||||||||||||||
| echo "VIOLATION: $path_rel — filename does not match HHMMZ.md or HHMMSSZ-<hash>.md schema" >&2 | ||||||||||||||||||||||||||||||||||
|
Comment on lines
+103
to
+110
|
||||||||||||||||||||||||||||||||||
| # Pull the HHMM from the filename (handle both HHMMZ and | |
| # HHMMSSZ-<hash> forms). | |
| if [[ "$base" =~ ^([0-9]{4})Z(-[0-9a-f]+)?$ ]]; then | |
| hhmm="${BASH_REMATCH[1]}" | |
| elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z(-[0-9a-f]+)?$ ]]; then | |
| hhmm="${BASH_REMATCH[1]}" | |
| else | |
| echo "VIOLATION: $path_rel — filename does not match HHMMZ.md or HHMMSSZ-<hash>.md schema" >&2 | |
| # Pull the HHMM from the filename (handle HHMMZ, HHMMZ-01, | |
| # and HHMMSSZ-<hash> forms). | |
| if [[ "$base" =~ ^([0-9]{4})Z(-[0-9]{2})?$ ]]; then | |
| hhmm="${BASH_REMATCH[1]}" | |
| elif [[ "$base" =~ ^([0-9]{4})([0-9]{2})Z-[0-9a-f]+$ ]]; then | |
| hhmm="${BASH_REMATCH[1]}" | |
| else | |
| echo "VIOLATION: $path_rel — filename does not match HHMMZ.md, HHMMZ-NN.md, or HHMMSSZ-<hash>.md schema" >&2 |
Copilot
AI
Apr 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0: The shard schema requires a 6-column row (and comments say this tool checks a “6-column markdown table row”), but the implementation only validates the col1 timestamp prefix and never enforces column count / trailing final |. As a result, malformed shard rows can pass this check while still violating the declared schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforce full six-column shard row schema
This regex only validates that the row starts with a timestamp in col1, but it does not verify that the rest of the required table structure exists. As a result, malformed rows such as | 2026-04-30T23:20Z | only-one-column | pass validation, so column-loss/schema-drift bugs can slip through despite this script being the write-time schema guard.
Useful? React with 👍 / 👎.
Copilot
AI
Apr 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0: The col1 regex is too strict about spacing after the timestamp. It currently requires two spaces after the | that closes col1 (...Z) \| ), but existing shard rows are formatted like | <ts> | <col2> | ... (single space). This will flag valid shards as violations. Relax the pattern to allow exactly one space (or any [[:space:]]+) after the col1 boundary, while still forbidding parentheticals/extra prose inside col1.
| if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|\ ]]; then | |
| if [[ "$first_line" =~ ^\|\ ([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}(:[0-9]{2})?Z)\ \|[[:space:]]+ ]]; then |
Copilot
AI
Apr 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P2: The violation message hard-codes the with-seconds example (...THH:MM:SSZ) even though the matcher accepts both ...THH:MMZ and ...THH:MM:SSZ. Updating the message to reflect the accepted forms will reduce confusion when this trips.
| echo "VIOLATION: $path_rel — col1 must be exactly '| YYYY-MM-DDTHH:MM:SSZ | ...' (no parenthetical, no extra prose)" >&2 | |
| echo "VIOLATION: $path_rel — col1 must be exactly '| YYYY-MM-DDTHH:MMZ | ...' or '| YYYY-MM-DDTHH:MM:SSZ | ...' (no parenthetical, no extra prose)" >&2 |
Copilot
AI
Apr 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: Comment says the shard scan “skip README.md and any schema/* docs”, but the find only excludes README.md. Either implement the additional exclusion(s) or update the comment so the scan behavior is accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
HHMMSSfilename branch makes the-<hash>part optional, so files like232045Z.mdare treated as valid even though the documented schema allows onlyHHMMZ.mdorHHMMSSZ-<short-content-hash>.md. In high-concurrency use, this accepts second-resolution names without the collision discriminator the schema relies on, which weakens the uniqueness guarantee this check is supposed to enforce.Useful? React with 👍 / 👎.