Skip to content

ci(backlog): index-integrity workflow — Phase 1c per BACKLOG ADR#492

Merged
AceHack merged 3 commits intomainfrom
ci/backlog-index-integrity-phase-1c
Apr 25, 2026
Merged

ci(backlog): index-integrity workflow — Phase 1c per BACKLOG ADR#492
AceHack merged 3 commits intomainfrom
ci/backlog-index-integrity-phase-1c

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 25, 2026

Summary

Phase 1c per the BACKLOG-per-row-file ADR (PR #474, just merged). The ADR named tools/backlog/lint-index.sh as the proposed shape; this PR lands the equivalent function as a CI workflow since there's no pre-commit-hook framework wired up in this repo — CI is the enforcement surface.

Pre-Phase-2 mode (current state)

docs/BACKLOG.md is still the monolithic authoritative file (~12,800 lines) and is hand-edited. The workflow detects this via the absence of the AUTO-GENERATED header line emitted by generate-index.sh, and only verifies per-row files are well-formed (parseable). Guards B-0001 + B-0002 without false-positives on the legacy file.

Phase 2+ mode (after bulk migration)

generate-index.sh --check runs, which exits non-zero on drift. Workflow becomes load-bearing at that point.

Test plan

  • Mirrors memory-index-integrity.yml safe patterns (SHA-pinned action, minimum permissions, concurrency group, no user-input)
  • Verified locally: head check correctly identifies pre-Phase-2 mode
  • generate-index.sh --stdout produces parseable output for current per-row files

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 25, 2026 14:49
@AceHack AceHack enabled auto-merge (squash) April 25, 2026 14:49
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 060d8d8206

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/backlog-index-integrity.yml
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a dedicated CI workflow to enforce integrity between per-row backlog files (docs/backlog/**) and the generated backlog index (docs/BACKLOG.md), with a “Phase 1c / pre-Phase-2” fallback mode.

Changes:

  • Introduces .github/workflows/backlog-index-integrity.yml to run on PRs/pushes touching backlog tooling/content.
  • Implements a sentinel check to skip drift enforcement until docs/BACKLOG.md is migrated to the generated index format.
  • Runs tools/backlog/generate-index.sh --check once Phase 2+ mode is detected.

Comment thread .github/workflows/backlog-index-integrity.yml
Comment thread .github/workflows/backlog-index-integrity.yml Outdated
AceHack added a commit that referenced this pull request Apr 25, 2026
Three PR #492 review threads addressed:

1. **chatgpt-codex P1 + copilot P1** — sentinel hides
   missing docs/BACKLOG.md as pre-Phase-2 mode and exits 0,
   which would let an accidental delete of the authoritative
   backlog ship green. Added explicit existence preconditions
   that fail fast on missing docs/BACKLOG.md OR missing/
   non-executable tools/backlog/generate-index.sh.

2. **copilot P2** — pre-Phase-2 parseability proxy
   (`generate-index.sh --stdout > /dev/null`) is too weak.
   The generator is forgiving: a row with bad frontmatter
   produces an empty index line, not an error. Replaced with
   explicit awk-extraction of id/status/title for every
   B-*.md file and an empty-value check; any row missing all
   three required fields fails the workflow with a clear
   per-file message. Belt-and-suspenders: still runs
   generator end-to-end for structural issues the field-only
   check might miss.

Verified locally with bash:
- 2 per-row files enumerated (B-0001 + B-0002)
- All 3 required fields extracted cleanly from each
- bad_count=0
- generator runs cleanly

Net: workflow is now defensive-by-default. Missing files
surface as errors immediately; malformed per-row files
surface with file-by-file diagnostics.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c88c63a65a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/backlog-index-integrity.yml Outdated
AceHack added 3 commits April 25, 2026 11:09
The BACKLOG-per-row-file ADR (PR #474, just merged) committed
to a Phase 1c lint that enforces docs/BACKLOG.md ↔ docs/backlog/
per-row parity. The ADR named tools/backlog/lint-index.sh as the
proposed shape, but there is no pre-commit-hook framework in
this repo — the CI surface is the equivalent enforcement point,
so this lands as a workflow rather than a separate wrapper
script.

Mirrors the structure of memory-index-integrity.yml: SHA-pinned
checkout, explicit minimum permissions, concurrency group, no
user-authored input touched.

**Pre-Phase-2 mode** (current state): docs/BACKLOG.md is still
the monolithic authoritative file (~12,800 lines) and is
hand-edited. The workflow detects this via the absence of the
"AUTO-GENERATED" header line emitted by generate-index.sh, and
in that mode only verifies the per-row files themselves are
well-formed (parseable by the generator). This guards the
existing per-row files (B-0001 example, B-0002 Otto-287 Noether)
without false-positive-flagging the legacy monolithic file.

**Phase 2+ mode** (after bulk migration): the workflow runs
generate-index.sh --check, which exits non-zero on any drift
between per-row files and the generated index. Becomes
load-bearing at that point.

Verified locally:
- head -5 docs/BACKLOG.md does NOT match AUTO-GENERATED →
  pre-Phase-2 path taken
- generate-index.sh --stdout produces parseable output
Three PR #492 review threads addressed:

1. **chatgpt-codex P1 + copilot P1** — sentinel hides
   missing docs/BACKLOG.md as pre-Phase-2 mode and exits 0,
   which would let an accidental delete of the authoritative
   backlog ship green. Added explicit existence preconditions
   that fail fast on missing docs/BACKLOG.md OR missing/
   non-executable tools/backlog/generate-index.sh.

2. **copilot P2** — pre-Phase-2 parseability proxy
   (`generate-index.sh --stdout > /dev/null`) is too weak.
   The generator is forgiving: a row with bad frontmatter
   produces an empty index line, not an error. Replaced with
   explicit awk-extraction of id/status/title for every
   B-*.md file and an empty-value check; any row missing all
   three required fields fails the workflow with a clear
   per-file message. Belt-and-suspenders: still runs
   generator end-to-end for structural issues the field-only
   check might miss.

Verified locally with bash:
- 2 per-row files enumerated (B-0001 + B-0002)
- All 3 required fields extracted cleanly from each
- bad_count=0
- generator runs cleanly

Net: workflow is now defensive-by-default. Missing files
surface as errors immediately; malformed per-row files
surface with file-by-file diagnostics.
Resolves chatgpt-codex P2 review on PR #492. The earlier
extraction (`awk '/^id:/'`, etc.) matched anywhere in the file,
so a malformed frontmatter could falsely pass if `id:` /
`status:` / `title:` happened to appear later in the body
(e.g., in a code block, an example, or a discussion of the
schema itself).

Now uses an explicit `extract_frontmatter_field` shell function
that mirrors `tools/backlog/generate-index.sh`'s `extract_field`
state machine: state=0 before the opening `---`, state=1 inside
the frontmatter, exit on the closing `---`. Field match only
fires when state==1, so body-text matches cannot mask missing
frontmatter.

Verified locally with bash:
- Both real per-row files (B-0001, B-0002) parse cleanly.
- A simulated row with `status: open` only in the body (NOT in
  frontmatter) correctly returns empty for status, triggering
  the bad_count failure path.

Same gate intent (id+status+title required), now with the hole
closed.
Copilot AI review requested due to automatic review settings April 25, 2026 15:09
@AceHack AceHack force-pushed the ci/backlog-index-integrity-phase-1c branch from c88c63a to d3fb04c Compare April 25, 2026 15:09
@AceHack AceHack merged commit 8ec4a54 into main Apr 25, 2026
16 checks passed
@AceHack AceHack deleted the ci/backlog-index-integrity-phase-1c branch April 25, 2026 15:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment on lines +118 to +123
awk -v field="$field" '
BEGIN { state = 0 }
/^---$/ {
if (state == 0) { state = 1; next }
if (state == 1) { exit }
}
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_frontmatter_field does not actually guarantee the match is confined to frontmatter if the closing --- delimiter is missing: state stays 1 to EOF, so id:/status:/title: in the body can be mistakenly accepted. If the goal is to reject malformed frontmatter, consider explicitly requiring a closing delimiter (and failing non-zero if it’s absent) before attempting to extract fields.

Copilot uses AI. Check for mistakes.
Comment on lines +136 to +140
id=$(extract_frontmatter_field "$row" "id")
status=$(extract_frontmatter_field "$row" "status")
title=$(extract_frontmatter_field "$row" "title")
if [ -z "$id" ] || [ -z "$status" ] || [ -z "$title" ]; then
echo " bad: $row (missing id/status/title in frontmatter)" >&2
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “non-empty values” validation can be bypassed by values that are only quotes or whitespace (e.g., title: "" or title: ): the awk extraction returns the raw remainder of the line and the -z check treats it as non-empty. To enforce the stated invariant, trim surrounding quotes and trailing whitespace (ideally matching tools/backlog/generate-index.sh’s extract_field behavior) before the emptiness check.

Copilot uses AI. Check for mistakes.
# case, only verify the per-row files themselves are
# well-formed (parseable frontmatter), and skip the
# index-equivalence check.
if ! head -5 docs/BACKLOG.md | grep -q "AUTO-GENERATED by tools/backlog/generate-index.sh"; then
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentinel detection uses grep -q with a regex pattern. Because generate-index.sh contains . (regex wildcard), this can match unintended strings and incorrectly treat a non-generated BACKLOG.md as generated (or vice versa). Use fixed-string matching (e.g., grep -Fq -- 'AUTO-GENERATED by tools/backlog/generate-index.sh') to make the phase detection unambiguous.

Suggested change
if ! head -5 docs/BACKLOG.md | grep -q "AUTO-GENERATED by tools/backlog/generate-index.sh"; then
if ! head -5 docs/BACKLOG.md | grep -Fq -- "AUTO-GENERATED by tools/backlog/generate-index.sh"; then

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3fb04cbc4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# case, only verify the per-row files themselves are
# well-formed (parseable frontmatter), and skip the
# index-equivalence check.
if ! head -5 docs/BACKLOG.md | grep -q "AUTO-GENERATED by tools/backlog/generate-index.sh"; then
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail instead of downgrading when index marker disappears

The phase switch if ! head -5 ... grep "AUTO-GENERATED ..." treats a missing marker as pre-Phase-2 and exits success after the parseability path, so once Phase 2+ is active a PR can remove or move that header in docs/BACKLOG.md and bypass generate-index.sh --check entirely. In that scenario manual edits to the generated index can merge without drift detection, which defeats the workflow’s stated load-bearing role for Phase 2+.

Useful? React with 👍 / 👎.

echo " bad: $row (missing id/status/title in frontmatter)" >&2
bad_count=$((bad_count + 1))
fi
done < <(find docs/backlog -type f -name 'B-*.md' -print0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail when per-row backlog tree cannot be scanned

The row loop reads from process substitution done < <(find docs/backlog ...), but under bash set -e a failing find here does not terminate the script; if docs/backlog is deleted/renamed, the loop runs zero iterations and the job still exits 0 after --stdout. That means this integrity check can report green even when the per-row source tree is missing, so add an explicit existence check (or otherwise enforce find success) before trusting row_count=0.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants