Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions .github/workflows/backlog-index-integrity.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
name: backlog-index-integrity

# Enforces that `docs/BACKLOG.md` (the generated index) stays in
# sync with the per-row files under `docs/backlog/P[0-3]/B-<NNNN>-*.md`.
# When a row file is added/modified/removed, the index must be
# regenerated via `tools/backlog/generate-index.sh` in the same
# commit (or PR) so a fresh reader sees the same row set in both
# places.
#
# Phase 1c per the BACKLOG-per-row-file ADR
# (docs/DECISIONS/2026-04-22-backlog-per-row-file-restructure.md
# §"Existing substrate Phase 1a prior work" → Phase 1c OWED).
# This workflow IS the lint-index gate; it wraps
# `generate-index.sh --check` rather than introducing a separate
# `tools/backlog/lint-index.sh` wrapper, since there is no
# pre-commit-hook framework currently wired up in this repo —
# the CI surface is the equivalent enforcement point.
#
# Note: until Phase 2 bulk-migration runs, `docs/BACKLOG.md` is
# still the monolithic authoritative file. Phase 2 will overwrite
# it with the generated index. This workflow becomes load-bearing
# at that point. Until then it primarily guards the per-row
# files themselves are well-formed (B-0001 example, B-0002
# Otto-287 Noether) and skips the equivalence check.
#
# Safe-pattern compliance (mirrors memory-index-integrity.yml):
# - SHA-pinned action versions (actions/checkout@de0fac2...)
# - Explicit `permissions:` minimum
# - Only first-party trusted context (github.sha) — no
# user-authored text is referenced.
# - Concurrency group + cancel-in-progress: false.
# - runs-on: ubuntu-24.04 pinned.

on:
pull_request:
paths:
- "docs/backlog/**"
- "docs/BACKLOG.md"
- "tools/backlog/**"
push:
branches: [main]
paths:
- "docs/backlog/**"
- "docs/BACKLOG.md"
- "tools/backlog/**"
workflow_dispatch: {}

permissions:
contents: read

concurrency:
group: backlog-index-integrity-${{ github.ref }}
cancel-in-progress: false

jobs:
check:
name: check docs/BACKLOG.md generated-index drift
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1

- name: verify per-row + index parity
shell: bash
run: |
set -euo pipefail

# Existence preconditions — fail fast (chatgpt-codex P1 +
# copilot P1 review on PR #492). A missing docs/BACKLOG.md
# or missing generator must NOT silently fall into
# pre-Phase-2 mode and exit 0; that would let an accidental
# delete of the authoritative backlog ship green.
if [ ! -f docs/BACKLOG.md ]; then
echo "ERROR: docs/BACKLOG.md is missing." >&2
echo "This file is the authoritative backlog (pre-Phase-2)" >&2
echo "or the generated index (Phase 2+). Either way it" >&2
echo "must exist. Restore it or revert the deletion." >&2
exit 1
fi
if [ ! -x tools/backlog/generate-index.sh ]; then
echo "ERROR: tools/backlog/generate-index.sh missing or" >&2
echo "non-executable. Phase 1a tooling is required." >&2
exit 1
fi

# Pre-Phase-2 sentinel: if docs/BACKLOG.md still looks
# monolithic (lacks the "AUTO-GENERATED" header line
# emitted by generate-index.sh), the drift check would
# fire false-positives on every per-row edit since the
# legacy file isn't yet the generated output. In that
# case, only verify the per-row files themselves are
# well-formed (parseable frontmatter), and skip the
# index-equivalence check.
if ! head -5 docs/BACKLOG.md | grep -q "AUTO-GENERATED by tools/backlog/generate-index.sh"; then
Comment thread
AceHack marked this conversation as resolved.
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentinel detection uses grep -q with a regex pattern. Because generate-index.sh contains . (regex wildcard), this can match unintended strings and incorrectly treat a non-generated BACKLOG.md as generated (or vice versa). Use fixed-string matching (e.g., grep -Fq -- 'AUTO-GENERATED by tools/backlog/generate-index.sh') to make the phase detection unambiguous.

Suggested change
if ! head -5 docs/BACKLOG.md | grep -q "AUTO-GENERATED by tools/backlog/generate-index.sh"; then
if ! head -5 docs/BACKLOG.md | grep -Fq -- "AUTO-GENERATED by tools/backlog/generate-index.sh"; then

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail instead of downgrading when index marker disappears

The phase switch if ! head -5 ... grep "AUTO-GENERATED ..." treats a missing marker as pre-Phase-2 and exits success after the parseability path, so once Phase 2+ is active a PR can remove or move that header in docs/BACKLOG.md and bypass generate-index.sh --check entirely. In that scenario manual edits to the generated index can merge without drift detection, which defeats the workflow’s stated load-bearing role for Phase 2+.

Useful? React with 👍 / 👎.

echo "Phase pre-2 mode: docs/BACKLOG.md is the monolithic" >&2
echo "authoritative file; skipping generated-index drift" >&2
echo "check. Verifying per-row files instead." >&2

# Per-row file parseability check (copilot P2 review on
# PR #492). The generator is forgiving — a row file with
# bad frontmatter would silently produce empty index lines,
# not error out. So we explicitly require id+status+title
# extraction to succeed for every row file, with non-empty
# values, before declaring "parseable".
#
# Frontmatter-scoped extraction (chatgpt-codex P2 follow-up
# review): the awk match is restricted to lines BETWEEN the
# opening and closing `---` markers, mirroring
# tools/backlog/generate-index.sh's extract_field state
# machine. Without this scope, a row with malformed
# frontmatter could falsely pass if the body happened to
# contain `id:` / `status:` / `title:` text (e.g., in a
# code block or example). The frontmatter-scoped match
# closes that hole.
extract_frontmatter_field() {
local file="$1" field="$2"
awk -v field="$field" '
BEGIN { state = 0 }
/^---$/ {
if (state == 0) { state = 1; next }
if (state == 1) { exit }
}
Comment on lines +118 to +123
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_frontmatter_field does not actually guarantee the match is confined to frontmatter if the closing --- delimiter is missing: state stays 1 to EOF, so id:/status:/title: in the body can be mistakenly accepted. If the goal is to reject malformed frontmatter, consider explicitly requiring a closing delimiter (and failing non-zero if it’s absent) before attempting to extract fields.

Copilot uses AI. Check for mistakes.
state == 1 && $0 ~ "^"field":[[:space:]]+" {
sub("^"field":[[:space:]]+", "")
print
exit
}
' "$file"
}

row_count=0
bad_count=0
while IFS= read -r -d '' row; do
row_count=$((row_count + 1))
id=$(extract_frontmatter_field "$row" "id")
status=$(extract_frontmatter_field "$row" "status")
title=$(extract_frontmatter_field "$row" "title")
if [ -z "$id" ] || [ -z "$status" ] || [ -z "$title" ]; then
echo " bad: $row (missing id/status/title in frontmatter)" >&2
Comment on lines +136 to +140
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “non-empty values” validation can be bypassed by values that are only quotes or whitespace (e.g., title: "" or title: ): the awk extraction returns the raw remainder of the line and the -z check treats it as non-empty. To enforce the stated invariant, trim surrounding quotes and trailing whitespace (ideally matching tools/backlog/generate-index.sh’s extract_field behavior) before the emptiness check.

Copilot uses AI. Check for mistakes.
bad_count=$((bad_count + 1))
fi
done < <(find docs/backlog -type f -name 'B-*.md' -print0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail when per-row backlog tree cannot be scanned

The row loop reads from process substitution done < <(find docs/backlog ...), but under bash set -e a failing find here does not terminate the script; if docs/backlog is deleted/renamed, the loop runs zero iterations and the job still exits 0 after --stdout. That means this integrity check can report green even when the per-row source tree is missing, so add an explicit existence check (or otherwise enforce find success) before trusting row_count=0.

Useful? React with 👍 / 👎.


echo " per-row files: $row_count total, $bad_count malformed" >&2

if [ "$bad_count" -gt 0 ]; then
echo "ERROR: $bad_count per-row file(s) have malformed" >&2
echo "frontmatter (missing id/status/title). Each row" >&2
echo "must have all three fields per Otto-181 schema." >&2
exit 1
fi

# Belt-and-suspenders: also exercise the generator end-to-end
# to catch any structural issues the field-only check missed.
./tools/backlog/generate-index.sh --stdout > /dev/null

echo "per-row files parseable + generator runs cleanly" >&2
exit 0
Comment thread
AceHack marked this conversation as resolved.
fi

# Phase 2+ mode: docs/BACKLOG.md is the generated index;
# any drift between it and the per-row tree is a violation.
./tools/backlog/generate-index.sh --check
Loading