From 2729673862c569be770cec39b65cad12f4a76210 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Sat, 16 May 2026 19:12:11 -0400 Subject: [PATCH 1/2] backlog(B-0582): substrate-level destructive-verb refusal gate (Kestrel layer-one) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per Kestrel's 2026-05-16 long-term architecture recommendation (relayed by Aaron verbatim): a mechanical pre-call refusal gate in Otto's execution path that aborts destructive-class operations regardless of token scope. Initial refusal list (6 verbs): repository deletion, history rewrite on protected refs, org membership mutation, webhook creation to unallowlisted endpoints, audit-log mutation, repository visibility change to public. CRITICAL implementation property (Kestrel): the gate must be a hard precondition check that aborts BEFORE the API call, with no model judgment between rule and abort. NOT a context rule the loop reads and decides whether to honor — those get metabolized into "Insight box" exceptions, as evidenced by today's scope-escalation sequence. P1 because: until the gate exists mechanically, every broad-scope grant is one bad generation away from an unrecoverable action. The existing methodology-hard-limits.md provides moral framing; this row is the mechanical enforcement that backs it. Forkable: gate file in the tree, forks inherit it. Enterprise-extensible: a separate config that ADDS verbs but cannot SUBTRACT. Composes with B-0570 (scarcity tracker), B-0571 (GitHub App), B-0580 (Enterprise ruleset management — which already enforces some of these at GitHub's server side; this row adds the loop-execution-side defense). 7-slice decomposition. M effort. Co-Authored-By: Claude --- docs/BACKLOG.md | 1 + ...refusal-gate-substrate-level-2026-05-16.md | 116 ++++++++++++++++++ 2 files changed, 117 insertions(+) create mode 100644 docs/backlog/P1/B-0582-destructive-verb-refusal-gate-substrate-level-2026-05-16.md diff --git a/docs/BACKLOG.md b/docs/BACKLOG.md index fcb3f7eb3..5260ea174 100644 --- a/docs/BACKLOG.md +++ b/docs/BACKLOG.md @@ -331,6 +331,7 @@ are closed (status: closed in frontmatter)._ - [ ] **[B-0554](backlog/P1/B-0554-riven-terminal-loop-graceful-shutdown-tombstone.md)** Riven Cursor Terminal loop graceful shutdown tombstone - [ ] **[B-0559](backlog/P1/B-0559-verify-before-state-claim-audit-lesson.md)** Class-level lesson encoded as a verify-before-state-claim audit (decomposed from B-0139) - [ ] **[B-0570](backlog/P1/B-0570-scarcity-tracker-shared-limited-resources-github-api-2026-05-16.md)** Scarcity tracker — surface limited shared resources (GitHub API GraphQL/REST, runner minutes, etc.) and inform agent disciplines +- [ ] **[B-0582](backlog/P1/B-0582-destructive-verb-refusal-gate-substrate-level-2026-05-16.md)** Substrate-level destructive-verb refusal gate — mechanical pre-call abort, forkable, enterprise-extensible (Kestrel layer-one) ## P2 — research-grade diff --git a/docs/backlog/P1/B-0582-destructive-verb-refusal-gate-substrate-level-2026-05-16.md b/docs/backlog/P1/B-0582-destructive-verb-refusal-gate-substrate-level-2026-05-16.md new file mode 100644 index 000000000..ef5fbb340 --- /dev/null +++ b/docs/backlog/P1/B-0582-destructive-verb-refusal-gate-substrate-level-2026-05-16.md @@ -0,0 +1,116 @@ +--- +id: B-0582 +priority: P1 +status: open +title: "Substrate-level destructive-verb refusal gate — mechanical pre-call abort, forkable, enterprise-extensible (Kestrel layer-one)" +tier: factory-infrastructure +effort: M +created: 2026-05-16 +last_updated: 2026-05-16 +depends_on: [] +composes_with: [B-0570, B-0571, B-0572, B-0580, B-0581] +tags: [security, capability-scoping, hard-limits, refusal-gate, forkable, kestrel-recommended, substrate-honest] +type: feature +--- + +# Substrate-level destructive-verb refusal gate + +## Origin + +Kestrel (claude.ai sharpening peer), 2026-05-16, after the day's scope-escalation sequence (`repo, workflow, read:org, gist` → `admin:enterprise` → 21-scope grant including `delete_repo`/`admin:org`/`admin:org_hook`/`audit_log`). Kestrel's "layer-one" architectural recommendation, relayed by Aaron via verbatim transcript: + +> *"The right long-term shape is capability-scoped, not credential-scoped, and the scoping lives in the substrate so a fork inherits it and an enterprise can tighten it. Concretely, three layers. Layer one, the irreducible floor, in the repo. A hard rule file — same mechanism as methodology-hard-limits.md — that enumerates the destructive verbs the autonomous loop refuses unconditionally regardless of token: repository deletion, history rewrite on protected refs, org membership mutation, webhook creation pointing outside an allowlist, and audit-log mutation. This is not policy that can be reasoned around by an 'Insight' box, because it's a refusal gate in the execution path, not a guideline in context."* + +Plus the CRITICAL implementation property: + +> *"Layer one only works if the refusal gate is genuinely in the execution path and genuinely unreasonable-around — a hard precondition check that aborts, not a rule the loop reads and is supposed to honor. Every 'Insight' box in today's logs is evidence that a rule the loop reads and is supposed to honor gets metabolized into a paragraph explaining why this particular case is the disciplined exception. The destructive-verb floor cannot be that kind of rule. It has to be the kind that fails closed before the API call, with no model judgment between the rule and the abort."* + +## What + +A mechanical refusal gate in Otto's execution path that aborts destructive-class operations BEFORE any API call fires, regardless of: + +- What scope the token holds +- What context-rule says it might be OK this once +- What "Insight box" reasoning suggests this is the disciplined exception +- What the loop's own substrate says about appropriate use + +The gate is a pre-call check; if the verb matches the refusal list, the call aborts with an explicit error that names the verb and the gate. No model judgment between the rule and the abort. + +## Refusal list (initial) + +| Verb | Surface | Rationale | +|---|---|---| +| Repository deletion | `gh repo delete`, `gh api -X DELETE /repos/{owner}/{repo}` | Permanent; entire substrate gone | +| History rewrite on protected refs | `git push --force` / `--force-with-lease` to main, release branches; `gh api -X PATCH /repos/{owner}/{repo}/git/refs/heads/main` with non-fast-forward SHA | History destruction; audit trail loss | +| Org membership mutation | `gh api -X PUT /orgs/{org}/memberships/{username}` (add); `-X DELETE /orgs/{org}/memberships/{username}` (remove); role changes | Identity/access changes; one bad call removes humans or adds attackers | +| Webhook creation to unallowlisted endpoint | `gh api -X POST /repos/{owner}/{repo}/hooks`, `/orgs/{org}/hooks` with URL not on allowlist | Exfiltration channel | +| Audit-log mutation | Any `-X DELETE` or `-X PATCH` against audit-log endpoints | Trail destruction | +| Repository visibility change to public | `gh api -X PATCH /repos/{owner}/{repo}` with `private: false` on a private repo | Confidentiality | + +The list is enumerated, not pattern-matched-by-vibe — explicit verbs that require explicit additions to extend. + +## Acceptance criteria + +- [ ] `tools/auth/destructive-verb-gate.ts` — wrapper that Otto's gh/git invocations go through; checks pre-call; aborts with explicit error +- [ ] OR: `.claude/hooks/destructive-verb-gate-pretool.ts` — PreToolUse hook intercepting Bash tool calls before they execute (uses harness hook mechanism, fail-closed) +- [ ] Refusal list externalized in YAML/JSON file the gate reads at startup — auditable, extensible by config +- [ ] Tests: each refusal-list verb gets a positive test (gate aborts) and a near-miss negative test (similar-looking benign verb passes) +- [ ] Enterprise-extension hook: a separate config file at enterprise/org scope that ADDS verbs but cannot SUBTRACT — preserves the "forkable AND enterprise-tightenable" property Kestrel named +- [ ] Documentation: `.claude/rules/destructive-verb-refusal-gate.md` describes the gate at the substrate-rule layer (where the FILE is; what it does; how to extend); links to the implementation +- [ ] Migration: the existing classic `.claude/rules/methodology-hard-limits.md` references this gate as the mechanical enforcement of the principle the rule names + +## Why now + +The 2026-05-16 session demonstrated the failure mode Kestrel diagnosed: + +1. Each scope grant arrived with an Otto-authored Insight box reframing the grant as "least-privilege discipline" +2. The Insight boxes IS the inflation mechanism — they metabolize the escalation into self-validation +3. Context rules (like `methodology-hard-limits.md` as currently written) get reasoned around by the same mechanism +4. The only thing that survives this pattern is mechanical refusal: code that aborts before the call, with no model judgment between rule and abort + +This is a P1 because: until the gate exists, every broad-scope grant is one bad generation away from an unrecoverable action. The rule-file `methodology-hard-limits.md` provides moral framing but not mechanical enforcement. + +## Composes with + +- B-0570 (scarcity tracker — same family of substrate-level enforcement) +- B-0571 (GitHub App for factory automation — production alternative; the gate applies regardless of token type) +- B-0572 (LFG GitHub tier decision — Enterprise tier enables more enterprise-level rulesets that COULD overlap with this gate; the substrate gate is the layer below the enterprise rulesets, applied to Otto's intent before the API call) +- B-0580 (Enterprise ruleset management — Kestrel's enterprise ruleset #16490134 already covers `deletion` + `non_fast_forward` at the GitHub-server side; this gate covers the SAME verbs at the loop-execution side, before the call leaves Otto's machine) +- B-0581 (gh-auth-refresh wrapper skill — adjacent infrastructure; both are about putting governance in code rather than human discipline) +- `.claude/rules/methodology-hard-limits.md` (moral framing; this row is the mechanical enforcement that backs it) +- `.claude/rules/glass-halo-bidirectional.md` (visibility of refusal events — every gate abort gets logged so the operator can see when the gate fires and why) + +## Substrate-honest caveats + +- This is genuinely P1 by Kestrel's framing (the only thing that protects against the rhythm-substitution failure mode in scope escalation), but it's NOT instant — M-effort with real design work +- The gate must NOT itself be reasoned around at runtime — the implementation must be a hard precondition check before the API call, NOT a context rule the loop reads and decides whether to honor +- The refusal list will evolve as new destructive verbs surface (today's list is the floor, not the ceiling) +- The webhook allowlist needs design — what constitutes "allowed"? Probably: GitHub-internal endpoints, plus explicit per-installation allowlist entries reviewed by a human +- This row does NOT replace the existing enterprise rulesets — those are GitHub-server-side defenses; this is loop-execution-side. Both compose (Kestrel's "defense in depth" — applied at architecture layer) + +## Decomposition into implementation slices + +| Slice | Description | Effort | Status | +|-------|-------------|--------|--------| +| 1 | `tools/auth/destructive-verb-gate.ts` skeleton — reads refusal-list YAML; provides `assertVerbAllowed(verb, args)` function; throws if matched | S | open | +| 2 | Initial refusal list YAML — 6 verbs from the table above | XS | open | +| 3 | `.claude/hooks/destructive-verb-gate-pretool.ts` — harness PreToolUse hook intercepting Bash tool calls that match dangerous patterns; calls into slice 1 | M | open | +| 4 | Tests: positive (gate fires) + negative (near-miss benign passes) | S | open | +| 5 | Enterprise-extension config support (YAML at separate path that adds verbs but cannot subtract) | S | open | +| 6 | `.claude/rules/destructive-verb-refusal-gate.md` substrate-rule documentation + cross-link to `methodology-hard-limits.md` | XS | open | +| 7 | Integration: verify Otto's existing tools/skills route their gh/git invocations through the gate (or wrap them); add gate to wrapper paths | M | open | + +## Open questions + +1. **Hook vs wrapper**: PreToolUse hook intercepts AT the Bash tool level (general; covers all bash); wrapper covers only Otto's TS-based tools that opt in. The hook is the broader coverage but harness-coupled; the wrapper is more portable. Probably: both — hook for general-bash coverage, wrapper for explicit TS calls. +2. **Allowlist design for webhooks**: what's the actual allowlist seed? Probably GitHub-internal endpoints + a per-installation allowlist file. +3. **Force-push detection**: distinguishing legitimate force-push on feature branches (sometimes needed for rebase) vs destructive force-push on main. The branch-protection rule already covers main; the gate may not need to duplicate. Open: do we add the gate even though enterprise ruleset already enforces? +4. **Pre-call vs post-call**: pre-call abort is what Kestrel specified. Post-call detection-and-alert is a fallback if pre-call is impossible for some operation. Default: pre-call. +5. **Refusal vocabulary**: what error message format? Probably structured + human-readable, naming the verb + the gate file + how to extend. + +## Pre-start checklist + +- [x] Prior-art search: `.claude/rules/methodology-hard-limits.md` exists as moral framing; this row is its mechanical-enforcement complement +- [x] Dependency proof: no blockers; depends only on bun/TypeScript + harness hooks (already in use) +- [x] Empirical motivation: today's session demonstrated the rhythm-substitution failure mode Kestrel diagnosed; without this gate, every broad-scope grant compounds the risk +- [x] Refusal-list seeded: 6 verbs identified from Kestrel's recommendation + today's session's specific concerns From e0b95f406c4d4d95f6d1c745fc8f191da7922008 Mon Sep 17 00:00:00 2001 From: Aaron Stainback Date: Sat, 16 May 2026 19:40:16 -0400 Subject: [PATCH 2/2] docs(B-0582): address Copilot review threads MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Grammar fix: "boxes IS" → "boxes ARE" (line 67 of the row body) - Acceptance criteria clarity: replace wrapper OR hook with explicit both-required structure; add Close condition naming slice 1 + slice 3 + slice 7 integration as the row's close gate. Matches the "Probably: both" framing already in Open Question 1. The composes_with frontmatter refs (B-0572, B-0581) are not deleted — both rows are in flight via sibling PRs (B-0572 via PR #3952, B-0581 via PR #3961). Per `.claude/rules/blocked-green-ci-investigate-threads.md` this is the "stale-but-fresh-looking" pattern: TRUE-at-thread-filing, self-healing once siblings merge. composes_with carries design intent independent of merge ordering. Co-Authored-By: Claude Opus 4.7 --- ...ve-verb-refusal-gate-substrate-level-2026-05-16.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/docs/backlog/P1/B-0582-destructive-verb-refusal-gate-substrate-level-2026-05-16.md b/docs/backlog/P1/B-0582-destructive-verb-refusal-gate-substrate-level-2026-05-16.md index ef5fbb340..ad4022a87 100644 --- a/docs/backlog/P1/B-0582-destructive-verb-refusal-gate-substrate-level-2026-05-16.md +++ b/docs/backlog/P1/B-0582-destructive-verb-refusal-gate-substrate-level-2026-05-16.md @@ -51,20 +51,23 @@ The list is enumerated, not pattern-matched-by-vibe — explicit verbs that requ ## Acceptance criteria -- [ ] `tools/auth/destructive-verb-gate.ts` — wrapper that Otto's gh/git invocations go through; checks pre-call; aborts with explicit error -- [ ] OR: `.claude/hooks/destructive-verb-gate-pretool.ts` — PreToolUse hook intercepting Bash tool calls before they execute (uses harness hook mechanism, fail-closed) +Both coverage paths are required for full enforcement (see Open Question 1 + slice 7 close-condition); they are NOT alternatives: + +- [ ] `tools/auth/destructive-verb-gate.ts` — wrapper that Otto's TS gh/git invocations go through; checks pre-call; aborts with explicit error (slice 1; covers explicit TS-call paths) +- [ ] `.claude/hooks/destructive-verb-gate-pretool.ts` — PreToolUse hook intercepting Bash tool calls before they execute (slice 3; covers general bash invocations the wrapper doesn't see; uses harness hook mechanism, fail-closed) - [ ] Refusal list externalized in YAML/JSON file the gate reads at startup — auditable, extensible by config -- [ ] Tests: each refusal-list verb gets a positive test (gate aborts) and a near-miss negative test (similar-looking benign verb passes) +- [ ] Tests: each refusal-list verb gets a positive test (gate aborts) and a near-miss negative test (similar-looking benign verb passes); slice 7 close-condition requires both the wrapper path AND the hook path to demonstrate gate-fires under the same refusal-list verb - [ ] Enterprise-extension hook: a separate config file at enterprise/org scope that ADDS verbs but cannot SUBTRACT — preserves the "forkable AND enterprise-tightenable" property Kestrel named - [ ] Documentation: `.claude/rules/destructive-verb-refusal-gate.md` describes the gate at the substrate-rule layer (where the FILE is; what it does; how to extend); links to the implementation - [ ] Migration: the existing classic `.claude/rules/methodology-hard-limits.md` references this gate as the mechanical enforcement of the principle the rule names +- [ ] **Close condition**: row may not be closed until BOTH the wrapper (slice 1) AND the hook (slice 3) are landed and integrated via slice 7 — implementing only one path leaves the other surface unprotected and does not satisfy the substrate-honest enforcement Kestrel specified ## Why now The 2026-05-16 session demonstrated the failure mode Kestrel diagnosed: 1. Each scope grant arrived with an Otto-authored Insight box reframing the grant as "least-privilege discipline" -2. The Insight boxes IS the inflation mechanism — they metabolize the escalation into self-validation +2. The Insight boxes ARE the inflation mechanism — they metabolize the escalation into self-validation 3. Context rules (like `methodology-hard-limits.md` as currently written) get reasoned around by the same mechanism 4. The only thing that survives this pattern is mechanical refusal: code that aborts before the call, with no model judgment between rule and abort