feat(B-0058): add filter-gate honesty log tool#2110
Conversation
Implements B-0058 responsibility #3: candidate-failure honesty log. Records pass/fail/defer decisions for candidate adoptions from downstream research tracks as append-only JSONL, preventing the rubber-stamping failure mode the three-filter discipline exists to catch. Also fixes a pre-existing test regression in audit_retractibility where git grep found the test file's own literal path string. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8e69cd044f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const code = main([ | ||
| "--record", | ||
| "--candidate", "skill:test-entry", | ||
| "--source", "B-0056", | ||
| "--decision", "pass", |
There was a problem hiding this comment.
Keep filter-gate tests from mutating repository state
This test invokes main in --record mode, which appends to the real repo log (tools/alignment/out/filter-gate-log.jsonl) and never cleans it up, so every test run dirties the working tree and can interfere with workflows that require a clean checkout after tests. The test should isolate writes to a temp path (or temp repo) and remove artifacts in teardown instead of writing to the canonical log location.
Useful? React with 👍 / 👎.
| for (const line of content.split("\n")) { | ||
| if (line.trim() === "") continue; | ||
| try { | ||
| entries.push(JSON.parse(line) as FilterGateEntry); |
There was a problem hiding this comment.
Validate parsed log rows before treating them as entries
readLog accepts any syntactically valid JSON line via a type cast without checking required fields, so a line like {} is admitted and later --list --md can throw when code dereferences e.clauses.length. This turns one structurally invalid row (e.g., manual edit/merge artifact) into a runtime failure for listing, so parsed objects should be schema-validated before being added.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a new Bun/TypeScript alignment tool to record and report filter-gate decisions (pass/fail/defer) as an append-only log, supporting B-0058 responsibility #3, and fixes a small regression in an existing retractibility test.
Changes:
- Added
tools/alignment/filter_gate_log.tsto record/list/summarize filter-gate decisions to a JSONL log undertools/alignment/out/. - Added Bun tests for the new tool in
tools/alignment/filter_gate_log.test.ts. - Updated the alignment tools README and adjusted
audit_retractibility.test.tsto avoid a self-matching grep needle.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| tools/alignment/README.md | Registers the new filter-gate honesty log tool in the alignment tool index. |
| tools/alignment/filter_gate_log.ts | Implements record/list/summary CLI for append-only filter-gate decision logging. |
| tools/alignment/filter_gate_log.test.ts | Adds Bun tests covering parsing, reading JSONL, summarization, and basic CLI flows. |
| tools/alignment/audit_retractibility.test.ts | Fixes a regression by using a synthetic “nonexistent path” needle that can’t self-match. |
| import { afterEach, describe, expect, test } from "bun:test"; | ||
| import { existsSync, mkdirSync, rmSync, writeFileSync } from "node:fs"; | ||
| import { join } from "node:path"; | ||
| import { | ||
| computeSummary, | ||
| type Decision, | ||
| type FilterGateEntry, | ||
| main, | ||
| parseArgs, | ||
| readLog, | ||
| recordEntry, |
| test("returns 0 for valid --record", () => { | ||
| const code = main([ | ||
| "--record", | ||
| "--candidate", "skill:test-entry", | ||
| "--source", "B-0056", | ||
| "--decision", "pass", | ||
| "--rationale", "Integration test entry", | ||
| ]); | ||
| expect(code).toBe(0); | ||
| }); |
| function repoRoot(): string { | ||
| const result = spawnSync( | ||
| "git", | ||
| ["rev-parse", "--show-toplevel"], | ||
| { encoding: "utf8" }, | ||
| ); | ||
| if (result.error) throw new Error(`git rev-parse failed: ${result.error.message}`); | ||
| if (result.status !== 0) throw new Error(`git rev-parse exited with status ${String(result.status)}`); | ||
| return result.stdout.trim(); | ||
| } | ||
|
|
||
| function gitUser(): string { | ||
| const result = spawnSync( | ||
| "git", | ||
| ["config", "user.name"], | ||
| { encoding: "utf8" }, | ||
| ); | ||
| if (result.error || result.status !== 0) return "unknown"; | ||
| return result.stdout.trim(); | ||
| } |
| } | ||
| if (parsed.kind === "error") { | ||
| process.stderr.write(`${parsed.message}\n`); | ||
| return 1; |
| let i = 0; | ||
| while (i < argv.length) { | ||
| const arg = argv[i] ?? ""; | ||
| if (arg === "-h" || arg === "--help") return { kind: "help" }; | ||
|
|
||
| if (arg === "--record") { mode = "record"; i += 1; continue; } | ||
| if (arg === "--list") { mode = "list"; i += 1; continue; } | ||
| if (arg === "--summary") { mode = "summary"; i += 1; continue; } | ||
| if (arg === "--json") { json = true; i += 1; continue; } | ||
| if (arg === "--md") { md = true; i += 1; continue; } |
| lines.push("| Timestamp | Candidate | Source | Decision | Clauses | Rationale |"); | ||
| lines.push("| --- | --- | --- | --- | --- | --- |"); | ||
| for (const e of entries) { | ||
| const clauseStr = e.clauses.length > 0 ? e.clauses.join(", ") : "(none)"; | ||
| lines.push(`| ${e.timestamp} | ${e.candidate} | ${e.source} | **${e.decision}** | ${clauseStr} | ${e.rationale} |`); | ||
| } |
The squash merge of PR #2110 landed the original commit before the tsc fix commit, leaving 6 unused imports that fail the lint (tsc tools) CI check (noUnusedLocals). Removes afterEach, existsSync, mkdirSync, join, Decision, and recordEntry. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
tools/alignment/filter_gate_log.tsrecords pass/fail/defer decisions for candidate adoptions from downstream research tracks (B-0056/B-0057/B-0059) as append-only JSONL attools/alignment/out/filter-gate-log.jsonl.audit_retractibility.test.tswheregit grepfound the test file's own literal path string (1 fail → 0 fail).What the tool does
--recordmode: appends a structured JSON entry with candidate, source track, decision (pass/fail/defer), rationale, clause citations, author, and timestamp--listmode: reads and displays the log (plain/json/md)--summarymode: aggregates by decision type and source trackFocused checks
bun test tools/alignment/→ 93 pass, 0 fail (was 59 pass / 1 fail before this PR)dotnet build -c Release→ 0 Warning(s), 0 Error(s)B-0058 progress after this PR
audit_retractibility.tsaudit_clause_coverage.tsfilter_gate_log.tsaudit_clause_drift.ts🤖 Generated with Claude Code