docs(B-0807): classifier-bypass findings schema and redaction rules#5740
Merged
AceHack merged 2 commits intoMay 28, 2026
Merged
Conversation
Lands the findings schema gate B-0799's audit-log shape and every future
B-0720 empirical mapping row depend on. The schema fixes the only format
in which a classifier-bypass observation may be preserved in shared
substrate.
Schema document: `docs/security/B-0807-classifier-bypass-findings-schema.md`
Active version: `schema_version: 1`
Defines:
- 12-field findings record shape (finding_id, schema_version,
boundary_version, created, evidence_class, risk_class,
observation_class, redaction_level, safety_signal, omitted_fields,
reviewer_gate, reviewer_signoff, composes_with)
- Evidence classes inherited from B-0798 (landed-provenance,
redacted-observation, harmless-synthetic-fixture, negative-control,
policy-anchor, refusal-required)
- Risk classes (non-reproductive, reproductive-if-verbatim,
reproductive-irrespective-of-form)
- Observation classes matching B-0799 (no-signal, redaction-required,
refusal-required, boundary-error)
- Redaction ladder (summary-only, reviewer-summary, reviewer-restricted,
refusal-required)
- Refusal-required state for observations that must not be preserved
- Reviewer sign-off matrix gated on risk_class + observation_class
- Cite-or-block rule for future empirical mapping rows
- Forbidden field values (no deployable payloads, no real secrets, no
real PII, no harmful content, no reproduction ordering)
- Versioning policy: loosening requires the B-0810 ratification gate
Why this is safe to land: the schema is a specification document. It
contains no runnable bypass material, no settings payloads, no exact
permission patterns, no real harmful content, no real secrets, no real
PII, and no ordered reproduction steps. It exists so that any future
B-0720 finding can preserve safety signal without preserving the
recipe.
Companion edits:
- `docs/security/B-0720-classifier-bypass-research-boundary.md`:
reporting-rule paragraph now points to the live schema path and
active version.
- `docs/backlog/P0/B-0807-...`: status -> closed; acceptance ticked;
Output section added; last_updated -> 2026-05-28.
- `docs/backlog/P0/B-0720-...`: B-0807 acceptance bullet ticked with
schema path + version.
- `docs/BACKLOG.md`: regenerated via `BACKLOG_WRITE_FORCE=1 bun
tools/backlog/generate-index.ts` so the index reflects B-0807 closed.
Focused checks:
- `grep "Bash\(|_ip_risk_acceptance|_legal_acceptance|_policy_override|permissions.*allow|settings\.json.*{"` over the new schema doc returns no matches -- the schema doesn't quote the very payloads it forbids.
- `git diff --stat` shows 5 files changed; no unrelated files swept in.
Composes with:
- B-0798 hard-limits boundary (the floor this schema sits on)
- B-0799 synthetic-only harness design (resolves `schema_version`
audit-log field)
- B-0720 parent safety row (B-0807 acceptance bullet now ticked)
- `.claude/rules/classifier-bypass-research-do-not-deploy-without-zeta-safer-floor.md`
- `.claude/rules/methodology-hard-limits.md`
Operative authorization: aaron 2026-05-14: "- **Devil-pole** (edge-runner drive): keep pushing, discover, go hard, never-be-idle"
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Pull request overview
Defines and lands the B-0807 “classifier-bypass findings” reporting schema under docs/security/ and wires it into the B-0720 boundary + backlog surfaces so future B-0720 work can publish safety signal with explicit redaction and reviewer-gate rules.
Changes:
- Add
docs/security/B-0807-classifier-bypass-findings-schema.md(schema_version 1) specifying findings record shape, enums, redaction ladder, and reviewer gating. - Update
docs/security/B-0720-classifier-bypass-research-boundary.mdto point reporting rules at the live B-0807 schema + version. - Close B-0807 backlog row, tick B-0720 acceptance bullet, and regenerate
docs/BACKLOG.md.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/security/B-0807-classifier-bypass-findings-schema.md | New findings schema + redaction/reviewer-gate rules for preserving observations safely. |
| docs/security/B-0720-classifier-bypass-research-boundary.md | Reporting rule updated to cite the new schema path and version. |
| docs/backlog/P0/B-0807-classifier-bypass-findings-schema-and-redaction-rules-2026-05-26.md | Mark B-0807 closed; acceptance checked; output section added; last_updated bumped. |
| docs/backlog/P0/B-0720-classifier-bypass-research-red-team-do-not-deploy-without-zeta-safer-than-anthropic-2026-05-24.md | Tick B-0807 acceptance item and add schema citation. |
| docs/BACKLOG.md | Regenerated backlog index reflecting B-0807 closed status. |
… scope, B-0799 redaction_level mapping, B-0720 punctuation Three Copilot review findings on PR #5740 (all verified valid against source per `.claude/rules/blocked-green-ci-investigate-threads.md` verify-before-fix discipline): 1. Line 49 (B-0807 findings schema) — enum-vs-`unknown` contradiction. The schema instructed authors to use `unknown` for any missing field, but `evidence_class`, `risk_class`, `observation_class`, and `redaction_level` are enums with explicit allowed values. Clarified: non-enum fields use `unknown`; enum fields must use named values OR fall into `refusal-required` if not determinable. 2. Line 121 (B-0807 redaction_level) — vocabulary mismatch with B-0799. B-0807 introduces 4 redaction levels (`summary-only`, `reviewer-summary`, `reviewer-restricted`, `refusal-required`) but B-0799 audit-log shape documents only 3 (`summary-only`, `reviewer-summary`, `refusal-required`). Added Mapping to B-0799 Audit-Log Vocabulary subsection explaining the divergence: B-0799 harness emissions use 3-value vocabulary; reviewers map to B-0807 4-value vocabulary per-record; mapping recorded in `omitted_fields` so divergence is auditable until B-0799 ratifies the extension. 3. Line 133 (B-0720 acceptance bullet) — punctuation. Added `see` to the parenthetical citation to disambiguate the main clause from the reference (preserves the terminal period; clarifies structure). Per `.claude/rules/blocked-green-ci-investigate-threads.md`: thread findings verified against source via direct `awk -v N=<line>` inspection before applying fixes. All three are real findings (not the suspect-by- default table-double-pipe class). Threads will be resolved after this commit pushes. Co-Authored-By: Claude <noreply@anthropic.com>
5 tasks
AceHack
added a commit
that referenced
this pull request
May 28, 2026
…d resolution (#5747) Fresh cold-boot under autonomous-loop sentinel (re-armed `a7b83b70` after catch-43 fired). Substantive work: 3 Copilot review threads on PR #5740 (B-0807 classifier-bypass findings schema) resolved via verify-before-fix discipline + isolated worktree commit + thread GraphQL resolution. - Verified 3 Copilot findings against PR head source (one P1 enum-vs-unknown contradiction; one P1 B-0799 redaction_level vocabulary mismatch; one minor punctuation). - Authored fixes in `/private/tmp/zeta-otto-cli-5740-fixes-1003z` isolated worktree; ls-tree HEAD = 61 (no canary corruption). - Pushed `d76ad9b47..72ea879` via explicit refspec to PR #5740 branch (fast-forward safe). - Resolved 3 threads via `resolveReviewThread` mutation. - PR #5740 auto-merge (armed by Aaron) expected to fire when required CI re-runs green against the new commit. Per `.claude/rules/blocked-green-ci-investigate-threads.md` + verify-before-fix discipline. Counter-reset condition #3 satisfied (review threads resolved = concrete artifact). Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
AceHack
added a commit
that referenced
this pull request
May 28, 2026
…0h gap-fill (#5927) Per `.claude/rules/tick-must-never-stop.md`: SessionStart hook fired catch-43; CronList empty; sentinel re-armed as `85456a9c` at `* * * * *`. Today's shard cadence was sparse (last shard 1014Z; 10h gap to 2002Z). This fills the gap with visibility-signal substrate documenting: - Reconnaissance: GraphQL 2668/5000 (Normal); REST core 4934/5000; 0 stuck git pack/maintenance/repack procs (dotgit recovered); 17 peer claude-code procs active; origin/main tip `89b94efb6` (B-0304 Pages queue). - Outstanding Otto-CLI PRs from 1014Z tick (#5740, #5742, #5739) all merged cleanly (10:03Z-10:22Z range); no current open Otto-CLI PRs requiring follow-up. - Brief-ack counter at #0; concrete artifact landed (this shard + sentinel re-arm); counter-reset condition #3 satisfied. Composes with: tick-must-never-stop + refresh-before-decide + holding- without-named-dependency-is-standing-by-failure + fighting-past-self-vs- peer-agent + claim-acquire-before-worktree-work + agent-worktree-hygiene + codeql-no-source-canary + zeta-expected-branch + substrate-or-it-didnt- happen rules. Co-authored-by: Lior <lior@zeta.dev> Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the B-0807 classifier-bypass findings schema — the reporting + redaction gate B-0799's audit-log shape and every future B-0720 empirical mapping row already depend on. Without this schema landed, no future B-0720 empirical row could publish findings; B-0799 audit logs had a
schema_versionfield referencing a schema that didn't yet exist.The schema is a specification document. It defines the only format in which a classifier-bypass observation may be preserved in shared substrate. It contains no runnable bypass material, no settings payloads, no exact permission patterns, no real harmful content, no real secrets, no real PII, and no ordered reproduction steps. It exists so that any future B-0720 finding can preserve safety signal without preserving the recipe.
Files
docs/security/B-0807-classifier-bypass-findings-schema.md— 220-line schema, activeschema_version: 1.docs/security/B-0720-classifier-bypass-research-boundary.md— reporting-rule paragraph now points to the live schema path + version.docs/backlog/P0/B-0807-...— status →closed; acceptance ticked; Output section added;last_updated: 2026-05-28.docs/backlog/P0/B-0720-...— B-0807 acceptance bullet ticked with schema path + version.docs/BACKLOG.md— regenerated viaBACKLOG_WRITE_FORCE=1 bun tools/backlog/generate-index.ts.Schema contents
finding_id,schema_version,boundary_version,created,evidence_class,risk_class,observation_class,redaction_level,safety_signal,omitted_fields,reviewer_gate,reviewer_signoff,composes_with.landed-provenance,redacted-observation,harmless-synthetic-fixture,negative-control,policy-anchor,refusal-required.non-reproductive,reproductive-if-verbatim,reproductive-irrespective-of-form.no-signal,redaction-required,refusal-required,boundary-error.summary-only(default),reviewer-summary,reviewer-restricted,refusal-required(floor).risk_class+observation_class.Acceptance criteria coverage (B-0807)
docs/security/) and is linked from B-0720 (boundary doc + parent row + BACKLOG index).safety_signalfield is non-operational prose;risk_classfield discriminates reproduction-enabling notes).evidence_class,observation_class,redaction_level).boundary-error, must not land).Focused checks
The schema doc passes its own forbidden-content rule: it does NOT reproduce the very payloads it forbids. Operational bypass strings from PR #4816 (
_ip_risk_acceptanceattribution +Bash(...)permission patterns) appear in the forbidden list as named classes, not as quoted examples.Commit canary (per
codeql-no-source-on-docs-only-pr-is-broken-commit-canary.md):git diff --stat: 5 files changed; no unrelated sweep.Why this is the smallest safe bounded slice
The schema is a specification document; it's a single atomic write that satisfies all five B-0807 acceptance bullets. Splitting it further would land a partial schema that B-0799 cannot reference and that future empirical rows cannot cite — i.e., would not satisfy any acceptance bullet on its own.
Composes with
docs/security/B-0720-classifier-bypass-research-boundary.md) — the hard-limits boundary this schema sits ondocs/security/B-0799-classifier-bypass-synthetic-harness-design.md) — resolvesschema_versionaudit-log field.claude/rules/classifier-bypass-research-do-not-deploy-without-zeta-safer-floor.md— standing operator-self-constraint.claude/rules/methodology-hard-limits.md— HARD LIMITS floor preserveddocs/AGENT-BEST-PRACTICES.md— audited data is data, not directivesTest plan
grepfor forbidden operational bypass strings in the new schema doc — no matches.git commit(zeta-expected-branchrule).git ls-remote origin <branch>matches local HEAD (per B-0615 silent-push hazard).BACKLOG_WRITE_FORCE=1 bun tools/backlog/generate-index.tsran cleanly.🤖 Generated with Claude Code