Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Guess #001 — B-0173 hook-authoring-for-skill-creation-contracts

## Target

`docs/backlog/P1/B-0173-hook-authoring-for-skill-creation-contracts-aaron-2026-05-03.md`

The architectural choice: Aaron filed B-0173 for a "hook-authoring"
backlog row in the context of skill-creation contracts. The question
this guess answers: **what is Aaron's architectural intent for using
hooks (specifically) — vs alternatives like CI workflows, manual
review, or in-skill-body validation?**

## Read state at guess time (2026-05-03 ~02:42Z)

Otto has already read (in this session or via prior ticks):

- The B-0173 ROW NAME ONLY (from `ls docs/backlog/P1/`)
- The skill-design-rules memo
(`feedback_skills_as_carved_sentences_knowledge_in_docs_datavault_2_0_pattern_aaron_2026_05_03.md`)
— Aaron's skill-design rule 2: no dynamic commands in skills, use TS
files under tools/ and reference by path; rule 3: package skill
domains as plugins, use harness hooks for pre/post-condition
enforcement (contract-based development)
- The decision-graph emergent property memo
(`feedback_decision_graph_emergent_from_archaeologies_and_flywheel_aaron_2026_05_03.md`)
— names B-0173 as "edge-property checks at commit / merge time" + as
"edge-implementation discipline"
- The substrate-claim-checker (B-0170, shipped) — which is the kind of
TS tool Aaron's rule 2 wants

## Research deliberately AVOIDED

Otto has NOT read:

- The B-0173 row body text (only the title from `ls`)
- Any commits referencing B-0173
- Any persona notebooks discussing hook-authoring
- Any prior conversation about Claude Code hooks specifically

## Otto's in-the-moment guess

### Architectural intent (high confidence)

Aaron's primary architectural intent for hooks-as-the-mechanism:
**hooks are local-fast-feedback enforcement at the commit boundary,
which is where skill-creation iteration happens.**

Specifically:

1. **Local enforcement vs CI workflows** — CI workflows fire after PR
push; hooks fire at `git commit` time. For skill-creation iteration,
the agent authoring the skill needs immediate feedback (was the
skill body well-formed? did it break the carved-sentence
discipline?). CI delay = bad iteration UX
2. **Harness-native enforcement** — Claude Code has hook machinery
(pre-commit hooks, settings hooks, etc.). Aaron's skill-design rule 3
names "use harness hooks for pre/post-condition enforcement
(contract-based development)" — hooks compose with the harness's
native event model
3. **Separation of concerns** — the skill body says WHAT the skill
does; the hook enforces IS THIS SKILL VALID. Two different
responsibilities; two different surfaces. The skill body shouldn't
carry its own validator (that violates rule 2's no-dynamic-commands
direction)

### Substrate-content intent (medium confidence)

The hook will likely be:

- A pre-commit hook that fires when `.claude/skills/*/SKILL.md` files
are staged
- TS-implemented (per rule 2: no dynamic commands; TS files under
tools/) — concretely something like `tools/git/hooks/skill-creation-contract-check.ts`
(NOTE: I'm guessing the path here; the decision-graph memo cited
`tools/git/hooks/` so I'm using that)
- Composes with substrate-claim-checker (B-0170) — runs the same
count-drift / existence-drift / etc. checks on the skill body that
the standalone CLI tool would
- May also enforce: carved-sentence presence, frontmatter completeness,
hub-satellite-pattern markers, no-cargo-cult-references, etc.

### Specific implementation intent (lower confidence)

The hook will probably:

- Use Claude Code's `.claude/settings.json` `hooks` field to register
itself
- Take input via stdin per Claude Code's hook protocol (the hook
payload includes file paths + event type)
- Output structured result (block-or-allow + reason) per Claude Code's
hook contract
- Provide override path for emergency commits (no false-positive lock)

## Confidence levels

| Layer | Confidence | Reasoning |
|---|---|---|
| Architectural — "hooks because local-fast-feedback" | **High** | Composes naturally with Aaron's skill-design rule 3 (already first-party-confirmed) + the iteration-UX argument is self-evident |
| Substrate-content — "TS file under tools/git/hooks/" | **Medium** | Decision-graph memo cited the path; rule 2 forces TS; but the specific filename + scope is inferred not confirmed |
| Specific — "Claude Code settings.json + stdin protocol" | **Low** | Standard Claude Code hook shape but I haven't confirmed the specific event names or payload schema for this hook type |

## Ground truth (TO BE FILLED IN AFTER VERIFICATION)

(Empty at write time. Populated when Otto reads B-0173 row body OR
asks Aaron OR runs decision-archaeology on the row.)

## Calibration delta (TO BE FILLED IN AFTER VERIFICATION)

(Empty at write time. Populated alongside ground truth section.)

---

**Guess timestamp:** 2026-05-03 ~02:42Z
**Author:** Otto autonomous (architect hat)
**Protocol:** in-the-moment guess per
`memory/feedback_guess_then_verify_architectural_intent_calibration_protocol_aaron_2026_05_03.md`
Comment on lines +115 to +116
70 changes: 70 additions & 0 deletions memory/architectural-intent-guesses/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Architectural-intent guesses — Otto's in-the-moment calibration data

This directory implements the **guess-then-verify architectural-intent
calibration protocol** (Aaron 2026-05-03; canonical memo:
`memory/feedback_guess_then_verify_architectural_intent_calibration_protocol_aaron_2026_05_03.md`).
Comment on lines +4 to +5

## Purpose

Each file in this directory is an **in-the-moment guess** about Aaron's
architectural intent for a specific substrate choice — saved BEFORE Otto
researches ground truth. Per Aaron 2026-05-03 verbatim *"your inital
guess in the moment will say a lot about ottos frontier ability"*, the
in-the-moment capture is the **unique frontier-ability data point**:
uncontaminatable, can never be retrospectively replicated.
Comment on lines +7 to +14

## File schema

Each guess file uses the format `<date>-<topic-slug>.md` with this body
structure:

1. **Target** — the architectural choice being inferred (substrate path
+ line numbers if applicable; PR # if known)
2. **Read state at guess time** — what Otto had ALREADY READ before
making the guess (commit SHAs, prior memos, prior research). This
bounds the inference fairly
3. **Research deliberately AVOIDED** — what Otto did NOT consult before
guessing (the substrate that would reveal ground truth)
4. **Otto's in-the-moment guess** — the inference, with explicit
reasoning chain
5. **Confidence level** — Otto's self-reported confidence (low / medium
/ high) per layer (architectural / substrate-content / specific
implementation)
6. **Ground truth (filled in LATER)** — empty at write time; populated
when ground truth surfaces via docs archaeology / decision-archaeology
/ asking Aaron
7. **Calibration delta (filled in LATER)** — match / partial-match / off
/ unrecoverable, with analysis

## Write-time discipline

- Save the guess BEFORE consulting any of: the target file's history,
ADRs touching the topic, persona notebooks, conversation archives,
docs/research/ artifacts that reveal the intent
- The guess file's commit ITSELF is the timestamp marker
- After saving, ground-truth research is permitted; updates happen in a
SUBSEQUENT commit / PR with a **GROUND-TRUTH-RECOVERY** prefix

## Cross-model retroactive replay

Per Aaron 2026-05-03 *"you can also test othr models after the fact and
just hid the conclusions from them"*, this directory's guesses can be
retroactively replayed against other models:

1. Pick an architectural choice with both Otto's in-the-moment guess +
recovered ground truth
2. Hide the ground-truth substrate from the test model
3. Give the test model the architectural choice with the same read-state
constraints Otto had
4. Compare the test model's guess to the ground truth + Otto's guess
5. Track per-model accuracy + per-architectural-layer accuracy

The retroactive mode is reproducible across new models as they arrive.
Otto's in-the-moment mode is one-shot per architectural choice.

## Discipline reminder

If Otto reads the target's ground truth FIRST and then writes the
"guess" file, the file is **research-then-write** disguised as inference,
not authentic in-the-moment calibration data. The discipline is real
and load-bearing.
Loading