Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e2ad349
fix(setup): align tool order with analyze --ai-configs sequence
laurentftech May 9, 2026
c876e0a
fix(gate): exclude phantom decisions from no_decisions_recorded bypass
laurentftech May 9, 2026
3aad0c3
fix(decisions): replaceDecisions fixes consolidation silent drop bug
laurentftech May 9, 2026
62a3523
feat(decisions): sentinel-based --no-verify bypass detector
laurentftech May 9, 2026
d9e768a
feat(decisions): gate blocks on approved-not-synced decisions
laurentftech May 9, 2026
b713256
feat(decisions): always write ADR for every synced decision
laurentftech May 9, 2026
8e02c58
test(analysis): cover inventory handlers, getMinimalContext, getClust…
laurentftech May 9, 2026
e1db956
fix(decisions): address review findings from PR #78
laurentftech May 9, 2026
72789e1
fix(decisions): address review findings — audit trail, state guards, …
laurentftech May 10, 2026
06edd25
fix(decisions): use IDs as traceability anchor in consolidation
laurentftech May 10, 2026
ae65cb3
docs(agents): restore gate-blocked reason handler docs
laurentftech May 10, 2026
07c9996
feat(decisions): purge inactive decisions from store after sync
laurentftech May 10, 2026
aad9169
feat(decisions): extend RAG and orient with decision context
laurentftech May 10, 2026
0104502
docs: update decisions gate, RAG, and orient documentation
laurentftech May 10, 2026
d3c4273
fix(decisions): use earliest superseded recordedAt for merged decisions
laurentftech May 10, 2026
75c13c6
feat(decisions): add decision scope tiers to gate ADR creation
laurentftech May 10, 2026
336bfd7
docs(readme): fix orphaned sentence, update test count, add decision …
laurentftech May 10, 2026
a424029
fix(watcher): replace glob ignored with function to prevent EMFILE on…
laurentftech May 10, 2026
80f34be
fix(deps): replace better-sqlite3 with node:sqlite built-in
laurentftech May 11, 2026
91b21c6
docs: bump Node.js requirement to 22.5+ (node:sqlite built-in)
laurentftech May 11, 2026
f0264a5
docs: document node:sqlite experimental warning in Known Limitations
laurentftech May 11, 2026
24704fb
ci: bump Node.js to 22 (required for node:sqlite built-in)
laurentftech May 11, 2026
80d347c
ci: use Node 24 LTS — no node:sqlite experimental warning
laurentftech May 11, 2026
ad23576
feat(mcp): add --minimal profile and tool annotations for token effic…
laurentftech May 11, 2026
dd29679
feat(orient): add suggestedTools field for portable tool discovery
laurentftech May 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:

- uses: actions/setup-node@v4
with:
node-version: '20'
node-version: '24'
cache: 'npm'

- run: npm ci
Expand All @@ -40,7 +40,7 @@ jobs:

- uses: actions/setup-node@v4
with:
node-version: '20'
node-version: '24'
cache: 'npm'

- run: npm ci
Expand All @@ -56,7 +56,7 @@ jobs:

- uses: actions/setup-node@v4
with:
node-version: '20'
node-version: '24'
cache: 'npm'

- run: npm ci
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jobs:

- uses: actions/setup-node@v4
with:
node-version: '20'
node-version: '24'
cache: 'npm'

- run: npm ci
Expand Down Expand Up @@ -43,7 +43,7 @@ jobs:

- uses: actions/setup-node@v4
with:
node-version: '20'
node-version: '24'
cache: 'npm'
registry-url: 'https://registry.npmjs.org'

Expand Down
13 changes: 6 additions & 7 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,36 +340,35 @@ record_decision({
})
```

Decisions are consolidated and cross-checked against the git diff before each commit.
Decisions are consolidated in the background immediately after `record_decision` is called — the pre-commit gate reads the already-consolidated store and adds no LLM latency.

**Performance note**: if you skip `record_decision`, the gate detects unrecorded source changes at commit time and triggers a slow LLM extraction on the *next* commit (~10-30s). Calling `record_decision` proactively keeps every commit instant.

## When git commit is blocked by the decisions gate

If `git commit` fails and the output is JSON with `"gated": true`, do NOT approve decisions yourself. Always present them to the user first.

If `git commit` fails and the output is JSON with `"gated": true`, do NOT retry silently.
Check the `reason` field and act accordingly:

**`reason: "verified"` — decisions await review:**
Present each decision to the user:
> "The commit is blocked — I found N architectural decision(s) to validate:
> 1. **[id]** Title — rationale
> Do you approve? (yes/no)"

Do you approve? (yes/no)"
For each approval call `approve_decision`, for rejections call `reject_decision`.
Then run `spec-gen decisions --sync` and retry `git commit`.

**`reason: "approved_not_synced"` — decisions approved but not written to specs:**
Run `spec-gen decisions --sync` then retry `git commit`. Do not skip this step.

**`reason: "drafts_pending_consolidation"` — drafts were recorded but not yet consolidated:**
Present to the user:
> "N decision draft(s) were recorded but never consolidated. Run consolidation now? (~10-30s)"

If yes: run `spec-gen decisions --consolidate --gate` and handle the result.
If no: retry with `git commit --no-verify` to skip the gate.

**`reason: "no_decisions_recorded"` — source files staged but nothing recorded:**
Present to the user:
> "Source files are staged but no architectural decisions were recorded. Run fallback extraction to check for undocumented decisions? (~10-30s)"

If yes: run `spec-gen decisions --consolidate --gate` and handle the result.
If no: retry with `git commit --no-verify` to skip the gate.
<!-- end-spec-gen-decisions-instructions -->
5 changes: 4 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,13 @@ Check the `reason` field and act accordingly:
Present each decision to the user:
> "The commit is blocked — I found N architectural decision(s) to validate:
> 1. **[id]** Title — rationale
> Do you approve? (yes/no)"
Do you approve? (yes/no)"
For each approval call `approve_decision`, for rejections call `reject_decision`.
Then run `spec-gen decisions --sync` and retry `git commit`.

**`reason: "approved_not_synced"` — decisions approved but not written to specs:**
Run `spec-gen decisions --sync` then retry `git commit`. Do not skip this step.

**`reason: "drafts_pending_consolidation"` — drafts were recorded but not yet consolidated:**
Present to the user:
> "N decision draft(s) were recorded but never consolidated. Run consolidation now? (~10-30s)"
Expand Down
27 changes: 17 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# spec-gen

**Persistent architectural memory for AI coding agents.**
**Persistent architectural memory and structural cognition for AI coding agents.**

spec-gen turns any codebase into a navigable knowledge graph backed by [OpenSpec](https://github.com/Fission-AI/OpenSpec) living specifications. It extracts and maintains specs, detects spec/code drift, gates architectural decisions, and exposes everything through graph-native MCP tools — so agents start every session already knowing the codebase instead of re-discovering it.
spec-gen turns any evolving codebase into a navigable knowledge graph backed by [OpenSpec](https://github.com/Fission-AI/OpenSpec) living specifications. It maintains persistent architectural context across agent sessions: graph structure, specs, decisions, drift state, and semantic retrieval — so agents start each task already oriented instead of re-discovering the system from file reads.

---

Expand All @@ -15,7 +15,9 @@ AI agents are powerful but amnesiac. On every new task:
- They have no link between specs and code — drift is invisible
- File-by-file navigation often burns **15,000–50,000 tokens** per orientation pass, before a single line of useful code is written

spec-gen closes this loop. Run a full analysis once, then keep the graph incrementally updated during development. Wire two files into your agent's context — every subsequent session starts informed.
spec-gen closes this loop. Run a full analysis once, then keep the graph incrementally updated as the codebase evolves. Even greenfield projects become cognitively "brownfield" after only a few agent sessions — architectural context fragments, decisions disappear, and agents repeatedly reconstruct the same understanding from scratch.

spec-gen persists that context continuously: structure, specs, decisions, drift state, and graph relationships remain queryable across sessions.

---

Expand All @@ -29,7 +31,7 @@ Three layers, each usable independently:
| **2. Spec Layer** | LLM-generated living specs, ADRs, drift detection, decision gates | For generation |
| **3. Agent Runtime** | 45 MCP tools — `orient()`, semantic search, graph expansion | No |

You can use layer 1 alone to give agents structural context. Add layer 2 for spec coverage. Layer 3 is always-on once `spec-gen mcp` is running.
You can use layer 1 alone to give agents structural context. Add layer 2 for semantic intent and architectural governance through OpenSpec-compatible living specifications. Layer 3 keeps that context continuously accessible through graph-native MCP tools once `spec-gen mcp` is running.

---

Expand All @@ -43,6 +45,7 @@ You can use layer 1 alone to give agents structural context. Add layer 2 for spe
| Offline structural analysis | ❌ | ❌ | ✓ |
| Token-efficient orient() | ❌ | ❌ | ✓ ~1–3k vs 15–50k tokens |
| Living spec generation | ❌ | ❌ | ✓ |
| Persistent cross-session architectural memory | ❌ | Partial | ✓ |

Traditional coding agents reconstruct architecture from repeated file reads every session. spec-gen persists it as a queryable graph.

Expand All @@ -62,7 +65,7 @@ spec-gen mcp # start MCP server

Then ask your agent: **`orient("add a new payment method")`**

That single call returns the relevant functions, their call neighbours, matching spec sections, and insertion-point candidates — in one round-trip instead of a dozen file reads, costing ~1,000 tokens instead of ~30,000.
That single call returns the relevant functions, their call neighbours, matching spec sections, and insertion-point candidates — preserving architectural continuity across sessions instead of forcing the agent to repeatedly reconstruct context from raw file reads. In practice, this often reduces orientation cost from ~30,000 exploratory tokens to ~1,000 targeted tokens.

**Full pipeline** (specs + decisions — optional and additive):

Expand Down Expand Up @@ -142,7 +145,7 @@ One graph query replaces most exploratory file reads. The agent knows exactly wh

**Analyze** (no API key)

Scans your codebase with pure static analysis. Builds a full call graph persisted to SQLite, runs label-propagation community detection to cluster tightly coupled functions, computes McCabe cyclomatic complexity for every function, and extracts DB schemas, HTTP routes, UI components, middleware chains, and environment variables. Outputs `.spec-gen/analysis/CODEBASE.md` — a ~600-token structural digest that compresses the equivalent of tens of thousands of exploratory tokens into a small, queryable summary.
Continuously maintains a structural representation of your codebase using pure static analysis. Builds a full call graph persisted to SQLite, runs label-propagation community detection to cluster tightly coupled functions, computes McCabe cyclomatic complexity for every function, and extracts DB schemas, HTTP routes, UI components, middleware chains, and environment variables. Outputs `.spec-gen/analysis/CODEBASE.md` — a ~600-token structural digest that compresses the equivalent of tens of thousands of exploratory tokens into a small, queryable summary.

With `--watch-auto`, the call graph updates incrementally on every file save: changed file and its direct callers are re-parsed and the graph is atomically swapped. Orient and BFS queries remain live between full analyze runs.

Expand All @@ -156,18 +159,21 @@ Compares git changes against spec mappings in milliseconds. Detects: Gap (code c

**MCP** (no API key)

45 graph-native tools exposed over stdio. `orient()` is the main entry point — one call replaces 10+ file reads. `detect_changes` risk-scores changed functions using call graph centrality × change type multiplier. See [docs/mcp-tools.md](docs/mcp-tools.md).
45 graph-native tools exposed over stdio. Together they act as a persistent architectural runtime for coding agents: orientation, graph traversal, semantic retrieval, drift awareness, decision context, and structural risk analysis.
`orient()` is the main entry point — one call replaces 10+ file reads. `detect_changes` risk-scores changed functions using call graph centrality × change type multiplier. See [docs/mcp-tools.md](docs/mcp-tools.md).

`orient()` runs in **~430µs p50** against a 15k-node codebase (TypeScript compiler, ~79k edges). Full benchmark results: [scripts/BENCHMARKS.md](scripts/BENCHMARKS.md).

**Decisions** (API key for consolidation)

Agents call `record_decision` before writing code. Consolidation runs immediately in the background. At commit time, a pre-commit hook gates the commit until all verified decisions are reviewed and written back as requirements in `spec.md` files.
Agents call `record_decision` before writing code. Consolidation runs immediately in the background. At commit time, a pre-commit hook gates the commit until all verified decisions are reviewed and written back as requirements in `spec.md` files. Decisions are classified by scope (`local / component / cross-domain / system`); only `cross-domain` and `system` decisions produce ADR files, keeping the decision log signal-dense.

---

## Architecture

OpenSpec provides semantic intent and workflow structure. spec-gen maintains the evolving implementation as a continuously queryable architectural graph for agents.

```
Codebase
Expand Down Expand Up @@ -221,12 +227,13 @@ The graph and the OpenSpec spec layer are co-equal: the graph makes orientation
- **LLM spec quality varies**: generated specs reflect the model's understanding. Review sections covering complex business logic before treating them as authoritative.
- **Embedding is optional**: without an embedding endpoint, `orient` and `search_code` fall back to BM25 keyword search (still useful, less accurate for semantic queries).
- **Large monorepos**: `spec-gen analyze` on large codebases may take several minutes. Graph storage itself has no practical limit — the pipeline (AST parsing, symbol extraction) is the bottleneck.
- **`node:sqlite` experimental warning on Node 22**: Node.js 22 prints `ExperimentalWarning: SQLite is an experimental feature` to stderr. The warning is gone on Node 24+. Suppress on Node 22 with `NODE_NO_WARNINGS=1 spec-gen analyze`.

---

## Requirements

- Node.js 20+
- Node.js 22.5+
- API key for `generate`, `verify`, and `drift --use-llm`:
```bash
export ANTHROPIC_API_KEY=sk-ant-... # default provider
Expand All @@ -245,7 +252,7 @@ The graph and the OpenSpec spec layer are co-equal: the graph makes orientation
```bash
npm install
npm run build
npm test # 2580+ unit tests
npm test # 2660+ unit tests
npm run typecheck
```

Expand Down
30 changes: 30 additions & 0 deletions docs/agent-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,36 @@ Wire the generated digest into your agent's context:
`search_code` · `suggest_insertion_points` · `get_spec <domain>` · `search_specs` · `analyze_impact` · `get_function_body` · `get_function_skeleton`
```

**Claude Code — MCP config (token-efficient two-server setup)**

MCP clients load all tool schemas at session start. With 45 tools, this costs ~8–77k tokens before any work begins. Claude Code supports `alwaysLoad: false` (deferred, default) — tools load only when the agent searches for them via Tool Search.

The recommended setup uses two server entries: one always-visible core server and one deferred full server:

```json
{
"mcpServers": {
"spec-gen-core": {
"type": "stdio",
"command": "spec-gen",
"args": ["mcp", "--minimal"],
"alwaysLoad": true
},
"spec-gen": {
"type": "stdio",
"command": "spec-gen",
"args": ["mcp"],
"alwaysLoad": false
}
}
}
```

- **`spec-gen-core`** exposes 5 tools always visible in context (~500 tokens): `orient`, `search_code`, `record_decision`, `detect_changes`, `check_spec_drift`. These are the tools most likely to be called at session start.
- **`spec-gen`** exposes all 45 tools deferred — loaded on demand when the agent uses Tool Search (e.g. "find tool for BFS graph traversal").

If you only need one server entry, use `alwaysLoad: false` (the default) with the standard `spec-gen mcp` command — all tools are deferred and searchable via Tool Search.

**Cline / Roo Code / Kilocode** — create `.clinerules/spec-gen.md`:

```markdown
Expand Down
11 changes: 11 additions & 0 deletions docs/ci-cd.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,17 @@ spec-gen setup --tools claude # Install (also installs Claude Code skill
spec-gen decisions --uninstall-hook # Remove decisions hook only
```

When the gate blocks, the JSON output includes a `reason` field:

| Reason | Meaning | Action |
|--------|---------|--------|
| `verified` | Decisions consolidated and verified — await human review | Present to user, call `approve_decision` / `reject_decision`, then `--sync` |
| `approved_not_synced` | Decisions approved but not written to specs yet | Run `spec-gen decisions --sync`, retry commit |
| `drafts_pending_consolidation` | Drafts recorded but consolidation never ran | Run `spec-gen decisions --consolidate --gate` |
| `no_decisions_recorded` | Source files staged but no decisions recorded | Run `spec-gen decisions --consolidate --gate` for fallback extraction |

The gate uses a sentinel file (`.git/SPEC_GEN_GATE_RAN`) written by the pre-commit hook and checked by the post-commit hook. If a commit bypasses the gate via `--no-verify`, the post-commit hook detects the missing sentinel and logs a warning.

**How they relate**: they address different failure modes and do not substitute for each other.

The decisions gate asks: *"has this architectural choice been reviewed by a human?"* It operates on decisions recorded during development — it has no knowledge of which spec files cover which source files.
Expand Down
7 changes: 5 additions & 2 deletions docs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,12 +140,15 @@ spec-gen setup # workflow skills
spec-gen decisions [options]
--list # List decisions, optionally filtered by --status
--status <status> # Filter by status: draft, consolidated, verified, approved, synced, phantom
--approve <id> # Approve a decision by ID
# Note: synced/rejected/phantom are purged from store after --sync
--approve <id> # Approve a decision by ID (blocked if already synced)
--reject <id> # Reject a decision by ID
--reason <text> # Rejection reason (used with --reject)
--sync # Write approved decisions into specs and ADRs
--sync # Write approved decisions into specs and ADRs, then purge inactive entries
--dry-run # Preview sync without writing files
--gate # Run commit gate check (reads pending.json, no LLM — used by pre-commit hook)
# Gate reason codes: verified | approved_not_synced |
# drafts_pending_consolidation | no_decisions_recorded
--consolidate # Manually trigger LLM consolidation + diff verification of drafts
--json # Machine-readable output
--uninstall-hook # Remove decisions pre-commit hook (install via: spec-gen setup --tools claude)
Expand Down
Loading
Loading