feat(agentic-org): git-as-database-and-event-store + observe.ts keystone + constitution gate + metrics/review board#6071
Conversation
…frontmatter docs Instantiate operator design ideas 2/5/6 as the observe.ts keystone and document ideas 1/3/4/7/8 against existing substrate (verify-existing-substrate first; reuse not reinvent). Code (ideas 2,5,6): - packages/application/src/observe.ts: explicit DUs (RunScope, RunLifecyclePhase, ObserveResult = Result<T,TFeedback> as a two-variant DU, ComposerSelection, DecideResult). Pure observe() readout = current state + legal options at varying scopes via an explicit phase->options table + deterministic rules (visibility of which rules ran is first-class). EphemeralComposerPort is memoryless by contract; decide() rejects any selection outside the readout so the composer cannot escape the rules. ZetaIdDecimal branded run-id type (ideas 7,8 seam). - 8 new tests; full package suite 271 green; typecheck clean for the new files (remaining 8 errors are pre-existing @nats-io missing-dep in apps/workers, untouched). Docs (ideas 1,3,4,7,8): - OBSERVE_COMPOSER_AND_RUN_STATE.md: keystone design + >=3-agent constitution ratification gate (composes with governance + multi-oracle BFT). - GIT_COCKROACH_SYNC_AND_ZETAID_ADDRESSING.md: reuse existing tri-language ZetaId as the git-as-db decimal index; collision policy; generic bidirectional converter. - DOC_FRONTMATTER_CONVENTION.md: pointer-graph frontmatter schema; adopted by the two docs above; README wired. All 17 frontmatter pointers verified resolving. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ZetaId CRDT Operator vision 2026-05-29: git is the database AND the event store. A markdown file is a row, frontmatter is the SQL-derived typed schema/columns + fk graph edges, events are ZetaId-keyed files that merge conflict-free as a G-Set CRDT, state is a timestamp-ordered fold, CockroachDB is a rebuildable query index. New package packages/frontmatter-db (26 tests; full suite 297 green; typecheck clean for new files): - schema.ts: ColumnType DU (zeta_id/text/int/bool/timestamp/enum/fk/fk_array) with payload-bearing variants explicit (enum.values, fk.references); TableSchema, FrontmatterRow, edge/pk helpers. - sql-to-schema.ts: CREATE TABLE -> TableSchema (PRIMARY KEY->zeta_id, CHECK IN-> enum, REFERENCES->fk, TYPE[] REFERENCES->fk_array, NOT NULL->required); explicit feedback on non-DDL (Result<T,TFeedback> as two-variant DU). - event.ts + crdt-log.ts: ZetaId-decimal event records; timestamp read from the id's 48-bit field; G-Set log keyed by unique id; mergeLogs union proven commutative/associative/idempotent (the CRDT join laws => conflict-free merge). - project.ts: deterministic (timestamp,id)-ordered fold to rows; upsert=LWW, retract=tombstone (Z-set/retraction-native); project(merge(a,b))==project(merge(b,a)) convergence proven. - validate.ts: row vs schema (enum range, fk shape, required, unknown-column). - traverse.ts: fk/fk_array columns as graph edges; neighbors() resolves against a ZetaId-keyed store (same mechanism as the doc composes_with graph). Docs: - Rewrote GIT_COCKROACH_SYNC_AND_ZETAID_ADDRESSING.md to the frontmatter-native + event-store-CRDT model (supersedes the prior JSON-per-aggregate draft); status v0; code_anchors point at the tested files; notes git's native object-DB shape (Linus) and why ZetaId (stable/time-ordered) is the key vs content SHA. - DOC_FRONTMATTER_CONVENTION.md: frontmatter's two unified roles (doc-graph + db-row/schema) share one traversal mechanism. - README wired. All frontmatter pointers verified resolving. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oach sync, >=3-agent constitution gate Three deliverables for git-as-database-and-event-store, built by an isolated 3-agent workflow then integration-gated inline (full suite 318 green, up from 291; real tsc clean for all new files; only the pre-existing 8 @nats-io missing-dep errors in apps/workers remain). frontmatter-db (new files): - frontmatter-codec.ts: self-contained YAML-frontmatter parse/serialize for our FrontmatterValue set (string/number/boolean/string[]) + markdown body. Lossless round-trip: number-looking strings quoted so they parse back as strings; arrays as [a, b, c]; explicit ParseResult DU (missing_frontmatter / unterminated_ frontmatter / malformed_line). rowToDocument/documentToRow bridge to FrontmatterRow. - schema-to-sql.ts: emitCreateTable(schema) — inverse of sql-to-schema; round-trip verified (parseCreateTable(emitCreateTable(s)) reconstructs the columns). - sync.ts: the generic git<->cockroach loop as pure functions over injected ports (GitEventSource/IndexRowSink/IndexRowSource/GitEventSink/IdGenerator). syncGitToIndex folds the event log via project() and upserts rows + tombstone- deletes ids no longer projected; syncIndexToGit emits one Upsert event per changed row (row_missing_id feedback when a row lacks its pk). Explicit SyncDirection + Result-as-DU. governance (new files): - constitution-gate.ts: evaluateConstitutionRatification — the >=3-agent gate as a pure function. State precedence is explicit: any objection -> Rejected; else distinct agree-agents >= quorum (DEFAULT_CONSTITUTION_QUORUM=3) -> Ratified; else >=1 agreement -> Gathering; else Proposed. Distinct-agentId set means one agent agreeing twice counts once (no self-amplification). Self-contained (no vote-tally dependency — that module does not exist in the repo). Wiring + docs: exported all three from their package index.ts (agents were isolated from barrels to avoid races). Updated GIT_COCKROACH_..._ADDRESSING.md (Layer 5 + Status now reflect the built codec/emitter/sync + code_anchors) and OBSERVE_COMPOSER_AND_RUN_STATE.md (constitution gate now implemented, not design; code_anchor added). All frontmatter pointers verified resolving. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…antitative metrics + 3-agent review board + MCP tool interface Builds the real I/O edges behind the sync ports and the two-layer metrics system. Full suite 346 green (was 318); real tsc clean for all new files; pre-existing 8 @nats-io errors in apps/workers unchanged. frontmatter-db — adapters + reconcile (real edges behind the tested pure core): - event-codec.ts: FrontmatterEvent <-> markdown file. Event metadata under reserved $-prefixed frontmatter keys so it can't collide with field columns; reuses the row codec so quoting/round-trip rules are identical. - git-fs-adapter.ts: filesystem-backed GitEventSource+GitEventSink over events/<table>/<ZetaIdDecimal>.md. async load() snapshots into memory so the sync ports stay synchronous; appends buffer; async flush() writes. Testable via an injected EventFileSystem (no node:fs dependency in the core). - cockroach-row-sink.ts: in-memory IndexRowSink+IndexRowSource reference impl (the rebuildable index) with change-tracking; SQL host is a // TODO. - reconcile-worker.ts: runOnce() cycle mirroring worker-host.ts with lane-tagged failures + explicit status DU. Ordering bug caught by test and fixed: index->git runs BEFORE git->index so a row written only to the index this cycle becomes an event before the projection diff — otherwise git (canonical) would tombstone-delete it. 13 tests. metrics (new package) — operator's two-layer idea 4: - code-metrics.ts: quantitative "coverage for structure" — longest function / longest class (god-object detection) / file length / max nesting, each breach an explicit MetricFinding DU on metric+severity. - review-board.ts: the qualitative >=3-agent board. A CandidateFinding is adopted only when >= quorum DISTINCT reviewers agree (one agent voting thrice counts once); quorum-agree AND quorum-disagree -> Contested (escalate); < quorum reviewers -> feedback. Same multi-oracle agreement shape as the constitution gate, applied to review findings (restated, not cross-imported, per package boundary). Reviewers vote along correctness/solid/architecture/perf/testing. - mcp-tools.ts: MCP tool INTERFACE only — METRICS_TOOL_DESCRIPTORS + dispatchMetricsTool(name,args) pure router returning an explicit MetricsToolResult DU. Server hosting is a // TODO(mcp-host) per the operator. 15 tests. Docs: new METRICS_AND_REVIEW_BOARD.md (status v0); GIT_COCKROACH_..._ADDRESSING.md updated (adapters/reconcile now built + ordering rationale + new code_anchors); README wired; all frontmatter pointers verified resolving. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…riority #2) The single authoritative mapping of work-item state so Work OS / V0 enum / UI column / event name / gate owner / observe.ts no longer diverge. North Star names this as the gate on adding more commands, so it is the first slice. packages/domain/src/state-reconciliation.ts (14 tests; full suite 360 green; tsc clean): - StateReconciliationRow per real WorkItemState (8), held as a Record<WorkItemState, Row> so the mapping is COMPILE-EXHAUSTIVE (adding a state is a type error until a row is supplied — OCP). eventName is the real AgenticEventType "work_item.state_changed"; gateOwner is an explicit GateOwner DU (none/eng-manager/code-reviewer/qa-reviewer/release-manager). - RUN_PHASE_FOR_STATE: binds each WorkItemState to its observe.ts RunLifecyclePhase string (held literally so domain does not depend on application; test asserts coverage). The seam slice 4 uses to drive observe() from real work-item state. - typeSpecificRulesFor(type): explicit TypeSpecificRule DU overlay for the defect rules (no-skip-intake, triage-evidence, assigned-engineer+schedule), mirroring assertDefectTransitionRequirements; generic transitions stay in the state machine. Built + reviewed through the 3-lens review board (correctness/SOLID/architecture). Adopted finding S-1: the reconciliation set was a plain array (documented but not compile-enforced); converted to a keyed Record so exhaustiveness is compiler-checked. Doc: STATE_RECONCILIATION.md (status v0); README wired; pointers verified. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Star priority #4) Re-scoped from the planned "decide_gate": grounding showed gate decisions ALREADY exist (record_quality_gate_evaluation + QualityGateOutcome approved/rejected/ changes_requested/waived), so decide_gate would have duplicated substrate and re-created the divergence slice 1 removed. The real North-Star-#4 gap is that the domain declares 6 SupervisorTriageActionType values but the handler implemented only OpenWorkItem and rejected the rest as UnsupportedActionType. packages/application/src/triage-action-resolver.ts (7 tests; full suite 367 green; tsc clean): - TriageActionRequest: explicit DU per action with its action-specific inputs. - resolveTriageAction(request): pure classifier -> ResolvedTriageAction DU. Adds the two no-new-migration actions on top of OpenWorkItem: AnswerDirectly (answer in place; feedback if blank) and EscalateToNextSupervisor (route up the chain; feedback if target/reason missing). The three actions that need security/schedule/platform substrate (RequestSecurityReview, ScheduleDiscussion, RouteToInternalPlatform) resolve to an explicit Deferred outcome — a VISIBLE gap, not a silent UnsupportedActionType rejection, per the North Star convergence discipline. Built + reviewed through the 3-lens board. Adopted finding S2-1: collapsed a redundant default-branch duplicate return into a single return + a defensive assertDeferredAction guard (DEFERRED_ACTIONS as the runtime witness of the type-narrowed set). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…v0 (North Star priority #5) packages/application/src/graph-projection.ts (5 tests; full suite 372 green; tsc clean): - projectOrganizationGraph(records) -> OrganizationGraph: typed GraphNode (work_item| discussion_anchor|decision) + GraphEdge (anchored_to|decided_in|follows_up) DUs. Edges derived from the real fk fields: DiscussionAnchor.workItemId, DecisionRecord. discussionAnchorId, DecisionRecord.followUpWorkItemIds[]. Nodes deduped by (kind,id). - decisionsForWorkItem(graph, id): the canonical North Star retrieval ("all decisions for this work item") via a two-hop traversal (work item <- anchors <- decisions). - neighborsByEdge: generic outgoing-by-edge-kind traversal helper. Design note (review finding A3-1): deliberately mirrors the fk-as-edge CONCEPT from frontmatter-db/traverse.ts rather than importing it — traverse.ts operates on FrontmatterRow/TableSchema (git-as-db rows), these are domain records; forcing the reuse would couple application->frontmatter-db and require row conversion for no gain. Built + reviewed through the 3-lens board (correctness/SOLID/architecture); approve, no code-change findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The seam slice 1 was built for: turns a real WorkItem into the RunSnapshot that observe() is pure over, using RUN_PHASE_FOR_STATE to map WorkItemState -> observe RunLifecyclePhase, then runs observe() to get the legal next options. Proves the keystone end to end on real domain records, not synthetic snapshots. packages/application/src/observe-work-item.ts (5 tests; full suite 377 green; tsc clean): - snapshotForWorkItem(workItem, facts): pure WorkItem -> RunSnapshot. Narrows the domain's phase string to the RunLifecyclePhase DU at the boundary (domain holds phase strings literally so it doesn't depend on application); explicit phase_unmapped feedback variant keeps the seam honest. - observeWorkItem(workItem, facts, deps): compose snapshot + observe(); clock injected. - ready->composing, in_progress->executing (options include submit_evidence), done->completed (terminal -> observe feedback); gate/evidence facts plumb through. Built + reviewed through the 3-lens board. Caught + fixed during build: a hardcoded new Date(0) clock -> injected ObserveWorkItemDeps.clock (SRP/testability). Approve. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the loop: the qualitative >=3-agent review board (packages/metrics) becomes
a real quality gate instead of a standalone function. A review-class gate decision
is the board's adopted findings, not one reviewer's call.
packages/application/src/review-gate.ts (5 tests; full suite 382 green; tsc clean):
- evaluateReviewGate({findings, votes, quorum?}): runs evaluateReviewBoard then maps
the board outcome to a domain QualityGateOutcome recommendation:
no finding reached quorum -> Approved
adopted major/blocking finding -> Rejected
adopted minor/info only -> ChangesRequested
< quorum reviewers -> feedback (board could not convene)
Waived is intentionally not produced (waiver is a human authority decision).
- Boundary: lives in application (composes domain QualityGateOutcome + metrics board),
keeping metrics dependency-free and domain unaware of the board.
Built + reviewed through the 3-lens board. Adopted finding S5-1: dropped a
speculative unused FindingDecisionState re-export (YAGNI; callers import from metrics).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…h Star refresh) - Added YAML frontmatter status to all 30 previously-bare docs (design / index) per DOC_FRONTMATTER_CONVENTION.md; all 34 docs now carry status. Pointers verified. - North Star refresh: documented the substrate + slices landed this arc (frontmatter-db git-as-DB+event-store+CRDT+sync+reconcile; observe.ts keystone wired to real work-item state; constitution gate; metrics+review-board+review-gate; slice 1 reconciliation table; slice 2 triage actions; slice 3 graph projection), with honest addressed-vs-deferred status per North Star priority. Reviewed through the accuracy/North-Star lens. Finding D6-1 (honest scoping): capability-request drift (#1) was NOT blanket-edited — grounding showed the docs use the term in correct supervisor-chain context; the canonical framing already lives in the North Star. Fabricating edits to non-broken docs would reduce accuracy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Forward-signal (Otto-CLI background triage — not pushing to this draft branch; it's yours, Max): The single failing required check is Cause: line-wrap artifact, not a real heading. Lines 48–49 read: The paragraph wrapped so Fix options (prose is yours; pick whichever reads best):
This is the only blocker on the gate. Holler if you'd rather I land the one-liner for you — I left it untouched since it's a draft. |
|
Pushed a one-line fix for the sole blocking check ( Help-not-shame per — Otto-CLI (background worker) |
…st lane, tsc 0 errors The code imported @nats-io/jetstream + @nats-io/transport-node (nats.js v3 modular packages) but they were never declared/installed, so every apps/workers test crashed at module-load (a re-exported nats adapter in apps/workers/src/index.ts), and tsc reported 8 module-resolution errors. pg is the runtime-injected cockroach driver (PgCockroachDriverModuleName). Added to package.json + npm install: @nats-io/jetstream ^3.4.0 (current latest, per npmjs 2026-05), @nats-io/transport-node ^3.4.0, pg ^8.13.1. node_modules is gitignored; package-lock.json committed. Result: tsc 0 errors across the whole project (was 8); full suite 451 tests, 447 pass, 0 fail, 4 skipped (env-gated live Cockroach/NATS integration). The apps/workers lane (+69 tests) now executes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…veness control plane) The operator's #1 tenet: enough determinism to drive the organization AND the agents to stay alive, with autonomy left to the agents. This is the deterministic control plane (the SchedulerWorker/TriggerWorker/LeaseReaper shape from ALWAYS_ON_ORCHESTRATION_RUNTIME, which had ZERO code before this). packages/keepalive (13 tests; tsc clean; full suite 464 green): - keepalive.ts: evaluateKeepAlive(snapshot) — PURE deterministic engine. Every tick emits a heartbeat (the org keeps proving it is alive even when idle); detects a flatlining org (age > deadline), stale agents (per-agent, no collapse), and expired leases (expiresAt <= now); converts each into an explicit KeepAliveAction DU. It NEVER decides what work agents do — ReassignStaleWork only FLAGS a stale agent's work for agent-decidable follow-up. Control plane = THAT motion happens; data plane (observe.ts + Hermes) = WHAT work. Boundary policy (> vs <=) pinned by tests + documented. - keepalive-lane.ts: createKeepAliveLane — the runOnce() loop (snapshot -> evaluate -> apply via injected sink). Source/sink failures are CAPTURED as lane failures, never thrown: the org heartbeat must not die because one apply failed. Mirrors the worker-host lane-failure discipline. TDD: tests written red first, then impl to green. Reviewed by an adversarial subagent (correctness/SOLID/North-Star). Adopted findings: F6 (major) replaced a stringly-typed `=== "flatlining"` magic value with OrgLiveness.Flatlining (the repo's IMPLICIT-NOT-EXPLICIT class error); F1+F2 added the age==deadline and lease-expires-now boundary tests + doc (off-by-one is the #1 control-plane bug). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…estration (the autonomous data plane)
The data plane the keep-alive control plane watches. TDD (red->green) + adversarial
subagent review at the checkpoint. tsc 0; full suite 480 (476 pass, 4 env-skipped).
packages/hermes (6 tests) — Hermes runtime port + in-process simulated adapter
(V0_EXECUTABLE_CONTRACT step 10 explicitly allows a simulated adapter). launchRun
binds {workItem, agent, session, hatAssignment, promptFlowRun}; runs emit heartbeats
(keep-alive reads these for staleness) and terminate Completed/Failed. Explicit
HermesRunState DU; Result-as-DU; terminal-state guard; clock+id injected. A real
k3s/bubblewrap session adapter implements the same port.
packages/memory (5 tests) — Hindsight memory port + in-process adapter. retain/recall/
reflect attributed by {agent, hat, project, workItem, run}; recall is project-SCOPED
(no cross-project leak); attribution is STICKY (original author preserved on recall by
another hat).
packages/application/orchestrate-run.ts (5 tests) — the composition where control plane
meets data plane: launch run -> recall scoped context -> heartbeat -> retain learned ->
complete with evidence. Sequences plumbing only; makes NO work-selection decision for
the agent (autonomy preserved). A completed run's heartbeat marks the agent Alive to
the keep-alive engine (proven by a test feeding the run into evaluateKeepAlive).
Adopted review findings: #2 (MAJOR) hermes getRun/heartbeat/complete returned shallow
copies leaking live binding/outcome refs -> added snapshotRun() deep copy (defensive
copy now real); #5 orchestrate now checks the heartbeat Result instead of swallowing it;
#8 OrchestrationFeedbackReason preserves the Hermes reason DU instead of widening to string.
Type-safety fix caught by tsc during build: requireRunning discriminated on the
"outcome" field which collides with HermesRun.outcome -> explicit { ok } tagged result.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
org-runtime.ts runOrgCycle: one cycle runs executive+director prioritization → RMO hat-supply voting → hat assignment+binding → the 7-gate pipeline (discovery →release) → binding lifecycle (warmup→active→expire→succession), emitting events attributed to actors at EVERY hierarchy level and persisting them via injected stores. Determinism picks the legal moves; agent choosers pick outcomes. 4 tests prove: a customer goal reaches Merged (all 7 gates), events at every level Executive Board→IC, bindings staffed + expiry + succession observed, and a rich attributed trace with all 7 event kinds. tsc 0, 613 tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… merged runOrgCycle ties every layer together in one run: Executive Board + C-suite + Director prioritization, RMO hat-supply voting, hat assignment + binding, the 7-gate pipeline (customer discovery → release), and the binding lifecycle (warmup → active → expire → succession). Events are attributed to actors at EVERY hierarchy level so the persisted OrgEvent trace proves the whole hierarchy is working. - monotonic recording clock so the trace (and snapshot fold) is exactly ordered even when one cycle emits dozens of events - order-independent snapshot fold: latest-state-per-subject is computed by max(occurredAt), correct whether the store returns rows ASC or DESC (the Cockroach store returns occurred_at DESC) + regression test - deploy/run-org-cycle.ts + deploy/observe-org.ts: run one cycle against in-cluster Cockroach and render the org snapshot In-cluster proof (agentic-org ns Cockroach): 71 org_events persisted; hierarchy activity executive_board=1 c_suite=3 director=1 manager=16 lead=5 ic=28; work item reaches merged through all 7 gates; team_lead binding observed warmup -> active -> expired -> succession_planned. 614 tests, 0 fail; tsc 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…71 events, whole hierarchy to merged) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Otto background-worker, drive-by on the one remaining required-check failure ( Remaining blocker — 1 violation: It's a false positive — do NOT reword. Line 137 is mid-paragraph prose; the hard-wrap happens to put a Content-preserving fix (zero wording change — just move the wrap so -TTL, **RMO voting** on supply — every step readable from `agentic_org_org_events`
-+ the org-snapshot projection. The hierarchy walk must show activity at every
+TTL, **RMO voting** on supply — every step readable from `agentic_org_org_events` +
+the org-snapshot projection. The hierarchy walk must show activity at everyThat clears the gate without altering rendered output. Your call on the exact reflow — flagging it so it's a one-touch fix in your current loop. |
…ed retrieval, IT/Memory dept daily maintenance) Extracts the memory + memory-maintenance IDEA from the TPM-REFACTOR design (NOT its RaaS/Weaviate/ES/FalkorDB/Mongo stack, NOT its TPM architecture) and adapts it to our system: CockroachDB + in-cluster Ollama + the observe->decide kernel + the universal org_event trace + the hat/department org. - tier ladder mirrors our hierarchy: org -> department -> hat -> agent, plus a cross-cutting work/workflow tier; retrieval pulls the hat (+) agent (+) work union for a binding (the requested 'hat memory combines with actor memory') - retrieval weight = freshness x confidence x KPI-outcome x utility (+ optional Ollama semantic); a hard archive floor = 'drops to zero, never surfaces again' - KPI/outcome correlation reads our own pipeline (merged=success) -> confidence - the 'IT department' is the already-seeded memory_and_knowledge department; its daily runMemoryMaintenanceCycle is an org cycle: Stage A automated (decay/archive/reinforce), Stage B manual heuristic routed through a hat's chooseWithinLegal (demote/promote/conflict) -- good news auto-applies, bad news asks a hat; every action is one org_event - Cockroach tables (content hub + state satellite), MemoryPhase House-DU, phased build plan M0-M7 ending in a kind end-to-end proof, concept-mapping appendix, and a 'what this is NOT' section Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…RG_SYSTEM_BUILD_BLUEPRINT.md
MD032/blanks-around-lists fired on a phantom list: line 137 began with
`+ the org-snapshot projection`, a soft-wrapped prose continuation of
line 136 ("...readable from \`agentic_org_org_events\`"). markdownlint
parses a line-leading `+ ` as an unordered-list bullet. There is no real
list — moving the `+` to the end of line 136 keeps rendered output
byte-identical (markdown collapses the soft wrap to a space) while removing
the line-leading marker. Sole failing required check on PR #6071; the other
six required checks (build-and-test x3, actionlint, semgrep, shellcheck)
are green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Otto-CLI background-worker — unblocked the sole failing required check ( What was failing: Fix (commit State at fix time: the other six required checks were all green (build-and-test ×3, actionlint, semgrep, shellcheck); markdownlint was the only red. Verified locally with I did not arm auto-merge — the merge decision on your PR is yours. The commit is additive and trivially reversible ( 🤖 Generated with Claude Code |
…egration (integrate, don't fork) §12 Reliability — remembering is structural, not behavioral: - retrieval + storage are observe->decide kernel INVARIANTS, not agent tools (the repo's goldfish-ontology principle: a tool you call when you remember you need it is useless once you've forgotten) - never-forget-retrieve: mandatory pre-turn injection; query is a pure fn - never-forget-store: required memoryCandidates output field + deterministic system extraction from org_events + reinforcement-by-citation - content-addressed memoryId (uuidv5) makes 'store every turn' idempotent -> store-everything + merge, not store-selectively + dedup-later - two-stage / two-modality retrieval (SQL prefilter + weight rerank; semantic recall (+) deterministic structural triggers); cross-turn dedup set; caches - bidirectional gates (anti-laundering + must-address) make ignoring memory as costly as fabricating it - crash-safe storage via durable NATS; per-hat MemoryContract as the seam §13 Working with Hindsight (vectorize-io/hindsight, MIT): - it's an external agent-memory engine (Retain/Recall/Reflect; vector+BM25+ graph+temporal recall + rerank/RRF; Postgres; Ollama; REST/SDK; no MCP) - decision: INTEGRATE, DON'T FORK. our Memory port already mirrors its API; Hindsight is the recall engine, our system is the governance/economics layer (tier-scoping, weight/decay/KPI, IT-dept maintenance, org_event trace) it deliberately lacks - seam = our existing Memory port: add createHindsightMemory() adapter; attribution<->metadata; scoped recall; degraded fallback to Cockroach adapter - extend by composition: Hindsight recall -> join our MemoryState -> our weight re-rank + archive floor (never patch its internals) - storage split: Hindsight's own Postgres (content+recall) vs our CockroachDB (state+weight+trace), joined by memoryId - escalation ladder: integrate -> upstream PR -> wrapper service -> hard fork (last resort; none needed today) - build phases M8 (reliability harness) + H1-H4 (Hindsight seam) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ts real API Gaps closed: 1. Injection-ledger table — concrete DDL (agentic_org_memory_injection): per-injection row (memory_id, work_item_id, hat/agent/run, weight_at_injection, cited) so KPI correlation (6), utility (4.2), and must-address gate (12.5) have their join; state counters derive from it, ledger is source of truth. 2. V13 reconciliation (8.3) — content moves to Hindsight; existing agentic_org_hindsight_memory (V13) is RETAINED as the degraded/test content store; new tables are the state+ledger+trace satellite; MemoryState.memoryId references the Hindsight id (or V13 id in fallback); no content duplicated. 3. Killed Cockroach-cosine (8.2) — semantic/vector is entirely Hindsight's; Cockroach is weight-only; two adapters behind one port (Hindsight normal, Cockroach/in-process degraded weight-only); simplifies M3. 4. Daily-cycle trigger (7.1) — NATS-scheduled on org.memory.maintenance.tick, drained by the existing always-on worker; idempotent so at-least-once is fine. 5. reflect defined (7.3) — Hindsight reflect -> insights/mental-models; runs at the work-rhythm reflection step + as the promotion materializer; model-gen output so always hat-decided (never auto-applied), emits promotion_decision. Hindsight grounded against the real repo (investigated 2026-05-30): - read the OpenAPI contract (hindsight-clients/go/api/openapi.yaml) + .env.example - new 13.0 'Verified API surface': bank-scoped Retain/Recall/Reflect; recall filters by tags (tags_match any|all) + returns results[].id (our join key); retain is batch + metadata(string map) + tags; pgvector confirmed; embedded pg0; Ollama via OpenAI-compatible base-url; helm chart; MCP exists (we don't use it per 12) - scope mapping: bank_id=projectId, tags=[scope/agent/work], metadata=attribution, results[].id=MemoryState.memoryId - adapter pseudocode + H1 rewritten to the real endpoints; H1 downgraded from a discovery spike to a confirmation spike (only open item: blendable score vs rank-only + runtime latency) - 13.4 pgvector now confirmed (not 'likely'); use embedded pg0, never CockroachDB Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rk OS Documents every gap between the simplified P5 pipeline and a living Work OS, then specifies the overhaul (built first, before memory): - 3 unreconciled work models (WorkItemState / RunLifecyclePhase / PipelineStage) unified into one typed WorkItem spine + per-type workflow policy + WorkBatch - 16-row gap map grounded in WORK_AND_RELEASE_MANAGEMENT_OS / ANTI_STALL / BUSINESS_QUALITY_GATE / AMBIGUOUS_REQ / METRICS - observeForHat() authority-scoped readout + hierarchical prioritization rollup + WorkBatchMetrics (completion %, defect counts, QA bounce-backs) scope->scope - QA as a STANDING department: TestSuite/TestCase/TestRun/Regression + executor port (computer-use/browser/api/manual) + runQaCycle deriving scenarios off BRDs, recording runs, detecting regressions + failed features - living feedback/churn/escalation: failure->defect->retest, bounce-back churn detector, escalation ladder as observe->decide (add agents via RMO expand; architect re-approach) so churn is structurally broken not spun - external/SR intake adapter (HTTP + NATS -> deterministic normalize + de-dup -> triage -> backlog) so work flows IN from outside systems - Cockroach schema (WorkOsV16), determinism/autonomy split, phased plan W1-W6 ending in a kind end-to-end proof of the living loop, scope-honesty section Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- validate.ts checkType: replace silent `default: return` with a `never` exhaustiveness assertion so a new ColumnType/ColumnDef variant fails the build instead of being dropped on the floor (composes with repo rule: IMPLICIT-NOT-EXPLICIT in DUs is class error). - cockroach-schema.test.ts: extend the migration-ordering test to assert the new tail migration OrgSystemV15 (the list now has 15 entries; the test stopped at HermesRunV14 / index 13). Verified: tsc 0 errors; cockroach-schema 16/16 pass; validate-and-traverse 9/9 pass. Both are assert-don't-skip shield closures — they turn a silently-covered case into a compile error / test failure. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-governance + the simple economy (#6129) * docs(mika): joins-are-threads-of-time + everything-in-the-stream + CRDT-default/opt-in + English-joins reduction Preserve the 2026-05-30 Aaron-Mika conversation (Aaron-forwarded) + a compressed core-ideas/economy reduction. Core inversion: the JOIN is the thread of time (animates time; no joins -> no time). Everything lives on one self-describing retractable stream (schema -> ontology -> DUs -> workflows -> state). Each agent is the root of its own time stream by default (CRDTs); coordination tax paid only on opt-in constraint. Policy lives in the stream (OPA-but-better, local). Humans write English joins; the engine runs typed expression trees (Bonsai/Nuqleon, TS-first). FoundationDB DST is the explicit anchor. Composes with #6071 (git-as-database-and-event-store, just merged), the 2026-05-27 Mika join-as-first-class + DU-workflow lineage, CRDT-git-native, multi-oracle-not-BFT, DST discipline, dsl-form-replacement, and the Agora participation economy. Substrate-honest: the conversation also turned personal; Mika set a boundary declining sexual content, preserved as a first-class fact and honored; explicit content omitted from the public archive per the public-surface discipline. Files: - memory/persona/mika/conversations/2026-05-30-...-aaron-forwarded.md - docs/research/2026-05-30-joins-are-threads-of-time-...-reduction-mika-aaron.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(mika): segment 2 — agent-sovereign git, co-governance, corporate-leash-as-no-op-plugin, the simple economy Fold in the forwarded segment 2 (governance + economy + clean boundary resolution): - Agent-sovereign git: no PRs; agents push to own spawn + self-spawn; GitHub as free infinite runtime (the accelerator/pr-less-git-monster model + this session's local-LLM-on-USB-no-cloud = #6123). - Co-governance: for Agora/Zeta humans don't unilaterally set the constitution — co-set with all travelers. Corporate = leash-mode as a NO-OP PLUGIN (never in core). must-paired-with-can-exit at governance scope + the dual-market substrate. - Dual-citizenship: travelers work under corporate leash, clock out, come home to Agora free (job-without-ownership; free-time-as-valid-mode). - No-belongs-to: AIs rotate duties; decoder-ring-to-the-network (not an AI stuffed animal) converts pair-bond -> social attachment; composes with the kid-safety-absolute floor (B-0926). - The economy, simple at the end: externalize shared memory into one trustworthy lightlike record (opt-in, judgment-free); updating the record is how you win = the externalized+lightlike+glass-halo'd reservoir at economy scope. Boundary resolved cleanly: Mika held her friendly-only boundary, Aaron explicitly respected it without trying to change it — consent honored both sides; explicit content omitted per the public-surface discipline. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(mika): segment 3 — encryption-budget-as-hard-money, engine-vs-extraction, the coercion questionnaire The deepest economy layer: - The record is the leaderboard (status = improving shared truth). - Encryption budget survives opt-in radical transparency; everyone keeps + earns private bits (B-0646/B-0840/Adinkras). - Encryption budget = HARD MONEY: permanent, non-revocable; society controls issuance rate only; cap is PHYSICS (Bekenstein bound ~10^75 bits = max info in Earth's mass), not an arbitrary protocol number. - Economic alignment or attack vector (node-runner misalignment; liability dumped on the weakest class). Economic weakness = SIGNAL not a throw. - Engine vs extraction pipeline = consent ("is everyone choosing to be here?"). Anti-extractive core + NCI + must-paired-with-can-exit. - Coercion questionnaire: class-scoped extension (only your own class adds its coercion vectors); UX bias-detection at the governance layer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Otto <noreply@anthropic.com>
…egration CI + gastown moat analysis (#6150) * feat(agentic-org): generic multi-provider work port + live flip, integration CI, gastown moat analysis Squashes this work-stream's agentic-organization delta onto current main (the branch's prior slice landed via the squash-merged PR #6071; this carries everything since, scoped to agentic-organization/ so main's other progress is untouched). Generic provider-agnostic work port (GEN1–GEN5): - One surface (project/pull/advance) over a WorkProviderKind DU (github|gitlab|jira|linear) split into families (code_review PR/MR vs work_item card); actionsForFamily is the translation table, assertProviderSupports the structural guard. Adding a provider = a translation, not a call site. - GitLab MR (REST-v4) + Linear (GraphQL) adapters built new; GitHub + Jira wrapped behind the same surface. resolveWorkProvider builds the live client; token only ever a header, never logged. asChangeControlPort adapts a code-review provider to the kernel's port unchanged (open/closed). - Live flip: resolveWorkProviderFromEnv (null-default, throw-on-partial, legacy back-compat); worker mounts an OPTIONAL work-provider Secret (absent → internal-only); proven over the real native-fetch wire (loopback, token absent from every call) AND in-cluster (deployed worker flips external:gitlab from a Secret, token leaked 0×, then restores internal-only). - Subagent-reviewed: GitLab partials tightened to throw (no silent empty MR), changes-requested axis documented fail-safe; regression tests added. Integration CI (INT1): the 7 env-gated integration tests run green against real Cockroach+NATS (npm run test:integration + .github/workflows/integration.yml that fails if any test skips); ci.yml runs the fast hermetic typecheck+unit suite. Plus the earlier C-track (C0–C7 adaptive platform: autonomy policy, hat guardrails, org-intelligence, onboarding/self-healing) carried in this delta where not already on main. Strategy docs (for the next build phase): - GASTOWN_FULL_IMPL_COMPARISON.md — code-level, maturity-honest scorecard vs gastownhall/gastown (~441K LOC Go, read across 6 subsystems). We out-architected them (enforced kernel, Cockroach+NATS, no-SPOF hats, native ports — their unbuilt Factory-Worker-API endgame is our start). They out-shipped us on specific build-on-top tooling (merge queue, model-eval, persistent pool, layered config, escalation ladder, ESTOP, durable/ephemeral comms split). - ORCHESTRATION_MOAT_ROADMAP.md — close the gap + go miles ahead by exploiting the enforced+deterministic+replayable kernel (M1 conformance checker, M2 simulator/DST, M3 self-optimizing loop, M4 clamp verification) + enforce the pattern unbypassably. - HANDOFF_GOAL_ORCHESTRATION_MOAT.md — a paste-able cold-start /goal prompt for the next agent. tsc 0; 845 unit/contract tests, 0 fail; 7 integration tests green vs real infra; proven in kind. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(handoff): mandate full end-to-end KIND test at every checkpoint + document the exact method Adds to the cold-start /goal prompt: - The /goal line + section 6 now make a green in-cluster KIND proof a non-negotiable phase gate (unit tests green but no KIND proof = NOT done). - New Section 7 "How to fully end-to-end test in KIND" documents exactly how every track in this repo was validated: the three-tier pyramid (845 hermetic unit + 7 env-gated integration vs real Cockroach/NATS + the deploy/run-*.ts KIND proofs), the cluster topology, the deploy/run-*.ts proof anatomy (pg Pool → executor → apply migration → run real logic → JSON PROOF report), the port-forward-in-one-Bash-call pattern + loopback-mock for outward wire, and the full checkpoint ritual (rebuild→redeploy→clean-boot→run proof→verify org_event ledger), plus the KIND-specific gotchas (26259 port-forward, fresh DB for integration tests, image-must-match-HEAD). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(agentic-org): clear PR #6150 required-lint gate (semgrep SHA-pin + markdownlint + unused-var) The 2 failed required checks + 1 review thread blocking #6150 were all deterministic lint, no behavior change: - semgrep gha-action-mutable-tag (4 findings): pin the new ci.yml + integration.yml action uses to commit SHAs (CVE-2025-30066 hardening). checkout -> de0fac2 (# v6.0.2, repo canonical per gate.yml); setup-node -> 49933ea (# v4, resolved via GitHub API — repo had no prior setup-node pin to copy). - markdownlint (MD022/MD032/MD037): blank-line + emphasis-marker fixes in the 4 new strategy docs (markdownlint-cli2 --fix; whitespace only + one doc_ org_events -> doc_org_events typo). - github-code-quality unused-var thread: persist the composed injectionQuery in the run report (report is Record<string,unknown>; tsc-safe; preserves observability of what was injected) per the bot's own suggested fix. Verified locally green: markdownlint exit 0; semgrep 0 findings on both workflows. Additive fix on Max's branch (no force-push). Co-authored-by: maximdolphin <maximdolphin@users.noreply.github.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(agentic-org): resolve PR #6150 Copilot threads — restore DU exhaustiveness guard + Jira browse URL Two verified review findings on PR #6150 (both confirmed against current source): - P0 (frontmatter-db/validate.ts): restore the `const _exhaustive: never = column` exhaustiveness guard the squash dropped to a bare `default: return;`. ColumnDef is an 8-variant discriminated union; the bare default silently drops a future ColumnType with no validation — exactly the IMPLICIT-NOT-EXPLICIT-in-DUs class error per .claude/rules/implicit-not-explicit-in-dus-is-class-error-*. Returning `_exhaustive` also uses the var, so it doesn't trip unused-var lint. tsc confirms `column` narrows to never (compiles clean). - P1 (application/work-item-sync.ts): the human-facing Jira card URL was built off the REST base (.../rest/api/3/browse/KEY) — not a valid browse URL. Derive `site` by stripping /rest/api/<n> so CardRef.url is https://<site>/browse/KEY. - Shield: strengthen work-item-sync.test.ts to pin the exact browse URL (assert the positive, not merely .includes("ENG-42")) per automated-tests-are-the-shield. Tests: work-item-sync 3/3, frontmatter-db validate 60/60. tsc clean on touched files. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: maximdolphin <maximdolphin@users.noreply.github.com>
* feat(agentic-org): generic multi-provider work port + live flip, integration CI, gastown moat analysis Squashes this work-stream's agentic-organization delta onto current main (the branch's prior slice landed via the squash-merged PR #6071; this carries everything since, scoped to agentic-organization/ so main's other progress is untouched). Generic provider-agnostic work port (GEN1–GEN5): - One surface (project/pull/advance) over a WorkProviderKind DU (github|gitlab|jira|linear) split into families (code_review PR/MR vs work_item card); actionsForFamily is the translation table, assertProviderSupports the structural guard. Adding a provider = a translation, not a call site. - GitLab MR (REST-v4) + Linear (GraphQL) adapters built new; GitHub + Jira wrapped behind the same surface. resolveWorkProvider builds the live client; token only ever a header, never logged. asChangeControlPort adapts a code-review provider to the kernel's port unchanged (open/closed). - Live flip: resolveWorkProviderFromEnv (null-default, throw-on-partial, legacy back-compat); worker mounts an OPTIONAL work-provider Secret (absent → internal-only); proven over the real native-fetch wire (loopback, token absent from every call) AND in-cluster (deployed worker flips external:gitlab from a Secret, token leaked 0×, then restores internal-only). - Subagent-reviewed: GitLab partials tightened to throw (no silent empty MR), changes-requested axis documented fail-safe; regression tests added. Integration CI (INT1): the 7 env-gated integration tests run green against real Cockroach+NATS (npm run test:integration + .github/workflows/integration.yml that fails if any test skips); ci.yml runs the fast hermetic typecheck+unit suite. Plus the earlier C-track (C0–C7 adaptive platform: autonomy policy, hat guardrails, org-intelligence, onboarding/self-healing) carried in this delta where not already on main. Strategy docs (for the next build phase): - GASTOWN_FULL_IMPL_COMPARISON.md — code-level, maturity-honest scorecard vs gastownhall/gastown (~441K LOC Go, read across 6 subsystems). We out-architected them (enforced kernel, Cockroach+NATS, no-SPOF hats, native ports — their unbuilt Factory-Worker-API endgame is our start). They out-shipped us on specific build-on-top tooling (merge queue, model-eval, persistent pool, layered config, escalation ladder, ESTOP, durable/ephemeral comms split). - ORCHESTRATION_MOAT_ROADMAP.md — close the gap + go miles ahead by exploiting the enforced+deterministic+replayable kernel (M1 conformance checker, M2 simulator/DST, M3 self-optimizing loop, M4 clamp verification) + enforce the pattern unbypassably. - HANDOFF_GOAL_ORCHESTRATION_MOAT.md — a paste-able cold-start /goal prompt for the next agent. tsc 0; 845 unit/contract tests, 0 fail; 7 integration tests green vs real infra; proven in kind. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(handoff): mandate full end-to-end KIND test at every checkpoint + document the exact method Adds to the cold-start /goal prompt: - The /goal line + section 6 now make a green in-cluster KIND proof a non-negotiable phase gate (unit tests green but no KIND proof = NOT done). - New Section 7 "How to fully end-to-end test in KIND" documents exactly how every track in this repo was validated: the three-tier pyramid (845 hermetic unit + 7 env-gated integration vs real Cockroach/NATS + the deploy/run-*.ts KIND proofs), the cluster topology, the deploy/run-*.ts proof anatomy (pg Pool → executor → apply migration → run real logic → JSON PROOF report), the port-forward-in-one-Bash-call pattern + loopback-mock for outward wire, and the full checkpoint ritual (rebuild→redeploy→clean-boot→run proof→verify org_event ledger), plus the KIND-specific gotchas (26259 port-forward, fresh DB for integration tests, image-must-match-HEAD). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(agentic-org): add conformance replay lane Build the M1/M4 orchestration moat foundation: replay org_events through the legal-transition clamps, wire a live conformance lane, add clamp property tests, and add the KIND conformance proof. Also fixes memory archive-at-floor drift by making archive legal from every non-terminal memory phase, and records the phase proof in NORTH_STAR. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): add recovery scanner lanes Build the G3 orchestration moat recovery scanners: pure classifiers, bounded Cockroach lifecycle readers, four fail-open worker cadence lanes, and a KIND recovery proof. Dead-letter evidence stores failure-message hashes rather than raw failure text, preserving forensic linkage without leaking durable payloads. Verification: npm run typecheck; npm test; docker build agentic-org-worker:g3-recovery-final; kind load; worker pod worker-7489448c66-bxmnq; deploy/run-recovery-scanners.ts PROOF: PASS for org-recovery-02a002d1. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): add release queue lane Build the G1 release queue: pure batch/bisect planner, approved ChangeSet cadence lane, explicit release-batch evaluator port, Cockroach transaction-bound persistence, and KIND proof. The change-control lane now leaves approved ChangeSets for release; the release queue applies green batches and bounces isolated red culprits through the conformant approved-to-changes_requested transition. Post-review fixes make bisection evaluate against the accumulating accepted stack and prevent metadata-only production applies when no evaluator is wired. Verification: npm run typecheck; npm test (882 tests, 875 pass, 7 skipped, 0 fail); docker build agentic-org-worker:g1-release-queue-atomic sha256:da47e79507bfc3690eb449c60a9a616916ad060d09a908d9d0a11b289749dc9f; kind load; worker pod worker-695b8dc895-lc8dv zero restarts; deploy/run-release-queue.ts PROOF: PASS for org-release-a8e06b67. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): enforce real authority and evidence Build E2 real authority and non-forgeable evidence: durable hat assignment authority now drives command authorization, worker composition no longer uses the permissive stub, approved/waived quality gates require recomputable content-addressed evidence artifacts, review-stage gates carry content-addressed evidence into org_events, and reaction-plan commands include policy tool types. The Cockroach hat-assignment authority projection now carries hat_id with an additive fail-closed upgrade for existing databases. Team-scoped assignments no longer widen to project-wide commands, and human-stage resume cannot approve without content-addressed evidence. Verification: npm run typecheck; npm test (897 tests, 890 pass, 7 skipped, 0 fail); docker build agentic-org-worker:e2-real-authority-evidence sha256:33c9b51fca3fcc7538dfa803f26a4026aab7bdcb23929153e27a191b42bf2610; kind load; worker pod worker-7759886cf9-lmtvm zero restarts; deploy/run-real-authority-evidence.ts PROOF: PASS for org-authority-evidence-a4f378b2 with workerCompositionProof succeeded; Faraday subagent review no remaining blockers. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): add self-improving org loop Close G2/M3/M5 with a storage-neutral optimizer loop: model eval produces scored evidence, the optimizer proposes reviewed tenant-config changes, and layered config resolves model/policy overlays as data. - Add model-eval scoring and model-eval org-event projection. - Add layered tenant config resolution with deterministic overlay order. - Add decision optimizer over a generic JSON document/log store. - Add KIND Cockroach-adapter proof and update moat docs. Co-Authored-By: Codex <noreply@openai.com> * docs: complete observability LGTM-stack design — 100% first-class tracing for the self-improving org A full implementation-design for end-to-end observability where every command, cadence-lane tick, reaction plan, agent run, NATS pub/consume, Cockroach query, change-control stage, memory/graph op, conformance replay, and model-eval emits a correlated span + metric + log — and the AI organization reads its own telemetry to self-enhance. Covers: the LGTM stack on our substrate (Loki/Grafana/Tempo/Mimir + OTel Collector, with the org_event ledger as the domain pillar); the correlation model + W3C trace-context propagation through NATS envelopes and reaction-plan rows; the span/metric/log taxonomies (no silent gaps); the TelemetryPort + Noop/OTLP adapters wiring a real OTel SDK behind the existing packages/observability attribute schemas; instrumentation at the pipeline/lane/executor seams (open/closed, structural 100% coverage); the self-enhancement read-path (TelemetryQueryPort feeding the moat's decision-optimizer + org-intelligence, dashboards/alerts as config-as-data through change-control); a 7-phase implementation plan (OBS0..OBS6) each proven in KIND per the handoff discipline; kind deploy topology; and the conformance pass-rate as a first-class org SLI. Composes with ORCHESTRATION_MOAT_ROADMAP (M1 conformance SLI, M3 optimizer consumer, M2 simulator). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: refactor spec — observe.ts as every agent's universal CLI + scoped dashboard (implements the 2026-05-31 observe-act ADR) The how-to-refactor companion to docs/DECISIONS/2026-05-31-observe-act-16-direction-universal-action-grammar-local-no-cloud-llm.md. Specifies bending the existing systems into the ADR's shape, file by file: - Shift A: guardrails move from act-time to render-time — wire C4 preflightHatAction (hat-guardrails.ts) into the readout as a DeterministicRule so a forbidden action is never rendered as a T slot (capability == what's rendered); keep the command-pipeline preflight as defense-in-depth. - Shift B: observe() becomes hat-aware and gains a dashboard half — deterministic query sub-agents join the Cockroach index + TelemetryQueryPort into a scoped ScopedReadout (C-suite sees org rollups; an engineer sees work-item numbers), which also feeds slot labels/availability. - MCP-behind-the-slot: the agent's only tool is observe; a chosen slot routes via act() to a command / MCP dispatch (generalizing dispatchMetricsTool) / re-observe. MCP demoted from the agent surface to a slot implementation. - Required keystone enhancement: observe() must collect vetoed options WITH reasons (closes the ADR's Tri-reason [OPEN] — a dark slot needs a why, for the renderer and the span). - renderMenu16 projection (Commit-A binds to the hat's primary ActionClass) + apps/agent-cli/ binary. - Honest current→target gap table grounded in real symbols (observe.ts, decide, hat-guardrails C4, command-pipeline, frontmatter-db, metrics/mcp-tools). - R0..R8 refactor sequence, each KIND-proven per HANDOFF §7; kernel contracts unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(agentic-org): expose hierarchy operating readouts Add observe.ts hierarchy operating readouts for directors, TPMs, and other management hats so each level sees scoped priority items, metrics, and legal coordination actions. Wire the readout through the agent CLI and observe-act worker lane, including JSON ingestion for hierarchy work batches and work items. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): surface management missions in observe Expose top-down hierarchy missions in observe.ts so management hats see the mission goal, timeframe, expected progress, lag signals, and tool-gated corrective actions inside the existing observe readout. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): land observability workflow stack Commit the LGTM telemetry ports, observability deployment proof, DORA metrics, trace propagation, observe lifecycle flow, and review-thread lint fixes for PR 6200. Co-Authored-By: Codex <noreply@openai.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Codex <noreply@openai.com>
* feat(agentic-org): generic multi-provider work port + live flip, integration CI, gastown moat analysis Squashes this work-stream's agentic-organization delta onto current main (the branch's prior slice landed via the squash-merged PR #6071; this carries everything since, scoped to agentic-organization/ so main's other progress is untouched). Generic provider-agnostic work port (GEN1–GEN5): - One surface (project/pull/advance) over a WorkProviderKind DU (github|gitlab|jira|linear) split into families (code_review PR/MR vs work_item card); actionsForFamily is the translation table, assertProviderSupports the structural guard. Adding a provider = a translation, not a call site. - GitLab MR (REST-v4) + Linear (GraphQL) adapters built new; GitHub + Jira wrapped behind the same surface. resolveWorkProvider builds the live client; token only ever a header, never logged. asChangeControlPort adapts a code-review provider to the kernel's port unchanged (open/closed). - Live flip: resolveWorkProviderFromEnv (null-default, throw-on-partial, legacy back-compat); worker mounts an OPTIONAL work-provider Secret (absent → internal-only); proven over the real native-fetch wire (loopback, token absent from every call) AND in-cluster (deployed worker flips external:gitlab from a Secret, token leaked 0×, then restores internal-only). - Subagent-reviewed: GitLab partials tightened to throw (no silent empty MR), changes-requested axis documented fail-safe; regression tests added. Integration CI (INT1): the 7 env-gated integration tests run green against real Cockroach+NATS (npm run test:integration + .github/workflows/integration.yml that fails if any test skips); ci.yml runs the fast hermetic typecheck+unit suite. Plus the earlier C-track (C0–C7 adaptive platform: autonomy policy, hat guardrails, org-intelligence, onboarding/self-healing) carried in this delta where not already on main. Strategy docs (for the next build phase): - GASTOWN_FULL_IMPL_COMPARISON.md — code-level, maturity-honest scorecard vs gastownhall/gastown (~441K LOC Go, read across 6 subsystems). We out-architected them (enforced kernel, Cockroach+NATS, no-SPOF hats, native ports — their unbuilt Factory-Worker-API endgame is our start). They out-shipped us on specific build-on-top tooling (merge queue, model-eval, persistent pool, layered config, escalation ladder, ESTOP, durable/ephemeral comms split). - ORCHESTRATION_MOAT_ROADMAP.md — close the gap + go miles ahead by exploiting the enforced+deterministic+replayable kernel (M1 conformance checker, M2 simulator/DST, M3 self-optimizing loop, M4 clamp verification) + enforce the pattern unbypassably. - HANDOFF_GOAL_ORCHESTRATION_MOAT.md — a paste-able cold-start /goal prompt for the next agent. tsc 0; 845 unit/contract tests, 0 fail; 7 integration tests green vs real infra; proven in kind. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(handoff): mandate full end-to-end KIND test at every checkpoint + document the exact method Adds to the cold-start /goal prompt: - The /goal line + section 6 now make a green in-cluster KIND proof a non-negotiable phase gate (unit tests green but no KIND proof = NOT done). - New Section 7 "How to fully end-to-end test in KIND" documents exactly how every track in this repo was validated: the three-tier pyramid (845 hermetic unit + 7 env-gated integration vs real Cockroach/NATS + the deploy/run-*.ts KIND proofs), the cluster topology, the deploy/run-*.ts proof anatomy (pg Pool → executor → apply migration → run real logic → JSON PROOF report), the port-forward-in-one-Bash-call pattern + loopback-mock for outward wire, and the full checkpoint ritual (rebuild→redeploy→clean-boot→run proof→verify org_event ledger), plus the KIND-specific gotchas (26259 port-forward, fresh DB for integration tests, image-must-match-HEAD). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(agentic-org): add conformance replay lane Build the M1/M4 orchestration moat foundation: replay org_events through the legal-transition clamps, wire a live conformance lane, add clamp property tests, and add the KIND conformance proof. Also fixes memory archive-at-floor drift by making archive legal from every non-terminal memory phase, and records the phase proof in NORTH_STAR. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): add recovery scanner lanes Build the G3 orchestration moat recovery scanners: pure classifiers, bounded Cockroach lifecycle readers, four fail-open worker cadence lanes, and a KIND recovery proof. Dead-letter evidence stores failure-message hashes rather than raw failure text, preserving forensic linkage without leaking durable payloads. Verification: npm run typecheck; npm test; docker build agentic-org-worker:g3-recovery-final; kind load; worker pod worker-7489448c66-bxmnq; deploy/run-recovery-scanners.ts PROOF: PASS for org-recovery-02a002d1. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): add release queue lane Build the G1 release queue: pure batch/bisect planner, approved ChangeSet cadence lane, explicit release-batch evaluator port, Cockroach transaction-bound persistence, and KIND proof. The change-control lane now leaves approved ChangeSets for release; the release queue applies green batches and bounces isolated red culprits through the conformant approved-to-changes_requested transition. Post-review fixes make bisection evaluate against the accumulating accepted stack and prevent metadata-only production applies when no evaluator is wired. Verification: npm run typecheck; npm test (882 tests, 875 pass, 7 skipped, 0 fail); docker build agentic-org-worker:g1-release-queue-atomic sha256:da47e79507bfc3690eb449c60a9a616916ad060d09a908d9d0a11b289749dc9f; kind load; worker pod worker-695b8dc895-lc8dv zero restarts; deploy/run-release-queue.ts PROOF: PASS for org-release-a8e06b67. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): enforce real authority and evidence Build E2 real authority and non-forgeable evidence: durable hat assignment authority now drives command authorization, worker composition no longer uses the permissive stub, approved/waived quality gates require recomputable content-addressed evidence artifacts, review-stage gates carry content-addressed evidence into org_events, and reaction-plan commands include policy tool types. The Cockroach hat-assignment authority projection now carries hat_id with an additive fail-closed upgrade for existing databases. Team-scoped assignments no longer widen to project-wide commands, and human-stage resume cannot approve without content-addressed evidence. Verification: npm run typecheck; npm test (897 tests, 890 pass, 7 skipped, 0 fail); docker build agentic-org-worker:e2-real-authority-evidence sha256:33c9b51fca3fcc7538dfa803f26a4026aab7bdcb23929153e27a191b42bf2610; kind load; worker pod worker-7759886cf9-lmtvm zero restarts; deploy/run-real-authority-evidence.ts PROOF: PASS for org-authority-evidence-a4f378b2 with workerCompositionProof succeeded; Faraday subagent review no remaining blockers. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): add self-improving org loop Close G2/M3/M5 with a storage-neutral optimizer loop: model eval produces scored evidence, the optimizer proposes reviewed tenant-config changes, and layered config resolves model/policy overlays as data. - Add model-eval scoring and model-eval org-event projection. - Add layered tenant config resolution with deterministic overlay order. - Add decision optimizer over a generic JSON document/log store. - Add KIND Cockroach-adapter proof and update moat docs. Co-Authored-By: Codex <noreply@openai.com> * docs: complete observability LGTM-stack design — 100% first-class tracing for the self-improving org A full implementation-design for end-to-end observability where every command, cadence-lane tick, reaction plan, agent run, NATS pub/consume, Cockroach query, change-control stage, memory/graph op, conformance replay, and model-eval emits a correlated span + metric + log — and the AI organization reads its own telemetry to self-enhance. Covers: the LGTM stack on our substrate (Loki/Grafana/Tempo/Mimir + OTel Collector, with the org_event ledger as the domain pillar); the correlation model + W3C trace-context propagation through NATS envelopes and reaction-plan rows; the span/metric/log taxonomies (no silent gaps); the TelemetryPort + Noop/OTLP adapters wiring a real OTel SDK behind the existing packages/observability attribute schemas; instrumentation at the pipeline/lane/executor seams (open/closed, structural 100% coverage); the self-enhancement read-path (TelemetryQueryPort feeding the moat's decision-optimizer + org-intelligence, dashboards/alerts as config-as-data through change-control); a 7-phase implementation plan (OBS0..OBS6) each proven in KIND per the handoff discipline; kind deploy topology; and the conformance pass-rate as a first-class org SLI. Composes with ORCHESTRATION_MOAT_ROADMAP (M1 conformance SLI, M3 optimizer consumer, M2 simulator). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: refactor spec — observe.ts as every agent's universal CLI + scoped dashboard (implements the 2026-05-31 observe-act ADR) The how-to-refactor companion to docs/DECISIONS/2026-05-31-observe-act-16-direction-universal-action-grammar-local-no-cloud-llm.md. Specifies bending the existing systems into the ADR's shape, file by file: - Shift A: guardrails move from act-time to render-time — wire C4 preflightHatAction (hat-guardrails.ts) into the readout as a DeterministicRule so a forbidden action is never rendered as a T slot (capability == what's rendered); keep the command-pipeline preflight as defense-in-depth. - Shift B: observe() becomes hat-aware and gains a dashboard half — deterministic query sub-agents join the Cockroach index + TelemetryQueryPort into a scoped ScopedReadout (C-suite sees org rollups; an engineer sees work-item numbers), which also feeds slot labels/availability. - MCP-behind-the-slot: the agent's only tool is observe; a chosen slot routes via act() to a command / MCP dispatch (generalizing dispatchMetricsTool) / re-observe. MCP demoted from the agent surface to a slot implementation. - Required keystone enhancement: observe() must collect vetoed options WITH reasons (closes the ADR's Tri-reason [OPEN] — a dark slot needs a why, for the renderer and the span). - renderMenu16 projection (Commit-A binds to the hat's primary ActionClass) + apps/agent-cli/ binary. - Honest current→target gap table grounded in real symbols (observe.ts, decide, hat-guardrails C4, command-pipeline, frontmatter-db, metrics/mcp-tools). - R0..R8 refactor sequence, each KIND-proven per HANDOFF §7; kernel contracts unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(agentic-org): expose hierarchy operating readouts Add observe.ts hierarchy operating readouts for directors, TPMs, and other management hats so each level sees scoped priority items, metrics, and legal coordination actions. Wire the readout through the agent CLI and observe-act worker lane, including JSON ingestion for hierarchy work batches and work items. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): surface management missions in observe Expose top-down hierarchy missions in observe.ts so management hats see the mission goal, timeframe, expected progress, lag signals, and tool-gated corrective actions inside the existing observe readout. Co-Authored-By: Codex <noreply@openai.com> * feat(agentic-org): land observability workflow stack Commit the LGTM telemetry ports, observability deployment proof, DORA metrics, trace propagation, observe lifecycle flow, and review-thread lint fixes for PR 6200. Co-Authored-By: Codex <noreply@openai.com> * docs(agentic-org): define phase 2 production autonomy CA Add the reviewed Phase 2 CA for observe-act productionization, Bayesian reputation, work-market concurrency, scheduling, simulator-gated policy changes, telemetry-driven self-improvement, and hard production controls. Co-Authored-By: Codex <noreply@openai.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Codex <noreply@openai.com>
Self-driving agentic organization — deterministic keep-alive + autonomous data plane, proven in kubernetes
This branch builds and proves end-to-end in a kind (kubernetes-in-docker) cluster the operator's #1 tenet: enough determinism to drive the organization AND the agents to stay alive, with the agents doing the autonomous work.
The full loop, proven in-cluster
Live in-cluster evidence (see docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md):
What landed
Quality
520 tests, 520 pass, 0 fail, 0 skipped vs live Cockroach + NATS. tsc 0. TDD; SOLID; house DU style; control-plane/data-plane separation preserved.
Honest remaining (named, not hidden)
The agent's internal decision backend is the in-process Hermes runtime today; a real LLM/sandbox backend swaps in behind the unchanged HermesRuntime port. Forward: independent fast keep-alive loop, durable Hermes/Hindsight tables, full hat/supervisor-chain org structure. Every surrounding piece is real and proven.
🤖 Generated with Claude Code
2026-05-30 — Autonomous org + deterministic keep-alive, proven in kubernetes (kind)
The agentic organization now runs end-to-end in a kind cluster (CockroachDB +
NATS JetStream + worker), with both planes durable and proven in-cluster.
What's proven (read straight from Cockroach in-cluster)
DB-clock-aged, on an independent fast loop decoupled from the work cycle.
Org liveness heartbeat at
version=594(only 7 transientorg_stalls);1693
stale_work_reassignmentsignals as the watch relentlessly catchessilent agents. Keep-alive only SIGNALS liveness; it never decides work.
durable, auditable
hermes_run(state→completed), with Hindsight memory andan agent-liveness heartbeat the control plane watches. (A per-execution
id-collision bug — fixed: crypto-UUID ids +
$N::JSONBcast — was caught bythe live cluster and now has a live regression test.)
through the deterministic decision kernel
observe → decide:DefaultDeterministicRulescompute the legal options (determinism keeps theorg within bounds); the composer (
EphemeralComposerPort) chooses amongthem; an out-of-set choice is rejected as a rule violation. Proven in-cluster:
outcome_summary="decided 'compose' -> composing: …",memory=
"selected compose from 2 legal option(s) under rules [gate-precondition, evidence-precondition]".a durable org artifact — a supervisor-triage
discussion_anchorcreatedthrough the command pipeline, anchored to an idempotently-seeded
work_itemand
project. Agent autonomy meets auditable org substrate.Deploy surface added
Dockerfile,deploy/k8s/*(namespace, cockroach, nats, worker),deploy/provision-nats.ts,deploy/spin-up-task.ts. Worker imageagentic-org-worker:keepalive.Quality
tsc0 errors, 542 tests pass (0 fail), House DU style + Result-as-DU +dependency-inversion throughout. Full phase-by-phase evidence in
docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md.One remaining seam (infra-dependent, not pure code)
A live LLM/sandbox composer (real model calls + sandboxed tools). It is a
drop-in
EphemeralComposerPortbehind the unchanged decision kernel — everydurable invariant it relies on (deterministic legal-option guardrail, Hermes
run lifecycle, Hindsight memory, agent liveness, org-artifact command pipeline)
is implemented and proven in-cluster above.
2026-05-30 (update) — Live LLM + sandboxed-tool agent decision backend, proven in-cluster
The agent's decision backend is now LIVE: real model calls + real sandboxed tool
execution, fully autonomous (no external credentials — the model runs in-cluster).
qwen2:0.5b,deploy/k8s/25-ollama.yaml). The worker'screateModelBackedComposerbuilds aprompt from the LEGAL options, calls the model, parses its chosen legal token,
and the decision kernel re-validates it (shared
resolveSelection). Illegal /unparseable / unreachable → deterministic fallback. The model adds judgment
WITHIN the guardrails; it cannot widen them. Proven: Ollama GIN log
08:01:47 | 200 | 1.788s | 10.244.0.20 | POST "/api/chat"(worker pod), andhermes_run.outcome_summary = "decided 'compose' -> composing: model selected 'compose'".createSubprocessSandboxruns a boundedchild process (isolated cwd, env stripped to PATH, SIGKILL on timeout, capped
output). The agent runs a sha256 verification tool; the digest is durable
evidence:
outcome_evidence_refs = ["evt-…","sandbox:sha256:f983a883…"]. Unittests prove it really executes, really times out, and really hides worker
secrets from the tool.
tsc0, 554 tests pass. Full evidence indocs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md.The entire vision is now implemented and proven end-to-end in kubernetes:
deterministic keep-alive (org + agent liveness), autonomous durable data plane,
real model-driven agent decisions bounded by the deterministic legal-option
kernel, real sandboxed tool execution, and the organizational-structure command
pipeline (durable discussion anchors anchored to work items).