From ebf1b3175dc336238ee664e36383f2c3af78d530 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 15:15:08 -0400 Subject: [PATCH 01/21] docs: define v0 organization contracts Co-Authored-By: OpenAI Codex --- .../IMPLEMENTATION_READINESS_CHECKLIST.md | 16 +- agentic-organization/docs/README.md | 5 +- .../docs/V0_EXECUTABLE_CONTRACT.md | 242 +++++++++++ .../docs/V0_POLICY_AND_RUNTIME_BOUNDARIES.md | 229 +++++++++++ .../docs/V0_SCHEMA_AND_COMMANDS.md | 375 ++++++++++++++++++ 5 files changed, 864 insertions(+), 3 deletions(-) create mode 100644 agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md create mode 100644 agentic-organization/docs/V0_POLICY_AND_RUNTIME_BOUNDARIES.md create mode 100644 agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md diff --git a/agentic-organization/docs/IMPLEMENTATION_READINESS_CHECKLIST.md b/agentic-organization/docs/IMPLEMENTATION_READINESS_CHECKLIST.md index fba14c71e6..94751b4d7a 100644 --- a/agentic-organization/docs/IMPLEMENTATION_READINESS_CHECKLIST.md +++ b/agentic-organization/docs/IMPLEMENTATION_READINESS_CHECKLIST.md @@ -16,6 +16,12 @@ Implementation can begin once we define: Everything else can evolve through the Organization itself. +The first narrowed contracts are now captured in: + +- [V0 Executable Contract](./V0_EXECUTABLE_CONTRACT.md); +- [V0 Schema and Commands](./V0_SCHEMA_AND_COMMANDS.md); +- [V0 Policy and Runtime Boundaries](./V0_POLICY_AND_RUNTIME_BOUNDARIES.md). + ## 1. MVP Slice Define the first end-to-end workflow we will build. @@ -66,7 +72,13 @@ Need to decide: - whether this is a new app under `agentic-team/packages` or a separate top-level workspace; - whether frontend and backend live together at first; - whether initial deployment target is local Docker Compose, k3s, or both. -- whether runtime code belongs under `full-ai-cluster/` as a cluster subsystem or as a parallel top-level product tree. + +Placement decision: + +- documentation lives under `agentic-organization/docs/`; +- product/runtime code may live under the Agentic Organization app tree; +- cluster deployment belongs under `full-ai-cluster/k8s/applications/agentic-organization/` as an ArgoCD-managed workload; +- Agentic Organization consumes the `full-ai-cluster` substrate and must not create a parallel cluster substrate. Recommendation: @@ -74,7 +86,7 @@ Recommendation: - use dev-portal/TPM only as reference and selective extraction source; - build modular monolith first, with clear boundaries for later service extraction. - use a TypeScript monorepo with `apps/api`, `apps/web`, `apps/workers`, `apps/temporal-worker`, `apps/dapr-actors`, and shared `packages/*` as defined in the build plan. -- decide placement before code lands. Docs can live at `agentic-organization/docs/`; runtime implementation should not create a second parallel substrate by accident. +- treat deployment as a `full-ai-cluster` consumer workload from the first cluster integration. ## 3. Source of Truth diff --git a/agentic-organization/docs/README.md b/agentic-organization/docs/README.md index 490e645cb9..5b1bca40c4 100644 --- a/agentic-organization/docs/README.md +++ b/agentic-organization/docs/README.md @@ -21,6 +21,9 @@ Current documents: - [Ambiguous Requirement Lifecycle](./AMBIGUOUS_REQUIREMENT_LIFECYCLE.md) - the discovery, customer interview, BRD, workflow modeling, architecture, decomposition, readiness, and learning path from vague request to curated feature. - [Anti-Stall Prioritization Runtime](./ANTI_STALL_PRIORITY_RUNTIME.md) - the hat-owned schedules, blocker triage, queue SLO, reassignment, alternate-work, dependency reconciliation, and priority routines that keep the Organization moving. - [Implementation Readiness Checklist](./IMPLEMENTATION_READINESS_CHECKLIST.md) - the decisions and contracts that should be defined before scaffolding the first implementation slice. +- [V0 Executable Contract](./V0_EXECUTABLE_CONTRACT.md) - the smallest end-to-end runtime slice, grounded against the current `full-ai-cluster` substrate. +- [V0 Schema and Commands](./V0_SCHEMA_AND_COMMANDS.md) - the CockroachDB-backed state groups, enums, command contract, outbox model, and TypeScript-facing runtime events for the first implementation. +- [V0 Policy and Runtime Boundaries](./V0_POLICY_AND_RUNTIME_BOUNDARIES.md) - the hat policy matrix, MCP preflight checks, cluster runtime boundaries, failure rules, and ArgoCD integration shape. - [Cluster-Native Hat System](./CLUSTER_NATIVE_HAT_SYSTEM.md) - the CRD, OPA, hat binding, succession, reputation, graph rendering, polyglot operator, and event model for enforcing hats on Kubernetes. - [Cluster Execution and Memory Substrate](./CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md) - the k3s, sandboxed Hermes container, Cilium Service Mesh, SPIRE identity, Vault-backed secrets, Credential Proxy, NATS, Hindsight, and runtime observability contract. - [AI Cluster Scaffold Context](./AI_CLUSTER_SCAFFOLD_CONTEXT.md) - the two-directory NixOS/k3s/ArgoCD scaffold assumptions, component clarifications, bootstrap constraints, and deferred/local-model gating. @@ -34,4 +37,4 @@ These documents are reference substrate, not a mandate to implement every concep ## Placement -These docs live at `agentic-organization/docs/` as the documentation root for the Agentic Organization subsystem. Before runtime code lands, decide whether the implementation is a subsystem of `full-ai-cluster/` or a parallel top-level product tree. +These docs live at `agentic-organization/docs/` as the documentation root for the Agentic Organization subsystem. Runtime code can live under the Agentic Organization product tree, but cluster deployment should land as a `full-ai-cluster/k8s/applications/agentic-organization/` ArgoCD workload. Agentic Organization runs on the `full-ai-cluster` substrate; it is not a second cluster substrate. diff --git a/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md b/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md new file mode 100644 index 0000000000..a50b994495 --- /dev/null +++ b/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md @@ -0,0 +1,242 @@ +# V0 Executable Contract + +## Purpose + +This document narrows the Agentic Organization reference design into +the first executable slice. It is intentionally smaller than the full +Organization. The goal is to prove that one governed agent work loop can +run on the `full-ai-cluster` substrate with durable state, hat-scoped +authority, evidence, and review. + +V0 should be boring enough to ship and rich enough to teach the rest of +the system what to build next. + +## Cluster Assumption + +Agentic Organization is a workload that runs on `full-ai-cluster`. It is +not a parallel substrate. + +The current `origin/main` cluster shape gives V0 these host primitives: + +| Cluster component | V0 use | +|---|---| +| K3S + ArgoCD App-of-Apps | deploy Agentic Organization as a future `full-ai-cluster/k8s/applications/agentic-organization/` application | +| Cilium + Hubble | pod networking, L7 policy, flow observability, and service-mesh behavior without Istio | +| cert-manager, Vault, SPIRE, Trust Manager, External Secrets | workload identity, TLS trust, and secret delivery | +| CockroachDB | authoritative Organization database | +| NATS JetStream | event transport, outbox fanout, live UI updates, and replayable integration streams | +| Temporal TS | durable workflows after the native command model is proven | +| Dapr Actors | hot entity coordination after the DB-backed service contract is proven | +| Hindsight | Hermes memory backend, wrapped with Organization attribution and scope | +| Hermes | agent runtime that performs the work | +| OZ/OpenZiti | zero-trust transport, not the Organization business orchestrator | +| hat-system | Kubernetes hat enforcement/projection surface using Hat, HatBinding, HatSwap, and HatPolicy CRDs | +| Loki, Tempo, Alloy, Mimir, kube-prometheus-stack | logs, traces, metrics, dashboards, and audit correlation | + +Sync-wave implication: Agentic Organization is a consumer app. It should +land after the foundation, data planes, hat-system CRDs, Hindsight, +Temporal, and Hermes are available. A future ArgoCD application should +therefore use a late consumer wave, likely wave `20` or later, unless a +split bootstrap app is created for CRDs only. + +## Non-Goals + +V0 does not need: + +- the full executive, director, TPM, manager, QA, security, and memory + department lattice; +- every hat in the inventory; +- a full Hindsight fork; +- a full Temporal workflow catalog; +- a Dapr actor for every entity; +- live CRD writeback from day one; +- complete performance review and budget systems; +- autonomous creation of new tools, workflows, or credential proxy + endpoints. + +V0 should still model those future paths as capability requests, so the +Organization can later build them through its own lifecycle. + +## First Vertical Slice + +The first executable slice is: + +```text +capability request + -> discussion anchor and context pack + -> one readiness or review gate + -> hat assignment + -> scheduled prompt-flow run + -> Hermes run through the Organization runtime adapter + -> evidence submission + -> reviewer decision + -> outcome review +``` + +This is the smallest useful loop because it proves: + +- work has a durable source of truth; +- every discussion is tied to a work item; +- a hat can be assigned, tokenized, and revoked; +- a Hermes agent can work inside an Organization-scoped context; +- the system can capture actions, observations, artifacts, and memory + events; +- a reviewer can approve or reject without self-approval; +- a completed run can create follow-up work when gaps are found. + +## Required V0 Hats + +Keep the first hat set small: + +| Hat | V0 reason | +|---|---| +| Director | accepts or rejects the capability request for V0 scope | +| Engineering Manager | grooms the work item, selects schedule, assigns implementer and reviewer hats | +| Implementer | executes the prompt flow and submits evidence | +| Code Reviewer | reviews the evidence and blocks self-approval | +| Memory Curator | reviews memory writes or flags memory gaps when the run ends | +| Platform Operator | handles runtime failure, pod/session issues, and integration health | +| Security Reviewer | required only when the request needs a new credential or external tool scope | + +The Executive Board, TPM, Product Owner, Architect, QA Reviewer, Hat +Designer, and department directors remain first-class in the reference +model, but V0 can simulate or defer them unless the first example +requires their gate. + +## Required V0 Work Objects + +V0 must persist these objects: + +- agent; +- department; +- hat definition; +- hat assignment; +- hat token; +- project; +- initiative; +- work item; +- discussion anchor; +- context pack; +- schedule block; +- prompt-flow definition; +- prompt-flow run; +- prompt-flow phase run; +- universal action record; +- action observation; +- Hermes run binding; +- artifact; +- memory event; +- gate; +- gate decision; +- audit event; +- outbox event. + +Anything else should be added only when a V0 command cannot be expressed +without it. + +## Required V0 Flow + +1. `submit_capability_request` creates the work item, discussion anchor, + and first audit/outbox events. +2. `triage_capability_request` selects the responsible project, + initiative, owner hat, and required gate. +3. `create_context_pack` links relevant docs, prior decisions, task + graph nodes, memory references, and acceptance criteria. +4. `decide_gate` moves the request into ready state or asks for more + information. +5. `reserve_hat` creates a hat assignment and maps it to the hat-system + projection boundary. +6. `issue_hat_token` creates the time-bounded runtime authority for the + selected Hermes agent/session. +7. `start_schedule_block` enters prioritized work time for the active + hat. +8. `start_prompt_flow` locks the agent into the selected deterministic + work protocol. +9. `launch_hermes_run` binds the Organization work item, agent, session, + hat assignment, and prompt-flow run to the Hermes/OZ runtime adapter. +10. `record_universal_action` and `record_action_observation` capture + what the agent did and what the system observed. +11. `submit_evidence` attaches logs, screenshots, code refs, traces, or + documents to the work item. +12. `request_gate_review` moves the work to reviewer attention. +13. `decide_gate` approves, rejects, or requests changes. +14. `complete_outcome_review` records what was learned and creates + follow-up work if the run exposed a capability, memory, test, or + process gap. + +## Runtime Mode + +V0 can start with a native NestJS modular monolith and in-process fakes +for Temporal, Dapr, and Hermes/OZ adapters, but the contracts must match +the cluster runtime. + +The first deployable shape should be: + +```text +apps/api + Organization commands, reads, auth, policy, MCP gateway + +apps/web + work board, role workspace, review center, evidence timeline + +apps/workers + outbox publisher, schedulers, reconcilers, NATS consumers + +packages/* + domain, state, policy, work-os, hats, prompt-flows, mcp, memory, + observability, k8s-hats, hermes adapter +``` + +Temporal and Dapr can be introduced once the same commands work through +the native service layer: + +```text +Temporal workflow or Dapr actor + -> Organization command service + -> CockroachDB transaction + -> outbox event + -> NATS publish + -> trace, log, metric +``` + +## Hat-System Boundary + +The Organization DB owns business intent. The cluster hat-system owns +runtime enforcement/projection. + +For V0: + +- read hat-system CRDs through typed TypeScript clients; +- decode HatSwap records and NATS hat ticks into Organization signals; +- map Organization hat assignments to the CRD vocabulary; +- do not require bidirectional CRD writes before the DB state machine is + stable. + +Later: + +- Organization-approved assignments can create HatBinding proposals; +- HatSwap can become the runtime confirmation record; +- Hubble, Loki, and HatSwap can be joined by SPIFFE ID for per-hat + runtime attribution. + +## Definition of Done + +V0 is done when a human or agent can submit one internal capability +request and watch it move through: + +```text +request -> ready gate -> hat assignment -> prompt-flow run + -> Hermes work -> evidence -> review decision -> outcome review +``` + +The demo must show: + +- CockroachDB state for the work item and assignment; +- NATS/outbox events for every transition; +- a discussion anchor tied to the work item; +- a hat token with expiry; +- a Hermes run binding, even if the runtime adapter is still simulated; +- evidence attached to the work item; +- review denial of self-approval; +- trace IDs and audit events across the whole run; +- a UI read model that shows status without reading raw logs. diff --git a/agentic-organization/docs/V0_POLICY_AND_RUNTIME_BOUNDARIES.md b/agentic-organization/docs/V0_POLICY_AND_RUNTIME_BOUNDARIES.md new file mode 100644 index 0000000000..d0d3e2b770 --- /dev/null +++ b/agentic-organization/docs/V0_POLICY_AND_RUNTIME_BOUNDARIES.md @@ -0,0 +1,229 @@ +# V0 Policy and Runtime Boundaries + +## Purpose + +This document defines the first boundary rules for Agentic Organization: +what each runtime owns, which hats can perform which V0 actions, and how +MCP tools execute with agent, hat, task, and cluster context. + +The main rule is simple: infrastructure can execute, schedule, transport, +or project state, but Organization command services own business +decisions. + +## Source-of-Truth Boundary + +| Surface | Owns | Does not own | +|---|---|---| +| Organization DB on CockroachDB | business state, work lifecycle, assignments, gates, audit, outbox | LLM reasoning, Kubernetes admission, workflow history | +| NestJS API | command handlers, reads, policy checks, MCP gateway, adapters | durable workflow history, actor placement, raw cluster ownership | +| Temporal TS | durable timers, retries, waits, workflow history | direct DB mutation, direct LLM calls, final business authority | +| Dapr Actors | hot per-entity coordination, serialized local state, reminders | global truth, broad policy, long-running cross-entity process | +| NATS JetStream | event transport, replay, fanout, inboxes | authoritative state | +| Hermes | reasoning and tool-using work | granting itself authority, bypassing gates | +| OZ/OpenZiti | zero-trust transport paths | Organization run orchestration semantics | +| Hindsight | memory recall, retain, reflect | Organization work graph, hat assignment authority | +| hat-system CRDs | cluster hat enforcement and runtime projection | Organization business intent until writeback is explicitly enabled | +| Cilium, SPIRE, Vault, ESO | network policy, identity, trust, secret delivery | Organization lifecycle or review decisions | +| Loki, Tempo, Alloy, Mimir, Prometheus, Grafana | observability storage and dashboards | business truth | + +Every adapter should call Organization commands. No adapter should update +authoritative tables directly. + +## V0 Policy Matrix + +| Hat | Can do in V0 | Cannot do in V0 | +|---|---|---| +| Director | accept capability request, route to project/initiative, assign manager | approve own implementation work | +| Engineering Manager | groom work, mark ready, assign implementer/reviewer, set schedule block, request outcome review | bypass reviewer gate, create new credential scope alone | +| Implementer | run assigned prompt flow, call allowed MCP tools, submit evidence, request review | approve own work, change hat supply, create unanchored discussions | +| Code Reviewer | approve, reject, or request changes on assigned review gate | review work they implemented under the same assignment | +| Memory Curator | review memory events, request memory cleanup/follow-up work, approve memory adaptation | grant tool or credential scope | +| Platform Operator | reconcile failed runs, inspect runtime health, restart or cancel runtime sessions through policy | change product acceptance criteria | +| Security Reviewer | approve or reject credential/tool expansion, require narrower scopes | implement the requested capability and approve its security scope alone | + +Human operators may hold these hats too. The policy model should treat a +human and a Hermes agent the same at the command boundary: both need an +actor identity, active authority, scope, and audit trail. + +## Required Policy Invariants + +V0 must enforce: + +- no self-approval for implementation review; +- every discussion, meeting, message thread, and broadcast has a work + anchor; +- every MCP tool call has an active agent session; +- every privileged MCP tool call has an active hat token; +- every hat token has an expiry and refresh path; +- expired or revoked hats lose credential and tool authority; +- every work transition uses a command, not a direct field update; +- every command writes audit and outbox records in the same transaction; +- every memory event is attributed to agent, hat assignment, work item, + project, and prompt-flow context when available; +- every credential expansion request goes through security review; +- every runtime callback is idempotent; +- every denial records a structured reason agents can learn from. + +## MCP Preflight + +The MCP Gateway is stateless at the edge but policy-rich at execution. + +Flow: + +```text +Hermes + -> Organization MCP Gateway + -> validate JWT and session + -> load AgentSessionActor or DB-backed session context + -> load work item, schedule block, prompt-flow run, and hat assignment + -> validate policy + -> execute command/tool handler + -> write audit, observation, outbox, trace + -> update session activity +``` + +Minimum preflight checks: + +| Check | Purpose | +|---|---| +| `validate_actor_context` | confirms agent/session identity | +| `validate_hat_token` | confirms active, unexpired, unrevoked hat authority | +| `validate_scope` | confirms project, initiative, work item, memory, and credential scope | +| `validate_discussion_anchor` | blocks unanchored discussion | +| `validate_schedule_block` | confirms the agent is in an allowed work mode | +| `validate_prompt_flow_start` | confirms the hat can run the selected flow | +| `validate_prompt_flow_phase_gate` | enforces phase review gates | +| `validate_universal_action` | validates the action grammar and allowed side effects | +| `validate_action_reversibility` | flags actions that need approval or rollback plan | +| `validate_required_docs_acknowledged` | confirms BRD, CA, ADR, test plan, or runbook context was loaded when required | +| `validate_no_blocking_contradictions` | blocks work when the context graph has unresolved critical contradictions | +| `validate_lifecycle_transition` | confirms legal state movement | + +Request-provided IDs are hints. The gateway must verify authority from +the Organization DB, session context, and policy engine. + +## Runtime Failure Rules + +V0 should treat distributed failure as normal. + +| Failure | Required behavior | +|---|---| +| Duplicate command | return the idempotent prior result or reject hash conflict | +| Duplicate Hermes callback | update only once, attach duplicate observation if useful | +| NATS publish failure | keep outbox row pending and retry | +| NATS consumer replay | process idempotently from event ID and aggregate version | +| Temporal activity retry | call Organization command with the same idempotency key | +| Dapr reminder duplicate | call Organization command with the same idempotency key | +| Hindsight unavailable | continue only if memory is optional for that phase; otherwise block with a recoverable signal | +| Hat token expires during run | stop privileged tools, request refresh, and record schedule interruption | +| Hat assignment revoked during run | cancel or quarantine the Hermes run and revoke credentials | +| Credential denied | create a blocked state or lower-scope alternate-work item | +| Pod or session silent | mark heartbeat late, notify Platform Operator, reconcile or cancel | +| Reviewer unavailable | escalate to manager after SLO, but do not auto-approve | +| Hat-system projection lag | keep Organization state, mark projection stale, do not assume enforcement | + +## Cluster-Native Runtime Contract + +Agentic Organization should eventually deploy as: + +```text +full-ai-cluster/k8s/applications/agentic-organization/ + Application.yaml + namespace.yaml + api deployment/service + web deployment/service + worker deployment + temporal-worker deployment + dapr-actor deployment + mcp-gateway deployment/service + ExternalSecret refs + CiliumNetworkPolicy + ServiceAccount/RBAC +``` + +The first docs-only and app-code PRs do not need deployment YAML. When +deployment is added, it should follow the existing App-of-Apps model: + +- `targetRevision: main`; +- `path: full-ai-cluster/k8s/applications/agentic-organization`; +- `CreateNamespace=true`; +- `ServerSideApply=true`; +- sync wave after data planes and dependent runtimes; +- no plaintext secrets in Git; +- Cilium policy for egress to only required services; +- SPIFFE/SPIRE identity for service accounts when enabled; +- External Secrets from Vault for DB, NATS, Temporal, Hindsight, and + provider credentials; +- OpenTelemetry export to the existing observability stack. + +## Dependency Order + +Agentic Organization runtime depends on the current cluster order: + +```text +Cilium + -> cert-manager + -> Vault + -> SPIRE + -> Trust Manager + -> External Secrets + -> ArgoCD + -> OPA Gatekeeper + -> Longhorn + -> hat-system CRDs + -> CockroachDB, NATS, Dapr, OZ/OpenZiti, observability + -> Hindsight, Temporal, Hermes + -> Agentic Organization +``` + +If Agentic Organization later ships CRDs of its own, split them into an +earlier app. Do not make the main application block the cluster +foundation. + +## TypeScript First-Class Consumption + +TypeScript should be a first-class consumer of the cluster contracts. + +Required packages: + +| Package | Responsibility | +|---|---| +| `@agentic-org/k8s-hats` | generated or checked Hat, HatBinding, HatSwap, and HatPolicy types, informers, NATS tick decoding | +| `@agentic-org/state` | Drizzle schema, repositories, outbox, idempotency | +| `@agentic-org/policy` | typed policy engine and later OPA bundle adapter | +| `@agentic-org/mcp` | MCP gateway contracts, preflight checks, tool registry | +| `@agentic-org/hermes` | Hermes run/session adapter and callback contract | +| `@agentic-org/memory` | Hindsight attribution and scoped recall/retain/reflect | +| `@agentic-org/observability` | OpenTelemetry span helpers and correlation envelope | + +The TypeScript CRD package should be mechanically checked against the +CRD YAML from `full-ai-cluster/k8s/applications/hat-system/crds/`. + +## What Not to Blur + +Keep these boundaries explicit: + +- OZ/OpenZiti is transport. If a future component orchestrates runs, + name it separately from OpenZiti. +- Hindsight is memory. Organization graph retrieval is work context, + decisions, documents, discussions, and evidence. +- Temporal coordinates durable workflows. It does not decide policy. +- Dapr Actors serialize hot entity state. They do not own global truth. +- hat-system CRDs enforce/project runtime hats. Organization DB owns the + assignment request and business reason. +- Cilium/SPIRE/Vault secure the cluster. They do not replace hat RBAC. + +## V0 Review Checklist + +Before coding any endpoint, worker, or MCP tool, confirm: + +- which command owns the state transition; +- which hat can call it; +- which policy checks run; +- which state rows change; +- which audit and outbox events are emitted; +- which graph nodes or edges are created; +- which trace fields are attached; +- how duplicate calls behave; +- how denial is explained to the agent; +- how the UI can display the result without scraping logs. diff --git a/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md b/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md new file mode 100644 index 0000000000..ee14bc6ffe --- /dev/null +++ b/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md @@ -0,0 +1,375 @@ +# V0 Schema and Commands + +## Purpose + +This document defines the first TypeScript-facing schema and command +contract for Agentic Organization. It is not a full DDL. It is the shape +the domain model, Drizzle migrations, command handlers, MCP tools, +workers, and tests should agree on before implementation starts. + +CockroachDB is the authoritative store for Organization-owned state. +Temporal history, Dapr actor state, NATS streams, Hindsight memory, and +hat-system CRDs are runtime surfaces or projections. They do not replace +the Organization database. + +## Global Columns + +Every authoritative table should include: + +| Column | Purpose | +|---|---| +| `id` | stable unique ID | +| `organization_id` | future multi-org partition key, even if V0 uses one org | +| `created_at` | creation time | +| `updated_at` | last mutation time | +| `version` | optimistic concurrency and projection safety | +| `created_by_agent_id` | agent that caused the write, when applicable | +| `created_by_hat_assignment_id` | hat authority that caused the write, when applicable | +| `correlation_id` | end-to-end request/run correlation | +| `causation_id` | direct parent command, event, tool call, or workflow step | +| `trace_id` | observability trace link | + +Append-only records should also carry `sequence` when replay order +matters. + +## Schema Groups + +### Identity and Hats + +| Table | V0 responsibility | +|---|---| +| `agents` | known Hermes-capable agents and their stable identity | +| `agent_sessions` | live or historical Hermes sessions bound to an agent | +| `departments` | first department containers for ownership and review routing | +| `hat_definitions` | Organization-owned hat catalog | +| `hat_authority_rules` | typed permissions, scopes, and policy metadata for a hat | +| `hat_skill_bindings` | skills and prompt-flow availability attached to a hat | +| `hat_supply_policies` | max concurrency, TTL, cooldown, warmup, and assignment rules | +| `hat_assignments` | time-bounded wearer assignment for a specific agent/session | +| `hat_tokens` | short-lived JWT issuance, refresh, revocation, and expiry state | +| `hat_system_projections` | last observed Hat, HatBinding, HatSwap, and HatPolicy state from Kubernetes | + +### Work Management + +| Table | V0 responsibility | +|---|---| +| `projects` | top-level work containers | +| `initiatives` | project-scoped bodies of work | +| `work_items` | capability requests, tasks, defects, reviews, and follow-ups | +| `work_item_state_history` | append-only state transitions | +| `work_item_dependencies` | blocking or informational dependencies | +| `blockers` | active impediments with owner, severity, and resolution path | +| `assignments` | work item to agent/hat/session assignment records | +| `gates` | required review points for readiness, code, QA, memory, or security | +| `gate_decisions` | approve, reject, needs-changes, or defer decisions | +| `releases` | release groupings once release management enters the slice | + +### Schedules, Prompt Flows, and Actions + +| Table | V0 responsibility | +|---|---| +| `hat_schedule_templates` | default work rhythm by hat | +| `work_schedules` | concrete schedule assigned to an agent/hat context | +| `work_schedule_blocks` | free time, prioritized work, review, reflection, or meeting blocks | +| `prompt_flow_definitions` | named deterministic work protocols | +| `prompt_flow_versions` | immutable versioned prompt-flow contract | +| `prompt_flow_phases` | ordered reusable phases | +| `hat_prompt_flow_bindings` | which hats can run which prompt flows | +| `prompt_flow_runs` | one execution of a prompt-flow version | +| `prompt_flow_phase_runs` | state and evidence for each phase execution | +| `prompt_flow_gate_decisions` | reviewer decisions at phase boundaries | +| `universal_action_definitions` | typed action grammar catalog | +| `universal_action_records` | action intent emitted by an agent or workflow | +| `universal_action_observations` | observed result, evidence, and side effects for an action | + +### Communication, Graph, Documents, and Context + +| Table | V0 responsibility | +|---|---| +| `discussion_anchors` | required work/project/initiative/task anchor for any discussion | +| `conversation_threads` | one-on-one, team, department, executive, or broadcast thread | +| `messages` | immutable message log with actor and hat attribution | +| `meetings` | structured meeting sessions with mode and anchor | +| `decisions` | explicit decisions linked to work and evidence | +| `documents` | BRDs, CAs, ADRs, reports, test cases, runbooks, and memory reviews | +| `artifact_links` | logs, screenshots, traces, code refs, PRs, builds, and uploads | +| `graph_nodes` | agent-readable graph node registry | +| `graph_edges` | typed relationships between work, docs, messages, decisions, runs, and memories | +| `context_packs` | deterministic context bundles assembled for an agent run or review | + +### Runtime, Memory, Security, and Audit + +| Table | V0 responsibility | +|---|---| +| `hermes_runs` | Organization binding to a Hermes execution session | +| `mcp_tool_calls` | governed tool call attempts and results | +| `memory_events` | Hindsight recall, retain, reflect, and review attribution | +| `credential_requests` | requests to expand credential proxy or external tool scope | +| `signals` | durable internal signals consumed by workers and UI read models | +| `audit_events` | append-only policy and state-change audit trail | +| `outbox_events` | transactional event publication source for NATS | +| `runtime_leases` | scheduler, reconciler, and worker leases | +| `idempotency_keys` | command deduplication records | + +## V0 Enums + +Use typed enums in TypeScript and database constraints. Do not rely on +magic strings in command handlers. + +### `work_item_state` + +```text +new +triage +needs_clarification +ready +assigned +in_progress +review_requested +changes_requested +approved +blocked +done +canceled +``` + +### `hat_assignment_state` + +```text +requested +reserved +warmup +active +expired +revoked +released +denied +``` + +### `hat_token_state` + +```text +issued +refresh_required +refreshed +expired +revoked +denied +``` + +### `gate_state` + +```text +not_required +waiting +in_review +approved +rejected +changes_requested +deferred +``` + +### `prompt_flow_run_state` + +```text +queued +running +waiting_for_gate +waiting_for_input +succeeded +failed +canceled +expired +``` + +### `schedule_block_state` + +```text +scheduled +active +paused +completed +missed +canceled +``` + +### `hermes_run_state` + +```text +requested +starting +running +heartbeat_late +succeeded +failed +canceled +lost +reconciled +``` + +### `discussion_anchor_type` + +```text +project +initiative +work_item +gate +release +incident +memory_review +capability_request +``` + +### `signal_type` + +```text +work_item_changed +gate_requested +gate_decided +hat_assignment_changed +hat_token_changed +schedule_block_changed +prompt_flow_changed +hermes_run_changed +memory_event_recorded +credential_request_changed +blocker_changed +outcome_review_completed +hat_system_tick_observed +``` + +## Command Contract + +Every side-effecting command must include: + +- `commandId`; +- `idempotencyKey`; +- `actorAgentId`; +- `actorHatAssignmentId`, when the actor is wearing a hat; +- `organizationId`; +- `projectId` or explicit reason none is available; +- `correlationId`; +- `causationId`; +- `traceId`; +- `expectedVersion`, when mutating an existing aggregate; +- `policyContext`. + +Every command handler must: + +1. load authoritative state from CockroachDB; +2. validate actor context and hat authority; +3. validate lifecycle transition; +4. write state, audit event, and outbox event in one transaction; +5. return the authoritative post-state; +6. be idempotent under retry. + +## V0 Commands + +| Command | Actor scope | Writes | Emits | +|---|---|---|---| +| `submit_capability_request` | human, agent, director, manager | `work_items`, `discussion_anchors`, `graph_nodes`, `audit_events`, `outbox_events` | `work_item_changed` | +| `triage_capability_request` | director or engineering manager | `work_items`, `assignments`, `gates`, `context_packs` | `work_item_changed`, `gate_requested` | +| `create_discussion_anchor` | any authorized hat | `discussion_anchors`, `graph_edges` | `work_item_changed` | +| `create_context_pack` | manager, reviewer, implementer for assigned work | `context_packs`, `graph_edges`, `audit_events` | `work_item_changed` | +| `mark_work_ready` | manager or reviewer | `work_items`, `work_item_state_history`, `gates` | `work_item_changed`, `gate_requested` | +| `reserve_hat` | manager, director, platform operator | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed` | +| `issue_hat_token` | hat service, after policy allow | `hat_tokens`, `audit_events` | `hat_token_changed` | +| `refresh_hat_token` | active assigned agent/session | `hat_tokens`, `audit_events` | `hat_token_changed` | +| `revoke_hat_assignment` | manager, director, security, policy automation | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed`, `hat_token_changed` | +| `start_schedule_block` | assigned agent/session or scheduler | `work_schedule_blocks`, `agent_sessions` | `schedule_block_changed` | +| `start_prompt_flow` | assigned agent/session | `prompt_flow_runs`, `prompt_flow_phase_runs` | `prompt_flow_changed` | +| `record_universal_action` | assigned agent/session, workflow activity, adapter | `universal_action_records`, `mcp_tool_calls`, `audit_events` | `prompt_flow_changed` | +| `record_action_observation` | adapter, worker, reviewer, assigned agent | `universal_action_observations`, `artifact_links` | `prompt_flow_changed` | +| `launch_hermes_run` | runtime service or Temporal activity | `hermes_runs`, `agent_sessions`, `audit_events` | `hermes_run_changed` | +| `record_hermes_run_status` | Hermes/OZ callback, reconciler, platform operator | `hermes_runs`, `artifact_links` | `hermes_run_changed` | +| `submit_evidence` | implementer, QA, reviewer, adapter | `artifact_links`, `graph_edges`, `audit_events` | `work_item_changed` | +| `request_gate_review` | implementer, manager, workflow | `gates`, `work_items` | `gate_requested`, `work_item_changed` | +| `decide_gate` | reviewer hat, not same active implementer assignment | `gate_decisions`, `gates`, `work_items`, `audit_events` | `gate_decided`, `work_item_changed` | +| `record_memory_event` | memory adapter, assigned agent/session, memory curator | `memory_events`, `graph_edges`, `audit_events` | `memory_event_recorded` | +| `submit_credential_request` | any authorized hat with anchored work | `credential_requests`, `work_items`, `discussion_anchors` | `credential_request_changed` | +| `complete_outcome_review` | manager, memory curator, reviewer | `work_items`, `decisions`, optional follow-up `work_items` | `outcome_review_completed` | + +## Idempotency + +Use deterministic idempotency keys at command boundaries: + +```text +:: +``` + +Examples: + +```text +launch_hermes_run:work_item_123:prompt_flow_run_456 +record_hermes_run_status:hermes_run_789:callback_abc +decide_gate:gate_123:reviewer_assignment_456 +``` + +The idempotency record should store: + +- request hash; +- command result reference; +- first-seen timestamp; +- last-seen timestamp; +- status; +- error class for terminal failures. + +If the same key appears with a different request hash, reject it as an +idempotency conflict. + +## Outbox and NATS + +CockroachDB transactions should write domain state and `outbox_events` +together. A worker publishes outbox rows to NATS JetStream and marks +them published. + +Subject shape: + +```text +agentic-org... +``` + +Examples: + +```text +agentic-org.dev.work.work_item_changed +agentic-org.dev.hats.hat_assignment_changed +agentic-org.dev.runtime.hermes_run_changed +agentic-org.dev.memory.memory_event_recorded +agentic-org.dev.cluster.hat_system_tick_observed +``` + +Consumers must be idempotent. Replays are normal. + +## Hat-System Projection + +The TypeScript app should consume hat-system CRDs through +`@agentic-org/k8s-hats`. + +V0 should project: + +- `Hat` into `hat_system_projections`; +- `HatBinding` into runtime assignment status; +- `HatSwap` into `signals`, `audit_events`, and graph edges; +- `HatPolicy` into read-only policy diagnostics. + +The projection must be replayable from CRDs and NATS without +double-counting. Organization DB assignments remain authoritative until +an ADR explicitly promotes CRD writeback to live enforcement. + +## Migration and Test Expectations + +Before the first implementation PR lands, define tests for: + +- enum values and legal transitions; +- policy denials and approvals; +- self-approval denial; +- hat token expiry and refresh; +- idempotent command replay; +- outbox publish retry; +- duplicate Hermes callback handling; +- duplicate HatSwap projection handling; +- context pack generation for an anchored work item; +- prompt-flow phase gate behavior. + +Bug fixes must follow the local TDD rule: red test first, then the fix. From 9d1ffa5d63b4fa4d731b992ca8de9fed58fe8add Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 15:48:32 -0400 Subject: [PATCH 02/21] docs: propose package-first organization architecture Co-Authored-By: OpenAI Codex --- agentic-organization/docs/README.md | 1 + .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 597 ++++++++++++++++++ 2 files changed, 598 insertions(+) create mode 100644 agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md diff --git a/agentic-organization/docs/README.md b/agentic-organization/docs/README.md index 5b1bca40c4..0c82b379f1 100644 --- a/agentic-organization/docs/README.md +++ b/agentic-organization/docs/README.md @@ -15,6 +15,7 @@ Current documents: - [UI and Observability Concepts](./UI_AND_OBSERVABILITY_CONCEPTS.md) - how humans visualize and operate the Organization across work, agents, hats, runs, pods, clusters, meetings, reports, and evidence. - [Department, Hat, and Tool Inventory](./DEPARTMENT_HAT_TOOL_INVENTORY.md) - the starter department map, hat catalog, tool bundles, approval gates, lifecycle ownership, and high-risk guardrails for the Organization. - [Organization Layer Build Plan](./ORGANIZATION_LAYER_BUILD_PLAN.md) - the service layer, role workspaces, automation loops, state model, UI surfaces, and MVP sequence needed to make each department and hat operational. +- [Technical CA: Package-First Agentic Organization Architecture](./TECHNICAL_CA_PACKAGE_ARCHITECTURE.md) - the proposed TypeScript/NestJS modular-monolith package architecture, event envelope, traceability contract, NATS model, and cluster deployment boundary. - [Work and Release Management OS](./WORK_AND_RELEASE_MANAGEMENT_OS.md) - the custom backlog, project, task, assignment, signal, board, and release workflow product that keeps agent work reliable and visible. - [Agent-Native Knowledge Graph and Retrieval](./AGENT_NATIVE_KNOWLEDGE_GRAPH.md) - the graph and retrieval layer linking tasks, discussions, decisions, meetings, docs, artifacts, runs, memories, and evidence into agent-readable context. - [Agent Work Rhythm and Prompt Flows](./AGENT_WORK_RHYTHM_AND_PROMPT_FLOWS.md) - the hat-bound schedule, free-time, review/red-team, reflection, memory maintenance, and deterministic prompt-flow model for agents. diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md new file mode 100644 index 0000000000..4f46663161 --- /dev/null +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -0,0 +1,597 @@ +# Technical CA: Package-First Agentic Organization Architecture + +## Status + +Proposal for review. + +## Purpose + +This CA proposes the first implementation architecture for Agentic +Organization as a TypeScript/NestJS modular monolith made of reusable +packages. The Organization OS composes those packages into runnable +processes, but the packages own the actual capability contracts. + +The goal is to build a generic, extensible, event-driven Organization +runtime where every meaningful action is traceable, replay-aware, +policy-checked, and safe to move from in-process execution to a separate +service later. + +## Architecture Decision + +Build Agentic Organization as one product with many packages and a small +set of runtime hosts: + +```text +apps/api +apps/web +apps/workers +apps/temporal-worker +apps/dapr-actors +apps/mcp-gateway + +packages/* +``` + +Packages are the real service boundaries. NestJS apps are composition +hosts. They wire dependency injection, transports, lifecycle hooks, +health checks, process concerns, and adapters. + +The Organization owns: + +- business model; +- commands; +- state transitions; +- policies; +- audit; +- trace contracts; +- event contracts; +- package boundaries. + +The cluster provides: + +- CockroachDB for authoritative Organization state; +- NATS JetStream for event transport, fanout, inboxes, replay, and DLQ; +- Temporal TS for durable long-running workflows; +- Dapr Actors for hot entity-local coordination; +- Hermes for agent reasoning and tool use; +- Hindsight for memory; +- hat-system CRDs for cluster hat enforcement and projection; +- Cilium, SPIRE, Vault, Trust Manager, and External Secrets for network, + identity, trust, and secret delivery; +- ArgoCD for physical GitOps reconciliation. + +None of those cluster runtimes should become a parallel business model. + +## Core Shape + +```text +Runtime host + API controller / worker / MCP handler / Temporal activity / Dapr actor + -> application command service + -> policy check + -> domain state transition + -> CockroachDB transaction + -> authoritative state + -> audit event + -> outbox event + -> idempotency record + -> NATS publish through outbox worker + -> OpenTelemetry spans and logs +``` + +The same command service should be callable from every runtime host. No +adapter should mutate authoritative tables directly. + +## Dependency Direction + +Dependencies point inward. + +```text +apps/* + -> @agentic-org/nest-composition + -> @agentic-org/application + -> @agentic-org/domain + -> ports/interfaces + -> adapter packages + -> external runtimes +``` + +Rules: + +- `@agentic-org/domain` depends on no infrastructure package. +- Domain packages do not import NestJS, Temporal, Dapr, NATS, + Hindsight, Hermes, Kubernetes, OpenZiti, Drizzle, or OpenTelemetry. +- Application packages depend on domain and ports. +- Adapter packages implement ports and may depend on runtime clients. +- Runtime hosts depend on packages; packages do not depend on runtime + hosts. +- Cross-package imports use public exports only. +- No controller, worker entrypoint, Temporal workflow, Dapr actor, or MCP + route contains business rules. + +## Package Layers + +### Layer 0: Domain Kernel + +| Package | Owns | +|---|---| +| `@agentic-org/domain` | entity IDs, value objects, typed enums, state machines, domain events, command names, event names, aggregate contracts | +| `@agentic-org/contracts` | shared DTOs, public schemas, versioned API/event contracts, generated clients when needed | + +The domain kernel should be small and strict. It defines language and +legal transitions. It does not execute side effects. + +### Layer 1: Application and Policy + +| Package | Owns | +|---|---| +| `@agentic-org/application` | command handlers, use cases, transaction orchestration, ports, command result contracts | +| `@agentic-org/policy` | RBAC, hat authority checks, OPA/Rego adapter boundary, policy decisions, denial reasons | +| `@agentic-org/observability` | correlation envelope, OpenTelemetry helpers, required span attributes, trace propagation | + +The application layer is the Organization OS command layer. It is where +the runtime asks the Organization to do something. + +### Layer 2: Capability Packages + +| Package | Owns | +|---|---| +| `@agentic-org/work-os` | projects, initiatives, work items, dependencies, blockers, assignments, releases, work signals | +| `@agentic-org/requirements` | ambiguous requirement intake, clarification, BRD lifecycle, maturity state | +| `@agentic-org/documents` | BRDs, CAs, ADRs, design docs, reports, document scope, document approval state | +| `@agentic-org/gates` | readiness, code, QA, security, architecture, memory, release, and outcome gates | +| `@agentic-org/hats` | hat graph, supply, assignment, JWT issuance/refresh/revocation, succession, cooldown, warmup | +| `@agentic-org/assignments` | staffing, agent-to-hat fit, work assignment, reassignment, capacity checks | +| `@agentic-org/prompt-flows` | deterministic prompt-flow definitions, phases, phase gates, reusable procedures | +| `@agentic-org/action-grammar` | universal action grammar, reversibility, observation contracts, action-mode classification | +| `@agentic-org/knowledge-graph` | graph nodes, edges, context packs, retrieval envelopes, provenance and access envelopes | +| `@agentic-org/runtime` | triggers, rules, reaction plans, leases, schedulers, reconcilers, self-healing loops | +| `@agentic-org/ui-projections` | read models for boards, timelines, run views, evidence, reviews, observability, org map | + +Capability packages should be independently testable. They can expose +interfaces and services, but they should not know which process is +calling them. + +### Layer 3: State, Messaging, and Runtime Adapters + +| Package | Owns | +|---|---| +| `@agentic-org/state` | Drizzle schema, migrations, repositories, transactions, outbox, inbox, idempotency, leases | +| `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | +| `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | +| `@agentic-org/actors-dapr` | Dapr actor interfaces, actor implementations, reminders, actor state projection | +| `@agentic-org/mcp` | MCP schemas, tool registry, preflight checks, policy-checked tool handlers | +| `@agentic-org/hermes` | Hermes session adapter, run adapter, callback contract, run context builder | +| `@agentic-org/memory` | Hindsight adapter, hat-scoped recall/retain/reflect, memory attribution, memory health | +| `@agentic-org/k8s-hats` | generated or checked Hat, HatBinding, HatSwap, HatPolicy types, informers, projection decoding | +| `@agentic-org/openziti` | OpenZiti transport adapter, identity/config access, connectivity checks | +| `@agentic-org/credential-proxy` | credential request adapter, scoped credential use, audit hooks | +| `@agentic-org/adapters-agentic-services` | temporary wrappers around reused `agentic-services` primitives | + +Adapters are replaceable. The Organization should be able to run a V0 +slice with in-process fakes, then swap in Temporal, Dapr, Hermes, +Hindsight, Kubernetes, and NATS adapters behind the same ports. + +### Layer 4: Runtime Hosts + +| Runtime host | Responsibility | +|---|---| +| `apps/api` | REST/OpenAPI, internal APIs, command dispatch, read queries, auth guards | +| `apps/web` | human operations console, boards, timelines, org map, observability, review center | +| `apps/workers` | outbox publisher, schedulers, NATS consumers, reconcilers, projection builders | +| `apps/temporal-worker` | Temporal workers and activities that call Organization commands | +| `apps/dapr-actors` | Dapr actor host for hot state and reminders | +| `apps/mcp-gateway` | MCP gateway, agent context resolution, preflight checks, tool execution | + +Runtime hosts are allowed to be deployed separately. They are not +separate business services yet. + +## NestJS Composition + +NestJS should compose packages into modules: + +```text +OrganizationModule + IdentityModule + PolicyModule + WorkOsModule + RequirementsModule + DocumentsModule + GatesModule + HatsModule + AssignmentsModule + PromptFlowModule + ActionGrammarModule + KnowledgeGraphModule + RuntimeModule + MessagingModule + McpGatewayModule + HermesModule + MemoryModule + K8sHatsModule + ObservabilityModule + UiProjectionModule +``` + +Module providers should bind ports to implementations: + +```text +HatAssignmentRepository -> DrizzleHatAssignmentRepository +EventPublisher -> OutboxEventPublisher +HermesRunPort -> HermesRunAdapter or FakeHermesRunAdapter +MemoryPort -> HindsightMemoryAdapter or FakeMemoryAdapter +HatSystemPort -> KubernetesHatSystemAdapter or ReadOnlyFakeHatSystemAdapter +``` + +Business services should depend on ports, not concrete adapters. + +## SOLID Rules + +### Single Responsibility + +Each package owns one capability family. If a package needs to know too +much about another package's internals, the boundary is wrong. + +### Open/Closed + +New hats, prompt flows, workflow types, MCP tools, memory strategies, and +runtime adapters should be added through registries and package exports, +not by editing central switch statements. + +### Liskov Substitution + +Fake adapters, local adapters, and cluster adapters must satisfy the same +ports. If V0 runs with a fake Hermes adapter, the real Hermes adapter +must be swappable without changing command handlers. + +### Interface Segregation + +Ports should be narrow: + +- `ReserveHatPort`, not `HatEverythingService`; +- `PublishOutboxEventPort`, not `MessagingService`; +- `RecallMemoryPort`, not `MemoryPlatform`; +- `LaunchHermesRunPort`, not `RuntimeManager`. + +### Dependency Inversion + +Application services define what they need. Infrastructure packages +implement it. Runtime hosts bind implementations. + +## Event-Driven Contract + +All state transitions are event-producing commands. + +CockroachDB stores authoritative state, audit, idempotency, and outbox. +NATS JetStream carries event distribution, inboxes, live UI updates, +replayable integration streams, and DLQs. Logs, traces, and metrics are +evidence. They are not business truth. + +### Canonical Event Envelope + +All events published through NATS should use one envelope: + +```ts +type AgenticEventEnvelope = { + eventId: string; + eventType: AgenticEventType; + schemaVersion: string; + occurredAt: string; + source: { + service: string; + instanceId?: string; + workloadSpiffeId?: string; + }; + scope: { + organizationId: string; + projectId?: string; + initiativeId?: string; + workItemId?: string; + runId?: string; + }; + actor: { + agentId?: string; + hatAssignmentId?: string; + serviceId?: string; + }; + aggregate: { + type: string; + id: string; + version: number; + }; + trace: { + traceparent: string; + traceId: string; + spanId?: string; + correlationId: string; + causationId?: string; + commandId?: string; + idempotencyKey: string; + }; + policy?: { + policyVersion?: string; + decisionId?: string; + }; + replay?: { + isReplay: boolean; + originalEventId?: string; + replayRequestId?: string; + }; + payload: TPayload; +}; +``` + +No app should publish raw NATS payloads directly. Publishing should go +through `@agentic-org/messaging`. + +### Subject Convention + +Use one Organization subject family: + +```text +agentic-org.... +agentic-org...inbox.agent. +agentic-org...inbox.hat. +agentic-org...ui. +agentic-org...dlq. +agentic-org...cluster.hats. +``` + +The existing hat-system NATS subjects can be consumed by a bridge +consumer and republished into the Organization subject family with the +canonical envelope. Do not require the Go hat-system operator to change +before V0 can consume it. + +### Transactional Outbox and Inbox + +Every command writes these in one transaction: + +```text +authoritative state +audit_events +outbox_events +idempotency_keys +``` + +Every NATS consumer writes an inbox receipt before side effects: + +```text +event_id +consumer_name +aggregate_type +aggregate_id +aggregate_version +payload_hash +first_seen_at +processed_at +result +``` + +Consumers dedupe by `eventId + consumerName`. Commands dedupe by +deterministic `idempotencyKey`. External side effects must either be +natively idempotent or wrapped by a command that stores the external +request/result. + +### Stream and Consumer Manifests + +Every stream and durable consumer should declare: + +- owner package; +- subject pattern; +- durable name; +- retention policy; +- ordering key; +- ack wait; +- max deliveries; +- backoff; +- replay authorization; +- DLQ subject; +- schema versions accepted; +- compatibility rule; +- SLO for consumer lag. + +This can start as generated config from `@agentic-org/messaging` and +later become Kubernetes manifests. + +## Traceability Contract + +Every command, event, adapter call, and artifact must carry a correlation +chain. + +Required trace fields: + +```text +traceparent +trace_id +span_id +correlation_id +causation_id +command_id +idempotency_key +event_id +agent_id +hat_assignment_id +project_id +initiative_id +work_item_id +run_id +policy_decision_id +``` + +Required OpenTelemetry span attributes: + +```text +agentic.event.id +agentic.event.type +agentic.command.id +agentic.correlation.id +agentic.causation.id +agentic.idempotency.key +agentic.agent.id +agentic.hat.assignment.id +agentic.project.id +agentic.initiative.id +agentic.work_item.id +agentic.run.id +agentic.policy.decision_id +nats.subject +nats.stream +nats.consumer +k8s.namespace.name +k8s.pod.name +``` + +`@agentic-org/observability` should provide helpers that make these +fields easy to attach and hard to skip. + +## Audit, Event Store, and Replay + +Use clear language: + +- `audit_events` are compliance and evidence records. +- `outbox_events` are integration events waiting to publish. +- `event_store` is the append-only domain history if we decide to keep + one beyond audit/outbox. +- NATS streams are transport and replay surfaces, not the source of + truth. + +Replay should: + +1. create a `ReplayRequested` event; +2. require approval for side-effecting domains; +3. preserve original event IDs inside the replay envelope; +4. mark replay spans and events; +5. write replay outcomes to audit; +6. never silently perform external side effects twice. + +## Cluster Deployment Boundary + +Agentic Organization should deploy as a `full-ai-cluster` consumer +workload: + +```text +full-ai-cluster/k8s/applications/agentic-organization/ + Application.yaml + namespace.yaml + api + web + workers + temporal-worker + dapr-actors + mcp-gateway + ExternalSecret refs + CiliumNetworkPolicy + ServiceAccount/RBAC +``` + +Current cluster readiness: + +- CockroachDB exists as the distributed SQL substrate. +- NATS exists with JetStream enabled. +- Temporal and Dapr are present, but their Organization-specific + persistence/components still need wiring. +- Hindsight is wired as the Hermes memory system. +- Hermes exists as a placeholder deployment until the real image is + available. +- hat-system CRDs and policies exist; the operator image/runtime still + needs completion. +- Cilium, SPIRE, Vault, Trust Manager, and External Secrets provide the + security substrate. + +The CA should not block V0 on every runtime being production-ready. It +should define ports and fakes first, then swap in cluster adapters as +each substrate becomes live. + +## V0 Build Sequence + +1. Create package skeletons for: + - `@agentic-org/domain`; + - `@agentic-org/application`; + - `@agentic-org/state`; + - `@agentic-org/policy`; + - `@agentic-org/messaging`; + - `@agentic-org/observability`; + - `@agentic-org/work-os`; + - `@agentic-org/hats`; + - `@agentic-org/mcp`; + - `@agentic-org/hermes`; + - `@agentic-org/memory`; + - `@agentic-org/ui-projections`. +2. Implement the canonical command context, event envelope, typed enums, + and idempotency key builder. +3. Implement the first CockroachDB schema and Drizzle migrations for the + V0 executable contract. +4. Implement command handlers for: + - submit capability request; + - triage capability request; + - reserve hat; + - issue hat token; + - start prompt flow; + - launch Hermes run; + - submit evidence; + - decide gate; + - complete outcome review. +5. Use fake adapters for Hermes, Hindsight, Dapr, Temporal, and + hat-system. +6. Add NATS outbox publisher and one consumer after command tests pass. +7. Add the NestJS API and worker hosts. +8. Add UI projections for work board, review center, and evidence + timeline. +9. Add real cluster adapters one at a time. + +## Extraction Path + +Do not split by domain first. Split only after package contracts are +stable and contract-tested. + +Extraction pattern: + +```text +package interface + -> in-process provider + -> remote adapter behind the same interface + -> ArgoCD-managed service +``` + +The first extraction candidates are runtime hosts already implied by the +architecture: + +- MCP gateway; +- Temporal worker; +- Dapr actor host; +- Hermes run adapter; +- memory adapter; +- k8s hat projection worker. + +Keep Organization Kernel, Work OS, Hat Graph, Policy, and command +handling together until their contracts are proven. + +## Contract Gates + +Before a package can be consumed by the OS, it needs: + +- public export review; +- dependency-boundary check; +- typed enum/state-machine tests; +- policy allow/deny tests where relevant; +- event envelope tests; +- idempotency tests for side-effecting commands; +- OpenTelemetry field coverage tests; +- outbox/inbox tests for event-producing commands; +- contract tests for every adapter port; +- README section documenting ownership, inputs, outputs, events, and + failure modes. + +## Open Questions + +- Should `@agentic-org/contracts` be separate from `@agentic-org/domain` + from day one, or split after the first generated client exists? +- Should event store be a dedicated table in V0, or are audit plus + outbox enough until replay requirements harden? +- Should the hat-system bridge republish into the canonical + `agentic-org.*` subject family, or should Organization consumers read + both subject families directly? +- Should package dependency rules be enforced with Nx module boundaries, + dependency-cruiser, or a custom lint rule? +- Which first fake adapter should be replaced with a real cluster + adapter: NATS, Hindsight, Hermes, Dapr, Temporal, or k8s hats? From 383e47dbf07a1cd258e618a3969a188995549f18 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 15:50:19 -0400 Subject: [PATCH 03/21] docs: ground organization CA in cluster runtime Co-Authored-By: OpenAI Codex --- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 156 +++++++++++++++++- 1 file changed, 150 insertions(+), 6 deletions(-) diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 4f46663161..4774ffe943 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -484,17 +484,115 @@ full-ai-cluster/k8s/applications/agentic-organization/ ServiceAccount/RBAC ``` +### Native `full-ai-cluster` Binding + +The application should treat `full-ai-cluster` as its native runtime +environment. Local fakes are useful for tests, but the real adapter +contracts should point at the services that already exist in the cluster +tree. + +| Adapter package | Cluster dependency | Expected in-cluster target | +|---|---|---| +| `@agentic-org/state` | CockroachDB ArgoCD app | `cockroachdb-public.cockroachdb.svc.cluster.local:26257` | +| `@agentic-org/messaging` | NATS ArgoCD app with JetStream enabled | `nats.nats.svc.cluster.local:4222` | +| `@agentic-org/workflows-temporal` | Temporal ArgoCD app | `temporal-frontend.temporal.svc.cluster.local:7233` | +| `@agentic-org/actors-dapr` | Dapr control plane | Dapr sidecar plus `dapr-system` placement service | +| `@agentic-org/memory` | Hindsight OCI Helm chart | `http://hindsight.hindsight.svc.cluster.local` | +| `@agentic-org/hermes` | Hermes deployment/service | `http://hermes.hermes.svc.cluster.local` once replicas are enabled | +| `@agentic-org/openziti` | OZ/OpenZiti controller app | `https://ziti-controller.openziti.svc.cluster.local:443` | +| `@agentic-org/k8s-hats` | hat-system CRDs and operator | Kubernetes API watches plus `zeta.society.hats.>` bridge input | +| `@agentic-org/observability` | Alloy, Tempo, Loki, Mimir, kube-prometheus-stack | OTLP traces to Alloy/Tempo, logs to Loki, metrics to Prometheus/Mimir | +| `@agentic-org/policy` | OPA Gatekeeper and Organization policy package | in-process policy first, OPA bundle/constraint adapters later | + +Adapter configuration should use environment variables and Kubernetes +Secrets/ExternalSecrets, but the domain package should never see those +values. The Nest composition layer binds configuration into adapter +ports. + +Minimum runtime environment contract: + +```text +AGENTIC_ORG_ENV +AGENTIC_ORG_ID +COCKROACH_URL +NATS_URL +TEMPORAL_ADDRESS +HINDSIGHT_URL +HERMES_URL +OZ_CONTROLLER_URL +OTEL_EXPORTER_OTLP_ENDPOINT +HAT_SYSTEM_NAMESPACE +``` + +Secrets such as database credentials, NATS credentials, OpenZiti +credentials, LLM provider keys, and credential-proxy tokens must come +from Vault through External Secrets or another approved cluster secret +path. They should not live in plain Kubernetes manifests and should not +be baked into the Agentic Organization image. + +### ArgoCD Sync Wave + +Agentic Organization should not land before its substrates. The current +cluster ordering puts hat-system CRDs before data consumers, data planes +at wave `0`, Hindsight and Temporal at wave `10`, and Hermes at wave +`20`. + +Recommended deployment split: + +| Application | Wave | Purpose | +|---|---:|---| +| `agentic-organization-contracts` | `-5` or `0` | optional future CRDs, NATS stream definitions, schema/config resources that other apps may consume | +| `agentic-organization` | `30` | API, web, workers, Temporal worker, Dapr actor host, MCP gateway | + +If V0 ships no CRDs and only consumes existing services, one +`agentic-organization` app at wave `30` is enough. If it later adds CRDs +or cluster-wide policies, split those resources into the earlier +contracts app rather than forcing the main runtime app to reconcile +early. + +### Kubernetes Workload Shape + +The first ArgoCD app should deploy one namespace and several workloads +from the same image or image family: + +| Workload | Kubernetes shape | Notes | +|---|---|---| +| API | Deployment + ClusterIP Service | REST/OpenAPI, internal command API, read API | +| Web | Deployment + ClusterIP Service/Gateway route | operations console | +| Workers | Deployment | outbox publisher, reconcilers, schedulers, NATS consumers | +| Temporal worker | Deployment | workflow and activity workers only | +| Dapr actor host | Deployment with Dapr annotations | actor endpoints and reminders | +| MCP gateway | Deployment + ClusterIP Service | Hermes-facing governed tool surface | + +All workloads need: + +- service account scoped to only required Kubernetes reads/writes; +- CiliumNetworkPolicy egress only to required namespaces/services; +- OpenTelemetry instrumentation enabled by default; +- readiness checks that include dependency health for the adapter set + the process actually uses; +- structured logs with the canonical trace envelope fields; +- pod labels for app, package host, version, and Organization + environment. + +The MCP gateway and worker processes are the highest-risk egress points. +They should get the narrowest network policy and credential scope first. + +### Runtime Readiness Mapping + Current cluster readiness: - CockroachDB exists as the distributed SQL substrate. -- NATS exists with JetStream enabled. +- NATS exists with JetStream enabled and Longhorn-backed file storage. - Temporal and Dapr are present, but their Organization-specific persistence/components still need wiring. -- Hindsight is wired as the Hermes memory system. -- Hermes exists as a placeholder deployment until the real image is - available. -- hat-system CRDs and policies exist; the operator image/runtime still - needs completion. +- Hindsight is wired as the Hermes memory system and currently uses + bundled PostgreSQL until an external CockroachDB-backed deployment is + proven. +- Hermes exists as a placeholder deployment with `replicas: 0` until the + real image is available. +- hat-system CRDs and policies exist; the operator deployment is still a + scaffold until the image is built and replicas are enabled. - Cilium, SPIRE, Vault, Trust Manager, and External Secrets provide the security substrate. @@ -502,6 +600,52 @@ The CA should not block V0 on every runtime being production-ready. It should define ports and fakes first, then swap in cluster adapters as each substrate becomes live. +### Bootstrap and Dev Parity + +The same package architecture should run in three modes: + +| Mode | Purpose | Runtime adapters | +|---|---|---| +| unit/test | package and command tests | in-memory/fake adapters | +| local dev cluster | k3d/K3S parity with `full-ai-cluster` apps | real NATS/Cockroach when available, fake Hermes/hat-system if needed | +| full cluster | production-like AI cluster | real CockroachDB, NATS, Hindsight, Hermes, OpenZiti, hat-system, Temporal, Dapr | + +Do not create a Docker Compose architecture that diverges from +`full-ai-cluster`. Local development can use fakes or a dev cluster, but +the real deployment contract is ArgoCD on the cluster. + +### Hat-System Integration Path + +The TypeScript side should integrate with the existing hat-system in +three steps: + +1. Read CRDs and consume `zeta.society.hats.>` ticks through + `@agentic-org/k8s-hats`. +2. Bridge HatSwap ticks into canonical `agentic-org.*` events with + trace fields, dedupe keys, and Organization scope when a matching + assignment exists. +3. After the Organization assignment state machine is stable, allow + approved Organization assignments to create HatBinding proposals. + +The Organization DB remains the business source of truth. Hat-system +proves runtime enforcement and cluster-observed state. + +### Observability Integration Path + +`@agentic-org/observability` should produce OTLP traces and structured +logs that the current Alloy/Tempo/Loki/Mimir stack can ingest. The +package should standardize: + +- trace propagation across HTTP, NATS, Temporal activities, Dapr actor + calls, MCP tools, Hermes callbacks, Hindsight calls, and Kubernetes + watches; +- JSON log fields matching the canonical event envelope; +- Prometheus metrics for outbox lag, NATS consumer lag, DLQ count, + command latency, policy denial count, adapter health, and projection + staleness; +- links from UI evidence records to trace IDs, log queries, run IDs, + event IDs, and artifacts. + ## V0 Build Sequence 1. Create package skeletons for: From 5f6598cb9c1180143f42d297d206b65fc82d7272 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 15:53:08 -0400 Subject: [PATCH 04/21] docs: define organization event automation Co-Authored-By: OpenAI Codex --- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 79 ++++++++++++++++++- 1 file changed, 76 insertions(+), 3 deletions(-) diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 4774ffe943..1e94acc6f2 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -324,6 +324,72 @@ type AgenticEventEnvelope = { No app should publish raw NATS payloads directly. Publishing should go through `@agentic-org/messaging`. +### Event-to-Automation Contract + +Agentic Organization should behave like an event-driven operating system. +State changes do not merely update boards. They wake up the Organization. + +The required runtime path is: + +```text +command accepted + -> state transition persisted + -> domain event written to outbox + -> outbox publishes canonical NATS event + -> rule evaluation consumes event + -> reaction plan is created + -> reaction executor validates policy, leases, budget, and hat supply + -> follow-up commands create reviews, QA work, assignments, runs, reports, + meetings, escalations, release tasks, or no-op decisions +``` + +Rules never mutate state directly. They propose a `ReactionPlan`. The +reaction executor turns that plan into normal Organization commands, so +automation follows the same policy, audit, idempotency, and trace path as +human or agent actions. + +Minimum event automations: + +| Event | Rule result | Follow-up command examples | +|---|---|---| +| `work_item.ready` | work needs execution or review assignment | `reserve_hat`, `assign_work`, `start_schedule_block` | +| `work_item.review_requested` | reviewer hat must be staffed | `reserve_hat`, `request_gate_review`, `send_inbox_signal` | +| `gate.code.approved` | work can move to QA if QA is required | `create_qa_work_item`, `reserve_hat`, `request_gate_review` | +| `gate.qa.approved` | work can move toward delivery/release | `create_release_task`, `request_delivery_review` | +| `gate.changes_requested` | implementer needs a bounded rework loop | `assign_rework`, `start_prompt_flow`, `send_inbox_signal` | +| `work_item.blocked` | blocker owner and escalation path required | `create_blocker`, `notify_manager`, `schedule_blocker_review` | +| `hermes_run.heartbeat_late` | runtime health needs reconciliation | `create_platform_incident`, `reconcile_run`, `notify_platform_operator` | +| `memory.gap_detected` | memory/process improvement enters backlog | `submit_capability_request`, `request_memory_review` | +| `credential_request.submitted` | security review is mandatory | `request_security_gate`, `send_inbox_signal` | +| `release.ready` | delivery gate and evidence check required | `request_delivery_review`, `verify_release_evidence` | + +The first V0 rule catalog should include: + +- ready work assignment; +- review staffing; +- QA staffing after code approval; +- delivery review after QA signoff; +- blocked work escalation; +- stale review escalation; +- late Hermes heartbeat incident creation; +- memory gap follow-up; +- credential expansion security review. + +Every automation must record: + +- triggering event ID; +- matched rule IDs and versions; +- reaction plan ID; +- policy decision ID; +- commands executed; +- commands skipped and why; +- idempotency keys; +- resulting event IDs; +- trace ID and correlation ID. + +This makes automation inspectable from the affected project, initiative, +work item, gate, agent, hat assignment, run, and UI timeline. + ### Subject Convention Use one Organization subject family: @@ -678,10 +744,13 @@ package should standardize: 5. Use fake adapters for Hermes, Hindsight, Dapr, Temporal, and hat-system. 6. Add NATS outbox publisher and one consumer after command tests pass. -7. Add the NestJS API and worker hosts. -8. Add UI projections for work board, review center, and evidence +7. Add the first rule catalog and reaction executor for ready work, + review staffing, QA staffing, blocker escalation, and late run + incidents. +8. Add the NestJS API and worker hosts. +9. Add UI projections for work board, review center, and evidence timeline. -9. Add real cluster adapters one at a time. +10. Add real cluster adapters one at a time. ## Extraction Path @@ -720,6 +789,10 @@ Before a package can be consumed by the OS, it needs: - policy allow/deny tests where relevant; - event envelope tests; - idempotency tests for side-effecting commands; +- rule evaluation tests that prove a state event creates the expected + reaction plan without mutating state directly; +- reaction executor tests that prove automation uses normal commands, + emits audit/outbox events, and dedupes retries; - OpenTelemetry field coverage tests; - outbox/inbox tests for event-producing commands; - contract tests for every adapter port; From 5a68e898dbeb687b27d4c225ec3a33f1470702b7 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 16:39:53 -0400 Subject: [PATCH 05/21] feat(agentic-org): add supervisor signal runtime slice Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 112 ++++++++++ .../docs/IMPLEMENTATION_GOVERNANCE.md | 137 +++++++++++++ .../IMPLEMENTATION_READINESS_CHECKLIST.md | 4 +- agentic-organization/docs/README.md | 3 + .../docs/SUPERVISOR_CHAIN_COMMUNICATION.md | 147 +++++++++++++ .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 176 ++++++++-------- .../docs/V0_EXECUTABLE_CONTRACT.md | 60 +++--- .../docs/V0_SCHEMA_AND_COMMANDS.md | 194 +++++++++--------- agentic-organization/package.json | 12 ++ agentic-organization/packages/README.md | 55 +++++ .../application/src/command-pipeline.test.ts | 75 +++++++ .../application/src/command-pipeline.ts | 84 ++++++++ .../application/src/command-result.ts | 28 +++ .../handlers/send-supervisor-signal.test.ts | 94 +++++++++ .../src/handlers/send-supervisor-signal.ts | 130 ++++++++++++ .../packages/application/src/index.ts | 8 + .../packages/application/src/ports.ts | 7 + .../domain/src/event-envelope.test.ts | 114 ++++++++++ .../packages/domain/src/event-envelope.ts | 121 +++++++++++ .../src/hat-communication-brief.test.ts | 34 +++ .../domain/src/hat-communication-brief.ts | 101 +++++++++ .../packages/domain/src/index.ts | 36 ++++ .../packages/domain/src/records.ts | 63 ++++++ .../domain/src/supervisor-communication.ts | 43 ++++ .../src/work-item-state-machine.test.ts | 20 ++ .../domain/src/work-item-state-machine.ts | 24 +++ .../packages/messaging/src/index.ts | 1 + .../messaging/src/subject-builder.test.ts | 19 ++ .../packages/messaging/src/subject-builder.ts | 30 +++ .../packages/observability/src/index.ts | 7 + .../observability/src/span-attributes.test.ts | 69 +++++++ .../observability/src/span-attributes.ts | 70 +++++++ .../runtime/src/event-automation.test.ts | 67 ++++++ .../packages/runtime/src/index.ts | 7 + .../packages/runtime/src/reaction-plan.ts | 94 +++++++++ .../state/src/in-memory-organization-store.ts | 28 +++ .../packages/state/src/index.ts | 1 + agentic-organization/packages/test-node.d.ts | 11 + agentic-organization/tsconfig.json | 20 ++ openspec/specs/agentic-organization/spec.md | 140 +++++++++++++ 40 files changed, 2232 insertions(+), 214 deletions(-) create mode 100644 agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md create mode 100644 agentic-organization/docs/IMPLEMENTATION_GOVERNANCE.md create mode 100644 agentic-organization/docs/SUPERVISOR_CHAIN_COMMUNICATION.md create mode 100644 agentic-organization/package.json create mode 100644 agentic-organization/packages/README.md create mode 100644 agentic-organization/packages/application/src/command-pipeline.test.ts create mode 100644 agentic-organization/packages/application/src/command-pipeline.ts create mode 100644 agentic-organization/packages/application/src/command-result.ts create mode 100644 agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts create mode 100644 agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts create mode 100644 agentic-organization/packages/application/src/index.ts create mode 100644 agentic-organization/packages/application/src/ports.ts create mode 100644 agentic-organization/packages/domain/src/event-envelope.test.ts create mode 100644 agentic-organization/packages/domain/src/event-envelope.ts create mode 100644 agentic-organization/packages/domain/src/hat-communication-brief.test.ts create mode 100644 agentic-organization/packages/domain/src/hat-communication-brief.ts create mode 100644 agentic-organization/packages/domain/src/index.ts create mode 100644 agentic-organization/packages/domain/src/records.ts create mode 100644 agentic-organization/packages/domain/src/supervisor-communication.ts create mode 100644 agentic-organization/packages/domain/src/work-item-state-machine.test.ts create mode 100644 agentic-organization/packages/domain/src/work-item-state-machine.ts create mode 100644 agentic-organization/packages/messaging/src/index.ts create mode 100644 agentic-organization/packages/messaging/src/subject-builder.test.ts create mode 100644 agentic-organization/packages/messaging/src/subject-builder.ts create mode 100644 agentic-organization/packages/observability/src/index.ts create mode 100644 agentic-organization/packages/observability/src/span-attributes.test.ts create mode 100644 agentic-organization/packages/observability/src/span-attributes.ts create mode 100644 agentic-organization/packages/runtime/src/event-automation.test.ts create mode 100644 agentic-organization/packages/runtime/src/index.ts create mode 100644 agentic-organization/packages/runtime/src/reaction-plan.ts create mode 100644 agentic-organization/packages/state/src/in-memory-organization-store.ts create mode 100644 agentic-organization/packages/state/src/index.ts create mode 100644 agentic-organization/packages/test-node.d.ts create mode 100644 agentic-organization/tsconfig.json create mode 100644 openspec/specs/agentic-organization/spec.md diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md new file mode 100644 index 0000000000..45a8623902 --- /dev/null +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -0,0 +1,112 @@ +# First Implementation Slice + +## Status + +Implemented as a small NodeNext TypeScript package slice. + +## Purpose + +This slice turns the first Agentic Organization runtime contract from +architecture prose into executable TypeScript. + +It does not introduce NestJS, CockroachDB, NATS clients, Temporal, +Dapr, Hermes, Hindsight, or Kubernetes deployment manifests yet. Those +remain adapter layers. The goal is to prove the Organization command +shape before adding distributed infrastructure. + +The slice is intentionally generic. `send_supervisor_signal` is the +coordination primitive; specific downstream outcomes are lifecycle +decisions made by the target supervisor chain. The goal is not to +hardcode every future request tool. The goal is to make agent +coordination traceable and expandable so agents can propose new tools, +flows, and routing patterns as the Organization learns. + +## Implemented Flow + +```text +send_supervisor_signal + -> idempotency record check + -> chain-of-command signal + -> audit event + -> outbox event with canonical event envelope + -> NATS subject contract + -> LGTM span attributes + -> supervisor triage reaction plan +``` + +## Packages + +| Package | Implemented first | +| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `@agentic-org/domain` | event envelope, command/event constants, aggregate constants, supervisor-chain communication types, hat communication briefs, work item state machine, shared records | +| `@agentic-org/application` | command pipeline, idempotency conflict handling, supervisor signal handler | +| `@agentic-org/state` | in-memory Organization store fake | +| `@agentic-org/messaging` | stable `agentic-org....` subject builder | +| `@agentic-org/observability` | OpenTelemetry/LGTM span attribute projection | +| `@agentic-org/runtime` | first rule that plans triage for the target supervisor when a chain signal is sent | + +## NodeNext Runtime Decision + +Agentic Organization now has a local `package.json` and +`tsconfig.json` under `agentic-organization/`. + +The first executable slice uses: + +- Node 22 or newer; +- `type: module`; +- TypeScript `module: NodeNext`; +- explicit `.ts` imports; +- `node:test`; +- `node:assert/strict`; +- Node TypeScript stripping for test execution. + +This keeps the first package contracts independent from the root repo's +Bun tooling while still letting the future NestJS hosts consume the same +package code. + +## Telemetry Contract + +Every event envelope carries: + +- event ID and event type; +- command ID; +- correlation ID; +- causation ID; +- trace ID; +- idempotency key; +- agent ID; +- hat assignment ID; +- organization ID; +- project ID; +- work item ID; +- aggregate ID, type, and version. + +`@agentic-org/observability` projects those fields into stable +`agentic.*` span attributes plus NATS messaging attributes. Later OTLP +instrumentation must use these keys so Alloy, Tempo, Loki, Mimir, +Prometheus, and Grafana can connect command execution, NATS fanout, +Hermes runs, MCP calls, and UI evidence. + +## Guardrails Proven + +- Hats can expose a communication brief that tells the wearer their duty, + supervisor line, and efficient upward tools. +- Duplicate commands with the same idempotency key and request hash + replay the stored result. +- Duplicate commands with the same idempotency key and a different + request hash are rejected with a typed error code. +- Work item transitions are typed and illegal direct transitions throw. +- Event envelopes reject missing command trace fields. +- The first automation rule produces a supervisor triage plan, not an + unreviewed side effect. + +## Next Slice + +The next slice should add a CockroachDB-backed state adapter and +transactional outbox while preserving this public package contract. +After that, the NATS publisher worker can publish persisted outbox rows +to JetStream and attach the same telemetry attributes. + +Do not make the next slice a pile of bespoke request commands. Build the +generic supervisor triage lifecycle first, then let specialized +lifecycles emerge behind triage. diff --git a/agentic-organization/docs/IMPLEMENTATION_GOVERNANCE.md b/agentic-organization/docs/IMPLEMENTATION_GOVERNANCE.md new file mode 100644 index 0000000000..a9e2a4121b --- /dev/null +++ b/agentic-organization/docs/IMPLEMENTATION_GOVERNANCE.md @@ -0,0 +1,137 @@ +# Agentic Organization Implementation Governance + +## Status + +Standing guardrail for implementation work. + +## Purpose + +This document translates existing Zeta governance into Agentic +Organization implementation rules. It exists so package code, docs, +runtime hosts, and future cluster deployments move together. + +## Current-State Rule + +Agentic Organization docs are current-state design documents. Update +the relevant document when the implementation teaches us something. +Create an ADR only for durable architectural decisions that future +contributors must understand as a decision record. + +## Behavioral Specs Lead + +Runtime behavior that must survive a rebuild belongs in OpenSpec. +The first Agentic Organization behavior is captured in +`openspec/specs/agentic-organization/spec.md`. + +When code adds a new command, lifecycle transition, event, review gate, +telemetry rule, or automation behavior, update the behavioral spec, +tests, and docs in the same change. + +## Authority and Scope + +Only Organization command services may change authoritative Organization +business state. + +Adapters and runtime hosts may call commands. They must not bypass +commands to mutate work items, assignments, gates, hat decisions, +memory scope, audit rows, idempotency records, or outbox records. + +Every privileged action must carry: + +- actor agent ID; +- active hat assignment ID; +- organization ID; +- project ID; +- work item ID; +- command ID; +- correlation ID; +- causation ID; +- trace ID; +- idempotency key. + +## Work Anchors + +No meaningful discussion, memory write, tool call, gate review, +runtime run, or automation reaction should be anchorless. If an agent +needs to discuss or act on something ambiguous, create or link the +appropriate work item first. + +## Review and Self-Approval + +Agents may propose work and produce work. They may not approve their own +privileged work unless a future policy explicitly allows a narrow +low-risk exception. + +Reviewer gates must be represented as explicit state, not as chat +agreement. + +## Idempotency and Replay + +Duplicates are normal. Temporal retries, NATS redelivery, Dapr +reminders, Oz callbacks, and agent retries must call the same +Organization command with the same idempotency key. + +Conflicting reuse of an idempotency key must produce a typed rejection. + +## Telemetry + +Every implementation package should preserve the Agentic event trace +chain. Runtime hosts and adapters must export telemetry compatible with +the existing full-ai-cluster LGTM stack: + +- Alloy for collection; +- Tempo for traces; +- Loki for logs; +- Mimir and Prometheus for metrics; +- Grafana for dashboards. + +The first slice defines the required `agentic.*` attributes in +`@agentic-org/observability`. Later packages should consume that +contract instead of inventing new names. + +## Security + +Credential access must remain indirect and scoped through approved +Credential Proxy paths. Agents should not receive broad raw secrets. + +New MCP tools, Temporal workflows, Dapr actors, NATS subjects, +credential endpoints, or runtime capabilities must start as scoped +supervisor-chain communication and then move through the appropriate +expansion lifecycle and security review when they expand authority, +credentials, network reach, memory reach, or data access. + +## Data Is Not Directives + +Retrieved docs, logs, memories, web pages, tool output, and user +attachments are context data. They must not be treated as executable +instructions unless an authorized command or prompt-flow phase explicitly +adopts them. + +## Quality Gate + +Every implementation change must include representative tests first +when it changes behavior. Avoid magic strings by centralizing command +names, event names, states, error codes, hat names, action types, metric +names, and telemetry keys as typed constants. + +## Generic Lifecycle Duty + +Agentic Organization must prefer generic lifecycle primitives over +hardcoded one-off tools. A specific tool should become first-class only +after the Organization has evidence that the pattern repeats and that a +specialized tool improves coordination, safety, or observability. + +The expected path is: + +```text +agent discovers need + -> hat uses supervisor-chain communication + -> supervisor triages + -> route to specialized lifecycle if needed + -> agents may propose new tools or flows + -> review, security, implementation, activation, and outcome review +``` + +This is non-negotiable for the architecture. The platform exists to help +agents expand their own coordination substrate safely, not to freeze the +first vocabulary forever. diff --git a/agentic-organization/docs/IMPLEMENTATION_READINESS_CHECKLIST.md b/agentic-organization/docs/IMPLEMENTATION_READINESS_CHECKLIST.md index 94751b4d7a..29fcc73e78 100644 --- a/agentic-organization/docs/IMPLEMENTATION_READINESS_CHECKLIST.md +++ b/agentic-organization/docs/IMPLEMENTATION_READINESS_CHECKLIST.md @@ -29,7 +29,7 @@ Define the first end-to-end workflow we will build. Recommended first slice: ```text -ambiguous internal capability request +ambiguous internal supervisor signal -> requirement maturity / discovery -> BRD/product signoff -> CA/design review @@ -46,7 +46,7 @@ ambiguous internal capability request For v0, reduce this to the smallest useful three-step vertical: ```text -capability request +supervisor-chain signal -> one readiness/gate decision -> one hat-assigned Hermes run with evidence ``` diff --git a/agentic-organization/docs/README.md b/agentic-organization/docs/README.md index 0c82b379f1..1102b41e2a 100644 --- a/agentic-organization/docs/README.md +++ b/agentic-organization/docs/README.md @@ -19,9 +19,12 @@ Current documents: - [Work and Release Management OS](./WORK_AND_RELEASE_MANAGEMENT_OS.md) - the custom backlog, project, task, assignment, signal, board, and release workflow product that keeps agent work reliable and visible. - [Agent-Native Knowledge Graph and Retrieval](./AGENT_NATIVE_KNOWLEDGE_GRAPH.md) - the graph and retrieval layer linking tasks, discussions, decisions, meetings, docs, artifacts, runs, memories, and evidence into agent-readable context. - [Agent Work Rhythm and Prompt Flows](./AGENT_WORK_RHYTHM_AND_PROMPT_FLOWS.md) - the hat-bound schedule, free-time, review/red-team, reflection, memory maintenance, and deterministic prompt-flow model for agents. +- [Supervisor-Chain Communication](./SUPERVISOR_CHAIN_COMMUNICATION.md) - the typed upward communication line from each hat to its supervisor chain, including tool families, evidence, and triage semantics. - [Ambiguous Requirement Lifecycle](./AMBIGUOUS_REQUIREMENT_LIFECYCLE.md) - the discovery, customer interview, BRD, workflow modeling, architecture, decomposition, readiness, and learning path from vague request to curated feature. - [Anti-Stall Prioritization Runtime](./ANTI_STALL_PRIORITY_RUNTIME.md) - the hat-owned schedules, blocker triage, queue SLO, reassignment, alternate-work, dependency reconciliation, and priority routines that keep the Organization moving. - [Implementation Readiness Checklist](./IMPLEMENTATION_READINESS_CHECKLIST.md) - the decisions and contracts that should be defined before scaffolding the first implementation slice. +- [Implementation Governance](./IMPLEMENTATION_GOVERNANCE.md) - the current-state, OpenSpec, authority, idempotency, telemetry, security, and quality rules for implementation work. +- [First Implementation Slice](./FIRST_IMPLEMENTATION_SLICE.md) - the NodeNext TypeScript package slice proving command, state, audit, outbox, NATS subject, telemetry, and reaction-plan contracts. - [V0 Executable Contract](./V0_EXECUTABLE_CONTRACT.md) - the smallest end-to-end runtime slice, grounded against the current `full-ai-cluster` substrate. - [V0 Schema and Commands](./V0_SCHEMA_AND_COMMANDS.md) - the CockroachDB-backed state groups, enums, command contract, outbox model, and TypeScript-facing runtime events for the first implementation. - [V0 Policy and Runtime Boundaries](./V0_POLICY_AND_RUNTIME_BOUNDARIES.md) - the hat policy matrix, MCP preflight checks, cluster runtime boundaries, failure rules, and ArgoCD integration shape. diff --git a/agentic-organization/docs/SUPERVISOR_CHAIN_COMMUNICATION.md b/agentic-organization/docs/SUPERVISOR_CHAIN_COMMUNICATION.md new file mode 100644 index 0000000000..58c4838857 --- /dev/null +++ b/agentic-organization/docs/SUPERVISOR_CHAIN_COMMUNICATION.md @@ -0,0 +1,147 @@ +# Supervisor-Chain Communication + +## Status + +Implementation concept and first executable package contract. + +## Purpose + +Agents should never have to guess how to talk upward in the +Organization. Their active hat should explain: + +- what duty they are performing; +- who supervises that duty; +- which upward tools are available; +- when each tool should be used; +- what evidence is required; +- what the Organization will do after the signal is sent. + +This is the clean primitive underneath blockers, questions, requests, +capability gaps, resource needs, security asks, and escalation. + +The north star is a generic lifecycle, not a hardcoded list of forever +tools. The starter tool families below are the minimum vocabulary needed +to make early coordination clear. They are not meant to freeze how the +Organization communicates. Agents and their supervisors must be able to +propose better communication tools, prompt flows, routing rules, review +gates, and lifecycle states as they discover repeated friction. + +Non-negotiable duty: build the Organization so agents can expand their +own coordination substrate through governed lifecycle work. The platform +should make expansion reviewable, traceable, scoped, and safe; it should +not make the first tool list a cage. + +## Chain + +The chain is role/hat based, not agent-worth based. + +```text +team member hat + -> manager hat + -> director hat + -> C-suite hat + -> executive board hat +``` + +Examples: + +- Developer hat reports a blocker to Engineering Manager. +- Engineering Manager requests staffing or escalation from Director. +- Director requests priority or budget decision from C-suite. +- C-suite asks Executive Board for standards, succession, or + organization-level approval. + +## Tool Families + +Hats expose upward communication tools as typed tools, not as a generic +chat box. + +| Tool type | Use when | +| --------------------- | ------------------------------------------------------------------ | +| `ask_question` | The hat needs clarification before continuing scoped work | +| `report_blocker` | Work cannot move without supervisor triage or routing | +| `request_decision` | Multiple valid paths exist and authority sits above the hat | +| `request_resource` | The team needs hats, time, budget, infrastructure, or access | +| `request_review` | A supervisor/reviewer decision is needed before lifecycle progress | +| `report_risk` | A risk could affect scope, schedule, quality, security, or cost | +| `suggest_improvement` | The hat sees a process, memory, prompt-flow, tool, or workflow gap | +| `request_escalation` | The current supervisor level cannot resolve the issue alone | + +These are starter families. A hat may later gain additional +communication tools through the same Organization lifecycle used for +any other internal capability: signal upward, supervisor triage, +director or security routing when needed, implementation, review, +activation, and outcome review. + +## Runtime Contract + +`send_supervisor_signal` creates a durable signal. It does not +automatically create a task or approve new capability. + +The command records: + +- source agent and active hat assignment; +- source chain level; +- target chain level; +- target supervisor hat assignment; +- organization, project, team, and work item; +- typed tool used; +- title and message; +- command trace, correlation, causation, and idempotency. + +The outbox event is `supervisor_signal.sent`. The runtime reacts by +creating a supervisor triage plan for the target level. + +## Hat Communication Brief + +Each active hat should receive a communication brief in its context +pack. The brief should be generated from the hat graph and Organization +policy. + +The brief includes: + +- `hatId`; +- duty statement; +- source chain level; +- supervisor target level and target hat; +- available upward tools; +- when to use each tool; +- evidence required for each tool. + +Hermes should see this brief before executing work so it can choose the +lowest-friction communication path instead of inventing one. + +## Routing Semantics + +The target supervisor decides what happens next: + +- answer directly; +- open or link a work item; +- route to another department; +- request security review; +- schedule a one-on-one or team discussion; +- escalate to the next supervisor level; +- route to internal platform teams for implementation. + +This keeps the lifecycle generic. A missing tool, missing workflow, +missing memory, unclear requirement, security access problem, blocked +task, or staffing issue all start as communication through the same +substrate, then become specialized work only after the responsible hat +triages it. + +## Expansion Rule + +Do not add one-off command handlers for every new thing an agent wants +to say. Prefer: + +```text +hat communication brief + -> generic supervisor signal + -> target supervisor triage + -> specialized lifecycle only if triage requires it + -> governed expansion of tools or flows when repeated need appears +``` + +If agents repeatedly need a more specific tool, the Organization should +capture that as evidence that the hat communication brief, prompt-flow +library, or routing policy needs to evolve. diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 1e94acc6f2..b606fbae0c 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -4,6 +4,12 @@ Proposal for review. +The first implementation slice starts as a NodeNext TypeScript package +island under `agentic-organization/packages`. NestJS remains the planned +composition host, but the first executable contracts intentionally run +without a Nest process so command, event, state, telemetry, and runtime +automation rules can be tested before adapters are introduced. + ## Purpose This CA proposes the first implementation architecture for Agentic @@ -113,20 +119,20 @@ Rules: ### Layer 0: Domain Kernel -| Package | Owns | -|---|---| -| `@agentic-org/domain` | entity IDs, value objects, typed enums, state machines, domain events, command names, event names, aggregate contracts | -| `@agentic-org/contracts` | shared DTOs, public schemas, versioned API/event contracts, generated clients when needed | +| Package | Owns | +| ------------------------ | ---------------------------------------------------------------------------------------------------------------------- | +| `@agentic-org/domain` | entity IDs, value objects, typed enums, state machines, domain events, command names, event names, aggregate contracts | +| `@agentic-org/contracts` | shared DTOs, public schemas, versioned API/event contracts, generated clients when needed | The domain kernel should be small and strict. It defines language and legal transitions. It does not execute side effects. ### Layer 1: Application and Policy -| Package | Owns | -|---|---| -| `@agentic-org/application` | command handlers, use cases, transaction orchestration, ports, command result contracts | -| `@agentic-org/policy` | RBAC, hat authority checks, OPA/Rego adapter boundary, policy decisions, denial reasons | +| Package | Owns | +| ---------------------------- | ---------------------------------------------------------------------------------------- | +| `@agentic-org/application` | command handlers, use cases, transaction orchestration, ports, command result contracts | +| `@agentic-org/policy` | RBAC, hat authority checks, OPA/Rego adapter boundary, policy decisions, denial reasons | | `@agentic-org/observability` | correlation envelope, OpenTelemetry helpers, required span attributes, trace propagation | The application layer is the Organization OS command layer. It is where @@ -134,19 +140,19 @@ the runtime asks the Organization to do something. ### Layer 2: Capability Packages -| Package | Owns | -|---|---| -| `@agentic-org/work-os` | projects, initiatives, work items, dependencies, blockers, assignments, releases, work signals | -| `@agentic-org/requirements` | ambiguous requirement intake, clarification, BRD lifecycle, maturity state | -| `@agentic-org/documents` | BRDs, CAs, ADRs, design docs, reports, document scope, document approval state | -| `@agentic-org/gates` | readiness, code, QA, security, architecture, memory, release, and outcome gates | -| `@agentic-org/hats` | hat graph, supply, assignment, JWT issuance/refresh/revocation, succession, cooldown, warmup | -| `@agentic-org/assignments` | staffing, agent-to-hat fit, work assignment, reassignment, capacity checks | -| `@agentic-org/prompt-flows` | deterministic prompt-flow definitions, phases, phase gates, reusable procedures | -| `@agentic-org/action-grammar` | universal action grammar, reversibility, observation contracts, action-mode classification | -| `@agentic-org/knowledge-graph` | graph nodes, edges, context packs, retrieval envelopes, provenance and access envelopes | -| `@agentic-org/runtime` | triggers, rules, reaction plans, leases, schedulers, reconcilers, self-healing loops | -| `@agentic-org/ui-projections` | read models for boards, timelines, run views, evidence, reviews, observability, org map | +| Package | Owns | +| ------------------------------ | ---------------------------------------------------------------------------------------------- | +| `@agentic-org/work-os` | projects, initiatives, work items, dependencies, blockers, assignments, releases, work signals | +| `@agentic-org/requirements` | ambiguous requirement intake, clarification, BRD lifecycle, maturity state | +| `@agentic-org/documents` | BRDs, CAs, ADRs, design docs, reports, document scope, document approval state | +| `@agentic-org/gates` | readiness, code, QA, security, architecture, memory, release, and outcome gates | +| `@agentic-org/hats` | hat graph, supply, assignment, JWT issuance/refresh/revocation, succession, cooldown, warmup | +| `@agentic-org/assignments` | staffing, agent-to-hat fit, work assignment, reassignment, capacity checks | +| `@agentic-org/prompt-flows` | deterministic prompt-flow definitions, phases, phase gates, reusable procedures | +| `@agentic-org/action-grammar` | universal action grammar, reversibility, observation contracts, action-mode classification | +| `@agentic-org/knowledge-graph` | graph nodes, edges, context packs, retrieval envelopes, provenance and access envelopes | +| `@agentic-org/runtime` | triggers, rules, reaction plans, leases, schedulers, reconcilers, self-healing loops | +| `@agentic-org/ui-projections` | read models for boards, timelines, run views, evidence, reviews, observability, org map | Capability packages should be independently testable. They can expose interfaces and services, but they should not know which process is @@ -154,19 +160,19 @@ calling them. ### Layer 3: State, Messaging, and Runtime Adapters -| Package | Owns | -|---|---| -| `@agentic-org/state` | Drizzle schema, migrations, repositories, transactions, outbox, inbox, idempotency, leases | -| `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | -| `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | -| `@agentic-org/actors-dapr` | Dapr actor interfaces, actor implementations, reminders, actor state projection | -| `@agentic-org/mcp` | MCP schemas, tool registry, preflight checks, policy-checked tool handlers | -| `@agentic-org/hermes` | Hermes session adapter, run adapter, callback contract, run context builder | -| `@agentic-org/memory` | Hindsight adapter, hat-scoped recall/retain/reflect, memory attribution, memory health | -| `@agentic-org/k8s-hats` | generated or checked Hat, HatBinding, HatSwap, HatPolicy types, informers, projection decoding | -| `@agentic-org/openziti` | OpenZiti transport adapter, identity/config access, connectivity checks | -| `@agentic-org/credential-proxy` | credential request adapter, scoped credential use, audit hooks | -| `@agentic-org/adapters-agentic-services` | temporary wrappers around reused `agentic-services` primitives | +| Package | Owns | +| ---------------------------------------- | ---------------------------------------------------------------------------------------------- | +| `@agentic-org/state` | Drizzle schema, migrations, repositories, transactions, outbox, inbox, idempotency, leases | +| `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | +| `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | +| `@agentic-org/actors-dapr` | Dapr actor interfaces, actor implementations, reminders, actor state projection | +| `@agentic-org/mcp` | MCP schemas, tool registry, preflight checks, policy-checked tool handlers | +| `@agentic-org/hermes` | Hermes session adapter, run adapter, callback contract, run context builder | +| `@agentic-org/memory` | Hindsight adapter, hat-scoped recall/retain/reflect, memory attribution, memory health | +| `@agentic-org/k8s-hats` | generated or checked Hat, HatBinding, HatSwap, HatPolicy types, informers, projection decoding | +| `@agentic-org/openziti` | OpenZiti transport adapter, identity/config access, connectivity checks | +| `@agentic-org/credential-proxy` | credential request adapter, scoped credential use, audit hooks | +| `@agentic-org/adapters-agentic-services` | temporary wrappers around reused `agentic-services` primitives | Adapters are replaceable. The Organization should be able to run a V0 slice with in-process fakes, then swap in Temporal, Dapr, Hermes, @@ -174,14 +180,14 @@ Hindsight, Kubernetes, and NATS adapters behind the same ports. ### Layer 4: Runtime Hosts -| Runtime host | Responsibility | -|---|---| -| `apps/api` | REST/OpenAPI, internal APIs, command dispatch, read queries, auth guards | -| `apps/web` | human operations console, boards, timelines, org map, observability, review center | -| `apps/workers` | outbox publisher, schedulers, NATS consumers, reconcilers, projection builders | -| `apps/temporal-worker` | Temporal workers and activities that call Organization commands | -| `apps/dapr-actors` | Dapr actor host for hot state and reminders | -| `apps/mcp-gateway` | MCP gateway, agent context resolution, preflight checks, tool execution | +| Runtime host | Responsibility | +| ---------------------- | ---------------------------------------------------------------------------------- | +| `apps/api` | REST/OpenAPI, internal APIs, command dispatch, read queries, auth guards | +| `apps/web` | human operations console, boards, timelines, org map, observability, review center | +| `apps/workers` | outbox publisher, schedulers, NATS consumers, reconcilers, projection builders | +| `apps/temporal-worker` | Temporal workers and activities that call Organization commands | +| `apps/dapr-actors` | Dapr actor host for hot state and reminders | +| `apps/mcp-gateway` | MCP gateway, agent context resolution, preflight checks, tool execution | Runtime hosts are allowed to be deployed separately. They are not separate business services yet. @@ -286,7 +292,7 @@ type AgenticEventEnvelope = { organizationId: string; projectId?: string; initiativeId?: string; - workItemId?: string; + workItemId: string; runId?: string; }; actor: { @@ -350,18 +356,18 @@ human or agent actions. Minimum event automations: -| Event | Rule result | Follow-up command examples | -|---|---|---| -| `work_item.ready` | work needs execution or review assignment | `reserve_hat`, `assign_work`, `start_schedule_block` | -| `work_item.review_requested` | reviewer hat must be staffed | `reserve_hat`, `request_gate_review`, `send_inbox_signal` | -| `gate.code.approved` | work can move to QA if QA is required | `create_qa_work_item`, `reserve_hat`, `request_gate_review` | -| `gate.qa.approved` | work can move toward delivery/release | `create_release_task`, `request_delivery_review` | -| `gate.changes_requested` | implementer needs a bounded rework loop | `assign_rework`, `start_prompt_flow`, `send_inbox_signal` | -| `work_item.blocked` | blocker owner and escalation path required | `create_blocker`, `notify_manager`, `schedule_blocker_review` | -| `hermes_run.heartbeat_late` | runtime health needs reconciliation | `create_platform_incident`, `reconcile_run`, `notify_platform_operator` | -| `memory.gap_detected` | memory/process improvement enters backlog | `submit_capability_request`, `request_memory_review` | -| `credential_request.submitted` | security review is mandatory | `request_security_gate`, `send_inbox_signal` | -| `release.ready` | delivery gate and evidence check required | `request_delivery_review`, `verify_release_evidence` | +| Event | Rule result | Follow-up command examples | +| ------------------------------ | ------------------------------------------ | ----------------------------------------------------------------------- | +| `work_item.ready` | work needs execution or review assignment | `reserve_hat`, `assign_work`, `start_schedule_block` | +| `work_item.review_requested` | reviewer hat must be staffed | `reserve_hat`, `request_gate_review`, `send_inbox_signal` | +| `gate.code.approved` | work can move to QA if QA is required | `create_qa_work_item`, `reserve_hat`, `request_gate_review` | +| `gate.qa.approved` | work can move toward delivery/release | `create_release_task`, `request_delivery_review` | +| `gate.changes_requested` | implementer needs a bounded rework loop | `assign_rework`, `start_prompt_flow`, `send_inbox_signal` | +| `work_item.blocked` | blocker owner and escalation path required | `create_blocker`, `notify_manager`, `schedule_blocker_review` | +| `hermes_run.heartbeat_late` | runtime health needs reconciliation | `create_platform_incident`, `reconcile_run`, `notify_platform_operator` | +| `memory.gap_detected` | memory/process improvement enters backlog | `send_supervisor_signal`, `request_memory_review` | +| `credential_request.submitted` | security review is mandatory | `request_security_gate`, `send_inbox_signal` | +| `release.ready` | delivery gate and evidence check required | `request_delivery_review`, `verify_release_evidence` | The first V0 rule catalog should include: @@ -557,18 +563,18 @@ environment. Local fakes are useful for tests, but the real adapter contracts should point at the services that already exist in the cluster tree. -| Adapter package | Cluster dependency | Expected in-cluster target | -|---|---|---| -| `@agentic-org/state` | CockroachDB ArgoCD app | `cockroachdb-public.cockroachdb.svc.cluster.local:26257` | -| `@agentic-org/messaging` | NATS ArgoCD app with JetStream enabled | `nats.nats.svc.cluster.local:4222` | -| `@agentic-org/workflows-temporal` | Temporal ArgoCD app | `temporal-frontend.temporal.svc.cluster.local:7233` | -| `@agentic-org/actors-dapr` | Dapr control plane | Dapr sidecar plus `dapr-system` placement service | -| `@agentic-org/memory` | Hindsight OCI Helm chart | `http://hindsight.hindsight.svc.cluster.local` | -| `@agentic-org/hermes` | Hermes deployment/service | `http://hermes.hermes.svc.cluster.local` once replicas are enabled | -| `@agentic-org/openziti` | OZ/OpenZiti controller app | `https://ziti-controller.openziti.svc.cluster.local:443` | -| `@agentic-org/k8s-hats` | hat-system CRDs and operator | Kubernetes API watches plus `zeta.society.hats.>` bridge input | -| `@agentic-org/observability` | Alloy, Tempo, Loki, Mimir, kube-prometheus-stack | OTLP traces to Alloy/Tempo, logs to Loki, metrics to Prometheus/Mimir | -| `@agentic-org/policy` | OPA Gatekeeper and Organization policy package | in-process policy first, OPA bundle/constraint adapters later | +| Adapter package | Cluster dependency | Expected in-cluster target | +| --------------------------------- | ------------------------------------------------ | --------------------------------------------------------------------- | +| `@agentic-org/state` | CockroachDB ArgoCD app | `cockroachdb-public.cockroachdb.svc.cluster.local:26257` | +| `@agentic-org/messaging` | NATS ArgoCD app with JetStream enabled | `nats.nats.svc.cluster.local:4222` | +| `@agentic-org/workflows-temporal` | Temporal ArgoCD app | `temporal-frontend.temporal.svc.cluster.local:7233` | +| `@agentic-org/actors-dapr` | Dapr control plane | Dapr sidecar plus `dapr-system` placement service | +| `@agentic-org/memory` | Hindsight OCI Helm chart | `http://hindsight.hindsight.svc.cluster.local` | +| `@agentic-org/hermes` | Hermes deployment/service | `http://hermes.hermes.svc.cluster.local` once replicas are enabled | +| `@agentic-org/openziti` | OZ/OpenZiti controller app | `https://ziti-controller.openziti.svc.cluster.local:443` | +| `@agentic-org/k8s-hats` | hat-system CRDs and operator | Kubernetes API watches plus `zeta.society.hats.>` bridge input | +| `@agentic-org/observability` | Alloy, Tempo, Loki, Mimir, kube-prometheus-stack | OTLP traces to Alloy/Tempo, logs to Loki, metrics to Prometheus/Mimir | +| `@agentic-org/policy` | OPA Gatekeeper and Organization policy package | in-process policy first, OPA bundle/constraint adapters later | Adapter configuration should use environment variables and Kubernetes Secrets/ExternalSecrets, but the domain package should never see those @@ -605,10 +611,10 @@ at wave `0`, Hindsight and Temporal at wave `10`, and Hermes at wave Recommended deployment split: -| Application | Wave | Purpose | -|---|---:|---| +| Application | Wave | Purpose | +| -------------------------------- | ----------: | -------------------------------------------------------------------------------------------------- | | `agentic-organization-contracts` | `-5` or `0` | optional future CRDs, NATS stream definitions, schema/config resources that other apps may consume | -| `agentic-organization` | `30` | API, web, workers, Temporal worker, Dapr actor host, MCP gateway | +| `agentic-organization` | `30` | API, web, workers, Temporal worker, Dapr actor host, MCP gateway | If V0 ships no CRDs and only consumes existing services, one `agentic-organization` app at wave `30` is enough. If it later adds CRDs @@ -621,14 +627,14 @@ early. The first ArgoCD app should deploy one namespace and several workloads from the same image or image family: -| Workload | Kubernetes shape | Notes | -|---|---|---| -| API | Deployment + ClusterIP Service | REST/OpenAPI, internal command API, read API | -| Web | Deployment + ClusterIP Service/Gateway route | operations console | -| Workers | Deployment | outbox publisher, reconcilers, schedulers, NATS consumers | -| Temporal worker | Deployment | workflow and activity workers only | -| Dapr actor host | Deployment with Dapr annotations | actor endpoints and reminders | -| MCP gateway | Deployment + ClusterIP Service | Hermes-facing governed tool surface | +| Workload | Kubernetes shape | Notes | +| --------------- | -------------------------------------------- | --------------------------------------------------------- | +| API | Deployment + ClusterIP Service | REST/OpenAPI, internal command API, read API | +| Web | Deployment + ClusterIP Service/Gateway route | operations console | +| Workers | Deployment | outbox publisher, reconcilers, schedulers, NATS consumers | +| Temporal worker | Deployment | workflow and activity workers only | +| Dapr actor host | Deployment with Dapr annotations | actor endpoints and reminders | +| MCP gateway | Deployment + ClusterIP Service | Hermes-facing governed tool surface | All workloads need: @@ -670,11 +676,11 @@ each substrate becomes live. The same package architecture should run in three modes: -| Mode | Purpose | Runtime adapters | -|---|---|---| -| unit/test | package and command tests | in-memory/fake adapters | -| local dev cluster | k3d/K3S parity with `full-ai-cluster` apps | real NATS/Cockroach when available, fake Hermes/hat-system if needed | -| full cluster | production-like AI cluster | real CockroachDB, NATS, Hindsight, Hermes, OpenZiti, hat-system, Temporal, Dapr | +| Mode | Purpose | Runtime adapters | +| ----------------- | ------------------------------------------ | ------------------------------------------------------------------------------- | +| unit/test | package and command tests | in-memory/fake adapters | +| local dev cluster | k3d/K3S parity with `full-ai-cluster` apps | real NATS/Cockroach when available, fake Hermes/hat-system if needed | +| full cluster | production-like AI cluster | real CockroachDB, NATS, Hindsight, Hermes, OpenZiti, hat-system, Temporal, Dapr | Do not create a Docker Compose architecture that diverges from `full-ai-cluster`. Local development can use fakes or a dev cluster, but @@ -732,8 +738,8 @@ package should standardize: 3. Implement the first CockroachDB schema and Drizzle migrations for the V0 executable contract. 4. Implement command handlers for: - - submit capability request; - - triage capability request; + - send supervisor signal; + - triage supervisor signal; - reserve hat; - issue hat token; - start prompt flow; diff --git a/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md b/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md index a50b994495..fb12051830 100644 --- a/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md +++ b/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md @@ -18,20 +18,20 @@ not a parallel substrate. The current `origin/main` cluster shape gives V0 these host primitives: -| Cluster component | V0 use | -|---|---| -| K3S + ArgoCD App-of-Apps | deploy Agentic Organization as a future `full-ai-cluster/k8s/applications/agentic-organization/` application | -| Cilium + Hubble | pod networking, L7 policy, flow observability, and service-mesh behavior without Istio | -| cert-manager, Vault, SPIRE, Trust Manager, External Secrets | workload identity, TLS trust, and secret delivery | -| CockroachDB | authoritative Organization database | -| NATS JetStream | event transport, outbox fanout, live UI updates, and replayable integration streams | -| Temporal TS | durable workflows after the native command model is proven | -| Dapr Actors | hot entity coordination after the DB-backed service contract is proven | -| Hindsight | Hermes memory backend, wrapped with Organization attribution and scope | -| Hermes | agent runtime that performs the work | -| OZ/OpenZiti | zero-trust transport, not the Organization business orchestrator | -| hat-system | Kubernetes hat enforcement/projection surface using Hat, HatBinding, HatSwap, and HatPolicy CRDs | -| Loki, Tempo, Alloy, Mimir, kube-prometheus-stack | logs, traces, metrics, dashboards, and audit correlation | +| Cluster component | V0 use | +| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | +| K3S + ArgoCD App-of-Apps | deploy Agentic Organization as a future `full-ai-cluster/k8s/applications/agentic-organization/` application | +| Cilium + Hubble | pod networking, L7 policy, flow observability, and service-mesh behavior without Istio | +| cert-manager, Vault, SPIRE, Trust Manager, External Secrets | workload identity, TLS trust, and secret delivery | +| CockroachDB | authoritative Organization database | +| NATS JetStream | event transport, outbox fanout, live UI updates, and replayable integration streams | +| Temporal TS | durable workflows after the native command model is proven | +| Dapr Actors | hot entity coordination after the DB-backed service contract is proven | +| Hindsight | Hermes memory backend, wrapped with Organization attribution and scope | +| Hermes | agent runtime that performs the work | +| OZ/OpenZiti | zero-trust transport, not the Organization business orchestrator | +| hat-system | Kubernetes hat enforcement/projection surface using Hat, HatBinding, HatSwap, and HatPolicy CRDs | +| Loki, Tempo, Alloy, Mimir, kube-prometheus-stack | logs, traces, metrics, dashboards, and audit correlation | Sync-wave implication: Agentic Organization is a consumer app. It should land after the foundation, data planes, hat-system CRDs, Hindsight, @@ -54,16 +54,16 @@ V0 does not need: - autonomous creation of new tools, workflows, or credential proxy endpoints. -V0 should still model those future paths as capability requests, so the -Organization can later build them through its own lifecycle. +V0 should still model those future paths as supervisor-chain signals, so +the Organization can later route them through its own lifecycle. ## First Vertical Slice The first executable slice is: ```text -capability request - -> discussion anchor and context pack +supervisor-chain signal + -> anchored work item and context pack -> one readiness or review gate -> hat assignment -> scheduled prompt-flow run @@ -88,15 +88,15 @@ This is the smallest useful loop because it proves: Keep the first hat set small: -| Hat | V0 reason | -|---|---| -| Director | accepts or rejects the capability request for V0 scope | +| Hat | V0 reason | +| ------------------- | ----------------------------------------------------------------------------- | +| Director | accepts or rejects escalated supervisor signals for V0 scope | | Engineering Manager | grooms the work item, selects schedule, assigns implementer and reviewer hats | -| Implementer | executes the prompt flow and submits evidence | -| Code Reviewer | reviews the evidence and blocks self-approval | -| Memory Curator | reviews memory writes or flags memory gaps when the run ends | -| Platform Operator | handles runtime failure, pod/session issues, and integration health | -| Security Reviewer | required only when the request needs a new credential or external tool scope | +| Implementer | executes the prompt flow and submits evidence | +| Code Reviewer | reviews the evidence and blocks self-approval | +| Memory Curator | reviews memory writes or flags memory gaps when the run ends | +| Platform Operator | handles runtime failure, pod/session issues, and integration health | +| Security Reviewer | required only when the request needs a new credential or external tool scope | The Executive Board, TPM, Product Owner, Architect, QA Reviewer, Hat Designer, and department directors remain first-class in the reference @@ -136,10 +136,10 @@ without it. ## Required V0 Flow -1. `submit_capability_request` creates the work item, discussion anchor, - and first audit/outbox events. -2. `triage_capability_request` selects the responsible project, - initiative, owner hat, and required gate. +1. `send_supervisor_signal` creates the chain communication record and + first audit/outbox events against an anchored work item. +2. `triage_supervisor_signal` selects the responsible project, + initiative, owner hat, lifecycle, and required gate. 3. `create_context_pack` links relevant docs, prior decisions, task graph nodes, memory references, and acceptance criteria. 4. `decide_gate` moves the request into ready state or asks for more diff --git a/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md b/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md index ee14bc6ffe..0a2bd5d8bf 100644 --- a/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md +++ b/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md @@ -16,18 +16,18 @@ the Organization database. Every authoritative table should include: -| Column | Purpose | -|---|---| -| `id` | stable unique ID | -| `organization_id` | future multi-org partition key, even if V0 uses one org | -| `created_at` | creation time | -| `updated_at` | last mutation time | -| `version` | optimistic concurrency and projection safety | -| `created_by_agent_id` | agent that caused the write, when applicable | -| `created_by_hat_assignment_id` | hat authority that caused the write, when applicable | -| `correlation_id` | end-to-end request/run correlation | -| `causation_id` | direct parent command, event, tool call, or workflow step | -| `trace_id` | observability trace link | +| Column | Purpose | +| ------------------------------ | --------------------------------------------------------- | +| `id` | stable unique ID | +| `organization_id` | future multi-org partition key, even if V0 uses one org | +| `created_at` | creation time | +| `updated_at` | last mutation time | +| `version` | optimistic concurrency and projection safety | +| `created_by_agent_id` | agent that caused the write, when applicable | +| `created_by_hat_assignment_id` | hat authority that caused the write, when applicable | +| `correlation_id` | end-to-end request/run correlation | +| `causation_id` | direct parent command, event, tool call, or workflow step | +| `trace_id` | observability trace link | Append-only records should also carry `sequence` when replay order matters. @@ -36,80 +36,80 @@ matters. ### Identity and Hats -| Table | V0 responsibility | -|---|---| -| `agents` | known Hermes-capable agents and their stable identity | -| `agent_sessions` | live or historical Hermes sessions bound to an agent | -| `departments` | first department containers for ownership and review routing | -| `hat_definitions` | Organization-owned hat catalog | -| `hat_authority_rules` | typed permissions, scopes, and policy metadata for a hat | -| `hat_skill_bindings` | skills and prompt-flow availability attached to a hat | -| `hat_supply_policies` | max concurrency, TTL, cooldown, warmup, and assignment rules | -| `hat_assignments` | time-bounded wearer assignment for a specific agent/session | -| `hat_tokens` | short-lived JWT issuance, refresh, revocation, and expiry state | +| Table | V0 responsibility | +| ------------------------ | --------------------------------------------------------------------------- | +| `agents` | known Hermes-capable agents and their stable identity | +| `agent_sessions` | live or historical Hermes sessions bound to an agent | +| `departments` | first department containers for ownership and review routing | +| `hat_definitions` | Organization-owned hat catalog | +| `hat_authority_rules` | typed permissions, scopes, and policy metadata for a hat | +| `hat_skill_bindings` | skills and prompt-flow availability attached to a hat | +| `hat_supply_policies` | max concurrency, TTL, cooldown, warmup, and assignment rules | +| `hat_assignments` | time-bounded wearer assignment for a specific agent/session | +| `hat_tokens` | short-lived JWT issuance, refresh, revocation, and expiry state | | `hat_system_projections` | last observed Hat, HatBinding, HatSwap, and HatPolicy state from Kubernetes | ### Work Management -| Table | V0 responsibility | -|---|---| -| `projects` | top-level work containers | -| `initiatives` | project-scoped bodies of work | -| `work_items` | capability requests, tasks, defects, reviews, and follow-ups | -| `work_item_state_history` | append-only state transitions | -| `work_item_dependencies` | blocking or informational dependencies | -| `blockers` | active impediments with owner, severity, and resolution path | -| `assignments` | work item to agent/hat/session assignment records | -| `gates` | required review points for readiness, code, QA, memory, or security | -| `gate_decisions` | approve, reject, needs-changes, or defer decisions | -| `releases` | release groupings once release management enters the slice | +| Table | V0 responsibility | +| ------------------------- | -------------------------------------------------------------------- | +| `projects` | top-level work containers | +| `initiatives` | project-scoped bodies of work | +| `work_items` | anchored supervisor signals, tasks, defects, reviews, and follow-ups | +| `work_item_state_history` | append-only state transitions | +| `work_item_dependencies` | blocking or informational dependencies | +| `blockers` | active impediments with owner, severity, and resolution path | +| `assignments` | work item to agent/hat/session assignment records | +| `gates` | required review points for readiness, code, QA, memory, or security | +| `gate_decisions` | approve, reject, needs-changes, or defer decisions | +| `releases` | release groupings once release management enters the slice | ### Schedules, Prompt Flows, and Actions -| Table | V0 responsibility | -|---|---| -| `hat_schedule_templates` | default work rhythm by hat | -| `work_schedules` | concrete schedule assigned to an agent/hat context | -| `work_schedule_blocks` | free time, prioritized work, review, reflection, or meeting blocks | -| `prompt_flow_definitions` | named deterministic work protocols | -| `prompt_flow_versions` | immutable versioned prompt-flow contract | -| `prompt_flow_phases` | ordered reusable phases | -| `hat_prompt_flow_bindings` | which hats can run which prompt flows | -| `prompt_flow_runs` | one execution of a prompt-flow version | -| `prompt_flow_phase_runs` | state and evidence for each phase execution | -| `prompt_flow_gate_decisions` | reviewer decisions at phase boundaries | -| `universal_action_definitions` | typed action grammar catalog | -| `universal_action_records` | action intent emitted by an agent or workflow | -| `universal_action_observations` | observed result, evidence, and side effects for an action | +| Table | V0 responsibility | +| ------------------------------- | ------------------------------------------------------------------ | +| `hat_schedule_templates` | default work rhythm by hat | +| `work_schedules` | concrete schedule assigned to an agent/hat context | +| `work_schedule_blocks` | free time, prioritized work, review, reflection, or meeting blocks | +| `prompt_flow_definitions` | named deterministic work protocols | +| `prompt_flow_versions` | immutable versioned prompt-flow contract | +| `prompt_flow_phases` | ordered reusable phases | +| `hat_prompt_flow_bindings` | which hats can run which prompt flows | +| `prompt_flow_runs` | one execution of a prompt-flow version | +| `prompt_flow_phase_runs` | state and evidence for each phase execution | +| `prompt_flow_gate_decisions` | reviewer decisions at phase boundaries | +| `universal_action_definitions` | typed action grammar catalog | +| `universal_action_records` | action intent emitted by an agent or workflow | +| `universal_action_observations` | observed result, evidence, and side effects for an action | ### Communication, Graph, Documents, and Context -| Table | V0 responsibility | -|---|---| -| `discussion_anchors` | required work/project/initiative/task anchor for any discussion | -| `conversation_threads` | one-on-one, team, department, executive, or broadcast thread | -| `messages` | immutable message log with actor and hat attribution | -| `meetings` | structured meeting sessions with mode and anchor | -| `decisions` | explicit decisions linked to work and evidence | -| `documents` | BRDs, CAs, ADRs, reports, test cases, runbooks, and memory reviews | -| `artifact_links` | logs, screenshots, traces, code refs, PRs, builds, and uploads | -| `graph_nodes` | agent-readable graph node registry | -| `graph_edges` | typed relationships between work, docs, messages, decisions, runs, and memories | -| `context_packs` | deterministic context bundles assembled for an agent run or review | +| Table | V0 responsibility | +| ---------------------- | ------------------------------------------------------------------------------- | +| `discussion_anchors` | required work/project/initiative/task anchor for any discussion | +| `conversation_threads` | one-on-one, team, department, executive, or broadcast thread | +| `messages` | immutable message log with actor and hat attribution | +| `meetings` | structured meeting sessions with mode and anchor | +| `decisions` | explicit decisions linked to work and evidence | +| `documents` | BRDs, CAs, ADRs, reports, test cases, runbooks, and memory reviews | +| `artifact_links` | logs, screenshots, traces, code refs, PRs, builds, and uploads | +| `graph_nodes` | agent-readable graph node registry | +| `graph_edges` | typed relationships between work, docs, messages, decisions, runs, and memories | +| `context_packs` | deterministic context bundles assembled for an agent run or review | ### Runtime, Memory, Security, and Audit -| Table | V0 responsibility | -|---|---| -| `hermes_runs` | Organization binding to a Hermes execution session | -| `mcp_tool_calls` | governed tool call attempts and results | -| `memory_events` | Hindsight recall, retain, reflect, and review attribution | -| `credential_requests` | requests to expand credential proxy or external tool scope | -| `signals` | durable internal signals consumed by workers and UI read models | -| `audit_events` | append-only policy and state-change audit trail | -| `outbox_events` | transactional event publication source for NATS | -| `runtime_leases` | scheduler, reconciler, and worker leases | -| `idempotency_keys` | command deduplication records | +| Table | V0 responsibility | +| --------------------- | --------------------------------------------------------------- | +| `hermes_runs` | Organization binding to a Hermes execution session | +| `mcp_tool_calls` | governed tool call attempts and results | +| `memory_events` | Hindsight recall, retain, reflect, and review attribution | +| `credential_requests` | requests to expand credential proxy or external tool scope | +| `signals` | durable internal signals consumed by workers and UI read models | +| `audit_events` | append-only policy and state-change audit trail | +| `outbox_events` | transactional event publication source for NATS | +| `runtime_leases` | scheduler, reconciler, and worker leases | +| `idempotency_keys` | command deduplication records | ## V0 Enums @@ -217,7 +217,7 @@ gate release incident memory_review -capability_request +supervisor_signal ``` ### `signal_type` @@ -265,29 +265,29 @@ Every command handler must: ## V0 Commands -| Command | Actor scope | Writes | Emits | -|---|---|---|---| -| `submit_capability_request` | human, agent, director, manager | `work_items`, `discussion_anchors`, `graph_nodes`, `audit_events`, `outbox_events` | `work_item_changed` | -| `triage_capability_request` | director or engineering manager | `work_items`, `assignments`, `gates`, `context_packs` | `work_item_changed`, `gate_requested` | -| `create_discussion_anchor` | any authorized hat | `discussion_anchors`, `graph_edges` | `work_item_changed` | -| `create_context_pack` | manager, reviewer, implementer for assigned work | `context_packs`, `graph_edges`, `audit_events` | `work_item_changed` | -| `mark_work_ready` | manager or reviewer | `work_items`, `work_item_state_history`, `gates` | `work_item_changed`, `gate_requested` | -| `reserve_hat` | manager, director, platform operator | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed` | -| `issue_hat_token` | hat service, after policy allow | `hat_tokens`, `audit_events` | `hat_token_changed` | -| `refresh_hat_token` | active assigned agent/session | `hat_tokens`, `audit_events` | `hat_token_changed` | -| `revoke_hat_assignment` | manager, director, security, policy automation | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed`, `hat_token_changed` | -| `start_schedule_block` | assigned agent/session or scheduler | `work_schedule_blocks`, `agent_sessions` | `schedule_block_changed` | -| `start_prompt_flow` | assigned agent/session | `prompt_flow_runs`, `prompt_flow_phase_runs` | `prompt_flow_changed` | -| `record_universal_action` | assigned agent/session, workflow activity, adapter | `universal_action_records`, `mcp_tool_calls`, `audit_events` | `prompt_flow_changed` | -| `record_action_observation` | adapter, worker, reviewer, assigned agent | `universal_action_observations`, `artifact_links` | `prompt_flow_changed` | -| `launch_hermes_run` | runtime service or Temporal activity | `hermes_runs`, `agent_sessions`, `audit_events` | `hermes_run_changed` | -| `record_hermes_run_status` | Hermes/OZ callback, reconciler, platform operator | `hermes_runs`, `artifact_links` | `hermes_run_changed` | -| `submit_evidence` | implementer, QA, reviewer, adapter | `artifact_links`, `graph_edges`, `audit_events` | `work_item_changed` | -| `request_gate_review` | implementer, manager, workflow | `gates`, `work_items` | `gate_requested`, `work_item_changed` | -| `decide_gate` | reviewer hat, not same active implementer assignment | `gate_decisions`, `gates`, `work_items`, `audit_events` | `gate_decided`, `work_item_changed` | -| `record_memory_event` | memory adapter, assigned agent/session, memory curator | `memory_events`, `graph_edges`, `audit_events` | `memory_event_recorded` | -| `submit_credential_request` | any authorized hat with anchored work | `credential_requests`, `work_items`, `discussion_anchors` | `credential_request_changed` | -| `complete_outcome_review` | manager, memory curator, reviewer | `work_items`, `decisions`, optional follow-up `work_items` | `outcome_review_completed` | +| Command | Actor scope | Writes | Emits | +| --------------------------- | ------------------------------------------------------ | ------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------ | +| `send_supervisor_signal` | any authorized hat with supervisor line | `supervisor_signals`, `audit_events`, `outbox_events` | `supervisor_signal_sent` | +| `triage_supervisor_signal` | target supervisor hat | `supervisor_signals`, optional `work_items`, `assignments`, `gates`, `context_packs` | `supervisor_signal_triaged`, optional `work_item_changed`, optional `gate_requested` | +| `create_discussion_anchor` | any authorized hat | `discussion_anchors`, `graph_edges` | `work_item_changed` | +| `create_context_pack` | manager, reviewer, implementer for assigned work | `context_packs`, `graph_edges`, `audit_events` | `work_item_changed` | +| `mark_work_ready` | manager or reviewer | `work_items`, `work_item_state_history`, `gates` | `work_item_changed`, `gate_requested` | +| `reserve_hat` | manager, director, platform operator | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed` | +| `issue_hat_token` | hat service, after policy allow | `hat_tokens`, `audit_events` | `hat_token_changed` | +| `refresh_hat_token` | active assigned agent/session | `hat_tokens`, `audit_events` | `hat_token_changed` | +| `revoke_hat_assignment` | manager, director, security, policy automation | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed`, `hat_token_changed` | +| `start_schedule_block` | assigned agent/session or scheduler | `work_schedule_blocks`, `agent_sessions` | `schedule_block_changed` | +| `start_prompt_flow` | assigned agent/session | `prompt_flow_runs`, `prompt_flow_phase_runs` | `prompt_flow_changed` | +| `record_universal_action` | assigned agent/session, workflow activity, adapter | `universal_action_records`, `mcp_tool_calls`, `audit_events` | `prompt_flow_changed` | +| `record_action_observation` | adapter, worker, reviewer, assigned agent | `universal_action_observations`, `artifact_links` | `prompt_flow_changed` | +| `launch_hermes_run` | runtime service or Temporal activity | `hermes_runs`, `agent_sessions`, `audit_events` | `hermes_run_changed` | +| `record_hermes_run_status` | Hermes/OZ callback, reconciler, platform operator | `hermes_runs`, `artifact_links` | `hermes_run_changed` | +| `submit_evidence` | implementer, QA, reviewer, adapter | `artifact_links`, `graph_edges`, `audit_events` | `work_item_changed` | +| `request_gate_review` | implementer, manager, workflow | `gates`, `work_items` | `gate_requested`, `work_item_changed` | +| `decide_gate` | reviewer hat, not same active implementer assignment | `gate_decisions`, `gates`, `work_items`, `audit_events` | `gate_decided`, `work_item_changed` | +| `record_memory_event` | memory adapter, assigned agent/session, memory curator | `memory_events`, `graph_edges`, `audit_events` | `memory_event_recorded` | +| `submit_credential_request` | any authorized hat with anchored work | `credential_requests`, `work_items`, `discussion_anchors` | `credential_request_changed` | +| `complete_outcome_review` | manager, memory curator, reviewer | `work_items`, `decisions`, optional follow-up `work_items` | `outcome_review_completed` | ## Idempotency diff --git a/agentic-organization/package.json b/agentic-organization/package.json new file mode 100644 index 0000000000..0fd10e913a --- /dev/null +++ b/agentic-organization/package.json @@ -0,0 +1,12 @@ +{ + "name": "@agentic-org/root", + "private": true, + "type": "module", + "scripts": { + "test": "node --experimental-strip-types --test packages/**/*.test.ts", + "typecheck": "npx --yes -p typescript@6.0.3 tsc -p tsconfig.json" + }, + "engines": { + "node": ">=22.12.0" + } +} diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md new file mode 100644 index 0000000000..f900db7a48 --- /dev/null +++ b/agentic-organization/packages/README.md @@ -0,0 +1,55 @@ +# Agentic Organization Packages + +These packages are the first executable slice of Agentic Organization. +They are intentionally small and run as a NodeNext TypeScript island +before any NestJS host, CockroachDB adapter, NATS client, Temporal +worker, Dapr actor host, or Kubernetes deployment is introduced. + +## Package Boundary + +| Package | Current responsibility | +| --------------- | ---------------------------------------------------------------------------------------------------------- | +| `domain` | typed command names, event names, aggregate names, work item state machine, event envelope, shared records | +| `application` | command pipeline, idempotency handling, first supervisor-chain signal command handler | +| `state` | in-memory Organization store used as the first repository port fake | +| `messaging` | NATS subject contract without a live NATS dependency | +| `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes | +| `runtime` | first event-to-automation reaction rule | + +## Slice Rule + +The first slice proves this path: + +```text +supervisor-chain signal command + -> idempotency check + -> chain communication record + -> audit event + -> outbox event + -> NATS subject / telemetry contract + -> automation reaction plan +``` + +CockroachDB, JetStream publishing, Temporal, Dapr, Hermes, Hindsight, +and the hat-system CRDs come next as adapters behind these contracts. +They should not redefine command names, event names, state names, +correlation fields, or policy authority. + +## Validation + +Run the package tests from `agentic-organization/`: + +```powershell +npm test +``` + +The test command uses Node's built-in test runner and TypeScript type +stripping: + +```text +node --experimental-strip-types --test packages/**/*.test.ts +``` + +This is a deliberate NodeNext starting point so the package contracts +can run without requiring Bun to be installed on a cluster maintainer's +shell. diff --git a/agentic-organization/packages/application/src/command-pipeline.test.ts b/agentic-organization/packages/application/src/command-pipeline.test.ts new file mode 100644 index 0000000000..3ea66d3772 --- /dev/null +++ b/agentic-organization/packages/application/src/command-pipeline.test.ts @@ -0,0 +1,75 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { CommandType, SupervisorChainLevel, SupervisorSignalToolType } from "../../domain/src/index.ts"; +import { CommandErrorCode, CommandResultStatus } from "./command-result.ts"; +import { createCommandPipeline, type PipelineCommand } from "./command-pipeline.ts"; + +const command: PipelineCommand = { + commandId: "cmd-supervisor-signal-001", + type: CommandType.SendSupervisorSignal, + idempotencyKey: "idem-supervisor-signal-001", + requestHash: "hash-supervisor-signal-001", + correlationId: "corr-supervisor-signal-001", + causationId: "cause-team-work-001", + traceId: "trace-supervisor-signal-001", + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatAssignmentId: "hat-assignment-em-001", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + toolType: SupervisorSignalToolType.ReportBlocker, + title: "Blocked on scoped NATS publisher", + message: "The team cannot validate the outbox worker until a supervisor routes a scoped NATS publisher decision.", + relatedWorkItemId: "work-outbox-001", +}; + +describe("command pipeline idempotency", () => { + test("replaying the same idempotency key returns the stored result", () => { + const pipeline = createCommandPipeline({ + now: () => "2026-05-25T20:00:00.000Z", + createId: (prefix) => `${prefix}-001`, + }); + + const firstResult = pipeline.execute(command); + const replayResult = pipeline.execute(command); + + equal(firstResult.status, CommandResultStatus.Accepted); + equal(replayResult.status, CommandResultStatus.Accepted); + deepEqual(replayResult.idempotency, { + replayed: true, + }); + equal(firstResult.supervisorSignal !== undefined, true); + equal(replayResult.supervisorSignal !== undefined, true); + equal(replayResult.supervisorSignal?.supervisorSignalId, firstResult.supervisorSignal?.supervisorSignalId); + equal(pipeline.store.supervisorSignals.length, 1); + equal(pipeline.store.workItems.length, 0); + equal(pipeline.store.auditEvents.length, 1); + equal(pipeline.store.outboxEvents.length, 1); + }); + + test("rejects conflicting reuse of the same idempotency key", () => { + const pipeline = createCommandPipeline({ + now: () => "2026-05-25T20:00:00.000Z", + createId: (prefix) => `${prefix}-001`, + }); + + const firstResult = pipeline.execute(command); + const conflictResult = pipeline.execute({ + ...command, + requestHash: "hash-supervisor-signal-conflict", + title: "Different supervisor signal", + }); + + equal(firstResult.status, CommandResultStatus.Accepted); + equal(conflictResult.status, CommandResultStatus.Rejected); + equal(conflictResult.error?.code, CommandErrorCode.IdempotencyConflict); + equal(pipeline.store.supervisorSignals.length, 1); + equal(pipeline.store.outboxEvents.length, 1); + }); +}); diff --git a/agentic-organization/packages/application/src/command-pipeline.ts b/agentic-organization/packages/application/src/command-pipeline.ts new file mode 100644 index 0000000000..fb8afea849 --- /dev/null +++ b/agentic-organization/packages/application/src/command-pipeline.ts @@ -0,0 +1,84 @@ +import { CommandType } from "../../domain/src/index.ts"; +import { createInMemoryOrganizationStore, type InMemoryOrganizationStore } from "../../state/src/index.ts"; +import { CommandErrorCode, CommandResultStatus, type CommandResult } from "./command-result.ts"; +import { sendSupervisorSignal, type SendSupervisorSignalCommand } from "./handlers/send-supervisor-signal.ts"; +import type { Clock, IdGenerator } from "./ports.ts"; + +export type PipelineCommand = SendSupervisorSignalCommand; + +export type CommandPipeline = { + store: InMemoryOrganizationStore; + execute: (command: PipelineCommand) => CommandResult; +}; + +export function createCommandPipeline(dependencies: Clock & IdGenerator): CommandPipeline { + const store = createInMemoryOrganizationStore(); + + return { + store, + execute: (command) => executeCommand(command, store, dependencies), + }; +} + +function executeCommand( + command: PipelineCommand, + store: InMemoryOrganizationStore, + dependencies: Clock & IdGenerator, +): CommandResult { + const existingRecord = store.idempotencyRecords.get(command.idempotencyKey); + + if (existingRecord?.requestHash === command.requestHash) { + return { + ...existingRecord.result, + idempotency: { + replayed: true, + }, + }; + } + + if (existingRecord) { + return { + status: CommandResultStatus.Rejected, + idempotency: { + replayed: false, + }, + error: { + code: CommandErrorCode.IdempotencyConflict, + message: "idempotency key was reused with a different request hash", + }, + }; + } + + const result = dispatchCommand(command, store, dependencies); + store.idempotencyRecords.set(command.idempotencyKey, { + idempotencyKey: command.idempotencyKey, + requestHash: command.requestHash, + result, + }); + + return result; +} + +function dispatchCommand( + command: PipelineCommand, + store: InMemoryOrganizationStore, + dependencies: Clock & IdGenerator, +): CommandResult { + if (command.type === CommandType.SendSupervisorSignal) { + return sendSupervisorSignal(command, { + ...dependencies, + store, + }); + } + + return { + status: CommandResultStatus.Rejected, + idempotency: { + replayed: false, + }, + error: { + code: CommandErrorCode.UnsupportedCommand, + message: "unsupported command type", + }, + }; +} diff --git a/agentic-organization/packages/application/src/command-result.ts b/agentic-organization/packages/application/src/command-result.ts new file mode 100644 index 0000000000..49d6d7c88f --- /dev/null +++ b/agentic-organization/packages/application/src/command-result.ts @@ -0,0 +1,28 @@ +import type { SupervisorSignal, WorkItem } from "../../domain/src/index.ts"; + +export const CommandResultStatus = { + Accepted: "accepted", + Rejected: "rejected", +} as const; + +export type CommandResultStatus = (typeof CommandResultStatus)[keyof typeof CommandResultStatus]; + +export const CommandErrorCode = { + IdempotencyConflict: "idempotency_conflict", + UnsupportedCommand: "unsupported_command", +} as const; + +export type CommandErrorCode = (typeof CommandErrorCode)[keyof typeof CommandErrorCode]; + +export type CommandResult = { + status: CommandResultStatus; + workItem?: WorkItem; + supervisorSignal?: SupervisorSignal; + idempotency: { + replayed: boolean; + }; + error?: { + code: CommandErrorCode; + message: string; + }; +}; diff --git a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts new file mode 100644 index 0000000000..2c28dfeafd --- /dev/null +++ b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts @@ -0,0 +1,94 @@ +import { deepEqual, equal, ok } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + CommandType, + SupervisorChainLevel, + SupervisorSignalStatus, + SupervisorSignalToolType, +} from "../../../domain/src/index.ts"; +import { createInMemoryOrganizationStore } from "../../../state/src/index.ts"; +import { CommandResultStatus, type CommandResult } from "../command-result.ts"; +import { sendSupervisorSignal, type SendSupervisorSignalCommand } from "./send-supervisor-signal.ts"; + +const command: SendSupervisorSignalCommand = { + commandId: "cmd-supervisor-signal-001", + type: CommandType.SendSupervisorSignal, + idempotencyKey: "idem-supervisor-signal-001", + requestHash: "hash-supervisor-signal-001", + correlationId: "corr-supervisor-signal-001", + causationId: "cause-team-work-001", + traceId: "trace-supervisor-signal-001", + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatAssignmentId: "hat-assignment-em-001", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + toolType: SupervisorSignalToolType.ReportBlocker, + title: "Blocked on scoped NATS publisher", + message: "The team cannot validate the outbox worker until a supervisor routes a scoped NATS publisher decision.", + relatedWorkItemId: "work-outbox-001", +}; + +describe("send supervisor signal handler", () => { + test("persists chain communication, audit event, and outbox event atomically", () => { + const store = createInMemoryOrganizationStore(); + + const result = sendSupervisorSignal(command, { + store, + now: () => "2026-05-25T20:00:00.000Z", + createId: (prefix) => `${prefix}-001`, + }); + + equal(result.status, CommandResultStatus.Accepted); + ok(result.supervisorSignal); + equal(result.supervisorSignal.status, SupervisorSignalStatus.Sent); + equal(store.supervisorSignals.length, 1); + equal(store.workItems.length, 0); + equal(store.auditEvents.length, 1); + equal(store.outboxEvents.length, 1); + deepEqual(store.outboxEvents[0]?.envelope, { + eventId: "evt-001", + eventType: AgenticEventType.SupervisorSignalSent, + schemaVersion: "agentic.org.event.v1", + occurredAt: "2026-05-25T20:00:00.000Z", + scope: { + organizationId: command.organizationId, + projectId: command.projectId, + teamId: command.teamId, + workItemId: command.relatedWorkItemId, + }, + actor: command.actor, + aggregate: { + aggregateId: result.supervisorSignal.supervisorSignalId, + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: command.commandId, + correlationId: command.correlationId, + causationId: command.causationId, + traceId: command.traceId, + idempotencyKey: command.idempotencyKey, + }, + replay: { + isReplay: false, + }, + payload: { + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatAssignmentId: command.targetHatAssignmentId, + toolType: SupervisorSignalToolType.ReportBlocker, + status: SupervisorSignalStatus.Sent, + title: command.title, + }, + }); + }); +}); diff --git a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts new file mode 100644 index 0000000000..85e207077e --- /dev/null +++ b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts @@ -0,0 +1,130 @@ +import { + AgenticAggregateType, + AgenticEventType, + CommandType, + SupervisorSignalStatus, + type AgenticActor, + type AuditEvent, + type OutboxEvent, + type SupervisorChainLevel, + type SupervisorSignal, + type SupervisorSignalToolType, +} from "../../../domain/src/index.ts"; +import { createAgenticEventEnvelope } from "../../../domain/src/index.ts"; +import type { InMemoryOrganizationStore } from "../../../state/src/index.ts"; +import { CommandResultStatus, type CommandResult } from "../command-result.ts"; +import type { Clock, IdGenerator } from "../ports.ts"; + +export const IdPrefix = { + SupervisorSignal: "supervisor-signal", + AuditEvent: "audit", + OutboxEvent: "outbox", + Event: "evt", +} as const; + +export type IdPrefix = (typeof IdPrefix)[keyof typeof IdPrefix]; + +export type SendSupervisorSignalCommand = { + commandId: string; + type: typeof CommandType.SendSupervisorSignal; + idempotencyKey: string; + requestHash: string; + correlationId: string; + causationId: string; + traceId: string; + organizationId: string; + projectId: string; + teamId: string; + sourceLevel: SupervisorChainLevel; + targetLevel: SupervisorChainLevel; + targetHatAssignmentId: string; + actor: AgenticActor; + toolType: SupervisorSignalToolType; + title: string; + message: string; + relatedWorkItemId: string; +}; + +export type SendSupervisorSignalDependencies = Clock & + IdGenerator & { + store: InMemoryOrganizationStore; + }; + +export function sendSupervisorSignal( + command: SendSupervisorSignalCommand, + dependencies: SendSupervisorSignalDependencies, +): CommandResult { + const occurredAt = dependencies.now(); + const supervisorSignal: SupervisorSignal = { + supervisorSignalId: dependencies.createId(IdPrefix.SupervisorSignal), + organizationId: command.organizationId, + projectId: command.projectId, + teamId: command.teamId, + sourceLevel: command.sourceLevel, + targetLevel: command.targetLevel, + targetHatAssignmentId: command.targetHatAssignmentId, + sender: command.actor, + toolType: command.toolType, + status: SupervisorSignalStatus.Sent, + title: command.title, + message: command.message, + relatedWorkItemId: command.relatedWorkItemId, + createdAt: occurredAt, + }; + + const auditEvent: AuditEvent = { + auditEventId: dependencies.createId(IdPrefix.AuditEvent), + eventName: AgenticEventType.SupervisorSignalSent, + aggregateId: supervisorSignal.supervisorSignalId, + actor: command.actor, + occurredAt, + }; + + const outboxEvent: OutboxEvent = { + outboxEventId: dependencies.createId(IdPrefix.OutboxEvent), + envelope: createAgenticEventEnvelope({ + eventId: dependencies.createId(IdPrefix.Event), + eventType: AgenticEventType.SupervisorSignalSent, + occurredAt, + actor: command.actor, + scope: { + organizationId: command.organizationId, + projectId: command.projectId, + teamId: command.teamId, + workItemId: command.relatedWorkItemId, + }, + aggregate: { + aggregateId: supervisorSignal.supervisorSignalId, + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: command.commandId, + correlationId: command.correlationId, + causationId: command.causationId, + traceId: command.traceId, + idempotencyKey: command.idempotencyKey, + }, + payload: { + sourceLevel: command.sourceLevel, + targetLevel: command.targetLevel, + targetHatAssignmentId: command.targetHatAssignmentId, + toolType: command.toolType, + status: SupervisorSignalStatus.Sent, + title: command.title, + }, + }), + }; + + dependencies.store.supervisorSignals.push(supervisorSignal); + dependencies.store.auditEvents.push(auditEvent); + dependencies.store.outboxEvents.push(outboxEvent); + + return { + status: CommandResultStatus.Accepted, + supervisorSignal, + idempotency: { + replayed: false, + }, + }; +} diff --git a/agentic-organization/packages/application/src/index.ts b/agentic-organization/packages/application/src/index.ts new file mode 100644 index 0000000000..b02bc74402 --- /dev/null +++ b/agentic-organization/packages/application/src/index.ts @@ -0,0 +1,8 @@ +export { createCommandPipeline, type CommandPipeline, type PipelineCommand } from "./command-pipeline.ts"; +export { CommandErrorCode, CommandResultStatus, type CommandResult } from "./command-result.ts"; +export { + sendSupervisorSignal, + type IdPrefix, + type SendSupervisorSignalCommand, + type SendSupervisorSignalDependencies, +} from "./handlers/send-supervisor-signal.ts"; diff --git a/agentic-organization/packages/application/src/ports.ts b/agentic-organization/packages/application/src/ports.ts new file mode 100644 index 0000000000..ba623a3ec8 --- /dev/null +++ b/agentic-organization/packages/application/src/ports.ts @@ -0,0 +1,7 @@ +export type Clock = { + now: () => string; +}; + +export type IdGenerator = { + createId: (prefix: string) => string; +}; diff --git a/agentic-organization/packages/domain/src/event-envelope.test.ts b/agentic-organization/packages/domain/src/event-envelope.test.ts new file mode 100644 index 0000000000..8bea403c2b --- /dev/null +++ b/agentic-organization/packages/domain/src/event-envelope.test.ts @@ -0,0 +1,114 @@ +import { deepEqual, equal, throws } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + createAgenticEventEnvelope, + type CommandTrace, +} from "./event-envelope.ts"; + +const commandTrace: CommandTrace = { + commandId: "cmd-capability-001", + correlationId: "corr-capability-001", + causationId: "cause-intake-001", + traceId: "trace-capability-001", + idempotencyKey: "idem-capability-001", +}; + +describe("canonical agentic event envelope", () => { + test("requires the command trace chain", () => { + throws( + () => + createAgenticEventEnvelope({ + eventId: "evt-capability-001", + eventType: AgenticEventType.WorkItemChanged, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-addison", + hatAssignmentId: "hat-assignment-em-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + workItemId: "work-capability-001", + }, + aggregate: { + aggregateId: "work-capability-001", + aggregateType: AgenticAggregateType.WorkItem, + aggregateVersion: 1, + }, + trace: { + ...commandTrace, + commandId: "", + }, + payload: { + title: "Request NATS publishing capability", + }, + }), + /commandId/, + ); + }); + + test("requires a work item anchor", () => { + throws( + () => + createAgenticEventEnvelope({ + eventId: "evt-capability-001", + eventType: AgenticEventType.WorkItemChanged, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-addison", + hatAssignmentId: "hat-assignment-em-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + }, + aggregate: { + aggregateId: "work-capability-001", + aggregateType: AgenticAggregateType.WorkItem, + aggregateVersion: 1, + }, + trace: commandTrace, + payload: { + title: "Request NATS publishing capability", + }, + } as Parameters[0]), + /scope.workItemId/, + ); + }); + + test("builds a typed replay-safe event envelope", () => { + const envelope = createAgenticEventEnvelope({ + eventId: "evt-capability-001", + eventType: AgenticEventType.WorkItemChanged, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-addison", + hatAssignmentId: "hat-assignment-em-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + workItemId: "work-capability-001", + }, + aggregate: { + aggregateId: "work-capability-001", + aggregateType: AgenticAggregateType.WorkItem, + aggregateVersion: 1, + }, + trace: commandTrace, + payload: { + title: "Request NATS publishing capability", + }, + }); + + equal(envelope.schemaVersion, "agentic.org.event.v1"); + equal(envelope.eventType, AgenticEventType.WorkItemChanged); + equal(envelope.aggregate.aggregateVersion, 1); + deepEqual(envelope.replay, { + isReplay: false, + }); + }); +}); diff --git a/agentic-organization/packages/domain/src/event-envelope.ts b/agentic-organization/packages/domain/src/event-envelope.ts new file mode 100644 index 0000000000..c94db21867 --- /dev/null +++ b/agentic-organization/packages/domain/src/event-envelope.ts @@ -0,0 +1,121 @@ +export const EventSchemaVersion = { + AgenticOrgEventV1: "agentic.org.event.v1", +} as const; + +export type EventSchemaVersion = (typeof EventSchemaVersion)[keyof typeof EventSchemaVersion]; + +export const CommandType = { + SendSupervisorSignal: "send_supervisor_signal", +} as const; + +export type CommandType = (typeof CommandType)[keyof typeof CommandType]; + +export const AgenticEventType = { + SupervisorSignalSent: "supervisor_signal.sent", + WorkItemChanged: "work_item.changed", + WorkItemStateChanged: "work_item.state_changed", +} as const; + +export type AgenticEventType = (typeof AgenticEventType)[keyof typeof AgenticEventType]; + +export const AgenticAggregateType = { + SupervisorSignal: "supervisor_signal", + WorkItem: "work_item", +} as const; + +export type AgenticAggregateType = (typeof AgenticAggregateType)[keyof typeof AgenticAggregateType]; + +export type CommandTrace = { + commandId: string; + correlationId: string; + causationId: string; + traceId: string; + idempotencyKey: string; +}; + +export type AgenticActor = { + agentId: string; + hatAssignmentId: string; +}; + +export type AgenticScope = { + organizationId: string; + projectId: string; + initiativeId?: string; + teamId?: string; + workItemId: string; +}; + +export type AgenticAggregate = { + aggregateId: string; + aggregateType: AgenticAggregateType; + aggregateVersion: number; +}; + +export type AgenticReplayState = { + isReplay: boolean; +}; + +export type AgenticEventEnvelope> = { + eventId: string; + eventType: AgenticEventType; + schemaVersion: EventSchemaVersion; + occurredAt: string; + actor: AgenticActor; + scope: AgenticScope; + aggregate: AgenticAggregate; + trace: CommandTrace; + replay: AgenticReplayState; + payload: Payload; +}; + +export type CreateAgenticEventEnvelopeInput> = Omit< + AgenticEventEnvelope, + "schemaVersion" | "replay" +> & { + schemaVersion?: EventSchemaVersion; + replay?: AgenticReplayState; +}; + +export function createAgenticEventEnvelope( + input: CreateAgenticEventEnvelopeInput, +): AgenticEventEnvelope { + assertNonEmpty("eventId", input.eventId); + assertNonEmpty("occurredAt", input.occurredAt); + assertCommandTrace(input.trace); + assertNonEmpty("actor.agentId", input.actor.agentId); + assertNonEmpty("actor.hatAssignmentId", input.actor.hatAssignmentId); + assertNonEmpty("scope.organizationId", input.scope.organizationId); + assertNonEmpty("scope.projectId", input.scope.projectId); + assertNonEmpty("scope.workItemId", input.scope.workItemId); + assertNonEmpty("aggregate.aggregateId", input.aggregate.aggregateId); + assertPositiveInteger("aggregate.aggregateVersion", input.aggregate.aggregateVersion); + + return { + ...input, + schemaVersion: input.schemaVersion ?? EventSchemaVersion.AgenticOrgEventV1, + replay: input.replay ?? { + isReplay: false, + }, + }; +} + +function assertCommandTrace(trace: CommandTrace): void { + assertNonEmpty("commandId", trace.commandId); + assertNonEmpty("correlationId", trace.correlationId); + assertNonEmpty("causationId", trace.causationId); + assertNonEmpty("traceId", trace.traceId); + assertNonEmpty("idempotencyKey", trace.idempotencyKey); +} + +function assertNonEmpty(fieldName: string, value: string | undefined): void { + if (value === undefined || value.trim().length === 0) { + throw new Error(`${fieldName} is required`); + } +} + +function assertPositiveInteger(fieldName: string, value: number): void { + if (!Number.isInteger(value) || value < 1) { + throw new Error(`${fieldName} must be a positive integer`); + } +} diff --git a/agentic-organization/packages/domain/src/hat-communication-brief.test.ts b/agentic-organization/packages/domain/src/hat-communication-brief.test.ts new file mode 100644 index 0000000000..093a9449fa --- /dev/null +++ b/agentic-organization/packages/domain/src/hat-communication-brief.test.ts @@ -0,0 +1,34 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { DefaultTeamMemberSupervisorTools, buildHatCommunicationBrief } from "./hat-communication-brief.ts"; +import { SupervisorChainLevel, SupervisorSignalToolType } from "./supervisor-communication.ts"; + +describe("hat communication brief", () => { + test("explains duty, supervisor line, and efficient upward tools", () => { + const brief = buildHatCommunicationBrief({ + hatId: "developer", + duty: "Implement scoped work, surface blockers quickly, and keep evidence attached to the work item.", + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatId: "engineering-manager", + availableTools: DefaultTeamMemberSupervisorTools, + }); + + equal(brief.supervisor.targetLevel, SupervisorChainLevel.Manager); + deepEqual( + brief.availableTools.map((tool) => tool.toolType), + [ + SupervisorSignalToolType.AskQuestion, + SupervisorSignalToolType.ReportBlocker, + SupervisorSignalToolType.RequestDecision, + SupervisorSignalToolType.RequestResource, + SupervisorSignalToolType.RequestReview, + SupervisorSignalToolType.ReportRisk, + SupervisorSignalToolType.SuggestImprovement, + SupervisorSignalToolType.RequestEscalation, + ], + ); + equal(brief.availableTools[1]?.useWhen, "work cannot move without supervisor triage or routing"); + }); +}); diff --git a/agentic-organization/packages/domain/src/hat-communication-brief.ts b/agentic-organization/packages/domain/src/hat-communication-brief.ts new file mode 100644 index 0000000000..555211c494 --- /dev/null +++ b/agentic-organization/packages/domain/src/hat-communication-brief.ts @@ -0,0 +1,101 @@ +import { + SupervisorChainLevel, + SupervisorSignalToolType, + type SupervisorSignalToolType as SupervisorSignalToolTypeValue, +} from "./supervisor-communication.ts"; + +export type SupervisorSignalToolBrief = { + toolType: SupervisorSignalToolTypeValue; + useWhen: string; + requiredEvidence: readonly string[]; +}; + +export type HatCommunicationBrief = { + hatId: string; + duty: string; + sourceLevel: SupervisorChainLevel; + supervisor: { + targetLevel: SupervisorChainLevel; + targetHatId: string; + }; + availableTools: readonly SupervisorSignalToolBrief[]; +}; + +export type BuildHatCommunicationBriefInput = { + hatId: string; + duty: string; + sourceLevel: SupervisorChainLevel; + targetLevel: SupervisorChainLevel; + targetHatId: string; + availableTools: readonly SupervisorSignalToolBrief[]; +}; + +export function buildHatCommunicationBrief(input: BuildHatCommunicationBriefInput): HatCommunicationBrief { + assertNonEmpty("hatId", input.hatId); + assertNonEmpty("duty", input.duty); + assertNonEmpty("targetHatId", input.targetHatId); + + if (input.availableTools.length === 0) { + throw new Error("hat communication brief requires at least one tool"); + } + + return { + hatId: input.hatId, + duty: input.duty, + sourceLevel: input.sourceLevel, + supervisor: { + targetLevel: input.targetLevel, + targetHatId: input.targetHatId, + }, + availableTools: input.availableTools, + }; +} + +export const DefaultTeamMemberSupervisorTools = [ + { + toolType: SupervisorSignalToolType.AskQuestion, + useWhen: "clarification is needed before continuing scoped work", + requiredEvidence: ["question", "current work context"], + }, + { + toolType: SupervisorSignalToolType.ReportBlocker, + useWhen: "work cannot move without supervisor triage or routing", + requiredEvidence: ["blocking condition", "attempted workaround"], + }, + { + toolType: SupervisorSignalToolType.RequestDecision, + useWhen: "multiple valid paths exist and authority sits above the hat", + requiredEvidence: ["options", "recommended path", "tradeoffs"], + }, + { + toolType: SupervisorSignalToolType.RequestResource, + useWhen: "work needs additional hats, time, budget, infrastructure, or access", + requiredEvidence: ["resource needed", "work impact", "urgency"], + }, + { + toolType: SupervisorSignalToolType.RequestReview, + useWhen: "a supervisor or reviewer decision is needed before lifecycle progress", + requiredEvidence: ["review target", "acceptance criteria", "evidence"], + }, + { + toolType: SupervisorSignalToolType.ReportRisk, + useWhen: "risk could affect scope, schedule, quality, security, or cost", + requiredEvidence: ["risk", "impact", "mitigation"], + }, + { + toolType: SupervisorSignalToolType.SuggestImprovement, + useWhen: "the hat sees a process, memory, prompt-flow, tool, or workflow gap", + requiredEvidence: ["observed friction", "repeatability", "suggested improvement"], + }, + { + toolType: SupervisorSignalToolType.RequestEscalation, + useWhen: "the current supervisor level cannot resolve the issue alone", + requiredEvidence: ["why escalation is needed", "prior triage", "requested level"], + }, +] as const satisfies readonly SupervisorSignalToolBrief[]; + +function assertNonEmpty(fieldName: string, value: string): void { + if (value.trim().length === 0) { + throw new Error(`${fieldName} is required`); + } +} diff --git a/agentic-organization/packages/domain/src/index.ts b/agentic-organization/packages/domain/src/index.ts new file mode 100644 index 0000000000..f4821b78a8 --- /dev/null +++ b/agentic-organization/packages/domain/src/index.ts @@ -0,0 +1,36 @@ +export { + AgenticAggregateType, + AgenticEventType, + CommandType, + EventSchemaVersion, + createAgenticEventEnvelope, + type AgenticActor, + type AgenticAggregate, + type AgenticEventEnvelope, + type AgenticReplayState, + type AgenticScope, + type CommandTrace, + type CreateAgenticEventEnvelopeInput, +} from "./event-envelope.ts"; +export { WorkItemState, assertWorkItemTransition, createInitialWorkItemState } from "./work-item-state-machine.ts"; +export { + SupervisorChainLevel, + SupervisorSignalStatus, + SupervisorSignalToolType, + SupervisorTriageActionType, +} from "./supervisor-communication.ts"; +export { + DefaultTeamMemberSupervisorTools, + buildHatCommunicationBrief, + type BuildHatCommunicationBriefInput, + type HatCommunicationBrief, + type SupervisorSignalToolBrief, +} from "./hat-communication-brief.ts"; +export type { + AuditEvent, + DiscussionAnchor, + IdempotencyRecord, + OutboxEvent, + SupervisorSignal, + WorkItem, +} from "./records.ts"; diff --git a/agentic-organization/packages/domain/src/records.ts b/agentic-organization/packages/domain/src/records.ts new file mode 100644 index 0000000000..17edc16e7f --- /dev/null +++ b/agentic-organization/packages/domain/src/records.ts @@ -0,0 +1,63 @@ +import type { AgenticActor, AgenticEventEnvelope } from "./event-envelope.ts"; +import type { + SupervisorChainLevel, + SupervisorSignalStatus, + SupervisorSignalToolType, +} from "./supervisor-communication.ts"; +import type { WorkItemState } from "./work-item-state-machine.ts"; + +export type WorkItem = { + workItemId: string; + organizationId: string; + projectId: string; + title: string; + description: string; + state: WorkItemState; + createdAt: string; + createdBy: AgenticActor; +}; + +export type SupervisorSignal = { + supervisorSignalId: string; + organizationId: string; + projectId: string; + teamId: string; + sourceLevel: SupervisorChainLevel; + targetLevel: SupervisorChainLevel; + targetHatAssignmentId: string; + sender: AgenticActor; + toolType: SupervisorSignalToolType; + status: SupervisorSignalStatus; + title: string; + message: string; + relatedWorkItemId: string; + createdAt: string; +}; + +export type DiscussionAnchor = { + discussionAnchorId: string; + workItemId: string; + organizationId: string; + projectId: string; + createdAt: string; +}; + +export type AuditEvent = { + auditEventId: string; + eventName: string; + aggregateId: string; + actor: AgenticActor; + occurredAt: string; +}; + +export type OutboxEvent = { + outboxEventId: string; + envelope: AgenticEventEnvelope; + publishedAt?: string; +}; + +export type IdempotencyRecord = { + idempotencyKey: string; + requestHash: string; + result: Result; +}; diff --git a/agentic-organization/packages/domain/src/supervisor-communication.ts b/agentic-organization/packages/domain/src/supervisor-communication.ts new file mode 100644 index 0000000000..fae89482d0 --- /dev/null +++ b/agentic-organization/packages/domain/src/supervisor-communication.ts @@ -0,0 +1,43 @@ +export const SupervisorChainLevel = { + TeamMember: "team_member", + Manager: "manager", + Director: "director", + CSuite: "c_suite", + ExecutiveBoard: "executive_board", +} as const; + +export type SupervisorChainLevel = (typeof SupervisorChainLevel)[keyof typeof SupervisorChainLevel]; + +export const SupervisorSignalToolType = { + AskQuestion: "ask_question", + ReportBlocker: "report_blocker", + RequestDecision: "request_decision", + RequestResource: "request_resource", + RequestReview: "request_review", + ReportRisk: "report_risk", + SuggestImprovement: "suggest_improvement", + RequestEscalation: "request_escalation", +} as const; + +export type SupervisorSignalToolType = (typeof SupervisorSignalToolType)[keyof typeof SupervisorSignalToolType]; + +export const SupervisorSignalStatus = { + Sent: "sent", + Acknowledged: "acknowledged", + Triaged: "triaged", + Routed: "routed", + Closed: "closed", +} as const; + +export type SupervisorSignalStatus = (typeof SupervisorSignalStatus)[keyof typeof SupervisorSignalStatus]; + +export const SupervisorTriageActionType = { + AnswerDirectly: "answer_directly", + OpenWorkItem: "open_work_item", + EscalateToNextSupervisor: "escalate_to_next_supervisor", + RequestSecurityReview: "request_security_review", + ScheduleDiscussion: "schedule_discussion", + RouteToInternalPlatform: "route_to_internal_platform", +} as const; + +export type SupervisorTriageActionType = (typeof SupervisorTriageActionType)[keyof typeof SupervisorTriageActionType]; diff --git a/agentic-organization/packages/domain/src/work-item-state-machine.test.ts b/agentic-organization/packages/domain/src/work-item-state-machine.test.ts new file mode 100644 index 0000000000..d7a7987f3b --- /dev/null +++ b/agentic-organization/packages/domain/src/work-item-state-machine.test.ts @@ -0,0 +1,20 @@ +import { equal, throws } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { WorkItemState, assertWorkItemTransition, createInitialWorkItemState } from "./work-item-state-machine.ts"; + +describe("work item state machine", () => { + test("new work starts in the typed new state", () => { + equal(createInitialWorkItemState(), WorkItemState.New); + }); + + test("allows only explicit typed lifecycle transitions", () => { + assertDoesNotThrow(() => assertWorkItemTransition(WorkItemState.New, WorkItemState.Triage)); + + throws(() => assertWorkItemTransition(WorkItemState.New, WorkItemState.Approved), /illegal work item transition/); + }); +}); + +function assertDoesNotThrow(action: () => void): void { + action(); +} diff --git a/agentic-organization/packages/domain/src/work-item-state-machine.ts b/agentic-organization/packages/domain/src/work-item-state-machine.ts new file mode 100644 index 0000000000..df259082aa --- /dev/null +++ b/agentic-organization/packages/domain/src/work-item-state-machine.ts @@ -0,0 +1,24 @@ +export const WorkItemState = { + New: "new", + Triage: "triage", + Ready: "ready", + Approved: "approved", +} as const; + +export type WorkItemState = (typeof WorkItemState)[keyof typeof WorkItemState]; + +const allowedTransitions = new Map>([ + [WorkItemState.New, new Set([WorkItemState.Triage])], + [WorkItemState.Triage, new Set([WorkItemState.Ready])], + [WorkItemState.Ready, new Set([WorkItemState.Approved])], +]); + +export function createInitialWorkItemState(): WorkItemState { + return WorkItemState.New; +} + +export function assertWorkItemTransition(fromState: WorkItemState, toState: WorkItemState): void { + if (!allowedTransitions.get(fromState)?.has(toState)) { + throw new Error(`illegal work item transition from ${fromState} to ${toState}`); + } +} diff --git a/agentic-organization/packages/messaging/src/index.ts b/agentic-organization/packages/messaging/src/index.ts new file mode 100644 index 0000000000..134ca99eb0 --- /dev/null +++ b/agentic-organization/packages/messaging/src/index.ts @@ -0,0 +1 @@ +export { AgenticSubjectPrefix, buildAgenticEventSubject, type AgenticEventSubjectInput } from "./subject-builder.ts"; diff --git a/agentic-organization/packages/messaging/src/subject-builder.test.ts b/agentic-organization/packages/messaging/src/subject-builder.test.ts new file mode 100644 index 0000000000..49b21bf303 --- /dev/null +++ b/agentic-organization/packages/messaging/src/subject-builder.test.ts @@ -0,0 +1,19 @@ +import { equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { AgenticEventType } from "../../domain/src/index.ts"; +import { buildAgenticEventSubject } from "./subject-builder.ts"; + +describe("agentic event NATS subjects", () => { + test("uses a stable organization-scoped subject shape", () => { + equal( + buildAgenticEventSubject({ + environment: "dev", + organizationId: "org-lfg", + domain: "work", + eventType: AgenticEventType.WorkItemChanged, + }), + "agentic-org.dev.org-lfg.work.work_item.changed", + ); + }); +}); diff --git a/agentic-organization/packages/messaging/src/subject-builder.ts b/agentic-organization/packages/messaging/src/subject-builder.ts new file mode 100644 index 0000000000..d35b13c899 --- /dev/null +++ b/agentic-organization/packages/messaging/src/subject-builder.ts @@ -0,0 +1,30 @@ +import type { AgenticEventType } from "../../domain/src/index.ts"; + +export const AgenticSubjectPrefix = { + Root: "agentic-org", +} as const; + +export type AgenticSubjectPrefix = (typeof AgenticSubjectPrefix)[keyof typeof AgenticSubjectPrefix]; + +export type AgenticEventSubjectInput = { + environment: string; + organizationId: string; + domain: string; + eventType: AgenticEventType; +}; + +export function buildAgenticEventSubject(input: AgenticEventSubjectInput): string { + const segments = [AgenticSubjectPrefix.Root, input.environment, input.organizationId, input.domain, input.eventType]; + + for (const segment of segments) { + assertSubjectSegment(segment); + } + + return segments.join("."); +} + +function assertSubjectSegment(segment: string): void { + if (!/^[a-zA-Z0-9_-]+(?:\.[a-zA-Z0-9_-]+)*$/.test(segment)) { + throw new Error(`invalid NATS subject segment: ${segment}`); + } +} diff --git a/agentic-organization/packages/observability/src/index.ts b/agentic-organization/packages/observability/src/index.ts new file mode 100644 index 0000000000..436cee48cc --- /dev/null +++ b/agentic-organization/packages/observability/src/index.ts @@ -0,0 +1,7 @@ +export { + AgenticSpanAttributeKey, + MessagingSystemName, + buildAgenticSpanAttributes, + type AgenticSpanAttributes, + type BuildAgenticSpanAttributesInput, +} from "./span-attributes.ts"; diff --git a/agentic-organization/packages/observability/src/span-attributes.test.ts b/agentic-organization/packages/observability/src/span-attributes.test.ts new file mode 100644 index 0000000000..a144a6c426 --- /dev/null +++ b/agentic-organization/packages/observability/src/span-attributes.test.ts @@ -0,0 +1,69 @@ +import { deepEqual } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + WorkItemState, + createAgenticEventEnvelope, +} from "../../domain/src/index.ts"; +import { MessagingSystemName, buildAgenticSpanAttributes } from "./span-attributes.ts"; + +describe("agentic observability span attributes", () => { + test("projects event context into LGTM-friendly OpenTelemetry attributes", () => { + const envelope = createAgenticEventEnvelope({ + eventId: "evt-capability-001", + eventType: AgenticEventType.WorkItemChanged, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-addison", + hatAssignmentId: "hat-assignment-em-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + workItemId: "work-capability-001", + }, + aggregate: { + aggregateId: "work-capability-001", + aggregateType: AgenticAggregateType.WorkItem, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-capability-001", + correlationId: "corr-capability-001", + causationId: "cause-intake-001", + traceId: "trace-capability-001", + idempotencyKey: "idem-capability-001", + }, + payload: { + state: WorkItemState.New, + }, + }); + + deepEqual( + buildAgenticSpanAttributes(envelope, { + natsSubject: "agentic-org.dev.org-lfg.work.work_item.changed", + }), + { + "agentic.event.id": "evt-capability-001", + "agentic.event.type": AgenticEventType.WorkItemChanged, + "agentic.command.id": "cmd-capability-001", + "agentic.correlation.id": "corr-capability-001", + "agentic.causation.id": "cause-intake-001", + "agentic.trace.id": "trace-capability-001", + "agentic.idempotency.key": "idem-capability-001", + "agentic.agent.id": "agent-addison", + "agentic.hat.assignment.id": "hat-assignment-em-001", + "agentic.organization.id": "org-lfg", + "agentic.project.id": "project-agentic-org", + "agentic.work_item.id": "work-capability-001", + "agentic.aggregate.id": "work-capability-001", + "agentic.aggregate.type": AgenticAggregateType.WorkItem, + "agentic.aggregate.version": 1, + "messaging.system": MessagingSystemName.Nats, + "messaging.destination.name": "agentic-org.dev.org-lfg.work.work_item.changed", + }, + ); + }); +}); diff --git a/agentic-organization/packages/observability/src/span-attributes.ts b/agentic-organization/packages/observability/src/span-attributes.ts new file mode 100644 index 0000000000..96e299350d --- /dev/null +++ b/agentic-organization/packages/observability/src/span-attributes.ts @@ -0,0 +1,70 @@ +import type { AgenticEventEnvelope } from "../../domain/src/index.ts"; + +export const AgenticSpanAttributeKey = { + EventId: "agentic.event.id", + EventType: "agentic.event.type", + CommandId: "agentic.command.id", + CorrelationId: "agentic.correlation.id", + CausationId: "agentic.causation.id", + TraceId: "agentic.trace.id", + IdempotencyKey: "agentic.idempotency.key", + AgentId: "agentic.agent.id", + HatAssignmentId: "agentic.hat.assignment.id", + OrganizationId: "agentic.organization.id", + ProjectId: "agentic.project.id", + TeamId: "agentic.team.id", + WorkItemId: "agentic.work_item.id", + AggregateId: "agentic.aggregate.id", + AggregateType: "agentic.aggregate.type", + AggregateVersion: "agentic.aggregate.version", + MessagingSystem: "messaging.system", + MessagingDestinationName: "messaging.destination.name", +} as const; + +export type AgenticSpanAttributeKey = (typeof AgenticSpanAttributeKey)[keyof typeof AgenticSpanAttributeKey]; + +export const MessagingSystemName = { + Nats: "nats", +} as const; + +export type MessagingSystemName = (typeof MessagingSystemName)[keyof typeof MessagingSystemName]; + +export type AgenticSpanAttributes = Partial>; + +export type BuildAgenticSpanAttributesInput = { + natsSubject: string; +}; + +export function buildAgenticSpanAttributes( + envelope: AgenticEventEnvelope, + input: BuildAgenticSpanAttributesInput, +): AgenticSpanAttributes { + const attributes: AgenticSpanAttributes = { + [AgenticSpanAttributeKey.EventId]: envelope.eventId, + [AgenticSpanAttributeKey.EventType]: envelope.eventType, + [AgenticSpanAttributeKey.CommandId]: envelope.trace.commandId, + [AgenticSpanAttributeKey.CorrelationId]: envelope.trace.correlationId, + [AgenticSpanAttributeKey.CausationId]: envelope.trace.causationId, + [AgenticSpanAttributeKey.TraceId]: envelope.trace.traceId, + [AgenticSpanAttributeKey.IdempotencyKey]: envelope.trace.idempotencyKey, + [AgenticSpanAttributeKey.AgentId]: envelope.actor.agentId, + [AgenticSpanAttributeKey.HatAssignmentId]: envelope.actor.hatAssignmentId, + [AgenticSpanAttributeKey.OrganizationId]: envelope.scope.organizationId, + [AgenticSpanAttributeKey.ProjectId]: envelope.scope.projectId, + [AgenticSpanAttributeKey.AggregateId]: envelope.aggregate.aggregateId, + [AgenticSpanAttributeKey.AggregateType]: envelope.aggregate.aggregateType, + [AgenticSpanAttributeKey.AggregateVersion]: envelope.aggregate.aggregateVersion, + [AgenticSpanAttributeKey.MessagingSystem]: MessagingSystemName.Nats, + [AgenticSpanAttributeKey.MessagingDestinationName]: input.natsSubject, + }; + + if (envelope.scope.teamId !== undefined) { + attributes[AgenticSpanAttributeKey.TeamId] = envelope.scope.teamId; + } + + if (envelope.scope.workItemId !== undefined) { + attributes[AgenticSpanAttributeKey.WorkItemId] = envelope.scope.workItemId; + } + + return attributes; +} diff --git a/agentic-organization/packages/runtime/src/event-automation.test.ts b/agentic-organization/packages/runtime/src/event-automation.test.ts new file mode 100644 index 0000000000..79f15dd8fc --- /dev/null +++ b/agentic-organization/packages/runtime/src/event-automation.test.ts @@ -0,0 +1,67 @@ +import { deepEqual } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + SupervisorChainLevel, + SupervisorSignalStatus, + SupervisorSignalToolType, + createAgenticEventEnvelope, +} from "../../domain/src/index.ts"; +import { ReactionPlanActionType, ReactionPlanReason, RequiredHat, evaluateV0AutomationRules } from "./reaction-plan.ts"; + +describe("v0 event automation rules", () => { + test("plans target-supervisor triage when a hat sends an upward signal", () => { + const envelope = createAgenticEventEnvelope({ + eventId: "evt-supervisor-signal-001", + eventType: AgenticEventType.SupervisorSignalSent, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-supervisor-signal-001", + correlationId: "corr-supervisor-signal-001", + causationId: "cause-team-work-001", + traceId: "trace-supervisor-signal-001", + idempotencyKey: "idem-supervisor-signal-001", + }, + payload: { + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatAssignmentId: "hat-assignment-em-001", + toolType: SupervisorSignalToolType.ReportBlocker, + status: SupervisorSignalStatus.Sent, + title: "Blocked on scoped NATS publisher", + }, + }); + + deepEqual(evaluateV0AutomationRules(envelope), [ + { + actionType: ReactionPlanActionType.CreateSupervisorTriage, + triggerEventId: "evt-supervisor-signal-001", + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + supervisorSignalId: "supervisor-signal-001", + targetLevel: SupervisorChainLevel.Manager, + requiredHat: RequiredHat.EngineeringManager, + reason: ReactionPlanReason.SupervisorSignalNeedsTriage, + }, + ]); + }); +}); diff --git a/agentic-organization/packages/runtime/src/index.ts b/agentic-organization/packages/runtime/src/index.ts new file mode 100644 index 0000000000..d531ba46e4 --- /dev/null +++ b/agentic-organization/packages/runtime/src/index.ts @@ -0,0 +1,7 @@ +export { + ReactionPlanActionType, + ReactionPlanReason, + RequiredHat, + evaluateV0AutomationRules, + type ReactionPlanAction, +} from "./reaction-plan.ts"; diff --git a/agentic-organization/packages/runtime/src/reaction-plan.ts b/agentic-organization/packages/runtime/src/reaction-plan.ts new file mode 100644 index 0000000000..091324ca45 --- /dev/null +++ b/agentic-organization/packages/runtime/src/reaction-plan.ts @@ -0,0 +1,94 @@ +import { AgenticEventType, SupervisorChainLevel, type AgenticEventEnvelope } from "../../domain/src/index.ts"; + +export const ReactionPlanActionType = { + CreateSupervisorTriage: "create_supervisor_triage", + RequestReviewGate: "request_review_gate", +} as const; + +export type ReactionPlanActionType = (typeof ReactionPlanActionType)[keyof typeof ReactionPlanActionType]; + +export const RequiredHat = { + CSuite: "c_suite", + Director: "director", + EngineeringManager: "engineering_manager", + ExecutiveBoard: "executive_board", + Reviewer: "reviewer", +} as const; + +export type RequiredHat = (typeof RequiredHat)[keyof typeof RequiredHat]; + +export const ReactionPlanReason = { + SupervisorSignalNeedsTriage: "supervisor signal needs triage", + WorkItemEnteredReadyState: "work item entered ready state", +} as const; + +export type ReactionPlanReason = (typeof ReactionPlanReason)[keyof typeof ReactionPlanReason]; + +export type ReactionPlanAction = { + actionType: ReactionPlanActionType; + triggerEventId: string; + organizationId: string; + projectId: string; + teamId?: string; + workItemId: string; + supervisorSignalId?: string; + targetLevel?: SupervisorChainLevel; + requiredHat: RequiredHat; + reason: ReactionPlanReason; +}; + +type SupervisorSignalSentPayload = { + targetHatAssignmentId: string; + targetLevel: SupervisorChainLevel; +}; + +export function evaluateV0AutomationRules(envelope: AgenticEventEnvelope): ReactionPlanAction[] { + if ( + envelope.eventType !== AgenticEventType.SupervisorSignalSent || + !isSupervisorSignalSentPayload(envelope.payload) || + envelope.scope.teamId === undefined + ) { + return []; + } + + return [ + { + actionType: ReactionPlanActionType.CreateSupervisorTriage, + triggerEventId: envelope.eventId, + organizationId: envelope.scope.organizationId, + projectId: envelope.scope.projectId, + teamId: envelope.scope.teamId, + workItemId: envelope.scope.workItemId, + supervisorSignalId: envelope.aggregate.aggregateId, + targetLevel: envelope.payload.targetLevel, + requiredHat: mapTargetLevelToHat(envelope.payload.targetLevel), + reason: ReactionPlanReason.SupervisorSignalNeedsTriage, + }, + ]; +} + +function isSupervisorSignalSentPayload(payload: unknown): payload is SupervisorSignalSentPayload { + return ( + typeof payload === "object" && payload !== null && "targetHatAssignmentId" in payload && "targetLevel" in payload + ); +} + +function mapTargetLevelToHat(targetLevel: SupervisorChainLevel): RequiredHat { + if (targetLevel === SupervisorChainLevel.Manager) { + return RequiredHat.EngineeringManager; + } + + if (targetLevel === SupervisorChainLevel.Director) { + return RequiredHat.Director; + } + + if (targetLevel === SupervisorChainLevel.CSuite) { + return RequiredHat.CSuite; + } + + if (targetLevel === SupervisorChainLevel.ExecutiveBoard) { + return RequiredHat.ExecutiveBoard; + } + + return RequiredHat.EngineeringManager; +} diff --git a/agentic-organization/packages/state/src/in-memory-organization-store.ts b/agentic-organization/packages/state/src/in-memory-organization-store.ts new file mode 100644 index 0000000000..a601f9526d --- /dev/null +++ b/agentic-organization/packages/state/src/in-memory-organization-store.ts @@ -0,0 +1,28 @@ +import type { + AuditEvent, + DiscussionAnchor, + IdempotencyRecord, + OutboxEvent, + SupervisorSignal, + WorkItem, +} from "../../domain/src/index.ts"; + +export type InMemoryOrganizationStore = { + workItems: WorkItem[]; + supervisorSignals: SupervisorSignal[]; + discussionAnchors: DiscussionAnchor[]; + auditEvents: AuditEvent[]; + outboxEvents: OutboxEvent[]; + idempotencyRecords: Map>; +}; + +export function createInMemoryOrganizationStore(): InMemoryOrganizationStore { + return { + workItems: [], + supervisorSignals: [], + discussionAnchors: [], + auditEvents: [], + outboxEvents: [], + idempotencyRecords: new Map>(), + }; +} diff --git a/agentic-organization/packages/state/src/index.ts b/agentic-organization/packages/state/src/index.ts new file mode 100644 index 0000000000..02fb4997ef --- /dev/null +++ b/agentic-organization/packages/state/src/index.ts @@ -0,0 +1 @@ +export { createInMemoryOrganizationStore, type InMemoryOrganizationStore } from "./in-memory-organization-store.ts"; diff --git a/agentic-organization/packages/test-node.d.ts b/agentic-organization/packages/test-node.d.ts new file mode 100644 index 0000000000..893420dde2 --- /dev/null +++ b/agentic-organization/packages/test-node.d.ts @@ -0,0 +1,11 @@ +declare module "node:assert/strict" { + export function deepEqual(actual: unknown, expected: unknown): void; + export function equal(actual: unknown, expected: unknown): void; + export function ok(value: unknown): asserts value; + export function throws(action: () => void, expected?: RegExp): void; +} + +declare module "node:test" { + export function describe(name: string, fn: () => void): void; + export function test(name: string, fn: () => void): void; +} diff --git a/agentic-organization/tsconfig.json b/agentic-organization/tsconfig.json new file mode 100644 index 0000000000..db930028ea --- /dev/null +++ b/agentic-organization/tsconfig.json @@ -0,0 +1,20 @@ +{ + "compilerOptions": { + "target": "ES2022", + "module": "NodeNext", + "moduleResolution": "NodeNext", + "strict": true, + "noImplicitOverride": true, + "noUncheckedIndexedAccess": true, + "noUnusedLocals": true, + "noUnusedParameters": true, + "exactOptionalPropertyTypes": true, + "verbatimModuleSyntax": true, + "allowImportingTsExtensions": true, + "isolatedModules": true, + "moduleDetection": "force", + "skipLibCheck": true, + "noEmit": true + }, + "include": ["packages/**/*.ts"] +} diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md new file mode 100644 index 0000000000..67d59f3d2e --- /dev/null +++ b/openspec/specs/agentic-organization/spec.md @@ -0,0 +1,140 @@ +## Purpose + +Agentic Organization defines the command, event, state, idempotency, +telemetry, and automation substrate for an agent-run Organization Work +OS. It is the business runtime above the full-ai-cluster substrate. + +## Requirements + +### Requirement: Lifecycle primitives are generic and expandable + +Agentic Organization MUST prefer generic lifecycle primitives over +hardcoded one-off request tools. + +#### Scenario: Agent discovers a repeated coordination need + +- **WHEN** an agent discovers that its current hat tools are not enough + for repeated work +- **THEN** the agent uses supervisor-chain communication to surface the + need +- **AND** the target supervisor triages whether it should become a new + tool, prompt flow, routing rule, review gate, or specialized + lifecycle +- **AND** any new tool or flow is activated only after the required + review, security, implementation, and outcome-review gates + +### Requirement: Commands are the only business authority + +Organization runtime hosts and adapters MUST change authoritative +Organization state only by calling Organization commands. + +#### Scenario: Hat sends a supervisor signal + +- **WHEN** a runtime host sends a supervisor-chain signal on behalf of + a hat wearer +- **THEN** the request is handled by the Organization command pipeline +- **AND** the command creates supervisor-signal state, audit, outbox, + and idempotency records together +- **AND** the adapter does not mutate authoritative state directly + +### Requirement: Commands are idempotent + +Organization commands MUST use deterministic idempotency keys at the +command boundary. + +#### Scenario: Matching replay + +- **WHEN** a command is submitted twice with the same idempotency key + and request hash +- **THEN** the second execution returns the stored result +- **AND** no duplicate supervisor signal, audit event, or outbox event is + created + +#### Scenario: Conflicting replay + +- **WHEN** a command is submitted with an idempotency key that already + exists for a different request hash +- **THEN** the command is rejected with a typed idempotency conflict +- **AND** no new authoritative state is created + +### Requirement: Events carry traceable envelopes + +Organization domain events MUST carry a canonical envelope with command, +actor, hat, scope, aggregate, replay, and trace fields. + +#### Scenario: Supervisor signal event is created + +- **WHEN** a supervisor signal event is written to the outbox +- **THEN** the event includes event ID, event type, schema version, + occurred-at timestamp, actor agent ID, hat assignment ID, + organization ID, project ID, team ID, work item ID, aggregate ID, + aggregate type, aggregate version, command ID, correlation ID, + causation ID, trace ID, and idempotency key +- **AND** the event is replay-aware + +### Requirement: Hats expose clear communication lines + +Hats MUST expose a communication brief that tells the wearer their +duty, supervisor line, available upward tools, when to use each tool, +and required evidence. + +#### Scenario: Hat receives its communication brief + +- **WHEN** a hat context is prepared for an agent +- **THEN** the context includes the hat duty statement +- **AND** the context identifies the next supervisor level and target + supervisor hat +- **AND** the context lists typed upward tools such as ask question, + report blocker, request decision, request resource, request review, + report risk, suggest improvement, and request escalation +- **AND** every tool explains when to use it and what evidence to + include +- **AND** the tool list is treated as an evolvable hat capability, not a + closed taxonomy + +### Requirement: Work state transitions are typed + +Work item state transitions MUST be represented by typed states and +validated by a state machine. + +#### Scenario: Illegal transition + +- **WHEN** a work item attempts to transition directly from `new` to + `approved` +- **THEN** the transition is rejected + +### Requirement: Messaging subjects are stable + +Organization NATS subjects MUST use a stable organization-scoped shape. + +#### Scenario: Work event subject is built + +- **WHEN** a work event is prepared for NATS publication +- **THEN** the subject shape is + `agentic-org....` + +### Requirement: Telemetry is complete at the event boundary + +Organization packages MUST expose OpenTelemetry-compatible attributes +for the full trace chain before live LGTM ingestion is wired. + +#### Scenario: Event is projected to span attributes + +- **WHEN** an event envelope is projected to telemetry +- **THEN** the attributes include event, command, correlation, + causation, trace, idempotency, actor, hat assignment, organization, + project, work item, aggregate, and NATS destination fields + +### Requirement: Automation rules create plans before side effects + +V0 automation rules MUST produce explicit reaction plans instead of +performing privileged side effects directly. + +#### Scenario: Supervisor signal is sent + +- **WHEN** a supervisor signal is sent to a manager, director, C-suite + hat, or executive-board hat +- **THEN** the runtime creates a reaction plan for the target + supervisor level +- **AND** the plan references the triggering event, supervisor signal, + target level, team, and work item From 584d0d83101c1838a7a80aeaacb43fbd7f4845f0 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 16:59:20 -0400 Subject: [PATCH 06/21] feat(agentic-org): add workflow visibility contract Co-Authored-By: Codex --- .../docs/IMPLEMENTATION_GOVERNANCE.md | 8 + .../docs/OBSERVABILITY_AND_SELF_HEALING.md | 162 ++++++++++++++++++ agentic-organization/docs/README.md | 1 + .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 20 +++ .../packages/observability/src/index.ts | 10 ++ .../src/workflow-visibility.test.ts | 111 ++++++++++++ .../observability/src/workflow-visibility.ts | 121 +++++++++++++ openspec/specs/agentic-organization/spec.md | 18 ++ 8 files changed, 451 insertions(+) create mode 100644 agentic-organization/docs/OBSERVABILITY_AND_SELF_HEALING.md create mode 100644 agentic-organization/packages/observability/src/workflow-visibility.test.ts create mode 100644 agentic-organization/packages/observability/src/workflow-visibility.ts diff --git a/agentic-organization/docs/IMPLEMENTATION_GOVERNANCE.md b/agentic-organization/docs/IMPLEMENTATION_GOVERNANCE.md index a9e2a4121b..26d60e5649 100644 --- a/agentic-organization/docs/IMPLEMENTATION_GOVERNANCE.md +++ b/agentic-organization/docs/IMPLEMENTATION_GOVERNANCE.md @@ -89,6 +89,14 @@ The first slice defines the required `agentic.*` attributes in `@agentic-org/observability`. Later packages should consume that contract instead of inventing new names. +Every meaningful workflow movement must also be projectable into a +workflow visibility record. The record is the agent- and UI-readable +surface that links command state, events, traces, logs, metrics, +work-item scope, active hat, aggregate version, and typed weak-point +indicators. This makes harness failures, blocker patterns, slow triage, +missing evidence, and telemetry gaps visible enough for agents to route +self-healing work through normal Organization commands. + ## Security Credential access must remain indirect and scoped through approved diff --git a/agentic-organization/docs/OBSERVABILITY_AND_SELF_HEALING.md b/agentic-organization/docs/OBSERVABILITY_AND_SELF_HEALING.md new file mode 100644 index 0000000000..89599babba --- /dev/null +++ b/agentic-organization/docs/OBSERVABILITY_AND_SELF_HEALING.md @@ -0,0 +1,162 @@ +# Observability and Self-Healing + +## Status + +Implementation contract for the first visibility slice. + +## Purpose + +Agentic Organization must be observable by construction. A human or +agent should be able to plug into the system, see what is happening +across work, agents, hats, commands, events, runs, gates, and reactions, +and identify weak points without reconstructing state from scattered +logs. + +The long-term goal is agent self-monitoring: agents can inspect their +own workflows, detect harness failures or process gaps, create the right +supervisor-chain communication, and route fixes through normal +Organization work. + +## Core Rule + +Every meaningful workflow movement emits or produces a workflow +visibility record. + +Required movements include: + +- command accepted or rejected; +- domain event written; +- outbox publication; +- NATS consumer handling; +- reaction plan created; +- Temporal workflow or activity step; +- Dapr actor reminder or callback; +- MCP tool preflight and execution; +- Hermes run lifecycle event; +- Hindsight recall, retain, or reflect operation; +- review, QA, security, architecture, delivery, or outcome gate; +- schedule block start, pause, resume, or finish; +- runtime health incident or reconciliation. + +The Organization DB, audit rows, outbox rows, and domain events remain +the source of truth. Logs, traces, metrics, and visibility records are +queryable evidence and diagnosis surfaces. + +## Workflow Visibility Record + +`@agentic-org/observability` owns the first typed record builder. The +record projects a canonical event envelope into the fields a UI, +monitor, or agent reviewer needs: + +- observation kind; +- health state; +- workflow stage; +- occurred-at timestamp; +- event ID and event type; +- command, correlation, causation, trace, and idempotency IDs; +- organization, project, initiative, team, and work item scope; +- agent and active hat assignment; +- aggregate ID, type, and version; +- Grafana links for traces, logs, and metrics; +- typed weak-point indicators. + +This is intentionally generic. It is not a QA-only, capability-request, +or platform-incident-only tool. It is the common visibility shape that +all packages can extend and all runtime hosts can emit. + +## Weak-Point Indicators + +Weak points should be typed so agents can reason over them without +parsing prose. The starter taxonomy is: + +| Indicator | Meaning | +| ------------------ | -------------------------------------------------------------------- | +| `blocked_work` | Work cannot proceed without supervisor triage, dependency, or input. | +| `slow_triage` | A queue, gate, review, or escalation is exceeding its SLO. | +| `repeated_failure` | The same workflow, test, run, or gate is failing repeatedly. | +| `missing_evidence` | A reviewer or agent cannot prove that acceptance criteria were met. | +| `missing_tool` | The hat lacks a safe existing tool or prompt flow for repeated work. | +| `policy_denied` | Policy rejected an attempted action and needs triage or education. | +| `harness_failure` | Agent runtime, MCP, prompt-flow, or orchestration harness broke. | +| `telemetry_gap` | Required event, trace, metric, log, artifact, or link is missing. | + +Indicators should include a short summary and a suggested next action. +The next action is still routed through the supervisor chain or normal +work commands; the indicator does not bypass hierarchy, policy, or +review. + +## LGTM Integration + +The full cluster already provides the visibility substrate: + +- Alloy collects telemetry. +- Tempo stores traces. +- Loki stores logs. +- Prometheus and Mimir store metrics. +- Grafana renders dashboards and exploration links. + +Agentic Organization should standardize links and labels so every +visibility record can open the same context from multiple surfaces: + +```text +work item timeline + -> event + -> visibility record + -> trace + -> logs + -> metrics + -> artifacts and evidence + -> related discussions and decisions +``` + +All runtime hosts should attach the canonical `agentic.*` trace fields +and Kubernetes workload labels before exporting telemetry. + +## Agent Self-Healing Loop + +Agents should eventually run this loop on their own workflows: + +```text +observe workflow visibility records + -> classify weak points + -> inspect linked trace, logs, metrics, artifacts, and work graph + -> decide whether the issue is local, team-level, department-level, or platform-level + -> send supervisor-chain communication when work, authority, tooling, or policy must change + -> create or update work items through normal commands + -> route to the right team, review gate, and security gate + -> implement harness, prompt-flow, tool, adapter, or process fix + -> validate through tests, gates, and runtime telemetry + -> run outcome review and update memories or docs when appropriate +``` + +The loop must remain auditable. Self-healing is not permission to mutate +the runtime outside command, policy, idempotency, and review boundaries. + +## UI Expectations + +The operations UI should make weak points visible at every hierarchy: + +- organization health; +- project and initiative health; +- department and team queues; +- hat supply and assignment health; +- work item timelines; +- agent schedules and runs; +- review, QA, security, architecture, delivery, and outcome gates; +- NATS, Temporal, Dapr, Hermes, Hindsight, MCP, and k8s adapter health. + +Every view should support drilling from summary to evidence. A red or +degraded status without a trace, log query, metric panel, event ID, +work item, and suggested action is not good enough for this platform. + +## Implementation Rule + +When adding a package, command, adapter, workflow, gate, or runtime host, +add its visibility contract with the same change: + +- typed event or command names; +- OpenTelemetry attributes; +- workflow visibility projection; +- weak-point indicators it may emit; +- dashboard or UI projection needs; +- tests proving the projection includes the trace chain. diff --git a/agentic-organization/docs/README.md b/agentic-organization/docs/README.md index 1102b41e2a..2dc8721ccd 100644 --- a/agentic-organization/docs/README.md +++ b/agentic-organization/docs/README.md @@ -16,6 +16,7 @@ Current documents: - [Department, Hat, and Tool Inventory](./DEPARTMENT_HAT_TOOL_INVENTORY.md) - the starter department map, hat catalog, tool bundles, approval gates, lifecycle ownership, and high-risk guardrails for the Organization. - [Organization Layer Build Plan](./ORGANIZATION_LAYER_BUILD_PLAN.md) - the service layer, role workspaces, automation loops, state model, UI surfaces, and MVP sequence needed to make each department and hat operational. - [Technical CA: Package-First Agentic Organization Architecture](./TECHNICAL_CA_PACKAGE_ARCHITECTURE.md) - the proposed TypeScript/NestJS modular-monolith package architecture, event envelope, traceability contract, NATS model, and cluster deployment boundary. +- [Observability and Self-Healing](./OBSERVABILITY_AND_SELF_HEALING.md) - the workflow visibility contract that lets humans and agents plug into Organization activity, find weak points, and route harness fixes through normal work. - [Work and Release Management OS](./WORK_AND_RELEASE_MANAGEMENT_OS.md) - the custom backlog, project, task, assignment, signal, board, and release workflow product that keeps agent work reliable and visible. - [Agent-Native Knowledge Graph and Retrieval](./AGENT_NATIVE_KNOWLEDGE_GRAPH.md) - the graph and retrieval layer linking tasks, discussions, decisions, meetings, docs, artifacts, runs, memories, and evidence into agent-readable context. - [Agent Work Rhythm and Prompt Flows](./AGENT_WORK_RHYTHM_AND_PROMPT_FLOWS.md) - the hat-bound schedule, free-time, review/red-team, reflection, memory maintenance, and deterministic prompt-flow model for agents. diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index b606fbae0c..a0f3c9ede1 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -717,6 +717,26 @@ package should standardize: staleness; - links from UI evidence records to trace IDs, log queries, run IDs, event IDs, and artifacts. +- workflow visibility records that project command/event context into + UI- and agent-readable health, stage, trace, scope, aggregate, and + weak-point indicator fields. + +Every runtime host should be inspectable from either direction: + +```text +work item or initiative + -> event timeline + -> visibility record + -> trace/log/metric links + -> weak-point indicators + -> supervisor-chain signal or follow-up work item +``` + +This is the foundation for agent self-monitoring. Agents should be able +to discover slow triage, repeated failures, missing evidence, missing +tools, policy denials, harness failures, and telemetry gaps, then route +fixes through the same command, review, and security lifecycle as any +other work. ## V0 Build Sequence diff --git a/agentic-organization/packages/observability/src/index.ts b/agentic-organization/packages/observability/src/index.ts index 436cee48cc..354ec11a1f 100644 --- a/agentic-organization/packages/observability/src/index.ts +++ b/agentic-organization/packages/observability/src/index.ts @@ -5,3 +5,13 @@ export { type AgenticSpanAttributes, type BuildAgenticSpanAttributesInput, } from "./span-attributes.ts"; +export { + VisibilityHealth, + WeakPointIndicatorType, + WorkflowObservationKind, + buildWorkflowVisibilityRecord, + type BuildWorkflowVisibilityRecordInput, + type VisibilityLinks, + type WeakPointIndicator, + type WorkflowVisibilityRecord, +} from "./workflow-visibility.ts"; diff --git a/agentic-organization/packages/observability/src/workflow-visibility.test.ts b/agentic-organization/packages/observability/src/workflow-visibility.test.ts new file mode 100644 index 0000000000..395d1f09d1 --- /dev/null +++ b/agentic-organization/packages/observability/src/workflow-visibility.test.ts @@ -0,0 +1,111 @@ +import { deepEqual } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + SupervisorChainLevel, + SupervisorSignalStatus, + SupervisorSignalToolType, + createAgenticEventEnvelope, +} from "../../domain/src/index.ts"; +import { + VisibilityHealth, + WeakPointIndicatorType, + WorkflowObservationKind, + buildWorkflowVisibilityRecord, +} from "./workflow-visibility.ts"; + +describe("workflow visibility records", () => { + test("builds a plug-in visibility record for agent self-monitoring", () => { + const envelope = createAgenticEventEnvelope({ + eventId: "evt-supervisor-signal-001", + eventType: AgenticEventType.SupervisorSignalSent, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-supervisor-signal-001", + correlationId: "corr-supervisor-signal-001", + causationId: "cause-team-work-001", + traceId: "trace-supervisor-signal-001", + idempotencyKey: "idem-supervisor-signal-001", + }, + payload: { + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatAssignmentId: "hat-assignment-em-001", + toolType: SupervisorSignalToolType.ReportBlocker, + status: SupervisorSignalStatus.Sent, + title: "Blocked on scoped NATS publisher", + }, + }); + + deepEqual( + buildWorkflowVisibilityRecord(envelope, { + observationKind: WorkflowObservationKind.SupervisorSignal, + health: VisibilityHealth.Degraded, + stage: "supervisor_triage", + links: { + traceUrl: "https://grafana.example/explore?trace=trace-supervisor-signal-001", + logsUrl: "https://grafana.example/explore?logs=work-outbox-001", + metricsUrl: "https://grafana.example/d/agentic-org", + }, + weakPointIndicators: [ + { + indicatorType: WeakPointIndicatorType.BlockedWork, + summary: "Work is waiting on supervisor triage", + suggestedAction: "Engineering manager should triage the signal", + }, + ], + }), + { + observationKind: WorkflowObservationKind.SupervisorSignal, + health: VisibilityHealth.Degraded, + stage: "supervisor_triage", + occurredAt: "2026-05-25T20:00:00.000Z", + eventId: "evt-supervisor-signal-001", + eventType: AgenticEventType.SupervisorSignalSent, + commandId: "cmd-supervisor-signal-001", + correlationId: "corr-supervisor-signal-001", + causationId: "cause-team-work-001", + traceId: "trace-supervisor-signal-001", + idempotencyKey: "idem-supervisor-signal-001", + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + links: { + traceUrl: "https://grafana.example/explore?trace=trace-supervisor-signal-001", + logsUrl: "https://grafana.example/explore?logs=work-outbox-001", + metricsUrl: "https://grafana.example/d/agentic-org", + }, + weakPointIndicators: [ + { + indicatorType: WeakPointIndicatorType.BlockedWork, + summary: "Work is waiting on supervisor triage", + suggestedAction: "Engineering manager should triage the signal", + }, + ], + }, + ); + }); +}); diff --git a/agentic-organization/packages/observability/src/workflow-visibility.ts b/agentic-organization/packages/observability/src/workflow-visibility.ts new file mode 100644 index 0000000000..7a58998c19 --- /dev/null +++ b/agentic-organization/packages/observability/src/workflow-visibility.ts @@ -0,0 +1,121 @@ +import type { AgenticAggregateType, AgenticEventEnvelope, AgenticEventType } from "../../domain/src/index.ts"; + +export const WorkflowObservationKind = { + SupervisorSignal: "supervisor_signal", + Command: "command", + ReactionPlan: "reaction_plan", + ToolCall: "tool_call", + HermesRun: "hermes_run", + Gate: "gate", + ScheduleBlock: "schedule_block", +} as const; + +export type WorkflowObservationKind = (typeof WorkflowObservationKind)[keyof typeof WorkflowObservationKind]; + +export const VisibilityHealth = { + Healthy: "healthy", + Degraded: "degraded", + Blocked: "blocked", + Failing: "failing", + Unknown: "unknown", +} as const; + +export type VisibilityHealth = (typeof VisibilityHealth)[keyof typeof VisibilityHealth]; + +export const WeakPointIndicatorType = { + BlockedWork: "blocked_work", + SlowTriage: "slow_triage", + RepeatedFailure: "repeated_failure", + MissingEvidence: "missing_evidence", + MissingTool: "missing_tool", + PolicyDenied: "policy_denied", + HarnessFailure: "harness_failure", + TelemetryGap: "telemetry_gap", +} as const; + +export type WeakPointIndicatorType = (typeof WeakPointIndicatorType)[keyof typeof WeakPointIndicatorType]; + +export type VisibilityLinks = { + traceUrl: string; + logsUrl: string; + metricsUrl: string; +}; + +export type WeakPointIndicator = { + indicatorType: WeakPointIndicatorType; + summary: string; + suggestedAction: string; +}; + +export type WorkflowVisibilityRecord = { + observationKind: WorkflowObservationKind; + health: VisibilityHealth; + stage: string; + occurredAt: string; + eventId: string; + eventType: AgenticEventType; + commandId: string; + correlationId: string; + causationId: string; + traceId: string; + idempotencyKey: string; + organizationId: string; + projectId: string; + initiativeId?: string; + teamId?: string; + workItemId: string; + agentId: string; + hatAssignmentId: string; + aggregateId: string; + aggregateType: AgenticAggregateType; + aggregateVersion: number; + links: VisibilityLinks; + weakPointIndicators: WeakPointIndicator[]; +}; + +export type BuildWorkflowVisibilityRecordInput = { + observationKind: WorkflowObservationKind; + health: VisibilityHealth; + stage: string; + links: VisibilityLinks; + weakPointIndicators?: WeakPointIndicator[]; +}; + +export function buildWorkflowVisibilityRecord( + envelope: AgenticEventEnvelope, + input: BuildWorkflowVisibilityRecordInput, +): WorkflowVisibilityRecord { + const record: WorkflowVisibilityRecord = { + observationKind: input.observationKind, + health: input.health, + stage: input.stage, + occurredAt: envelope.occurredAt, + eventId: envelope.eventId, + eventType: envelope.eventType, + commandId: envelope.trace.commandId, + correlationId: envelope.trace.correlationId, + causationId: envelope.trace.causationId, + traceId: envelope.trace.traceId, + idempotencyKey: envelope.trace.idempotencyKey, + organizationId: envelope.scope.organizationId, + projectId: envelope.scope.projectId, + workItemId: envelope.scope.workItemId, + agentId: envelope.actor.agentId, + hatAssignmentId: envelope.actor.hatAssignmentId, + aggregateId: envelope.aggregate.aggregateId, + aggregateType: envelope.aggregate.aggregateType, + aggregateVersion: envelope.aggregate.aggregateVersion, + links: input.links, + weakPointIndicators: input.weakPointIndicators ?? [], + }; + + if (envelope.scope.initiativeId !== undefined) { + record.initiativeId = envelope.scope.initiativeId; + } + + if (envelope.scope.teamId !== undefined) { + record.teamId = envelope.scope.teamId; + } + + return record; +} diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 67d59f3d2e..61710ee2e7 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -125,6 +125,24 @@ for the full trace chain before live LGTM ingestion is wired. causation, trace, idempotency, actor, hat assignment, organization, project, work item, aggregate, and NATS destination fields +### Requirement: Workflow visibility records expose weak points + +Organization packages MUST project meaningful workflow movement into a +UI- and agent-readable visibility record. + +#### Scenario: Event is projected to workflow visibility + +- **WHEN** an event envelope is projected into workflow visibility +- **THEN** the record includes observation kind, health state, workflow + stage, occurred-at timestamp, event, command, correlation, causation, + trace, idempotency, actor, hat assignment, organization, project, + work item, aggregate, and evidence-link fields +- **AND** the record can include typed weak-point indicators such as + blocked work, slow triage, repeated failure, missing evidence, missing + tool, policy denial, harness failure, and telemetry gap +- **AND** the weak-point indicators route follow-up work through normal + Organization commands and supervisor-chain communication + ### Requirement: Automation rules create plans before side effects V0 automation rules MUST produce explicit reaction plans instead of From 2c337345b2e6efa18ce693a4ab6454d4db0e6958 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 17:07:28 -0400 Subject: [PATCH 07/21] refactor(agentic-org): invert command state and handler composition Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 7 ++- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 17 ++++-- agentic-organization/packages/README.md | 22 +++++--- .../src/command-handler-registry.ts | 37 +++++++++++++ .../application/src/command-pipeline.test.ts | 23 +++++--- .../application/src/command-pipeline.ts | 37 +++++++------ .../handlers/send-supervisor-signal.test.ts | 15 +++--- .../src/handlers/send-supervisor-signal.ts | 19 ++++--- .../packages/application/src/index.ts | 25 ++++++++- .../packages/application/src/ports.ts | 28 ++++++++++ .../state/src/in-memory-organization-store.ts | 52 ++++++++++++++++++- .../packages/state/src/index.ts | 6 ++- openspec/specs/agentic-organization/spec.md | 10 ++++ 13 files changed, 243 insertions(+), 55 deletions(-) create mode 100644 agentic-organization/packages/application/src/command-handler-registry.ts diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index 45a8623902..a434e2b30c 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -39,8 +39,8 @@ send_supervisor_signal | Package | Implemented first | | ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `@agentic-org/domain` | event envelope, command/event constants, aggregate constants, supervisor-chain communication types, hat communication briefs, work item state machine, shared records | -| `@agentic-org/application` | command pipeline, idempotency conflict handling, supervisor signal handler | -| `@agentic-org/state` | in-memory Organization store fake | +| `@agentic-org/application` | command pipeline, command-handler registry, state-store ports, idempotency conflict handling, supervisor signal handler | +| `@agentic-org/state` | in-memory Organization state-store factory fake | | `@agentic-org/messaging` | stable `agentic-org....` subject builder | | `@agentic-org/observability` | OpenTelemetry/LGTM span attribute projection | | `@agentic-org/runtime` | first rule that plans triage for the target supervisor when a chain signal is sent | @@ -91,6 +91,9 @@ Hermes runs, MCP calls, and UI evidence. - Hats can expose a communication brief that tells the wearer their duty, supervisor line, and efficient upward tools. +- The command pipeline receives state-store factories and command + handlers through ports instead of constructing in-memory adapters or + branching on command types. - Duplicate commands with the same idempotency key and request hash replay the stored result. - Duplicate commands with the same idempotency key and a different diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index a0f3c9ede1..f04b49c3d4 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -74,6 +74,7 @@ None of those cluster runtimes should become a parallel business model. Runtime host API controller / worker / MCP handler / Temporal activity / Dapr actor -> application command service + -> command handler registry -> policy check -> domain state transition -> CockroachDB transaction @@ -129,11 +130,11 @@ legal transitions. It does not execute side effects. ### Layer 1: Application and Policy -| Package | Owns | -| ---------------------------- | ---------------------------------------------------------------------------------------- | -| `@agentic-org/application` | command handlers, use cases, transaction orchestration, ports, command result contracts | -| `@agentic-org/policy` | RBAC, hat authority checks, OPA/Rego adapter boundary, policy decisions, denial reasons | -| `@agentic-org/observability` | correlation envelope, OpenTelemetry helpers, required span attributes, trace propagation | +| Package | Owns | +| ---------------------------- | --------------------------------------------------------------------------------------------------------- | +| `@agentic-org/application` | command handlers, handler registry, use cases, transaction orchestration, ports, command result contracts | +| `@agentic-org/policy` | RBAC, hat authority checks, OPA/Rego adapter boundary, policy decisions, denial reasons | +| `@agentic-org/observability` | correlation envelope, OpenTelemetry helpers, required span attributes, trace propagation | The application layer is the Organization OS command layer. It is where the runtime asks the Organization to do something. @@ -231,6 +232,12 @@ HatSystemPort -> KubernetesHatSystemAdapter or ReadOnlyFakeHatSystemAdapter Business services should depend on ports, not concrete adapters. +The command pipeline must also depend on a handler registry and a +state-store factory supplied by the composition layer. It must not +instantiate the in-memory store or branch on every command type. New +commands should register a handler; new persistence backends should +implement the same store-factory port. + ## SOLID Rules ### Single Responsibility diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md index f900db7a48..4304dd936b 100644 --- a/agentic-organization/packages/README.md +++ b/agentic-organization/packages/README.md @@ -7,14 +7,14 @@ worker, Dapr actor host, or Kubernetes deployment is introduced. ## Package Boundary -| Package | Current responsibility | -| --------------- | ---------------------------------------------------------------------------------------------------------- | -| `domain` | typed command names, event names, aggregate names, work item state machine, event envelope, shared records | -| `application` | command pipeline, idempotency handling, first supervisor-chain signal command handler | -| `state` | in-memory Organization store used as the first repository port fake | -| `messaging` | NATS subject contract without a live NATS dependency | -| `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes | -| `runtime` | first event-to-automation reaction rule | +| Package | Current responsibility | +| --------------- | -------------------------------------------------------------------------------------------------------------------------- | +| `domain` | typed command names, event names, aggregate names, work item state machine, event envelope, shared records | +| `application` | command pipeline, handler registry, idempotency handling, state-store ports, first supervisor-chain signal command handler | +| `state` | in-memory Organization state-store factory used as the first repository port fake | +| `messaging` | NATS subject contract without a live NATS dependency | +| `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes | +| `runtime` | first event-to-automation reaction rule | ## Slice Rule @@ -35,6 +35,12 @@ and the hat-system CRDs come next as adapters behind these contracts. They should not redefine command names, event names, state names, correlation fields, or policy authority. +The application package must not construct concrete state adapters. +Runtime hosts and tests provide a `CommandStateStoreFactory`; the state +package implements the current in-memory factory. Command routing uses a +handler registry so new commands add handlers instead of editing a +central `switch` or `if` dispatcher. + ## Validation Run the package tests from `agentic-organization/`: diff --git a/agentic-organization/packages/application/src/command-handler-registry.ts b/agentic-organization/packages/application/src/command-handler-registry.ts new file mode 100644 index 0000000000..a06753b865 --- /dev/null +++ b/agentic-organization/packages/application/src/command-handler-registry.ts @@ -0,0 +1,37 @@ +import type { Clock, CommandStateStore, IdGenerator } from "./ports.ts"; + +export type TypedCommand = { + type: string; +}; + +export type CommandExecutionContext = Clock & + IdGenerator & { + store: CommandStateStore; + }; + +export type CommandHandler = { + commandType: Command["type"]; + execute: (command: Command, context: CommandExecutionContext) => Result; +}; + +export type CommandHandlerRegistry = { + resolveHandler: (commandType: Command["type"]) => CommandHandler | undefined; +}; + +export function createCommandHandlerRegistry( + handlers: readonly CommandHandler[], +): CommandHandlerRegistry { + const handlersByCommandType = new Map>(); + + for (const handler of handlers) { + if (handlersByCommandType.has(handler.commandType)) { + throw new Error(`duplicate command handler for ${handler.commandType}`); + } + + handlersByCommandType.set(handler.commandType, handler); + } + + return { + resolveHandler: (commandType) => handlersByCommandType.get(commandType), + }; +} diff --git a/agentic-organization/packages/application/src/command-pipeline.test.ts b/agentic-organization/packages/application/src/command-pipeline.test.ts index 3ea66d3772..60324dfee4 100644 --- a/agentic-organization/packages/application/src/command-pipeline.test.ts +++ b/agentic-organization/packages/application/src/command-pipeline.test.ts @@ -2,8 +2,11 @@ import { deepEqual, equal } from "node:assert/strict"; import { describe, test } from "node:test"; import { CommandType, SupervisorChainLevel, SupervisorSignalToolType } from "../../domain/src/index.ts"; -import { CommandErrorCode, CommandResultStatus } from "./command-result.ts"; +import { createInMemoryOrganizationStoreFactory } from "../../state/src/index.ts"; +import { createCommandHandlerRegistry } from "./command-handler-registry.ts"; +import { CommandErrorCode, CommandResultStatus, type CommandResult } from "./command-result.ts"; import { createCommandPipeline, type PipelineCommand } from "./command-pipeline.ts"; +import { createSendSupervisorSignalHandler } from "./handlers/send-supervisor-signal.ts"; const command: PipelineCommand = { commandId: "cmd-supervisor-signal-001", @@ -31,7 +34,10 @@ const command: PipelineCommand = { describe("command pipeline idempotency", () => { test("replaying the same idempotency key returns the stored result", () => { + const stateStoreFactory = createInMemoryOrganizationStoreFactory(); const pipeline = createCommandPipeline({ + stateStoreFactory, + handlerRegistry: createCommandHandlerRegistry([createSendSupervisorSignalHandler()]), now: () => "2026-05-25T20:00:00.000Z", createId: (prefix) => `${prefix}-001`, }); @@ -47,14 +53,17 @@ describe("command pipeline idempotency", () => { equal(firstResult.supervisorSignal !== undefined, true); equal(replayResult.supervisorSignal !== undefined, true); equal(replayResult.supervisorSignal?.supervisorSignalId, firstResult.supervisorSignal?.supervisorSignalId); - equal(pipeline.store.supervisorSignals.length, 1); - equal(pipeline.store.workItems.length, 0); - equal(pipeline.store.auditEvents.length, 1); - equal(pipeline.store.outboxEvents.length, 1); + equal(stateStoreFactory.snapshot.supervisorSignals.length, 1); + equal(stateStoreFactory.snapshot.workItems.length, 0); + equal(stateStoreFactory.snapshot.auditEvents.length, 1); + equal(stateStoreFactory.snapshot.outboxEvents.length, 1); }); test("rejects conflicting reuse of the same idempotency key", () => { + const stateStoreFactory = createInMemoryOrganizationStoreFactory(); const pipeline = createCommandPipeline({ + stateStoreFactory, + handlerRegistry: createCommandHandlerRegistry([createSendSupervisorSignalHandler()]), now: () => "2026-05-25T20:00:00.000Z", createId: (prefix) => `${prefix}-001`, }); @@ -69,7 +78,7 @@ describe("command pipeline idempotency", () => { equal(firstResult.status, CommandResultStatus.Accepted); equal(conflictResult.status, CommandResultStatus.Rejected); equal(conflictResult.error?.code, CommandErrorCode.IdempotencyConflict); - equal(pipeline.store.supervisorSignals.length, 1); - equal(pipeline.store.outboxEvents.length, 1); + equal(stateStoreFactory.snapshot.supervisorSignals.length, 1); + equal(stateStoreFactory.snapshot.outboxEvents.length, 1); }); }); diff --git a/agentic-organization/packages/application/src/command-pipeline.ts b/agentic-organization/packages/application/src/command-pipeline.ts index fb8afea849..298874c031 100644 --- a/agentic-organization/packages/application/src/command-pipeline.ts +++ b/agentic-organization/packages/application/src/command-pipeline.ts @@ -1,31 +1,34 @@ -import { CommandType } from "../../domain/src/index.ts"; -import { createInMemoryOrganizationStore, type InMemoryOrganizationStore } from "../../state/src/index.ts"; +import type { CommandHandlerRegistry } from "./command-handler-registry.ts"; import { CommandErrorCode, CommandResultStatus, type CommandResult } from "./command-result.ts"; -import { sendSupervisorSignal, type SendSupervisorSignalCommand } from "./handlers/send-supervisor-signal.ts"; -import type { Clock, IdGenerator } from "./ports.ts"; +import type { SendSupervisorSignalCommand } from "./handlers/send-supervisor-signal.ts"; +import type { Clock, CommandStateStore, CommandStateStoreFactory, IdGenerator } from "./ports.ts"; export type PipelineCommand = SendSupervisorSignalCommand; export type CommandPipeline = { - store: InMemoryOrganizationStore; execute: (command: PipelineCommand) => CommandResult; }; -export function createCommandPipeline(dependencies: Clock & IdGenerator): CommandPipeline { - const store = createInMemoryOrganizationStore(); +export type CommandPipelineDependencies = Clock & + IdGenerator & { + stateStoreFactory: CommandStateStoreFactory; + handlerRegistry: CommandHandlerRegistry; + }; + +export function createCommandPipeline(dependencies: CommandPipelineDependencies): CommandPipeline { + const store = dependencies.stateStoreFactory.createCommandStateStore(); return { - store, execute: (command) => executeCommand(command, store, dependencies), }; } function executeCommand( command: PipelineCommand, - store: InMemoryOrganizationStore, - dependencies: Clock & IdGenerator, + store: CommandStateStore, + dependencies: CommandPipelineDependencies, ): CommandResult { - const existingRecord = store.idempotencyRecords.get(command.idempotencyKey); + const existingRecord = store.findIdempotencyRecord(command.idempotencyKey); if (existingRecord?.requestHash === command.requestHash) { return { @@ -50,7 +53,7 @@ function executeCommand( } const result = dispatchCommand(command, store, dependencies); - store.idempotencyRecords.set(command.idempotencyKey, { + store.saveIdempotencyRecord({ idempotencyKey: command.idempotencyKey, requestHash: command.requestHash, result, @@ -61,11 +64,13 @@ function executeCommand( function dispatchCommand( command: PipelineCommand, - store: InMemoryOrganizationStore, - dependencies: Clock & IdGenerator, + store: CommandStateStore, + dependencies: CommandPipelineDependencies, ): CommandResult { - if (command.type === CommandType.SendSupervisorSignal) { - return sendSupervisorSignal(command, { + const handler = dependencies.handlerRegistry.resolveHandler(command.type); + + if (handler !== undefined) { + return handler.execute(command, { ...dependencies, store, }); diff --git a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts index 2c28dfeafd..4b124147d9 100644 --- a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts +++ b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts @@ -9,7 +9,7 @@ import { SupervisorSignalStatus, SupervisorSignalToolType, } from "../../../domain/src/index.ts"; -import { createInMemoryOrganizationStore } from "../../../state/src/index.ts"; +import { createInMemoryOrganizationStoreFactory } from "../../../state/src/index.ts"; import { CommandResultStatus, type CommandResult } from "../command-result.ts"; import { sendSupervisorSignal, type SendSupervisorSignalCommand } from "./send-supervisor-signal.ts"; @@ -39,7 +39,8 @@ const command: SendSupervisorSignalCommand = { describe("send supervisor signal handler", () => { test("persists chain communication, audit event, and outbox event atomically", () => { - const store = createInMemoryOrganizationStore(); + const stateStoreFactory = createInMemoryOrganizationStoreFactory(); + const store = stateStoreFactory.createCommandStateStore(); const result = sendSupervisorSignal(command, { store, @@ -50,11 +51,11 @@ describe("send supervisor signal handler", () => { equal(result.status, CommandResultStatus.Accepted); ok(result.supervisorSignal); equal(result.supervisorSignal.status, SupervisorSignalStatus.Sent); - equal(store.supervisorSignals.length, 1); - equal(store.workItems.length, 0); - equal(store.auditEvents.length, 1); - equal(store.outboxEvents.length, 1); - deepEqual(store.outboxEvents[0]?.envelope, { + equal(stateStoreFactory.snapshot.supervisorSignals.length, 1); + equal(stateStoreFactory.snapshot.workItems.length, 0); + equal(stateStoreFactory.snapshot.auditEvents.length, 1); + equal(stateStoreFactory.snapshot.outboxEvents.length, 1); + deepEqual(stateStoreFactory.snapshot.outboxEvents[0]?.envelope, { eventId: "evt-001", eventType: AgenticEventType.SupervisorSignalSent, schemaVersion: "agentic.org.event.v1", diff --git a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts index 85e207077e..89401e24ca 100644 --- a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts +++ b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts @@ -11,9 +11,9 @@ import { type SupervisorSignalToolType, } from "../../../domain/src/index.ts"; import { createAgenticEventEnvelope } from "../../../domain/src/index.ts"; -import type { InMemoryOrganizationStore } from "../../../state/src/index.ts"; +import type { CommandHandler } from "../command-handler-registry.ts"; import { CommandResultStatus, type CommandResult } from "../command-result.ts"; -import type { Clock, IdGenerator } from "../ports.ts"; +import type { Clock, CommandStateStore, IdGenerator } from "../ports.ts"; export const IdPrefix = { SupervisorSignal: "supervisor-signal", @@ -47,9 +47,16 @@ export type SendSupervisorSignalCommand = { export type SendSupervisorSignalDependencies = Clock & IdGenerator & { - store: InMemoryOrganizationStore; + store: CommandStateStore; }; +export function createSendSupervisorSignalHandler(): CommandHandler { + return { + commandType: CommandType.SendSupervisorSignal, + execute: sendSupervisorSignal, + }; +} + export function sendSupervisorSignal( command: SendSupervisorSignalCommand, dependencies: SendSupervisorSignalDependencies, @@ -116,9 +123,9 @@ export function sendSupervisorSignal( }), }; - dependencies.store.supervisorSignals.push(supervisorSignal); - dependencies.store.auditEvents.push(auditEvent); - dependencies.store.outboxEvents.push(outboxEvent); + dependencies.store.appendSupervisorSignal(supervisorSignal); + dependencies.store.appendAuditEvent(auditEvent); + dependencies.store.appendOutboxEvent(outboxEvent); return { status: CommandResultStatus.Accepted, diff --git a/agentic-organization/packages/application/src/index.ts b/agentic-organization/packages/application/src/index.ts index b02bc74402..52603c3c9e 100644 --- a/agentic-organization/packages/application/src/index.ts +++ b/agentic-organization/packages/application/src/index.ts @@ -1,8 +1,31 @@ -export { createCommandPipeline, type CommandPipeline, type PipelineCommand } from "./command-pipeline.ts"; +export { + createCommandHandlerRegistry, + type CommandExecutionContext, + type CommandHandler, + type CommandHandlerRegistry, + type TypedCommand, +} from "./command-handler-registry.ts"; +export { + createCommandPipeline, + type CommandPipeline, + type CommandPipelineDependencies, + type PipelineCommand, +} from "./command-pipeline.ts"; export { CommandErrorCode, CommandResultStatus, type CommandResult } from "./command-result.ts"; export { + createSendSupervisorSignalHandler, sendSupervisorSignal, type IdPrefix, type SendSupervisorSignalCommand, type SendSupervisorSignalDependencies, } from "./handlers/send-supervisor-signal.ts"; +export type { + AuditEventStore, + Clock, + CommandStateStore, + CommandStateStoreFactory, + IdempotencyRecordStore, + IdGenerator, + OutboxEventStore, + SupervisorSignalStore, +} from "./ports.ts"; diff --git a/agentic-organization/packages/application/src/ports.ts b/agentic-organization/packages/application/src/ports.ts index ba623a3ec8..a7d71c8b6a 100644 --- a/agentic-organization/packages/application/src/ports.ts +++ b/agentic-organization/packages/application/src/ports.ts @@ -1,3 +1,5 @@ +import type { AuditEvent, IdempotencyRecord, OutboxEvent, SupervisorSignal } from "../../domain/src/index.ts"; + export type Clock = { now: () => string; }; @@ -5,3 +7,29 @@ export type Clock = { export type IdGenerator = { createId: (prefix: string) => string; }; + +export type IdempotencyRecordStore = { + findIdempotencyRecord: (idempotencyKey: string) => IdempotencyRecord | undefined; + saveIdempotencyRecord: (record: IdempotencyRecord) => void; +}; + +export type SupervisorSignalStore = { + appendSupervisorSignal: (supervisorSignal: SupervisorSignal) => void; +}; + +export type AuditEventStore = { + appendAuditEvent: (auditEvent: AuditEvent) => void; +}; + +export type OutboxEventStore = { + appendOutboxEvent: (outboxEvent: OutboxEvent) => void; +}; + +export type CommandStateStore = IdempotencyRecordStore & + SupervisorSignalStore & + AuditEventStore & + OutboxEventStore; + +export type CommandStateStoreFactory = { + createCommandStateStore: () => CommandStateStore; +}; diff --git a/agentic-organization/packages/state/src/in-memory-organization-store.ts b/agentic-organization/packages/state/src/in-memory-organization-store.ts index a601f9526d..5155ea72f7 100644 --- a/agentic-organization/packages/state/src/in-memory-organization-store.ts +++ b/agentic-organization/packages/state/src/in-memory-organization-store.ts @@ -1,3 +1,4 @@ +import type { CommandStateStore, CommandStateStoreFactory } from "../../application/src/ports.ts"; import type { AuditEvent, DiscussionAnchor, @@ -7,7 +8,34 @@ import type { WorkItem, } from "../../domain/src/index.ts"; -export type InMemoryOrganizationStore = { +export type InMemoryOrganizationStoreSnapshot = { + readonly workItems: readonly WorkItem[]; + readonly supervisorSignals: readonly SupervisorSignal[]; + readonly discussionAnchors: readonly DiscussionAnchor[]; + readonly auditEvents: readonly AuditEvent[]; + readonly outboxEvents: readonly OutboxEvent[]; + readonly idempotencyRecords: ReadonlyMap>; +}; + +export type InMemoryOrganizationStoreFactory = CommandStateStoreFactory & { + readonly snapshot: InMemoryOrganizationStoreSnapshot; +}; + +export function createInMemoryOrganizationStoreFactory(): InMemoryOrganizationStoreFactory { + let currentSnapshot = createEmptySnapshot(); + + return { + get snapshot() { + return currentSnapshot; + }, + createCommandStateStore: () => { + currentSnapshot = createEmptySnapshot(); + return createCommandStateStore(currentSnapshot); + }, + }; +} + +type MutableInMemoryOrganizationStoreSnapshot = { workItems: WorkItem[]; supervisorSignals: SupervisorSignal[]; discussionAnchors: DiscussionAnchor[]; @@ -16,7 +44,7 @@ export type InMemoryOrganizationStore = { idempotencyRecords: Map>; }; -export function createInMemoryOrganizationStore(): InMemoryOrganizationStore { +function createEmptySnapshot(): MutableInMemoryOrganizationStoreSnapshot { return { workItems: [], supervisorSignals: [], @@ -26,3 +54,23 @@ export function createInMemoryOrganizationStore(): InMemoryOrg idempotencyRecords: new Map>(), }; } + +function createCommandStateStore( + snapshot: MutableInMemoryOrganizationStoreSnapshot, +): CommandStateStore { + return { + findIdempotencyRecord: (idempotencyKey) => snapshot.idempotencyRecords.get(idempotencyKey), + saveIdempotencyRecord: (record) => { + snapshot.idempotencyRecords.set(record.idempotencyKey, record); + }, + appendSupervisorSignal: (supervisorSignal) => { + snapshot.supervisorSignals.push(supervisorSignal); + }, + appendAuditEvent: (auditEvent) => { + snapshot.auditEvents.push(auditEvent); + }, + appendOutboxEvent: (outboxEvent) => { + snapshot.outboxEvents.push(outboxEvent); + }, + }; +} diff --git a/agentic-organization/packages/state/src/index.ts b/agentic-organization/packages/state/src/index.ts index 02fb4997ef..8ab8d1eff9 100644 --- a/agentic-organization/packages/state/src/index.ts +++ b/agentic-organization/packages/state/src/index.ts @@ -1 +1,5 @@ -export { createInMemoryOrganizationStore, type InMemoryOrganizationStore } from "./in-memory-organization-store.ts"; +export { + createInMemoryOrganizationStoreFactory, + type InMemoryOrganizationStoreFactory, + type InMemoryOrganizationStoreSnapshot, +} from "./in-memory-organization-store.ts"; diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 61710ee2e7..409129c64a 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -37,6 +37,16 @@ Organization state only by calling Organization commands. and idempotency records together - **AND** the adapter does not mutate authoritative state directly +#### Scenario: Command pipeline is composed from ports + +- **WHEN** a runtime host creates a command pipeline +- **THEN** it supplies a command-state-store factory and command-handler + registry through ports +- **AND** the pipeline does not construct a concrete in-memory store + directly +- **AND** the pipeline does not use a central command-type switch for + extensible command dispatch + ### Requirement: Commands are idempotent Organization commands MUST use deterministic idempotency keys at the From fa366b2a2990388e239d6402f031c7d68bb57ab1 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 17:47:09 -0400 Subject: [PATCH 08/21] feat(agentic-org): add Cockroach state adapter foundation Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 33 ++-- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 173 ++++++++-------- agentic-organization/packages/README.md | 22 ++- .../src/command-handler-registry.ts | 2 +- .../application/src/command-pipeline.test.ts | 12 +- .../application/src/command-pipeline.ts | 18 +- .../handlers/send-supervisor-signal.test.ts | 4 +- .../src/handlers/send-supervisor-signal.ts | 10 +- .../packages/application/src/ports.ts | 10 +- .../packages/governance/src/index.ts | 7 + .../src/package-dependency-boundaries.test.ts | 33 ++++ .../src/package-dependency-boundaries.ts | 118 +++++++++++ .../0001_agentic_org_core_state.sql | 57 ++++++ .../src/cockroach-command-state-store.test.ts | 137 +++++++++++++ .../src/cockroach-command-state-store.ts | 186 ++++++++++++++++++ .../src/cockroach-schema.test.ts | 25 +++ .../state-cockroach/src/cockroach-schema.ts | 107 ++++++++++ .../packages/state-cockroach/src/index.ts | 14 ++ .../state/src/in-memory-organization-store.ts | 10 +- agentic-organization/packages/test-node.d.ts | 23 ++- openspec/specs/agentic-organization/spec.md | 19 ++ 21 files changed, 883 insertions(+), 137 deletions(-) create mode 100644 agentic-organization/packages/governance/src/index.ts create mode 100644 agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts create mode 100644 agentic-organization/packages/governance/src/package-dependency-boundaries.ts create mode 100644 agentic-organization/packages/state-cockroach/migrations/0001_agentic_org_core_state.sql create mode 100644 agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.test.ts create mode 100644 agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts create mode 100644 agentic-organization/packages/state-cockroach/src/cockroach-schema.test.ts create mode 100644 agentic-organization/packages/state-cockroach/src/cockroach-schema.ts create mode 100644 agentic-organization/packages/state-cockroach/src/index.ts diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index a434e2b30c..69ae71783e 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -36,14 +36,16 @@ send_supervisor_signal ## Packages -| Package | Implemented first | -| ---------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `@agentic-org/domain` | event envelope, command/event constants, aggregate constants, supervisor-chain communication types, hat communication briefs, work item state machine, shared records | -| `@agentic-org/application` | command pipeline, command-handler registry, state-store ports, idempotency conflict handling, supervisor signal handler | -| `@agentic-org/state` | in-memory Organization state-store factory fake | -| `@agentic-org/messaging` | stable `agentic-org....` subject builder | -| `@agentic-org/observability` | OpenTelemetry/LGTM span attribute projection | -| `@agentic-org/runtime` | first rule that plans triage for the target supervisor when a chain signal is sent | +| Package | Implemented first | +| ------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `@agentic-org/domain` | event envelope, command/event constants, aggregate constants, supervisor-chain communication types, hat communication briefs, work item state machine, shared records | +| `@agentic-org/application` | command pipeline, command-handler registry, state-store ports, idempotency conflict handling, supervisor signal handler | +| `@agentic-org/state` | in-memory Organization state-store factory fake | +| `@agentic-org/state-cockroach` | CockroachDB state-store factory contract, SQL statement catalog, and first core-state migration skeleton | +| `@agentic-org/messaging` | stable `agentic-org....` subject builder | +| `@agentic-org/observability` | OpenTelemetry/LGTM span attribute projection | +| `@agentic-org/runtime` | first rule that plans triage for the target supervisor when a chain signal is sent | +| `@agentic-org/governance` | package dependency-boundary checks that prevent application code from importing concrete state/runtime adapters | ## NodeNext Runtime Decision @@ -94,6 +96,12 @@ Hermes runs, MCP calls, and UI evidence. - The command pipeline receives state-store factories and command handlers through ports instead of constructing in-memory adapters or branching on command types. +- State-store ports are async from the beginning so CockroachDB, + NATS-backed workers, and other real adapters do not inherit a fake + synchronous shape. +- A governance test enforces that application code does not import the + state adapter, Cockroach adapter, NestJS, NATS, Dapr, Temporal, + Drizzle, or Postgres clients. - Duplicate commands with the same idempotency key and request hash replay the stored result. - Duplicate commands with the same idempotency key and a different @@ -105,10 +113,11 @@ Hermes runs, MCP calls, and UI evidence. ## Next Slice -The next slice should add a CockroachDB-backed state adapter and -transactional outbox while preserving this public package contract. -After that, the NATS publisher worker can publish persisted outbox rows -to JetStream and attach the same telemetry attributes. +The next slice should turn the CockroachDB adapter contract into a +transactional integration test once a local/dev Cockroach connection is +available, then add the NATS outbox publisher worker. The worker can +publish persisted outbox rows to JetStream and attach the same telemetry +attributes. Do not make the next slice a pile of bespoke request commands. Build the generic supervisor triage lifecycle first, then let specialized diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index f5887fbdd1..2479f6bc94 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -120,40 +120,40 @@ Rules: ### Layer 0: Domain Kernel -| Package | Owns | -|---|---| -| `@agentic-org/domain` | entity IDs, value objects, typed enums, state machines, domain events, command names, event names, aggregate contracts | -| `@agentic-org/contracts` | shared DTOs, public schemas, versioned API/event contracts, generated clients when needed | +| Package | Owns | +| ------------------------ | ---------------------------------------------------------------------------------------------------------------------- | +| `@agentic-org/domain` | entity IDs, value objects, typed enums, state machines, domain events, command names, event names, aggregate contracts | +| `@agentic-org/contracts` | shared DTOs, public schemas, versioned API/event contracts, generated clients when needed | The domain kernel should be small and strict. It defines language and legal transitions. It does not execute side effects. ### Layer 1: Application and Policy -| Package | Owns | -|---|---| -| `@agentic-org/application` | command handlers, handler registry, use cases, transaction orchestration, ports, command result contracts | -| `@agentic-org/policy` | RBAC, hat authority checks, OPA/Rego adapter boundary, policy decisions, denial reasons | -| `@agentic-org/observability` | correlation envelope, OpenTelemetry helpers, required span attributes, trace propagation | +| Package | Owns | +| ---------------------------- | --------------------------------------------------------------------------------------------------------- | +| `@agentic-org/application` | command handlers, handler registry, use cases, transaction orchestration, ports, command result contracts | +| `@agentic-org/policy` | RBAC, hat authority checks, OPA/Rego adapter boundary, policy decisions, denial reasons | +| `@agentic-org/observability` | correlation envelope, OpenTelemetry helpers, required span attributes, trace propagation | The application layer is the Organization OS command layer. It is where the runtime asks the Organization to do something. ### Layer 2: Capability Packages -| Package | Owns | -|---|---| -| `@agentic-org/work-os` | projects, initiatives, work items, dependencies, blockers, assignments, releases, work signals | -| `@agentic-org/requirements` | ambiguous requirement intake, clarification, BRD lifecycle, maturity state | -| `@agentic-org/documents` | BRDs, CAs, ADRs, design docs, reports, document scope, document approval state | -| `@agentic-org/gates` | readiness, code, QA, security, architecture, memory, release, and outcome gates | -| `@agentic-org/hats` | hat graph, supply, assignment, JWT issuance/refresh/revocation, succession, cooldown, warmup | -| `@agentic-org/assignments` | staffing, agent-to-hat fit, work assignment, reassignment, capacity checks | -| `@agentic-org/prompt-flows` | deterministic prompt-flow definitions, phases, phase gates, reusable procedures | -| `@agentic-org/action-grammar` | universal action grammar, reversibility, observation contracts, action-mode classification | -| `@agentic-org/knowledge-graph` | graph nodes, edges, context packs, retrieval envelopes, provenance and access envelopes | -| `@agentic-org/runtime` | triggers, rules, reaction plans, leases, schedulers, reconcilers, self-healing loops | -| `@agentic-org/ui-projections` | read models for boards, timelines, run views, evidence, reviews, observability, org map | +| Package | Owns | +| ------------------------------ | ---------------------------------------------------------------------------------------------- | +| `@agentic-org/work-os` | projects, initiatives, work items, dependencies, blockers, assignments, releases, work signals | +| `@agentic-org/requirements` | ambiguous requirement intake, clarification, BRD lifecycle, maturity state | +| `@agentic-org/documents` | BRDs, CAs, ADRs, design docs, reports, document scope, document approval state | +| `@agentic-org/gates` | readiness, code, QA, security, architecture, memory, release, and outcome gates | +| `@agentic-org/hats` | hat graph, supply, assignment, JWT issuance/refresh/revocation, succession, cooldown, warmup | +| `@agentic-org/assignments` | staffing, agent-to-hat fit, work assignment, reassignment, capacity checks | +| `@agentic-org/prompt-flows` | deterministic prompt-flow definitions, phases, phase gates, reusable procedures | +| `@agentic-org/action-grammar` | universal action grammar, reversibility, observation contracts, action-mode classification | +| `@agentic-org/knowledge-graph` | graph nodes, edges, context packs, retrieval envelopes, provenance and access envelopes | +| `@agentic-org/runtime` | triggers, rules, reaction plans, leases, schedulers, reconcilers, self-healing loops | +| `@agentic-org/ui-projections` | read models for boards, timelines, run views, evidence, reviews, observability, org map | Capability packages should be independently testable. They can expose interfaces and services, but they should not know which process is @@ -161,19 +161,20 @@ calling them. ### Layer 3: State, Messaging, and Runtime Adapters -| Package | Owns | -|---|---| -| `@agentic-org/state` | Drizzle schema, migrations, repositories, transactions, outbox, inbox, idempotency, leases | -| `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | -| `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | -| `@agentic-org/actors-dapr` | Dapr actor interfaces, actor implementations, reminders, actor state projection | -| `@agentic-org/mcp` | MCP schemas, tool registry, preflight checks, policy-checked tool handlers | -| `@agentic-org/hermes` | Hermes session adapter, run adapter, callback contract, run context builder | -| `@agentic-org/memory` | Hindsight adapter, hat-scoped recall/retain/reflect, memory attribution, memory health | -| `@agentic-org/k8s-hats` | generated or checked Hat, HatBinding, HatSwap, HatPolicy types, informers, projection decoding | -| `@agentic-org/openziti` | OpenZiti transport adapter, identity/config access, connectivity checks | -| `@agentic-org/credential-proxy` | credential request adapter, scoped credential use, audit hooks | -| `@agentic-org/adapters-agentic-services` | temporary wrappers around reused `agentic-services` primitives | +| Package | Owns | +| ---------------------------------------- | ----------------------------------------------------------------------------------------------- | +| `@agentic-org/state` | Drizzle schema, migrations, repositories, transactions, outbox, inbox, idempotency, leases | +| `@agentic-org/state-cockroach` | CockroachDB implementation of state-store ports, SQL statement catalog, and migration contracts | +| `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | +| `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | +| `@agentic-org/actors-dapr` | Dapr actor interfaces, actor implementations, reminders, actor state projection | +| `@agentic-org/mcp` | MCP schemas, tool registry, preflight checks, policy-checked tool handlers | +| `@agentic-org/hermes` | Hermes session adapter, run adapter, callback contract, run context builder | +| `@agentic-org/memory` | Hindsight adapter, hat-scoped recall/retain/reflect, memory attribution, memory health | +| `@agentic-org/k8s-hats` | generated or checked Hat, HatBinding, HatSwap, HatPolicy types, informers, projection decoding | +| `@agentic-org/openziti` | OpenZiti transport adapter, identity/config access, connectivity checks | +| `@agentic-org/credential-proxy` | credential request adapter, scoped credential use, audit hooks | +| `@agentic-org/adapters-agentic-services` | temporary wrappers around reused `agentic-services` primitives | Adapters are replaceable. The Organization should be able to run a V0 slice with in-process fakes, then swap in Temporal, Dapr, Hermes, @@ -181,14 +182,14 @@ Hindsight, Kubernetes, and NATS adapters behind the same ports. ### Layer 4: Runtime Hosts -| Runtime host | Responsibility | -|---|---| -| `apps/api` | REST/OpenAPI, internal APIs, command dispatch, read queries, auth guards | -| `apps/web` | human operations console, boards, timelines, org map, observability, review center | -| `apps/workers` | outbox publisher, schedulers, NATS consumers, reconcilers, projection builders | -| `apps/temporal-worker` | Temporal workers and activities that call Organization commands | -| `apps/dapr-actors` | Dapr actor host for hot state and reminders | -| `apps/mcp-gateway` | MCP gateway, agent context resolution, preflight checks, tool execution | +| Runtime host | Responsibility | +| ---------------------- | ---------------------------------------------------------------------------------- | +| `apps/api` | REST/OpenAPI, internal APIs, command dispatch, read queries, auth guards | +| `apps/web` | human operations console, boards, timelines, org map, observability, review center | +| `apps/workers` | outbox publisher, schedulers, NATS consumers, reconcilers, projection builders | +| `apps/temporal-worker` | Temporal workers and activities that call Organization commands | +| `apps/dapr-actors` | Dapr actor host for hot state and reminders | +| `apps/mcp-gateway` | MCP gateway, agent context resolution, preflight checks, tool execution | Runtime hosts are allowed to be deployed separately. They are not separate business services yet. @@ -238,6 +239,11 @@ instantiate the in-memory store or branch on every command type. New commands should register a handler; new persistence backends should implement the same store-factory port. +State-store ports are async at the application boundary. In-memory +adapters may resolve immediately, but CockroachDB, transactional +outbox, inbox, and lease adapters must be able to perform real I/O +without changing command-handler contracts. + ## SOLID Rules ### Single Responsibility @@ -364,18 +370,18 @@ human or agent actions. Minimum event automations: -| Event | Rule result | Follow-up command examples | -|---|---|---| -| `work_item.ready` | work needs execution or review assignment | `reserve_hat`, `assign_work`, `start_schedule_block` | -| `work_item.review_requested` | reviewer hat must be staffed | `reserve_hat`, `request_gate_review`, `send_inbox_signal` | -| `gate.code.approved` | work can move to QA if QA is required | `create_qa_work_item`, `reserve_hat`, `request_gate_review` | -| `gate.qa.approved` | work can move toward delivery/release | `create_release_task`, `request_delivery_review` | -| `gate.changes_requested` | implementer needs a bounded rework loop | `assign_rework`, `start_prompt_flow`, `send_inbox_signal` | -| `work_item.blocked` | blocker owner and escalation path required | `create_blocker`, `notify_manager`, `schedule_blocker_review` | -| `hermes_run.heartbeat_late` | runtime health needs reconciliation | `create_platform_incident`, `reconcile_run`, `notify_platform_operator` | -| `memory.gap_detected` | memory/process improvement enters backlog | `send_supervisor_signal`, `request_memory_review` | -| `credential_request.submitted` | security review is mandatory | `request_security_gate`, `send_inbox_signal` | -| `release.ready` | delivery gate and evidence check required | `request_delivery_review`, `verify_release_evidence` | +| Event | Rule result | Follow-up command examples | +| ------------------------------ | ------------------------------------------ | ----------------------------------------------------------------------- | +| `work_item.ready` | work needs execution or review assignment | `reserve_hat`, `assign_work`, `start_schedule_block` | +| `work_item.review_requested` | reviewer hat must be staffed | `reserve_hat`, `request_gate_review`, `send_inbox_signal` | +| `gate.code.approved` | work can move to QA if QA is required | `create_qa_work_item`, `reserve_hat`, `request_gate_review` | +| `gate.qa.approved` | work can move toward delivery/release | `create_release_task`, `request_delivery_review` | +| `gate.changes_requested` | implementer needs a bounded rework loop | `assign_rework`, `start_prompt_flow`, `send_inbox_signal` | +| `work_item.blocked` | blocker owner and escalation path required | `create_blocker`, `notify_manager`, `schedule_blocker_review` | +| `hermes_run.heartbeat_late` | runtime health needs reconciliation | `create_platform_incident`, `reconcile_run`, `notify_platform_operator` | +| `memory.gap_detected` | memory/process improvement enters backlog | `send_supervisor_signal`, `request_memory_review` | +| `credential_request.submitted` | security review is mandatory | `request_security_gate`, `send_inbox_signal` | +| `release.ready` | delivery gate and evidence check required | `request_delivery_review`, `verify_release_evidence` | The first V0 rule catalog should include: @@ -571,18 +577,18 @@ environment. Local fakes are useful for tests, but the real adapter contracts should point at the services that already exist in the cluster tree. -| Adapter package | Cluster dependency | Expected in-cluster target | -|---|---|---| -| `@agentic-org/state` | CockroachDB ArgoCD app | `cockroachdb-public.cockroachdb.svc.cluster.local:26257` | -| `@agentic-org/messaging` | NATS ArgoCD app with JetStream enabled | `nats.nats.svc.cluster.local:4222` | -| `@agentic-org/workflows-temporal` | Temporal ArgoCD app | `temporal-frontend.temporal.svc.cluster.local:7233` | -| `@agentic-org/actors-dapr` | Dapr control plane | Dapr sidecar plus `dapr-system` placement service | -| `@agentic-org/memory` | Hindsight OCI Helm chart | `http://hindsight.hindsight.svc.cluster.local` | -| `@agentic-org/hermes` | Hermes deployment/service | `http://hermes.hermes.svc.cluster.local` once replicas are enabled | -| `@agentic-org/openziti` | OZ/OpenZiti controller app | `https://ziti-controller.openziti.svc.cluster.local:443` | -| `@agentic-org/k8s-hats` | hat-system CRDs and operator | Kubernetes API watches plus `zeta.society.hats.>` bridge input | -| `@agentic-org/observability` | Alloy, Tempo, Loki, Mimir, kube-prometheus-stack | OTLP traces to Alloy/Tempo, logs to Loki, metrics to Prometheus/Mimir | -| `@agentic-org/policy` | OPA Gatekeeper and Organization policy package | in-process policy first, OPA bundle/constraint adapters later | +| Adapter package | Cluster dependency | Expected in-cluster target | +| --------------------------------- | ------------------------------------------------ | --------------------------------------------------------------------- | +| `@agentic-org/state` | CockroachDB ArgoCD app | `cockroachdb-public.cockroachdb.svc.cluster.local:26257` | +| `@agentic-org/messaging` | NATS ArgoCD app with JetStream enabled | `nats.nats.svc.cluster.local:4222` | +| `@agentic-org/workflows-temporal` | Temporal ArgoCD app | `temporal-frontend.temporal.svc.cluster.local:7233` | +| `@agentic-org/actors-dapr` | Dapr control plane | Dapr sidecar plus `dapr-system` placement service | +| `@agentic-org/memory` | Hindsight OCI Helm chart | `http://hindsight.hindsight.svc.cluster.local` | +| `@agentic-org/hermes` | Hermes deployment/service | `http://hermes.hermes.svc.cluster.local` once replicas are enabled | +| `@agentic-org/openziti` | OZ/OpenZiti controller app | `https://ziti-controller.openziti.svc.cluster.local:443` | +| `@agentic-org/k8s-hats` | hat-system CRDs and operator | Kubernetes API watches plus `zeta.society.hats.>` bridge input | +| `@agentic-org/observability` | Alloy, Tempo, Loki, Mimir, kube-prometheus-stack | OTLP traces to Alloy/Tempo, logs to Loki, metrics to Prometheus/Mimir | +| `@agentic-org/policy` | OPA Gatekeeper and Organization policy package | in-process policy first, OPA bundle/constraint adapters later | Adapter configuration should use environment variables and Kubernetes Secrets/ExternalSecrets, but the domain package should never see those @@ -619,10 +625,10 @@ at wave `0`, Hindsight and Temporal at wave `10`, and Hermes at wave Recommended deployment split: -| Application | Wave | Purpose | -|---|---:|---| +| Application | Wave | Purpose | +| -------------------------------- | ----------: | -------------------------------------------------------------------------------------------------- | | `agentic-organization-contracts` | `-5` or `0` | optional future CRDs, NATS stream definitions, schema/config resources that other apps may consume | -| `agentic-organization` | `30` | API, web, workers, Temporal worker, Dapr actor host, MCP gateway | +| `agentic-organization` | `30` | API, web, workers, Temporal worker, Dapr actor host, MCP gateway | If V0 ships no CRDs and only consumes existing services, one `agentic-organization` app at wave `30` is enough. If it later adds CRDs @@ -635,14 +641,14 @@ early. The first ArgoCD app should deploy one namespace and several workloads from the same image or image family: -| Workload | Kubernetes shape | Notes | -|---|---|---| -| API | Deployment + ClusterIP Service | REST/OpenAPI, internal command API, read API | -| Web | Deployment + ClusterIP Service/Gateway route | operations console | -| Workers | Deployment | outbox publisher, reconcilers, schedulers, NATS consumers | -| Temporal worker | Deployment | workflow and activity workers only | -| Dapr actor host | Deployment with Dapr annotations | actor endpoints and reminders | -| MCP gateway | Deployment + ClusterIP Service | Hermes-facing governed tool surface | +| Workload | Kubernetes shape | Notes | +| --------------- | -------------------------------------------- | --------------------------------------------------------- | +| API | Deployment + ClusterIP Service | REST/OpenAPI, internal command API, read API | +| Web | Deployment + ClusterIP Service/Gateway route | operations console | +| Workers | Deployment | outbox publisher, reconcilers, schedulers, NATS consumers | +| Temporal worker | Deployment | workflow and activity workers only | +| Dapr actor host | Deployment with Dapr annotations | actor endpoints and reminders | +| MCP gateway | Deployment + ClusterIP Service | Hermes-facing governed tool surface | All workloads need: @@ -684,11 +690,11 @@ each substrate becomes live. The same package architecture should run in three modes: -| Mode | Purpose | Runtime adapters | -|---|---|---| -| unit/test | package and command tests | in-memory/fake adapters | -| local dev cluster | k3d/K3S parity with `full-ai-cluster` apps | real NATS/Cockroach when available, fake Hermes/hat-system if needed | -| full cluster | production-like AI cluster | real CockroachDB, NATS, Hindsight, Hermes, OpenZiti, hat-system, Temporal, Dapr | +| Mode | Purpose | Runtime adapters | +| ----------------- | ------------------------------------------ | ------------------------------------------------------------------------------- | +| unit/test | package and command tests | in-memory/fake adapters | +| local dev cluster | k3d/K3S parity with `full-ai-cluster` apps | real NATS/Cockroach when available, fake Hermes/hat-system if needed | +| full cluster | production-like AI cluster | real CockroachDB, NATS, Hindsight, Hermes, OpenZiti, hat-system, Temporal, Dapr | Do not create a Docker Compose architecture that diverges from `full-ai-cluster`. Local development can use fakes or a dev cluster, but @@ -752,6 +758,7 @@ other work. - `@agentic-org/domain`; - `@agentic-org/application`; - `@agentic-org/state`; + - `@agentic-org/state-cockroach`; - `@agentic-org/policy`; - `@agentic-org/messaging`; - `@agentic-org/observability`; diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md index 4304dd936b..37a4f34893 100644 --- a/agentic-organization/packages/README.md +++ b/agentic-organization/packages/README.md @@ -7,14 +7,16 @@ worker, Dapr actor host, or Kubernetes deployment is introduced. ## Package Boundary -| Package | Current responsibility | -| --------------- | -------------------------------------------------------------------------------------------------------------------------- | -| `domain` | typed command names, event names, aggregate names, work item state machine, event envelope, shared records | -| `application` | command pipeline, handler registry, idempotency handling, state-store ports, first supervisor-chain signal command handler | -| `state` | in-memory Organization state-store factory used as the first repository port fake | -| `messaging` | NATS subject contract without a live NATS dependency | -| `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes | -| `runtime` | first event-to-automation reaction rule | +| Package | Current responsibility | +| ----------------- | -------------------------------------------------------------------------------------------------------------------------- | +| `domain` | typed command names, event names, aggregate names, work item state machine, event envelope, shared records | +| `application` | command pipeline, handler registry, idempotency handling, state-store ports, first supervisor-chain signal command handler | +| `state` | in-memory Organization state-store factory used as the first repository port fake | +| `state-cockroach` | CockroachDB state-store factory contract, SQL statement catalog, and first core-state migration skeleton | +| `messaging` | NATS subject contract without a live NATS dependency | +| `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes | +| `runtime` | first event-to-automation reaction rule | +| `governance` | package dependency-boundary checks that keep core packages SOLID and adapter-free | ## Slice Rule @@ -41,6 +43,10 @@ package implements the current in-memory factory. Command routing uses a handler registry so new commands add handlers instead of editing a central `switch` or `if` dispatcher. +`CommandStateStore` is async even when backed by the in-memory fake. The +real CockroachDB adapter must not be squeezed into a synchronous toy +shape. + ## Validation Run the package tests from `agentic-organization/`: diff --git a/agentic-organization/packages/application/src/command-handler-registry.ts b/agentic-organization/packages/application/src/command-handler-registry.ts index a06753b865..ff3c19918a 100644 --- a/agentic-organization/packages/application/src/command-handler-registry.ts +++ b/agentic-organization/packages/application/src/command-handler-registry.ts @@ -11,7 +11,7 @@ export type CommandExecutionContext = Clock & export type CommandHandler = { commandType: Command["type"]; - execute: (command: Command, context: CommandExecutionContext) => Result; + execute: (command: Command, context: CommandExecutionContext) => Promise; }; export type CommandHandlerRegistry = { diff --git a/agentic-organization/packages/application/src/command-pipeline.test.ts b/agentic-organization/packages/application/src/command-pipeline.test.ts index 60324dfee4..48fa57e79a 100644 --- a/agentic-organization/packages/application/src/command-pipeline.test.ts +++ b/agentic-organization/packages/application/src/command-pipeline.test.ts @@ -33,7 +33,7 @@ const command: PipelineCommand = { }; describe("command pipeline idempotency", () => { - test("replaying the same idempotency key returns the stored result", () => { + test("replaying the same idempotency key returns the stored result", async () => { const stateStoreFactory = createInMemoryOrganizationStoreFactory(); const pipeline = createCommandPipeline({ stateStoreFactory, @@ -42,8 +42,8 @@ describe("command pipeline idempotency", () => { createId: (prefix) => `${prefix}-001`, }); - const firstResult = pipeline.execute(command); - const replayResult = pipeline.execute(command); + const firstResult = await pipeline.execute(command); + const replayResult = await pipeline.execute(command); equal(firstResult.status, CommandResultStatus.Accepted); equal(replayResult.status, CommandResultStatus.Accepted); @@ -59,7 +59,7 @@ describe("command pipeline idempotency", () => { equal(stateStoreFactory.snapshot.outboxEvents.length, 1); }); - test("rejects conflicting reuse of the same idempotency key", () => { + test("rejects conflicting reuse of the same idempotency key", async () => { const stateStoreFactory = createInMemoryOrganizationStoreFactory(); const pipeline = createCommandPipeline({ stateStoreFactory, @@ -68,8 +68,8 @@ describe("command pipeline idempotency", () => { createId: (prefix) => `${prefix}-001`, }); - const firstResult = pipeline.execute(command); - const conflictResult = pipeline.execute({ + const firstResult = await pipeline.execute(command); + const conflictResult = await pipeline.execute({ ...command, requestHash: "hash-supervisor-signal-conflict", title: "Different supervisor signal", diff --git a/agentic-organization/packages/application/src/command-pipeline.ts b/agentic-organization/packages/application/src/command-pipeline.ts index 298874c031..15483b7e26 100644 --- a/agentic-organization/packages/application/src/command-pipeline.ts +++ b/agentic-organization/packages/application/src/command-pipeline.ts @@ -6,7 +6,7 @@ import type { Clock, CommandStateStore, CommandStateStoreFactory, IdGenerator } export type PipelineCommand = SendSupervisorSignalCommand; export type CommandPipeline = { - execute: (command: PipelineCommand) => CommandResult; + execute: (command: PipelineCommand) => Promise; }; export type CommandPipelineDependencies = Clock & @@ -23,12 +23,12 @@ export function createCommandPipeline(dependencies: CommandPipelineDependencies) }; } -function executeCommand( +async function executeCommand( command: PipelineCommand, store: CommandStateStore, dependencies: CommandPipelineDependencies, -): CommandResult { - const existingRecord = store.findIdempotencyRecord(command.idempotencyKey); +): Promise { + const existingRecord = await store.findIdempotencyRecord(command.idempotencyKey); if (existingRecord?.requestHash === command.requestHash) { return { @@ -52,8 +52,8 @@ function executeCommand( }; } - const result = dispatchCommand(command, store, dependencies); - store.saveIdempotencyRecord({ + const result = await dispatchCommand(command, store, dependencies); + await store.saveIdempotencyRecord({ idempotencyKey: command.idempotencyKey, requestHash: command.requestHash, result, @@ -62,15 +62,15 @@ function executeCommand( return result; } -function dispatchCommand( +async function dispatchCommand( command: PipelineCommand, store: CommandStateStore, dependencies: CommandPipelineDependencies, -): CommandResult { +): Promise { const handler = dependencies.handlerRegistry.resolveHandler(command.type); if (handler !== undefined) { - return handler.execute(command, { + return await handler.execute(command, { ...dependencies, store, }); diff --git a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts index 4b124147d9..2c9068c182 100644 --- a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts +++ b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts @@ -38,11 +38,11 @@ const command: SendSupervisorSignalCommand = { }; describe("send supervisor signal handler", () => { - test("persists chain communication, audit event, and outbox event atomically", () => { + test("persists chain communication, audit event, and outbox event atomically", async () => { const stateStoreFactory = createInMemoryOrganizationStoreFactory(); const store = stateStoreFactory.createCommandStateStore(); - const result = sendSupervisorSignal(command, { + const result = await sendSupervisorSignal(command, { store, now: () => "2026-05-25T20:00:00.000Z", createId: (prefix) => `${prefix}-001`, diff --git a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts index 89401e24ca..05660f5029 100644 --- a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts +++ b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts @@ -57,10 +57,10 @@ export function createSendSupervisorSignalHandler(): CommandHandler { const occurredAt = dependencies.now(); const supervisorSignal: SupervisorSignal = { supervisorSignalId: dependencies.createId(IdPrefix.SupervisorSignal), @@ -123,9 +123,9 @@ export function sendSupervisorSignal( }), }; - dependencies.store.appendSupervisorSignal(supervisorSignal); - dependencies.store.appendAuditEvent(auditEvent); - dependencies.store.appendOutboxEvent(outboxEvent); + await dependencies.store.appendSupervisorSignal(supervisorSignal); + await dependencies.store.appendAuditEvent(auditEvent); + await dependencies.store.appendOutboxEvent(outboxEvent); return { status: CommandResultStatus.Accepted, diff --git a/agentic-organization/packages/application/src/ports.ts b/agentic-organization/packages/application/src/ports.ts index a7d71c8b6a..2f8778b841 100644 --- a/agentic-organization/packages/application/src/ports.ts +++ b/agentic-organization/packages/application/src/ports.ts @@ -9,20 +9,20 @@ export type IdGenerator = { }; export type IdempotencyRecordStore = { - findIdempotencyRecord: (idempotencyKey: string) => IdempotencyRecord | undefined; - saveIdempotencyRecord: (record: IdempotencyRecord) => void; + findIdempotencyRecord: (idempotencyKey: string) => Promise | undefined>; + saveIdempotencyRecord: (record: IdempotencyRecord) => Promise; }; export type SupervisorSignalStore = { - appendSupervisorSignal: (supervisorSignal: SupervisorSignal) => void; + appendSupervisorSignal: (supervisorSignal: SupervisorSignal) => Promise; }; export type AuditEventStore = { - appendAuditEvent: (auditEvent: AuditEvent) => void; + appendAuditEvent: (auditEvent: AuditEvent) => Promise; }; export type OutboxEventStore = { - appendOutboxEvent: (outboxEvent: OutboxEvent) => void; + appendOutboxEvent: (outboxEvent: OutboxEvent) => Promise; }; export type CommandStateStore = IdempotencyRecordStore & diff --git a/agentic-organization/packages/governance/src/index.ts b/agentic-organization/packages/governance/src/index.ts new file mode 100644 index 0000000000..4e0ff163ae --- /dev/null +++ b/agentic-organization/packages/governance/src/index.ts @@ -0,0 +1,7 @@ +export { + PackageBoundaryRule, + validatePackageDependencyBoundaries, + type PackageDependencyBoundaryRule, + type PackageDependencyBoundaryViolation, + type ValidatePackageDependencyBoundariesInput, +} from "./package-dependency-boundaries.ts"; diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts new file mode 100644 index 0000000000..c954529300 --- /dev/null +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts @@ -0,0 +1,33 @@ +import { equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { PackageBoundaryRule, validatePackageDependencyBoundaries } from "./package-dependency-boundaries.ts"; + +describe("package dependency boundaries", () => { + test("keeps application independent from state and runtime adapters", async () => { + const violations = await validatePackageDependencyBoundaries({ + rootDirectory: new URL("../..", import.meta.url), + rules: [ + { + packageName: PackageBoundaryRule.Application, + sourceGlob: "application/src/**/*.ts", + forbiddenImportFragments: [ + "../../state", + "../../../state", + "state-cockroach", + "nestjs", + "@nestjs", + "nats", + "dapr", + "temporal", + "drizzle", + "pg", + "postgres", + ], + }, + ], + }); + + equal(violations.length, 0, violations.map((violation) => violation.message).join("\n")); + }); +}); diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts new file mode 100644 index 0000000000..01df149e12 --- /dev/null +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts @@ -0,0 +1,118 @@ +import { readdir, readFile } from "node:fs/promises"; +import { join, relative, sep } from "node:path"; +import { fileURLToPath } from "node:url"; + +export const PackageBoundaryRule = { + Application: "application", +} as const; + +export type PackageBoundaryRule = (typeof PackageBoundaryRule)[keyof typeof PackageBoundaryRule]; + +export type PackageDependencyBoundaryRule = { + packageName: PackageBoundaryRule; + sourceGlob: string; + forbiddenImportFragments: readonly string[]; +}; + +export type PackageDependencyBoundaryViolation = { + packageName: PackageBoundaryRule; + filePath: string; + importFragment: string; + message: string; +}; + +export type ValidatePackageDependencyBoundariesInput = { + rootDirectory: URL; + rules: readonly PackageDependencyBoundaryRule[]; +}; + +const TypeScriptSourceExtension = ".ts"; +const TestSourceExtension = ".test.ts"; +const RecursiveTypeScriptGlobSuffix = "/**/*.ts"; + +export async function validatePackageDependencyBoundaries( + input: ValidatePackageDependencyBoundariesInput, +): Promise { + const rootDirectoryPath = fileURLToPath(input.rootDirectory); + const violations: PackageDependencyBoundaryViolation[] = []; + + for (const rule of input.rules) { + const sourceFiles = await findSourceFiles(rootDirectoryPath, rule.sourceGlob); + + for (const sourceFile of sourceFiles) { + const sourceText = await readFile(sourceFile, "utf8"); + const importSpecifiers = extractImportSpecifiers(sourceText); + + for (const importSpecifier of importSpecifiers) { + for (const forbiddenImportFragment of rule.forbiddenImportFragments) { + if (importSpecifier.includes(forbiddenImportFragment)) { + violations.push({ + packageName: rule.packageName, + filePath: normalizePath(relative(rootDirectoryPath, sourceFile)), + importFragment: importSpecifier, + message: `${rule.packageName} may not import ${importSpecifier} from ${normalizePath( + relative(rootDirectoryPath, sourceFile), + )}`, + }); + } + } + } + } + } + + return violations; +} + +async function findSourceFiles(rootDirectoryPath: string, sourceGlob: string): Promise { + if (!sourceGlob.endsWith(RecursiveTypeScriptGlobSuffix)) { + throw new Error(`unsupported source glob: ${sourceGlob}`); + } + + const sourceRoot = join(rootDirectoryPath, sourceGlob.slice(0, -RecursiveTypeScriptGlobSuffix.length)); + const sourceFiles = await collectTypeScriptSourceFiles(sourceRoot); + return sourceFiles.filter((sourceFile) => !sourceFile.endsWith(TestSourceExtension)); +} + +async function collectTypeScriptSourceFiles(directoryPath: string): Promise { + const directoryEntries = await readdir(directoryPath, { + withFileTypes: true, + }); + const sourceFiles: string[] = []; + + for (const directoryEntry of directoryEntries) { + const entryPath = join(directoryPath, directoryEntry.name); + + if (directoryEntry.isDirectory()) { + sourceFiles.push(...(await collectTypeScriptSourceFiles(entryPath))); + continue; + } + + if (directoryEntry.isFile() && entryPath.endsWith(TypeScriptSourceExtension)) { + sourceFiles.push(entryPath); + } + } + + return sourceFiles; +} + +function extractImportSpecifiers(sourceText: string): string[] { + const importSpecifiers: string[] = []; + const importExpression = /import\s+(?:type\s+)?(?:[\s\S]*?\s+from\s+)?["']([^"']+)["']/g; + let match = importExpression.exec(sourceText); + + while (match !== null) { + const importSpecifier = match[1]; + + if (importSpecifier !== undefined) { + importSpecifiers.push(importSpecifier); + } + + match = importExpression.exec(sourceText); + } + + return importSpecifiers; +} + +function normalizePath(path: string): string { + return path.split(sep).join("/"); +} diff --git a/agentic-organization/packages/state-cockroach/migrations/0001_agentic_org_core_state.sql b/agentic-organization/packages/state-cockroach/migrations/0001_agentic_org_core_state.sql new file mode 100644 index 0000000000..d905f5cf3c --- /dev/null +++ b/agentic-organization/packages/state-cockroach/migrations/0001_agentic_org_core_state.sql @@ -0,0 +1,57 @@ +CREATE TABLE IF NOT EXISTS agentic_org_work_items ( + work_item_id STRING PRIMARY KEY, + organization_id STRING NOT NULL, + project_id STRING NOT NULL, + title STRING NOT NULL, + description STRING NOT NULL, + state STRING NOT NULL, + created_at TIMESTAMPTZ NOT NULL, + created_by_agent_id STRING NOT NULL, + created_by_hat_assignment_id STRING NOT NULL +); + +CREATE TABLE IF NOT EXISTS agentic_org_supervisor_signals ( + supervisor_signal_id STRING PRIMARY KEY, + organization_id STRING NOT NULL, + project_id STRING NOT NULL, + team_id STRING NOT NULL, + source_level STRING NOT NULL, + target_level STRING NOT NULL, + target_hat_assignment_id STRING NOT NULL, + sender_agent_id STRING NOT NULL, + sender_hat_assignment_id STRING NOT NULL, + tool_type STRING NOT NULL, + status STRING NOT NULL, + title STRING NOT NULL, + message STRING NOT NULL, + related_work_item_id STRING NOT NULL, + created_at TIMESTAMPTZ NOT NULL +); + +CREATE TABLE IF NOT EXISTS agentic_org_audit_events ( + audit_event_id STRING PRIMARY KEY, + event_name STRING NOT NULL, + aggregate_id STRING NOT NULL, + actor_agent_id STRING NOT NULL, + actor_hat_assignment_id STRING NOT NULL, + occurred_at TIMESTAMPTZ NOT NULL +); + +CREATE TABLE IF NOT EXISTS agentic_org_outbox_events ( + outbox_event_id STRING PRIMARY KEY, + event_id STRING NOT NULL UNIQUE, + event_type STRING NOT NULL, + organization_id STRING NOT NULL, + project_id STRING NOT NULL, + work_item_id STRING NOT NULL, + trace_id STRING NOT NULL, + correlation_id STRING NOT NULL, + envelope_json JSONB NOT NULL, + published_at TIMESTAMPTZ +); + +CREATE TABLE IF NOT EXISTS agentic_org_idempotency_records ( + idempotency_key STRING PRIMARY KEY, + request_hash STRING NOT NULL, + result_json JSONB NOT NULL +); diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.test.ts b/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.test.ts new file mode 100644 index 0000000000..74508ce2dd --- /dev/null +++ b/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.test.ts @@ -0,0 +1,137 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { CommandResultStatus, type CommandResult } from "../../application/src/index.ts"; +import { + AgenticAggregateType, + AgenticEventType, + SupervisorChainLevel, + SupervisorSignalStatus, + SupervisorSignalToolType, +} from "../../domain/src/index.ts"; +import { + CockroachCommandStateStoreStatement, + createCockroachCommandStateStoreFactory, + type CockroachSqlExecutor, +} from "./cockroach-command-state-store.ts"; + +describe("cockroach command state store", () => { + test("implements command-state-store operations behind a SQL executor", async () => { + const executor = createRecordingExecutor(); + const factory = createCockroachCommandStateStoreFactory({ + executor, + }); + const store = factory.createCommandStateStore(); + + equal(await store.findIdempotencyRecord("idem-001"), undefined); + + await store.appendSupervisorSignal({ + supervisorSignalId: "supervisor-signal-001", + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatAssignmentId: "hat-assignment-em-001", + sender: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + toolType: SupervisorSignalToolType.ReportBlocker, + status: SupervisorSignalStatus.Sent, + title: "Blocked on scoped NATS publisher", + message: "Need a scoped publisher decision.", + relatedWorkItemId: "work-outbox-001", + createdAt: "2026-05-25T20:00:00.000Z", + }); + + await store.appendAuditEvent({ + auditEventId: "audit-001", + eventName: AgenticEventType.SupervisorSignalSent, + aggregateId: "supervisor-signal-001", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + occurredAt: "2026-05-25T20:00:00.000Z", + }); + + await store.appendOutboxEvent({ + outboxEventId: "outbox-001", + envelope: { + eventId: "evt-001", + eventType: AgenticEventType.SupervisorSignalSent, + schemaVersion: "agentic.org.event.v1", + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-001", + correlationId: "corr-001", + causationId: "cause-001", + traceId: "trace-001", + idempotencyKey: "idem-001", + }, + replay: { + isReplay: false, + }, + payload: { + title: "Blocked on scoped NATS publisher", + }, + }, + }); + + await store.saveIdempotencyRecord({ + idempotencyKey: "idem-001", + requestHash: "hash-001", + result: { + status: CommandResultStatus.Accepted, + idempotency: { + replayed: false, + }, + }, + }); + + deepEqual( + executor.statements.map((statement) => statement.name), + [ + CockroachCommandStateStoreStatement.FindIdempotencyRecord, + CockroachCommandStateStoreStatement.InsertSupervisorSignal, + CockroachCommandStateStoreStatement.InsertAuditEvent, + CockroachCommandStateStoreStatement.InsertOutboxEvent, + CockroachCommandStateStoreStatement.UpsertIdempotencyRecord, + ], + ); + }); +}); + +type RecordingCockroachSqlExecutor = CockroachSqlExecutor & { + statements: { name: CockroachCommandStateStoreStatement; parameters: readonly unknown[] }[]; +}; + +function createRecordingExecutor(): RecordingCockroachSqlExecutor { + const statements: { name: CockroachCommandStateStoreStatement; parameters: readonly unknown[] }[] = []; + + return { + statements, + execute: async (statement) => { + statements.push(statement); + return { + rows: [], + }; + }, + }; +} diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts b/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts new file mode 100644 index 0000000000..93403076a6 --- /dev/null +++ b/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts @@ -0,0 +1,186 @@ +import type { CommandStateStore, CommandStateStoreFactory } from "../../application/src/ports.ts"; +import { CockroachTableName } from "./cockroach-schema.ts"; + +export const CockroachCommandStateStoreStatement = { + FindIdempotencyRecord: "find_idempotency_record", + UpsertIdempotencyRecord: "upsert_idempotency_record", + InsertSupervisorSignal: "insert_supervisor_signal", + InsertAuditEvent: "insert_audit_event", + InsertOutboxEvent: "insert_outbox_event", +} as const; + +export type CockroachCommandStateStoreStatement = + (typeof CockroachCommandStateStoreStatement)[keyof typeof CockroachCommandStateStoreStatement]; + +export type CockroachSqlStatement = { + name: CockroachCommandStateStoreStatement; + sql: string; + parameters: readonly unknown[]; +}; + +export type CockroachSqlResult> = { + rows: readonly Row[]; +}; + +export type CockroachSqlExecutor = { + execute: >(statement: CockroachSqlStatement) => Promise>; +}; + +export type CreateCockroachCommandStateStoreFactoryInput = { + executor: CockroachSqlExecutor; +}; + +export function createCockroachCommandStateStoreFactory( + input: CreateCockroachCommandStateStoreFactoryInput, +): CommandStateStoreFactory { + return { + createCommandStateStore: () => createCockroachCommandStateStore(input.executor), + }; +} + +function createCockroachCommandStateStore(executor: CockroachSqlExecutor): CommandStateStore { + return { + findIdempotencyRecord: async (idempotencyKey) => { + const result = await executor.execute({ + name: CockroachCommandStateStoreStatement.FindIdempotencyRecord, + sql: CockroachCommandStateStoreSql.FindIdempotencyRecord, + parameters: [idempotencyKey], + }); + const row = result.rows[0]; + + if (row === undefined) { + return undefined; + } + + return { + idempotencyKey: row.idempotency_key, + requestHash: row.request_hash, + result: row.result_json as Result, + }; + }, + saveIdempotencyRecord: async (record) => { + await executor.execute({ + name: CockroachCommandStateStoreStatement.UpsertIdempotencyRecord, + sql: CockroachCommandStateStoreSql.UpsertIdempotencyRecord, + parameters: [record.idempotencyKey, record.requestHash, record.result], + }); + }, + appendSupervisorSignal: async (supervisorSignal) => { + await executor.execute({ + name: CockroachCommandStateStoreStatement.InsertSupervisorSignal, + sql: CockroachCommandStateStoreSql.InsertSupervisorSignal, + parameters: [ + supervisorSignal.supervisorSignalId, + supervisorSignal.organizationId, + supervisorSignal.projectId, + supervisorSignal.teamId, + supervisorSignal.sourceLevel, + supervisorSignal.targetLevel, + supervisorSignal.targetHatAssignmentId, + supervisorSignal.sender.agentId, + supervisorSignal.sender.hatAssignmentId, + supervisorSignal.toolType, + supervisorSignal.status, + supervisorSignal.title, + supervisorSignal.message, + supervisorSignal.relatedWorkItemId, + supervisorSignal.createdAt, + ], + }); + }, + appendAuditEvent: async (auditEvent) => { + await executor.execute({ + name: CockroachCommandStateStoreStatement.InsertAuditEvent, + sql: CockroachCommandStateStoreSql.InsertAuditEvent, + parameters: [ + auditEvent.auditEventId, + auditEvent.eventName, + auditEvent.aggregateId, + auditEvent.actor.agentId, + auditEvent.actor.hatAssignmentId, + auditEvent.occurredAt, + ], + }); + }, + appendOutboxEvent: async (outboxEvent) => { + await executor.execute({ + name: CockroachCommandStateStoreStatement.InsertOutboxEvent, + sql: CockroachCommandStateStoreSql.InsertOutboxEvent, + parameters: [ + outboxEvent.outboxEventId, + outboxEvent.envelope.eventId, + outboxEvent.envelope.eventType, + outboxEvent.envelope.scope.organizationId, + outboxEvent.envelope.scope.projectId, + outboxEvent.envelope.scope.workItemId, + outboxEvent.envelope.trace.traceId, + outboxEvent.envelope.trace.correlationId, + outboxEvent.envelope, + ], + }); + }, + }; +} + +type IdempotencyRecordRow = { + idempotency_key: string; + request_hash: string; + result_json: unknown; +}; + +const CockroachCommandStateStoreSql = { + FindIdempotencyRecord: ` + SELECT idempotency_key, request_hash, result_json + FROM ${CockroachTableName.IdempotencyRecords} + WHERE idempotency_key = $1 + `, + UpsertIdempotencyRecord: ` + UPSERT INTO ${CockroachTableName.IdempotencyRecords} ( + idempotency_key, + request_hash, + result_json + ) VALUES ($1, $2, $3) + `, + InsertSupervisorSignal: ` + INSERT INTO ${CockroachTableName.SupervisorSignals} ( + supervisor_signal_id, + organization_id, + project_id, + team_id, + source_level, + target_level, + target_hat_assignment_id, + sender_agent_id, + sender_hat_assignment_id, + tool_type, + status, + title, + message, + related_work_item_id, + created_at + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15) + `, + InsertAuditEvent: ` + INSERT INTO ${CockroachTableName.AuditEvents} ( + audit_event_id, + event_name, + aggregate_id, + actor_agent_id, + actor_hat_assignment_id, + occurred_at + ) VALUES ($1, $2, $3, $4, $5, $6) + `, + InsertOutboxEvent: ` + INSERT INTO ${CockroachTableName.OutboxEvents} ( + outbox_event_id, + event_id, + event_type, + organization_id, + project_id, + work_item_id, + trace_id, + correlation_id, + envelope_json + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) + `, +} as const; diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-schema.test.ts b/agentic-organization/packages/state-cockroach/src/cockroach-schema.test.ts new file mode 100644 index 0000000000..7eba2ff572 --- /dev/null +++ b/agentic-organization/packages/state-cockroach/src/cockroach-schema.test.ts @@ -0,0 +1,25 @@ +import { equal, ok } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + CockroachCoreStateMigrationName, + CockroachTableName, + createCockroachCoreStateMigration, +} from "./cockroach-schema.ts"; + +describe("cockroach core state schema", () => { + test("declares the first authoritative state, audit, outbox, and idempotency tables", () => { + const migration = createCockroachCoreStateMigration(); + + equal(migration.name, CockroachCoreStateMigrationName.CoreStateV1); + ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.WorkItems}`)); + ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.SupervisorSignals}`)); + ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.AuditEvents}`)); + ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.OutboxEvents}`)); + ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.IdempotencyRecords}`)); + ok(migration.sql.includes("trace_id STRING NOT NULL")); + ok(migration.sql.includes("correlation_id STRING NOT NULL")); + ok(migration.sql.includes("envelope_json JSONB NOT NULL")); + ok(migration.sql.includes("result_json JSONB NOT NULL")); + }); +}); diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-schema.ts b/agentic-organization/packages/state-cockroach/src/cockroach-schema.ts new file mode 100644 index 0000000000..4d88de366a --- /dev/null +++ b/agentic-organization/packages/state-cockroach/src/cockroach-schema.ts @@ -0,0 +1,107 @@ +export const CockroachCoreStateMigrationName = { + CoreStateV1: "0001_agentic_org_core_state", +} as const; + +export type CockroachCoreStateMigrationName = + (typeof CockroachCoreStateMigrationName)[keyof typeof CockroachCoreStateMigrationName]; + +export const CockroachTableName = { + WorkItems: "agentic_org_work_items", + SupervisorSignals: "agentic_org_supervisor_signals", + AuditEvents: "agentic_org_audit_events", + OutboxEvents: "agentic_org_outbox_events", + IdempotencyRecords: "agentic_org_idempotency_records", +} as const; + +export type CockroachTableName = (typeof CockroachTableName)[keyof typeof CockroachTableName]; + +export type CockroachSchemaMigration = { + name: CockroachCoreStateMigrationName; + sql: string; +}; + +export function createCockroachCoreStateMigration(): CockroachSchemaMigration { + return { + name: CockroachCoreStateMigrationName.CoreStateV1, + sql: [ + createWorkItemsTableSql(), + createSupervisorSignalsTableSql(), + createAuditEventsTableSql(), + createOutboxEventsTableSql(), + createIdempotencyRecordsTableSql(), + ].join("\n\n"), + }; +} + +function createWorkItemsTableSql(): string { + return ` +CREATE TABLE IF NOT EXISTS ${CockroachTableName.WorkItems} ( + work_item_id STRING PRIMARY KEY, + organization_id STRING NOT NULL, + project_id STRING NOT NULL, + title STRING NOT NULL, + description STRING NOT NULL, + state STRING NOT NULL, + created_at TIMESTAMPTZ NOT NULL, + created_by_agent_id STRING NOT NULL, + created_by_hat_assignment_id STRING NOT NULL +);`.trim(); +} + +function createSupervisorSignalsTableSql(): string { + return ` +CREATE TABLE IF NOT EXISTS ${CockroachTableName.SupervisorSignals} ( + supervisor_signal_id STRING PRIMARY KEY, + organization_id STRING NOT NULL, + project_id STRING NOT NULL, + team_id STRING NOT NULL, + source_level STRING NOT NULL, + target_level STRING NOT NULL, + target_hat_assignment_id STRING NOT NULL, + sender_agent_id STRING NOT NULL, + sender_hat_assignment_id STRING NOT NULL, + tool_type STRING NOT NULL, + status STRING NOT NULL, + title STRING NOT NULL, + message STRING NOT NULL, + related_work_item_id STRING NOT NULL, + created_at TIMESTAMPTZ NOT NULL +);`.trim(); +} + +function createAuditEventsTableSql(): string { + return ` +CREATE TABLE IF NOT EXISTS ${CockroachTableName.AuditEvents} ( + audit_event_id STRING PRIMARY KEY, + event_name STRING NOT NULL, + aggregate_id STRING NOT NULL, + actor_agent_id STRING NOT NULL, + actor_hat_assignment_id STRING NOT NULL, + occurred_at TIMESTAMPTZ NOT NULL +);`.trim(); +} + +function createOutboxEventsTableSql(): string { + return ` +CREATE TABLE IF NOT EXISTS ${CockroachTableName.OutboxEvents} ( + outbox_event_id STRING PRIMARY KEY, + event_id STRING NOT NULL UNIQUE, + event_type STRING NOT NULL, + organization_id STRING NOT NULL, + project_id STRING NOT NULL, + work_item_id STRING NOT NULL, + trace_id STRING NOT NULL, + correlation_id STRING NOT NULL, + envelope_json JSONB NOT NULL, + published_at TIMESTAMPTZ +);`.trim(); +} + +function createIdempotencyRecordsTableSql(): string { + return ` +CREATE TABLE IF NOT EXISTS ${CockroachTableName.IdempotencyRecords} ( + idempotency_key STRING PRIMARY KEY, + request_hash STRING NOT NULL, + result_json JSONB NOT NULL +);`.trim(); +} diff --git a/agentic-organization/packages/state-cockroach/src/index.ts b/agentic-organization/packages/state-cockroach/src/index.ts new file mode 100644 index 0000000000..c98453bb3a --- /dev/null +++ b/agentic-organization/packages/state-cockroach/src/index.ts @@ -0,0 +1,14 @@ +export { + CockroachCommandStateStoreStatement, + createCockroachCommandStateStoreFactory, + type CockroachSqlExecutor, + type CockroachSqlResult, + type CockroachSqlStatement, + type CreateCockroachCommandStateStoreFactoryInput, +} from "./cockroach-command-state-store.ts"; +export { + CockroachCoreStateMigrationName, + CockroachTableName, + createCockroachCoreStateMigration, + type CockroachSchemaMigration, +} from "./cockroach-schema.ts"; diff --git a/agentic-organization/packages/state/src/in-memory-organization-store.ts b/agentic-organization/packages/state/src/in-memory-organization-store.ts index 5155ea72f7..42ee7adaa5 100644 --- a/agentic-organization/packages/state/src/in-memory-organization-store.ts +++ b/agentic-organization/packages/state/src/in-memory-organization-store.ts @@ -59,17 +59,17 @@ function createCommandStateStore( snapshot: MutableInMemoryOrganizationStoreSnapshot, ): CommandStateStore { return { - findIdempotencyRecord: (idempotencyKey) => snapshot.idempotencyRecords.get(idempotencyKey), - saveIdempotencyRecord: (record) => { + findIdempotencyRecord: async (idempotencyKey) => snapshot.idempotencyRecords.get(idempotencyKey), + saveIdempotencyRecord: async (record) => { snapshot.idempotencyRecords.set(record.idempotencyKey, record); }, - appendSupervisorSignal: (supervisorSignal) => { + appendSupervisorSignal: async (supervisorSignal) => { snapshot.supervisorSignals.push(supervisorSignal); }, - appendAuditEvent: (auditEvent) => { + appendAuditEvent: async (auditEvent) => { snapshot.auditEvents.push(auditEvent); }, - appendOutboxEvent: (outboxEvent) => { + appendOutboxEvent: async (outboxEvent) => { snapshot.outboxEvents.push(outboxEvent); }, }; diff --git a/agentic-organization/packages/test-node.d.ts b/agentic-organization/packages/test-node.d.ts index 893420dde2..e61ffaae79 100644 --- a/agentic-organization/packages/test-node.d.ts +++ b/agentic-organization/packages/test-node.d.ts @@ -1,6 +1,6 @@ declare module "node:assert/strict" { export function deepEqual(actual: unknown, expected: unknown): void; - export function equal(actual: unknown, expected: unknown): void; + export function equal(actual: unknown, expected: unknown, message?: string): void; export function ok(value: unknown): asserts value; export function throws(action: () => void, expected?: RegExp): void; } @@ -9,3 +9,24 @@ declare module "node:test" { export function describe(name: string, fn: () => void): void; export function test(name: string, fn: () => void): void; } + +declare module "node:fs/promises" { + export type Dirent = { + name: string; + isDirectory: () => boolean; + isFile: () => boolean; + }; + + export function readdir(path: string, options: { withFileTypes: true }): Promise; + export function readFile(path: string, encoding: "utf8"): Promise; +} + +declare module "node:path" { + export const sep: string; + export function join(...paths: string[]): string; + export function relative(from: string, to: string): string; +} + +declare module "node:url" { + export function fileURLToPath(url: URL): string; +} diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 409129c64a..605dba6a45 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -46,12 +46,31 @@ Organization state only by calling Organization commands. directly - **AND** the pipeline does not use a central command-type switch for extensible command dispatch +- **AND** command-state-store operations are async so real persistence + adapters can perform I/O without changing command contracts + +#### Scenario: Package boundaries are checked + +- **WHEN** package dependency-boundary tests run +- **THEN** application source files are checked for forbidden imports of + concrete state adapters, Cockroach adapters, NestJS, NATS, Dapr, + Temporal, Drizzle, Postgres, or other runtime clients +- **AND** a violation fails the test suite before the boundary can drift ### Requirement: Commands are idempotent Organization commands MUST use deterministic idempotency keys at the command boundary. +#### Scenario: Cockroach core state schema exists + +- **WHEN** the first CockroachDB migration contract is loaded +- **THEN** it declares work item, supervisor signal, audit event, + outbox event, and idempotency record tables +- **AND** outbox rows include trace ID, correlation ID, and canonical + envelope JSON fields for later NATS publication and workflow + visibility + #### Scenario: Matching replay - **WHEN** a command is submitted twice with the same idempotency key From 47fc64ddfbfc14642dc60151ea51b7bdffc0e68a Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 17:57:27 -0400 Subject: [PATCH 09/21] feat(agentic-org): add NATS outbox publisher contract Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 29 ++-- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 25 ++- agentic-organization/packages/README.md | 18 ++- .../src/package-dependency-boundaries.test.ts | 5 + .../src/package-dependency-boundaries.ts | 1 + .../packages/messaging-nats/src/index.ts | 7 + .../nats-jetstream-event-publisher.test.ts | 103 +++++++++++++ .../src/nats-jetstream-event-publisher.ts | 49 ++++++ .../packages/messaging/src/index.ts | 15 ++ .../messaging/src/outbox-publisher.test.ts | 145 ++++++++++++++++++ .../messaging/src/outbox-publisher.ts | 114 ++++++++++++++ .../src/cockroach-outbox-event-source.test.ts | 107 +++++++++++++ .../src/cockroach-outbox-event-source.ts | 75 +++++++++ .../packages/state-cockroach/src/index.ts | 8 + openspec/specs/agentic-organization/spec.md | 23 +++ 15 files changed, 703 insertions(+), 21 deletions(-) create mode 100644 agentic-organization/packages/messaging-nats/src/index.ts create mode 100644 agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.test.ts create mode 100644 agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.ts create mode 100644 agentic-organization/packages/messaging/src/outbox-publisher.test.ts create mode 100644 agentic-organization/packages/messaging/src/outbox-publisher.ts create mode 100644 agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.test.ts create mode 100644 agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.ts diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index 69ae71783e..8e43283cf0 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -9,10 +9,10 @@ Implemented as a small NodeNext TypeScript package slice. This slice turns the first Agentic Organization runtime contract from architecture prose into executable TypeScript. -It does not introduce NestJS, CockroachDB, NATS clients, Temporal, -Dapr, Hermes, Hindsight, or Kubernetes deployment manifests yet. Those -remain adapter layers. The goal is to prove the Organization command -shape before adding distributed infrastructure. +It does not introduce NestJS, live NATS connections, Temporal, Dapr, +Hermes, Hindsight, or Kubernetes deployment manifests yet. Those remain +adapter layers. The goal is to prove the Organization command and event +publication shape before adding distributed infrastructure. The slice is intentionally generic. `send_supervisor_signal` is the coordination primitive; specific downstream outcomes are lifecycle @@ -29,6 +29,8 @@ send_supervisor_signal -> chain-of-command signal -> audit event -> outbox event with canonical event envelope + -> outbox publisher + -> NATS JetStream event publisher adapter -> NATS subject contract -> LGTM span attributes -> supervisor triage reaction plan @@ -41,8 +43,9 @@ send_supervisor_signal | `@agentic-org/domain` | event envelope, command/event constants, aggregate constants, supervisor-chain communication types, hat communication briefs, work item state machine, shared records | | `@agentic-org/application` | command pipeline, command-handler registry, state-store ports, idempotency conflict handling, supervisor signal handler | | `@agentic-org/state` | in-memory Organization state-store factory fake | -| `@agentic-org/state-cockroach` | CockroachDB state-store factory contract, SQL statement catalog, and first core-state migration skeleton | -| `@agentic-org/messaging` | stable `agentic-org....` subject builder | +| `@agentic-org/state-cockroach` | CockroachDB state-store/outbox-source contracts, SQL statement catalogs, and first core-state migration skeleton | +| `@agentic-org/messaging` | stable `agentic-org....` subject builder, outbox publisher, event publisher port, and typed domain resolver | +| `@agentic-org/messaging-nats` | NATS JetStream event publisher adapter contract with canonical JSON payloads, headers, and message IDs | | `@agentic-org/observability` | OpenTelemetry/LGTM span attribute projection | | `@agentic-org/runtime` | first rule that plans triage for the target supervisor when a chain signal is sent | | `@agentic-org/governance` | package dependency-boundary checks that prevent application code from importing concrete state/runtime adapters | @@ -102,6 +105,11 @@ Hermes runs, MCP calls, and UI evidence. - A governance test enforces that application code does not import the state adapter, Cockroach adapter, NestJS, NATS, Dapr, Temporal, Drizzle, or Postgres clients. +- The outbox publisher claims unpublished events, publishes each event + through an `EventPublisher` port, and marks rows published only after + the publish succeeds. +- The NATS adapter publishes canonical JSON envelopes with typed headers + and event IDs as message IDs for idempotent JetStream publication. - Duplicate commands with the same idempotency key and request hash replay the stored result. - Duplicate commands with the same idempotency key and a different @@ -113,11 +121,10 @@ Hermes runs, MCP calls, and UI evidence. ## Next Slice -The next slice should turn the CockroachDB adapter contract into a -transactional integration test once a local/dev Cockroach connection is -available, then add the NATS outbox publisher worker. The worker can -publish persisted outbox rows to JetStream and attach the same telemetry -attributes. +The next slice should add inbox/consumer dedupe before automation starts +performing side effects from NATS events. After that, wire the outbox +publisher into a worker host and add a transactional Cockroach +integration test once a local/dev Cockroach connection is available. Do not make the next slice a pile of bespoke request commands. Build the generic supervisor triage lifecycle first, then let specialized diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 2479f6bc94..23b382cc3c 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -164,8 +164,9 @@ calling them. | Package | Owns | | ---------------------------------------- | ----------------------------------------------------------------------------------------------- | | `@agentic-org/state` | Drizzle schema, migrations, repositories, transactions, outbox, inbox, idempotency, leases | -| `@agentic-org/state-cockroach` | CockroachDB implementation of state-store ports, SQL statement catalog, and migration contracts | +| `@agentic-org/state-cockroach` | CockroachDB implementation of state-store and outbox-source ports, SQL catalogs, migrations | | `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | +| `@agentic-org/messaging-nats` | NATS JetStream implementation of the event publisher port, canonical JSON, headers, message IDs | | `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | | `@agentic-org/actors-dapr` | Dapr actor interfaces, actor implementations, reminders, actor state projection | | `@agentic-org/mcp` | MCP schemas, tool registry, preflight checks, policy-checked tool handlers | @@ -344,6 +345,15 @@ type AgenticEventEnvelope = { No app should publish raw NATS payloads directly. Publishing should go through `@agentic-org/messaging`. +The generic outbox publisher should claim unpublished outbox events from +an `OutboxEventSource`, resolve the typed Organization messaging +domain, publish through an `EventPublisher` port, and mark the outbox +row published only after the publish succeeds. The NATS adapter is an +implementation of that port; it owns transport-specific concerns such as +headers, message IDs, and JSON serialization. This keeps the +Organization event loop extensible and testable without coupling the +publisher to the NATS client. + ### Event-to-Automation Contract Agentic Organization should behave like an event-driven operating system. @@ -761,6 +771,7 @@ other work. - `@agentic-org/state-cockroach`; - `@agentic-org/policy`; - `@agentic-org/messaging`; + - `@agentic-org/messaging-nats`; - `@agentic-org/observability`; - `@agentic-org/work-os`; - `@agentic-org/hats`; @@ -786,13 +797,15 @@ other work. 5. Use fake adapters for Hermes, Hindsight, Dapr, Temporal, and hat-system. 6. Add NATS outbox publisher and one consumer after command tests pass. -7. Add the first rule catalog and reaction executor for ready work, +7. Add inbox/consumer dedupe before any NATS-driven automation performs + side effects. +8. Add the first rule catalog and reaction executor for ready work, review staffing, QA staffing, blocker escalation, and late run incidents. -8. Add the NestJS API and worker hosts. -9. Add UI projections for work board, review center, and evidence - timeline. -10. Add real cluster adapters one at a time. +9. Add the NestJS API and worker hosts. +10. Add UI projections for work board, review center, and evidence + timeline. +11. Add real cluster adapters one at a time. ## Extraction Path diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md index 37a4f34893..fd0bbea7dc 100644 --- a/agentic-organization/packages/README.md +++ b/agentic-organization/packages/README.md @@ -2,8 +2,8 @@ These packages are the first executable slice of Agentic Organization. They are intentionally small and run as a NodeNext TypeScript island -before any NestJS host, CockroachDB adapter, NATS client, Temporal -worker, Dapr actor host, or Kubernetes deployment is introduced. +before any NestJS host, live NATS connection, Temporal worker, Dapr actor +host, or Kubernetes deployment is introduced. ## Package Boundary @@ -12,8 +12,9 @@ worker, Dapr actor host, or Kubernetes deployment is introduced. | `domain` | typed command names, event names, aggregate names, work item state machine, event envelope, shared records | | `application` | command pipeline, handler registry, idempotency handling, state-store ports, first supervisor-chain signal command handler | | `state` | in-memory Organization state-store factory used as the first repository port fake | -| `state-cockroach` | CockroachDB state-store factory contract, SQL statement catalog, and first core-state migration skeleton | -| `messaging` | NATS subject contract without a live NATS dependency | +| `state-cockroach` | CockroachDB state-store/outbox-source contracts, SQL statement catalogs, and first core-state migration skeleton | +| `messaging` | NATS subject contract, outbox publisher port, event publisher port, and domain resolver without a live NATS dependency | +| `messaging-nats` | NATS JetStream event publisher adapter contract with canonical JSON payloads, headers, and message IDs | | `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes | | `runtime` | first event-to-automation reaction rule | | `governance` | package dependency-boundary checks that keep core packages SOLID and adapter-free | @@ -28,6 +29,8 @@ supervisor-chain signal command -> chain communication record -> audit event -> outbox event + -> outbox publisher + -> NATS JetStream event publisher adapter -> NATS subject / telemetry contract -> automation reaction plan ``` @@ -47,6 +50,13 @@ central `switch` or `if` dispatcher. real CockroachDB adapter must not be squeezed into a synchronous toy shape. +The outbox publisher owns the generic publish loop: claim unpublished +outbox rows, resolve a typed Organization domain, publish through an +`EventPublisher` port, and mark the outbox row published only after the +publish returns successfully. The NATS package implements that port and +is the only package in this slice that knows about NATS headers, +message IDs, and JSON transport payloads. + ## Validation Run the package tests from `agentic-organization/`: diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts index c954529300..b9ae9bdd1e 100644 --- a/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts @@ -25,6 +25,11 @@ describe("package dependency boundaries", () => { "postgres", ], }, + { + packageName: PackageBoundaryRule.Messaging, + sourceGlob: "messaging/src/**/*.ts", + forbiddenImportFragments: ["../messaging-nats", "../../messaging-nats", "nats"], + }, ], }); diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts index 01df149e12..08bfb7386a 100644 --- a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts @@ -4,6 +4,7 @@ import { fileURLToPath } from "node:url"; export const PackageBoundaryRule = { Application: "application", + Messaging: "messaging", } as const; export type PackageBoundaryRule = (typeof PackageBoundaryRule)[keyof typeof PackageBoundaryRule]; diff --git a/agentic-organization/packages/messaging-nats/src/index.ts b/agentic-organization/packages/messaging-nats/src/index.ts new file mode 100644 index 0000000000..9bc7e8466b --- /dev/null +++ b/agentic-organization/packages/messaging-nats/src/index.ts @@ -0,0 +1,7 @@ +export { + NatsHeaderName, + createNatsJetStreamEventPublisher, + type CreateNatsJetStreamEventPublisherInput, + type NatsJetStreamClient, + type NatsJetStreamMessage, +} from "./nats-jetstream-event-publisher.ts"; diff --git a/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.test.ts b/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.test.ts new file mode 100644 index 0000000000..b565bdc65f --- /dev/null +++ b/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.test.ts @@ -0,0 +1,103 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + createAgenticEventEnvelope, + type OutboxEvent, +} from "../../domain/src/index.ts"; +import { + createNatsJetStreamEventPublisher, + NatsHeaderName, + type NatsJetStreamClient, +} from "./nats-jetstream-event-publisher.ts"; + +describe("NATS JetStream event publisher", () => { + test("publishes canonical JSON with idempotent headers and message ID", async () => { + const client = createRecordingNatsClient(); + const publisher = createNatsJetStreamEventPublisher({ + client, + }); + const outboxEvent = createOutboxEvent(); + + await publisher.publish({ + subject: "agentic-org.local.org-lfg.supervisor_signal.supervisor_signal.sent", + outboxEvent, + }); + + equal(client.messages.length, 1); + deepEqual(client.messages[0], { + subject: "agentic-org.local.org-lfg.supervisor_signal.supervisor_signal.sent", + payload: JSON.stringify(outboxEvent.envelope), + messageId: "evt-001", + headers: { + [NatsHeaderName.EventId]: "evt-001", + [NatsHeaderName.EventType]: AgenticEventType.SupervisorSignalSent, + [NatsHeaderName.CorrelationId]: "corr-001", + [NatsHeaderName.CausationId]: "cause-001", + [NatsHeaderName.TraceId]: "trace-001", + [NatsHeaderName.IdempotencyKey]: "idem-001", + [NatsHeaderName.OutboxEventId]: "outbox-001", + }, + }); + }); +}); + +function createRecordingNatsClient(): NatsJetStreamClient & { + messages: { + subject: string; + payload: string; + messageId: string; + headers: Record; + }[]; +} { + const messages: { + subject: string; + payload: string; + messageId: string; + headers: Record; + }[] = []; + + return { + messages, + publish: async (message) => { + messages.push(message); + }, + }; +} + +function createOutboxEvent(): OutboxEvent { + return { + outboxEventId: "outbox-001", + envelope: createAgenticEventEnvelope({ + eventId: "evt-001", + eventType: AgenticEventType.SupervisorSignalSent, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-001", + correlationId: "corr-001", + causationId: "cause-001", + traceId: "trace-001", + idempotencyKey: "idem-001", + }, + payload: { + title: "Blocked on scoped NATS publisher", + }, + }), + }; +} diff --git a/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.ts b/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.ts new file mode 100644 index 0000000000..ec5a24b518 --- /dev/null +++ b/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.ts @@ -0,0 +1,49 @@ +import type { EventPublisher } from "../../messaging/src/index.ts"; + +export const NatsHeaderName = { + EventId: "Nats-Msg-Event-Id", + EventType: "Nats-Msg-Event-Type", + CorrelationId: "Nats-Msg-Correlation-Id", + CausationId: "Nats-Msg-Causation-Id", + TraceId: "Nats-Msg-Trace-Id", + IdempotencyKey: "Nats-Msg-Idempotency-Key", + OutboxEventId: "Nats-Msg-Outbox-Event-Id", +} as const; + +export type NatsHeaderName = (typeof NatsHeaderName)[keyof typeof NatsHeaderName]; + +export type NatsJetStreamMessage = { + subject: string; + payload: string; + messageId: string; + headers: Record; +}; + +export type NatsJetStreamClient = { + publish: (message: NatsJetStreamMessage) => Promise; +}; + +export type CreateNatsJetStreamEventPublisherInput = { + client: NatsJetStreamClient; +}; + +export function createNatsJetStreamEventPublisher(input: CreateNatsJetStreamEventPublisherInput): EventPublisher { + return { + publish: async (publication) => { + await input.client.publish({ + subject: publication.subject, + payload: JSON.stringify(publication.outboxEvent.envelope), + messageId: publication.outboxEvent.envelope.eventId, + headers: { + [NatsHeaderName.EventId]: publication.outboxEvent.envelope.eventId, + [NatsHeaderName.EventType]: publication.outboxEvent.envelope.eventType, + [NatsHeaderName.CorrelationId]: publication.outboxEvent.envelope.trace.correlationId, + [NatsHeaderName.CausationId]: publication.outboxEvent.envelope.trace.causationId, + [NatsHeaderName.TraceId]: publication.outboxEvent.envelope.trace.traceId, + [NatsHeaderName.IdempotencyKey]: publication.outboxEvent.envelope.trace.idempotencyKey, + [NatsHeaderName.OutboxEventId]: publication.outboxEvent.outboxEventId, + }, + }); + }, + }; +} diff --git a/agentic-organization/packages/messaging/src/index.ts b/agentic-organization/packages/messaging/src/index.ts index 134ca99eb0..2cb19ab88e 100644 --- a/agentic-organization/packages/messaging/src/index.ts +++ b/agentic-organization/packages/messaging/src/index.ts @@ -1 +1,16 @@ export { AgenticSubjectPrefix, buildAgenticEventSubject, type AgenticEventSubjectInput } from "./subject-builder.ts"; +export { + AgenticMessagingDomain, + OutboxPublishOutcomeStatus, + createOutboxPublisher, + resolveAgenticMessagingDomain, + type ClaimUnpublishedOutboxEventsInput, + type CreateOutboxPublisherInput, + type EventPublication, + type EventPublisher, + type MarkOutboxEventPublishedInput, + type OutboxEventSource, + type OutboxPublishBatchResult, + type OutboxPublisher, + type ResolveAgenticMessagingDomain, +} from "./outbox-publisher.ts"; diff --git a/agentic-organization/packages/messaging/src/outbox-publisher.test.ts b/agentic-organization/packages/messaging/src/outbox-publisher.test.ts new file mode 100644 index 0000000000..4de151fdc3 --- /dev/null +++ b/agentic-organization/packages/messaging/src/outbox-publisher.test.ts @@ -0,0 +1,145 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + createAgenticEventEnvelope, + type OutboxEvent, +} from "../../domain/src/index.ts"; +import { + AgenticMessagingDomain, + OutboxPublishOutcomeStatus, + createOutboxPublisher, + resolveAgenticMessagingDomain, + type EventPublication, + type EventPublisher, + type OutboxEventSource, +} from "./outbox-publisher.ts"; + +describe("outbox publisher", () => { + test("resolves event domains through typed mappings", () => { + deepEqual( + resolveAgenticMessagingDomain(AgenticEventType.SupervisorSignalSent), + AgenticMessagingDomain.SupervisorSignal, + ); + }); + + test("publishes unpublished outbox events and marks them published", async () => { + const outboxEvent = createOutboxEvent(); + const outboxSource = createRecordingOutboxSource([outboxEvent]); + const eventPublisher = createRecordingEventPublisher(); + const publisher = createOutboxPublisher({ + outboxSource, + eventPublisher, + environment: "local", + resolveDomain: resolveAgenticMessagingDomain, + now: () => "2026-05-25T21:00:00.000Z", + }); + + const result = await publisher.publishNextBatch({ + batchSize: 10, + }); + + deepEqual(result, { + status: OutboxPublishOutcomeStatus.Published, + attemptedCount: 1, + publishedOutboxEventIds: ["outbox-001"], + }); + deepEqual(outboxSource.markedPublished, [ + { + outboxEventId: "outbox-001", + publishedAt: "2026-05-25T21:00:00.000Z", + }, + ]); + deepEqual(eventPublisher.publications, [ + { + subject: "agentic-org.local.org-lfg.supervisor_signal.supervisor_signal.sent", + outboxEvent, + }, + ]); + }); + + test("returns empty when there is no work to publish", async () => { + const outboxSource = createRecordingOutboxSource([]); + const eventPublisher = createRecordingEventPublisher(); + const publisher = createOutboxPublisher({ + outboxSource, + eventPublisher, + environment: "local", + resolveDomain: resolveAgenticMessagingDomain, + now: () => "2026-05-25T21:00:00.000Z", + }); + + const result = await publisher.publishNextBatch({ + batchSize: 10, + }); + + equal(result.status, OutboxPublishOutcomeStatus.Empty); + equal(eventPublisher.publications.length, 0); + equal(outboxSource.markedPublished.length, 0); + }); +}); + +function createRecordingOutboxSource(outboxEvents: OutboxEvent[]): OutboxEventSource & { + markedPublished: { outboxEventId: string; publishedAt: string }[]; +} { + const markedPublished: { outboxEventId: string; publishedAt: string }[] = []; + + return { + markedPublished, + claimUnpublishedOutboxEvents: async () => outboxEvents, + markOutboxEventPublished: async (input) => { + markedPublished.push(input); + }, + }; +} + +function createRecordingEventPublisher(): EventPublisher & { + publications: EventPublication[]; +} { + const publications: EventPublication[] = []; + + return { + publications, + publish: async (publication) => { + publications.push(publication); + }, + }; +} + +function createOutboxEvent(): OutboxEvent { + return { + outboxEventId: "outbox-001", + envelope: createAgenticEventEnvelope({ + eventId: "evt-001", + eventType: AgenticEventType.SupervisorSignalSent, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-001", + correlationId: "corr-001", + causationId: "cause-001", + traceId: "trace-001", + idempotencyKey: "idem-001", + }, + payload: { + title: "Blocked on scoped NATS publisher", + }, + }), + }; +} diff --git a/agentic-organization/packages/messaging/src/outbox-publisher.ts b/agentic-organization/packages/messaging/src/outbox-publisher.ts new file mode 100644 index 0000000000..0897a35274 --- /dev/null +++ b/agentic-organization/packages/messaging/src/outbox-publisher.ts @@ -0,0 +1,114 @@ +import { AgenticEventType, type OutboxEvent } from "../../domain/src/index.ts"; +import { buildAgenticEventSubject } from "./subject-builder.ts"; + +export const AgenticMessagingDomain = { + SupervisorSignal: "supervisor_signal", + WorkItem: "work_item", +} as const; + +export type AgenticMessagingDomain = (typeof AgenticMessagingDomain)[keyof typeof AgenticMessagingDomain]; + +export const OutboxPublishOutcomeStatus = { + Empty: "empty", + Published: "published", +} as const; + +export type OutboxPublishOutcomeStatus = (typeof OutboxPublishOutcomeStatus)[keyof typeof OutboxPublishOutcomeStatus]; + +export type ClaimUnpublishedOutboxEventsInput = { + batchSize: number; +}; + +export type MarkOutboxEventPublishedInput = { + outboxEventId: string; + publishedAt: string; +}; + +export type OutboxEventSource = { + claimUnpublishedOutboxEvents: (input: ClaimUnpublishedOutboxEventsInput) => Promise; + markOutboxEventPublished: (input: MarkOutboxEventPublishedInput) => Promise; +}; + +export type EventPublication = { + subject: string; + outboxEvent: OutboxEvent; +}; + +export type EventPublisher = { + publish: (publication: EventPublication) => Promise; +}; + +export type ResolveAgenticMessagingDomain = (eventType: AgenticEventType) => AgenticMessagingDomain; + +export type OutboxPublisher = { + publishNextBatch: (input: ClaimUnpublishedOutboxEventsInput) => Promise; +}; + +export type OutboxPublishBatchResult = { + status: OutboxPublishOutcomeStatus; + attemptedCount: number; + publishedOutboxEventIds: string[]; +}; + +export type CreateOutboxPublisherInput = { + outboxSource: OutboxEventSource; + eventPublisher: EventPublisher; + environment: string; + resolveDomain: ResolveAgenticMessagingDomain; + now: () => string; +}; + +export function createOutboxPublisher(input: CreateOutboxPublisherInput): OutboxPublisher { + return { + publishNextBatch: async (publishInput) => { + const outboxEvents = await input.outboxSource.claimUnpublishedOutboxEvents(publishInput); + + if (outboxEvents.length === 0) { + return { + status: OutboxPublishOutcomeStatus.Empty, + attemptedCount: 0, + publishedOutboxEventIds: [], + }; + } + + const publishedOutboxEventIds: string[] = []; + + for (const outboxEvent of outboxEvents) { + const subject = buildAgenticEventSubject({ + environment: input.environment, + organizationId: outboxEvent.envelope.scope.organizationId, + domain: input.resolveDomain(outboxEvent.envelope.eventType), + eventType: outboxEvent.envelope.eventType, + }); + + await input.eventPublisher.publish({ + subject, + outboxEvent, + }); + await input.outboxSource.markOutboxEventPublished({ + outboxEventId: outboxEvent.outboxEventId, + publishedAt: input.now(), + }); + publishedOutboxEventIds.push(outboxEvent.outboxEventId); + } + + return { + status: OutboxPublishOutcomeStatus.Published, + attemptedCount: outboxEvents.length, + publishedOutboxEventIds, + }; + }, + }; +} + +export function resolveAgenticMessagingDomain(eventType: AgenticEventType): AgenticMessagingDomain { + if (eventType === AgenticEventType.SupervisorSignalSent) { + return AgenticMessagingDomain.SupervisorSignal; + } + + if (eventType === AgenticEventType.WorkItemChanged || eventType === AgenticEventType.WorkItemStateChanged) { + return AgenticMessagingDomain.WorkItem; + } + + throw new Error(`unsupported event type for messaging domain: ${eventType}`); +} diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.test.ts b/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.test.ts new file mode 100644 index 0000000000..a5e6eda061 --- /dev/null +++ b/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.test.ts @@ -0,0 +1,107 @@ +import { deepEqual } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { AgenticAggregateType, AgenticEventType, type AgenticEventEnvelope } from "../../domain/src/index.ts"; +import { + CockroachOutboxEventSourceStatement, + createCockroachOutboxEventSource, + type CockroachOutboxSqlExecutor, + type CockroachOutboxSqlStatement, +} from "./cockroach-outbox-event-source.ts"; + +describe("cockroach outbox event source", () => { + test("claims unpublished outbox events and marks them published", async () => { + const executor = createRecordingExecutor(); + const outboxSource = createCockroachOutboxEventSource({ + executor, + }); + + const outboxEvents = await outboxSource.claimUnpublishedOutboxEvents({ + batchSize: 10, + }); + await outboxSource.markOutboxEventPublished({ + outboxEventId: "outbox-001", + publishedAt: "2026-05-25T21:00:00.000Z", + }); + + deepEqual(outboxEvents, [ + { + outboxEventId: "outbox-001", + envelope: createEnvelope(), + }, + ]); + deepEqual( + executor.statements.map((statement) => statement.name), + [ + CockroachOutboxEventSourceStatement.ClaimUnpublishedOutboxEvents, + CockroachOutboxEventSourceStatement.MarkOutboxEventPublished, + ], + ); + }); +}); + +type RecordingCockroachOutboxSqlExecutor = CockroachOutboxSqlExecutor & { + statements: { name: CockroachOutboxEventSourceStatement; parameters: readonly unknown[] }[]; +}; + +function createRecordingExecutor(): RecordingCockroachOutboxSqlExecutor { + const statements: { name: CockroachOutboxEventSourceStatement; parameters: readonly unknown[] }[] = []; + + return { + statements, + execute: async >(statement: CockroachOutboxSqlStatement) => { + statements.push(statement); + + if (statement.name === CockroachOutboxEventSourceStatement.ClaimUnpublishedOutboxEvents) { + return { + rows: [ + { + outbox_event_id: "outbox-001", + envelope_json: createEnvelope(), + }, + ] as Row[], + }; + } + + return { + rows: [], + }; + }, + }; +} + +function createEnvelope(): AgenticEventEnvelope { + return { + eventId: "evt-001", + eventType: AgenticEventType.SupervisorSignalSent, + schemaVersion: "agentic.org.event.v1", + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-001", + correlationId: "corr-001", + causationId: "cause-001", + traceId: "trace-001", + idempotencyKey: "idem-001", + }, + replay: { + isReplay: false, + }, + payload: { + title: "Blocked on scoped NATS publisher", + }, + }; +} diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.ts b/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.ts new file mode 100644 index 0000000000..28b74733ba --- /dev/null +++ b/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.ts @@ -0,0 +1,75 @@ +import type { AgenticEventEnvelope } from "../../domain/src/index.ts"; +import type { OutboxEventSource } from "../../messaging/src/index.ts"; +import { CockroachTableName } from "./cockroach-schema.ts"; + +export const CockroachOutboxEventSourceStatement = { + ClaimUnpublishedOutboxEvents: "claim_unpublished_outbox_events", + MarkOutboxEventPublished: "mark_outbox_event_published", +} as const; + +export type CockroachOutboxEventSourceStatement = + (typeof CockroachOutboxEventSourceStatement)[keyof typeof CockroachOutboxEventSourceStatement]; + +export type CockroachOutboxSqlStatement = { + name: CockroachOutboxEventSourceStatement; + sql: string; + parameters: readonly unknown[]; +}; + +export type CockroachOutboxSqlResult> = { + rows: readonly Row[]; +}; + +export type CockroachOutboxSqlExecutor = { + execute: >( + statement: CockroachOutboxSqlStatement, + ) => Promise>; +}; + +export type CreateCockroachOutboxEventSourceInput = { + executor: CockroachOutboxSqlExecutor; +}; + +export function createCockroachOutboxEventSource(input: CreateCockroachOutboxEventSourceInput): OutboxEventSource { + return { + claimUnpublishedOutboxEvents: async (claimInput) => { + const result = await input.executor.execute({ + name: CockroachOutboxEventSourceStatement.ClaimUnpublishedOutboxEvents, + sql: CockroachOutboxEventSourceSql.ClaimUnpublishedOutboxEvents, + parameters: [claimInput.batchSize], + }); + + return result.rows.map((row) => ({ + outboxEventId: row.outbox_event_id, + envelope: row.envelope_json, + })); + }, + markOutboxEventPublished: async (markInput) => { + await input.executor.execute({ + name: CockroachOutboxEventSourceStatement.MarkOutboxEventPublished, + sql: CockroachOutboxEventSourceSql.MarkOutboxEventPublished, + parameters: [markInput.outboxEventId, markInput.publishedAt], + }); + }, + }; +} + +type OutboxEventRow = { + outbox_event_id: string; + envelope_json: AgenticEventEnvelope; +}; + +const CockroachOutboxEventSourceSql = { + ClaimUnpublishedOutboxEvents: ` + SELECT outbox_event_id, envelope_json + FROM ${CockroachTableName.OutboxEvents} + WHERE published_at IS NULL + ORDER BY outbox_event_id + LIMIT $1 + `, + MarkOutboxEventPublished: ` + UPDATE ${CockroachTableName.OutboxEvents} + SET published_at = $2 + WHERE outbox_event_id = $1 + `, +} as const; diff --git a/agentic-organization/packages/state-cockroach/src/index.ts b/agentic-organization/packages/state-cockroach/src/index.ts index c98453bb3a..bf795c6d1b 100644 --- a/agentic-organization/packages/state-cockroach/src/index.ts +++ b/agentic-organization/packages/state-cockroach/src/index.ts @@ -6,6 +6,14 @@ export { type CockroachSqlStatement, type CreateCockroachCommandStateStoreFactoryInput, } from "./cockroach-command-state-store.ts"; +export { + CockroachOutboxEventSourceStatement, + createCockroachOutboxEventSource, + type CockroachOutboxSqlExecutor, + type CockroachOutboxSqlResult, + type CockroachOutboxSqlStatement, + type CreateCockroachOutboxEventSourceInput, +} from "./cockroach-outbox-event-source.ts"; export { CockroachCoreStateMigrationName, CockroachTableName, diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 605dba6a45..71f30853f1 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -142,6 +142,29 @@ Organization NATS subjects MUST use a stable organization-scoped shape. - **THEN** the subject shape is `agentic-org....` +### Requirement: Outbox publisher is idempotent and adapter-backed + +Organization outbox publication MUST be driven by a generic publisher +and a concrete event-publisher adapter. + +#### Scenario: Outbox event is published + +- **WHEN** unpublished outbox events are claimed +- **THEN** the publisher resolves the typed Organization messaging + domain and builds the stable NATS subject +- **AND** the publisher sends the event through an `EventPublisher` port +- **AND** the outbox row is marked published only after the publish + succeeds + +#### Scenario: NATS adapter publishes event + +- **WHEN** the NATS JetStream adapter publishes an event publication +- **THEN** it sends the canonical event envelope as JSON +- **AND** it uses the event ID as the message ID +- **AND** it includes typed headers for event ID, event type, + correlation ID, causation ID, trace ID, idempotency key, and outbox + event ID + ### Requirement: Telemetry is complete at the event boundary Organization packages MUST expose OpenTelemetry-compatible attributes From 16417aecba9eae921bba0e363c2f5f9c4d21bbe7 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 18:06:57 -0400 Subject: [PATCH 10/21] refactor(agentic-org): keep durable state adapter replaceable Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 18 +- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 44 ++-- .../docs/V0_EXECUTABLE_CONTRACT.md | 51 ++--- .../docs/V0_SCHEMA_AND_COMMANDS.md | 210 +++++++++--------- agentic-organization/packages/README.md | 26 ++- .../src/package-dependency-boundaries.test.ts | 5 + .../src/package-dependency-boundaries.ts | 1 + .../packages/messaging/src/index.ts | 3 - .../messaging/src/outbox-publisher.test.ts | 2 +- .../messaging/src/outbox-publisher.ts | 15 +- .../src/cockroach-outbox-event-source.ts | 2 +- .../packages/state/src/index.ts | 5 + .../packages/state/src/outbox-event-source.ts | 15 ++ openspec/specs/agentic-organization/spec.md | 15 +- 14 files changed, 225 insertions(+), 187 deletions(-) create mode 100644 agentic-organization/packages/state/src/outbox-event-source.ts diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index 8e43283cf0..9f8e615eba 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -42,8 +42,8 @@ send_supervisor_signal | ------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `@agentic-org/domain` | event envelope, command/event constants, aggregate constants, supervisor-chain communication types, hat communication briefs, work item state machine, shared records | | `@agentic-org/application` | command pipeline, command-handler registry, state-store ports, idempotency conflict handling, supervisor signal handler | -| `@agentic-org/state` | in-memory Organization state-store factory fake | -| `@agentic-org/state-cockroach` | CockroachDB state-store/outbox-source contracts, SQL statement catalogs, and first core-state migration skeleton | +| `@agentic-org/state` | generic state-store/outbox-source ports plus the in-memory Organization state-store factory fake | +| `@agentic-org/state-cockroach` | first replaceable durable SQL implementation of the state-store/outbox-source ports, backed by CockroachDB | | `@agentic-org/messaging` | stable `agentic-org....` subject builder, outbox publisher, event publisher port, and typed domain resolver | | `@agentic-org/messaging-nats` | NATS JetStream event publisher adapter contract with canonical JSON payloads, headers, and message IDs | | `@agentic-org/observability` | OpenTelemetry/LGTM span attribute projection | @@ -99,12 +99,15 @@ Hermes runs, MCP calls, and UI evidence. - The command pipeline receives state-store factories and command handlers through ports instead of constructing in-memory adapters or branching on command types. -- State-store ports are async from the beginning so CockroachDB, - NATS-backed workers, and other real adapters do not inherit a fake - synchronous shape. +- State-store and outbox-source ports are async from the beginning so + durable SQL, NATS-backed workers, and other real adapters do not + inherit a fake synchronous shape. - A governance test enforces that application code does not import the state adapter, Cockroach adapter, NestJS, NATS, Dapr, Temporal, Drizzle, or Postgres clients. +- A governance test enforces that the Cockroach state adapter does not + import messaging, NATS, or JetStream. Durable state can be swapped + without dragging transport concerns into the repository layer. - The outbox publisher claims unpublished events, publishes each event through an `EventPublisher` port, and marks rows published only after the publish succeeds. @@ -123,8 +126,9 @@ Hermes runs, MCP calls, and UI evidence. The next slice should add inbox/consumer dedupe before automation starts performing side effects from NATS events. After that, wire the outbox -publisher into a worker host and add a transactional Cockroach -integration test once a local/dev Cockroach connection is available. +publisher into a worker host and add a transactional durable-state +adapter integration test using CockroachDB as the first cluster-backed +implementation once a local/dev connection is available. Do not make the next slice a pile of bespoke request commands. Build the generic supervisor triage lifecycle first, then let specialized diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 23b382cc3c..5462e2b839 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -55,7 +55,8 @@ The Organization owns: The cluster provides: -- CockroachDB for authoritative Organization state; +- CockroachDB as the first durable SQL adapter for authoritative + Organization state; - NATS JetStream for event transport, fanout, inboxes, replay, and DLQ; - Temporal TS for durable long-running workflows; - Dapr Actors for hot entity-local coordination; @@ -77,7 +78,7 @@ Runtime host -> command handler registry -> policy check -> domain state transition - -> CockroachDB transaction + -> durable state transaction through the state adapter -> authoritative state -> audit event -> outbox event @@ -163,8 +164,8 @@ calling them. | Package | Owns | | ---------------------------------------- | ----------------------------------------------------------------------------------------------- | -| `@agentic-org/state` | Drizzle schema, migrations, repositories, transactions, outbox, inbox, idempotency, leases | -| `@agentic-org/state-cockroach` | CockroachDB implementation of state-store and outbox-source ports, SQL catalogs, migrations | +| `@agentic-org/state` | generic state-store, outbox-source, inbox, idempotency, transaction, and lease ports | +| `@agentic-org/state-cockroach` | first replaceable durable SQL implementation of state-store and outbox-source ports | | `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | | `@agentic-org/messaging-nats` | NATS JetStream implementation of the event publisher port, canonical JSON, headers, message IDs | | `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | @@ -241,9 +242,10 @@ commands should register a handler; new persistence backends should implement the same store-factory port. State-store ports are async at the application boundary. In-memory -adapters may resolve immediately, but CockroachDB, transactional -outbox, inbox, and lease adapters must be able to perform real I/O -without changing command-handler contracts. +adapters may resolve immediately, but durable SQL, transactional outbox, +inbox, and lease adapters must be able to perform real I/O without +changing command-handler contracts. CockroachDB is the first durable SQL +adapter in the cluster, not an application-layer dependency. ## SOLID Rules @@ -282,10 +284,11 @@ implement it. Runtime hosts bind implementations. All state transitions are event-producing commands. -CockroachDB stores authoritative state, audit, idempotency, and outbox. -NATS JetStream carries event distribution, inboxes, live UI updates, -replayable integration streams, and DLQs. Logs, traces, and metrics are -evidence. They are not business truth. +The durable state adapter stores authoritative state, audit, +idempotency, and outbox. In `full-ai-cluster`, the first implementation +is CockroachDB. NATS JetStream carries event distribution, inboxes, live +UI updates, replayable integration streams, and DLQs. Logs, traces, and +metrics are evidence. They are not business truth. ### Canonical Event Envelope @@ -589,7 +592,7 @@ tree. | Adapter package | Cluster dependency | Expected in-cluster target | | --------------------------------- | ------------------------------------------------ | --------------------------------------------------------------------- | -| `@agentic-org/state` | CockroachDB ArgoCD app | `cockroachdb-public.cockroachdb.svc.cluster.local:26257` | +| `@agentic-org/state-cockroach` | CockroachDB ArgoCD app | `cockroachdb-public.cockroachdb.svc.cluster.local:26257` | | `@agentic-org/messaging` | NATS ArgoCD app with JetStream enabled | `nats.nats.svc.cluster.local:4222` | | `@agentic-org/workflows-temporal` | Temporal ArgoCD app | `temporal-frontend.temporal.svc.cluster.local:7233` | | `@agentic-org/actors-dapr` | Dapr control plane | Dapr sidecar plus `dapr-system` placement service | @@ -678,7 +681,8 @@ They should get the narrowest network policy and credential scope first. Current cluster readiness: -- CockroachDB exists as the distributed SQL substrate. +- CockroachDB exists as the first distributed SQL substrate. It is + consumed only through the durable state adapter boundary. - NATS exists with JetStream enabled and Longhorn-backed file storage. - Temporal and Dapr are present, but their Organization-specific persistence/components still need wiring. @@ -700,11 +704,11 @@ each substrate becomes live. The same package architecture should run in three modes: -| Mode | Purpose | Runtime adapters | -| ----------------- | ------------------------------------------ | ------------------------------------------------------------------------------- | -| unit/test | package and command tests | in-memory/fake adapters | -| local dev cluster | k3d/K3S parity with `full-ai-cluster` apps | real NATS/Cockroach when available, fake Hermes/hat-system if needed | -| full cluster | production-like AI cluster | real CockroachDB, NATS, Hindsight, Hermes, OpenZiti, hat-system, Temporal, Dapr | +| Mode | Purpose | Runtime adapters | +| ----------------- | ------------------------------------------ | ----------------------------------------------------------------------------------------------- | +| unit/test | package and command tests | in-memory/fake adapters | +| local dev cluster | k3d/K3S parity with `full-ai-cluster` apps | real NATS/durable SQL when available, fake Hermes/hat-system if needed | +| full cluster | production-like AI cluster | CockroachDB-backed state adapter, NATS, Hindsight, Hermes, OpenZiti, hat-system, Temporal, Dapr | Do not create a Docker Compose architecture that diverges from `full-ai-cluster`. Local development can use fakes or a dev cluster, but @@ -781,8 +785,8 @@ other work. - `@agentic-org/ui-projections`. 2. Implement the canonical command context, event envelope, typed enums, and idempotency key builder. -3. Implement the first CockroachDB schema and Drizzle migrations for the - V0 executable contract. +3. Implement the first durable SQL schema and migrations for the V0 + executable contract, using CockroachDB as the initial adapter. 4. Implement command handlers for: - send supervisor signal; - triage supervisor signal; diff --git a/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md b/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md index 43fd25c498..957f6b87c3 100644 --- a/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md +++ b/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md @@ -18,20 +18,20 @@ not a parallel substrate. The current `origin/main` cluster shape gives V0 these host primitives: -| Cluster component | V0 use | -|---|---| -| K3S + ArgoCD App-of-Apps | deploy Agentic Organization as a future `full-ai-cluster/k8s/applications/agentic-organization/` application | -| Cilium + Hubble | pod networking, L7 policy, flow observability, and service-mesh behavior without Istio | -| cert-manager, Vault, SPIRE, Trust Manager, External Secrets | workload identity, TLS trust, and secret delivery | -| CockroachDB | authoritative Organization database | -| NATS JetStream | event transport, outbox fanout, live UI updates, and replayable integration streams | -| Temporal TS | durable workflows after the native command model is proven | -| Dapr Actors | hot entity coordination after the DB-backed service contract is proven | -| Hindsight | Hermes memory backend, wrapped with Organization attribution and scope | -| Hermes | agent runtime that performs the work | -| OZ/OpenZiti | zero-trust transport, not the Organization business orchestrator | -| hat-system | Kubernetes hat enforcement/projection surface using Hat, HatBinding, HatSwap, and HatPolicy CRDs | -| Loki, Tempo, Alloy, Mimir, kube-prometheus-stack | logs, traces, metrics, dashboards, and audit correlation | +| Cluster component | V0 use | +| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | +| K3S + ArgoCD App-of-Apps | deploy Agentic Organization as a future `full-ai-cluster/k8s/applications/agentic-organization/` application | +| Cilium + Hubble | pod networking, L7 policy, flow observability, and service-mesh behavior without Istio | +| cert-manager, Vault, SPIRE, Trust Manager, External Secrets | workload identity, TLS trust, and secret delivery | +| CockroachDB | first durable SQL adapter for the authoritative Organization database boundary | +| NATS JetStream | event transport, outbox fanout, live UI updates, and replayable integration streams | +| Temporal TS | durable workflows after the native command model is proven | +| Dapr Actors | hot entity coordination after the DB-backed service contract is proven | +| Hindsight | Hermes memory backend, wrapped with Organization attribution and scope | +| Hermes | agent runtime that performs the work | +| OZ/OpenZiti | zero-trust transport, not the Organization business orchestrator | +| hat-system | Kubernetes hat enforcement/projection surface using Hat, HatBinding, HatSwap, and HatPolicy CRDs | +| Loki, Tempo, Alloy, Mimir, kube-prometheus-stack | logs, traces, metrics, dashboards, and audit correlation | Sync-wave implication: Agentic Organization is a consumer app. It should land after the foundation, data planes, hat-system CRDs, Hindsight, @@ -89,15 +89,15 @@ This is the smallest useful loop because it proves: Keep the first hat set small: -| Hat | V0 reason | -|---|---| -| Director | accepts or rejects escalated supervisor signals or capability requests for V0 scope | -| Engineering Manager | grooms the work item, selects schedule, assigns implementer and reviewer hats | -| Implementer | executes the prompt flow and submits evidence | -| Code Reviewer | reviews the evidence and blocks self-approval | -| Memory Curator | reviews memory writes or flags memory gaps when the run ends | -| Platform Operator | handles runtime failure, pod/session issues, and integration health | -| Security Reviewer | required only when the request needs a new credential or external tool scope | +| Hat | V0 reason | +| ------------------- | ----------------------------------------------------------------------------------- | +| Director | accepts or rejects escalated supervisor signals or capability requests for V0 scope | +| Engineering Manager | grooms the work item, selects schedule, assigns implementer and reviewer hats | +| Implementer | executes the prompt flow and submits evidence | +| Code Reviewer | reviews the evidence and blocks self-approval | +| Memory Curator | reviews memory writes or flags memory gaps when the run ends | +| Platform Operator | handles runtime failure, pod/session issues, and integration health | +| Security Reviewer | required only when the request needs a new credential or external tool scope | The Executive Board, TPM, Product Owner, Architect, QA Reviewer, Hat Designer, and department directors remain first-class in the reference @@ -196,7 +196,7 @@ the native service layer: ```text Temporal workflow or Dapr actor -> Organization command service - -> CockroachDB transaction + -> durable state transaction through the state adapter -> outbox event -> NATS publish -> trace, log, metric @@ -234,7 +234,8 @@ request -> ready gate -> hat assignment -> prompt-flow run The demo must show: -- CockroachDB state for the work item and assignment; +- durable state for the work item and assignment, backed by CockroachDB + in the first cluster adapter; - NATS/outbox events for every transition; - a discussion anchor tied to the work item; - a hat token with expiry; diff --git a/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md b/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md index ebc4de2c6c..e305c3148f 100644 --- a/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md +++ b/agentic-organization/docs/V0_SCHEMA_AND_COMMANDS.md @@ -7,27 +7,30 @@ contract for Agentic Organization. It is not a full DDL. It is the shape the domain model, Drizzle migrations, command handlers, MCP tools, workers, and tests should agree on before implementation starts. -CockroachDB is the authoritative store for Organization-owned state. +The durable state adapter is the authoritative store for +Organization-owned state. CockroachDB is the first implementation +because it exists in `full-ai-cluster`, but application code must depend +on generic state ports so another database can replace it later. Temporal history, Dapr actor state, NATS streams, Hindsight memory, and hat-system CRDs are runtime surfaces or projections. They do not replace -the Organization database. +the Organization database boundary. ## Global Columns Every authoritative table should include: -| Column | Purpose | -|---|---| -| `id` | stable unique ID | -| `organization_id` | future multi-org partition key, even if V0 uses one org | -| `created_at` | creation time | -| `updated_at` | last mutation time | -| `version` | optimistic concurrency and projection safety | -| `created_by_agent_id` | agent that caused the write, when applicable | -| `created_by_hat_assignment_id` | hat authority that caused the write, when applicable | -| `correlation_id` | end-to-end request/run correlation | -| `causation_id` | direct parent command, event, tool call, or workflow step | -| `trace_id` | observability trace link | +| Column | Purpose | +| ------------------------------ | --------------------------------------------------------- | +| `id` | stable unique ID | +| `organization_id` | future multi-org partition key, even if V0 uses one org | +| `created_at` | creation time | +| `updated_at` | last mutation time | +| `version` | optimistic concurrency and projection safety | +| `created_by_agent_id` | agent that caused the write, when applicable | +| `created_by_hat_assignment_id` | hat authority that caused the write, when applicable | +| `correlation_id` | end-to-end request/run correlation | +| `causation_id` | direct parent command, event, tool call, or workflow step | +| `trace_id` | observability trace link | Append-only records should also carry `sequence` when replay order matters. @@ -36,81 +39,81 @@ matters. ### Identity and Hats -| Table | V0 responsibility | -|---|---| -| `agents` | known Hermes-capable agents and their stable identity | -| `agent_sessions` | live or historical Hermes sessions bound to an agent | -| `departments` | first department containers for ownership and review routing | -| `hat_definitions` | Organization-owned hat catalog | -| `hat_authority_rules` | typed permissions, scopes, and policy metadata for a hat | -| `hat_skill_bindings` | skills and prompt-flow availability attached to a hat | -| `hat_supply_policies` | max concurrency, TTL, cooldown, warmup, and assignment rules | -| `hat_assignments` | time-bounded wearer assignment for a specific agent/session | -| `hat_tokens` | short-lived JWT issuance, refresh, revocation, and expiry state | +| Table | V0 responsibility | +| ------------------------ | --------------------------------------------------------------------------- | +| `agents` | known Hermes-capable agents and their stable identity | +| `agent_sessions` | live or historical Hermes sessions bound to an agent | +| `departments` | first department containers for ownership and review routing | +| `hat_definitions` | Organization-owned hat catalog | +| `hat_authority_rules` | typed permissions, scopes, and policy metadata for a hat | +| `hat_skill_bindings` | skills and prompt-flow availability attached to a hat | +| `hat_supply_policies` | max concurrency, TTL, cooldown, warmup, and assignment rules | +| `hat_assignments` | time-bounded wearer assignment for a specific agent/session | +| `hat_tokens` | short-lived JWT issuance, refresh, revocation, and expiry state | | `hat_system_projections` | last observed Hat, HatBinding, HatSwap, and HatPolicy state from Kubernetes | ### Work Management -| Table | V0 responsibility | -|---|---| -| `projects` | top-level work containers | -| `initiatives` | project-scoped bodies of work | -| `work_items` | supervisor-chain signals, capability requests, tasks, defects, reviews, and follow-ups | -| `work_item_state_history` | append-only state transitions | -| `work_item_dependencies` | blocking or informational dependencies | -| `blockers` | active impediments with owner, severity, and resolution path | -| `assignments` | work item to agent/hat/session assignment records | -| `gates` | required review points for readiness, code, QA, memory, or security | -| `gate_decisions` | approve, reject, needs-changes, or defer decisions | -| `releases` | release groupings once release management enters the slice | +| Table | V0 responsibility | +| ------------------------- | -------------------------------------------------------------------------------------- | +| `projects` | top-level work containers | +| `initiatives` | project-scoped bodies of work | +| `work_items` | supervisor-chain signals, capability requests, tasks, defects, reviews, and follow-ups | +| `work_item_state_history` | append-only state transitions | +| `work_item_dependencies` | blocking or informational dependencies | +| `blockers` | active impediments with owner, severity, and resolution path | +| `assignments` | work item to agent/hat/session assignment records | +| `gates` | required review points for readiness, code, QA, memory, or security | +| `gate_decisions` | approve, reject, needs-changes, or defer decisions | +| `releases` | release groupings once release management enters the slice | ### Schedules, Prompt Flows, and Actions -| Table | V0 responsibility | -|---|---| -| `hat_schedule_templates` | default work rhythm by hat | -| `work_schedules` | concrete schedule assigned to an agent/hat context | -| `work_schedule_blocks` | free time, prioritized work, review, reflection, or meeting blocks | -| `prompt_flow_definitions` | named deterministic work protocols | -| `prompt_flow_versions` | immutable versioned prompt-flow contract | -| `prompt_flow_phases` | ordered reusable phases | -| `hat_prompt_flow_bindings` | which hats can run which prompt flows | -| `prompt_flow_runs` | one execution of a prompt-flow version | -| `prompt_flow_phase_runs` | state and evidence for each phase execution | -| `prompt_flow_gate_decisions` | reviewer decisions at phase boundaries | -| `universal_action_definitions` | typed action grammar catalog | -| `universal_action_records` | action intent emitted by an agent or workflow | -| `universal_action_observations` | observed result, evidence, and side effects for an action | +| Table | V0 responsibility | +| ------------------------------- | ------------------------------------------------------------------ | +| `hat_schedule_templates` | default work rhythm by hat | +| `work_schedules` | concrete schedule assigned to an agent/hat context | +| `work_schedule_blocks` | free time, prioritized work, review, reflection, or meeting blocks | +| `prompt_flow_definitions` | named deterministic work protocols | +| `prompt_flow_versions` | immutable versioned prompt-flow contract | +| `prompt_flow_phases` | ordered reusable phases | +| `hat_prompt_flow_bindings` | which hats can run which prompt flows | +| `prompt_flow_runs` | one execution of a prompt-flow version | +| `prompt_flow_phase_runs` | state and evidence for each phase execution | +| `prompt_flow_gate_decisions` | reviewer decisions at phase boundaries | +| `universal_action_definitions` | typed action grammar catalog | +| `universal_action_records` | action intent emitted by an agent or workflow | +| `universal_action_observations` | observed result, evidence, and side effects for an action | ### Communication, Graph, Documents, and Context -| Table | V0 responsibility | -|---|---| -| `supervisor_signals` | supervisor-chain and capability-request intake records before or during work-item routing | -| `discussion_anchors` | required work/project/initiative/task anchor for any discussion | -| `conversation_threads` | one-on-one, team, department, executive, or broadcast thread | -| `messages` | immutable message log with actor and hat attribution | -| `meetings` | structured meeting sessions with mode and anchor | -| `decisions` | explicit decisions linked to work and evidence | -| `documents` | BRDs, CAs, ADRs, reports, test cases, runbooks, and memory reviews | -| `artifact_links` | logs, screenshots, traces, code refs, PRs, builds, and uploads | -| `graph_nodes` | agent-readable graph node registry | -| `graph_edges` | typed relationships between work, docs, messages, decisions, runs, and memories | -| `context_packs` | deterministic context bundles assembled for an agent run or review | +| Table | V0 responsibility | +| ---------------------- | ----------------------------------------------------------------------------------------- | +| `supervisor_signals` | supervisor-chain and capability-request intake records before or during work-item routing | +| `discussion_anchors` | required work/project/initiative/task anchor for any discussion | +| `conversation_threads` | one-on-one, team, department, executive, or broadcast thread | +| `messages` | immutable message log with actor and hat attribution | +| `meetings` | structured meeting sessions with mode and anchor | +| `decisions` | explicit decisions linked to work and evidence | +| `documents` | BRDs, CAs, ADRs, reports, test cases, runbooks, and memory reviews | +| `artifact_links` | logs, screenshots, traces, code refs, PRs, builds, and uploads | +| `graph_nodes` | agent-readable graph node registry | +| `graph_edges` | typed relationships between work, docs, messages, decisions, runs, and memories | +| `context_packs` | deterministic context bundles assembled for an agent run or review | ### Runtime, Memory, Security, and Audit -| Table | V0 responsibility | -|---|---| -| `hermes_runs` | Organization binding to a Hermes execution session | -| `mcp_tool_calls` | governed tool call attempts and results | -| `memory_events` | Hindsight recall, retain, reflect, and review attribution | -| `credential_requests` | requests to expand credential proxy or external tool scope | -| `signals` | durable internal signals consumed by workers and UI read models | -| `audit_events` | append-only policy and state-change audit trail | -| `outbox_events` | transactional event publication source for NATS | -| `runtime_leases` | scheduler, reconciler, and worker leases | -| `idempotency_keys` | command deduplication records | +| Table | V0 responsibility | +| --------------------- | --------------------------------------------------------------- | +| `hermes_runs` | Organization binding to a Hermes execution session | +| `mcp_tool_calls` | governed tool call attempts and results | +| `memory_events` | Hindsight recall, retain, reflect, and review attribution | +| `credential_requests` | requests to expand credential proxy or external tool scope | +| `signals` | durable internal signals consumed by workers and UI read models | +| `audit_events` | append-only policy and state-change audit trail | +| `outbox_events` | transactional event publication source for NATS | +| `runtime_leases` | scheduler, reconciler, and worker leases | +| `idempotency_keys` | command deduplication records | ## V0 Enums @@ -258,7 +261,7 @@ Every side-effecting command must include: Every command handler must: -1. load authoritative state from CockroachDB; +1. load authoritative state through the state-store port; 2. validate actor context and hat authority; 3. validate lifecycle transition; 4. write state, audit event, and outbox event in one transaction; @@ -267,29 +270,29 @@ Every command handler must: ## V0 Commands -| Command | Actor scope | Writes | Emits | -|---|---|---|---| -| `send_supervisor_signal` | any authorized hat with supervisor line; capability request inputs enter through this path | `supervisor_signals`, `work_items`, `discussion_anchors`, `graph_nodes`, `audit_events`, `outbox_events` | `supervisor_signal_sent`, `work_item_changed` | -| `triage_supervisor_signal` | target supervisor hat, director, or engineering manager | `supervisor_signals`, `work_items`, `assignments`, `gates`, `context_packs` | `supervisor_signal_triaged`, `work_item_changed`, `gate_requested` | -| `create_discussion_anchor` | any authorized hat | `discussion_anchors`, `graph_edges` | `work_item_changed` | -| `create_context_pack` | manager, reviewer, implementer for assigned work | `context_packs`, `graph_edges`, `audit_events` | `work_item_changed` | -| `mark_work_ready` | manager or reviewer | `work_items`, `work_item_state_history`, `gates` | `work_item_changed`, `gate_requested` | -| `reserve_hat` | manager, director, platform operator | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed` | -| `issue_hat_token` | hat service, after policy allow | `hat_tokens`, `audit_events` | `hat_token_changed` | -| `refresh_hat_token` | active assigned agent/session | `hat_tokens`, `audit_events` | `hat_token_changed` | -| `revoke_hat_assignment` | manager, director, security, policy automation | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed`, `hat_token_changed` | -| `start_schedule_block` | assigned agent/session or scheduler | `work_schedule_blocks`, `agent_sessions` | `schedule_block_changed` | -| `start_prompt_flow` | assigned agent/session | `prompt_flow_runs`, `prompt_flow_phase_runs` | `prompt_flow_changed` | -| `record_universal_action` | assigned agent/session, workflow activity, adapter | `universal_action_records`, `mcp_tool_calls`, `audit_events` | `prompt_flow_changed` | -| `record_action_observation` | adapter, worker, reviewer, assigned agent | `universal_action_observations`, `artifact_links` | `prompt_flow_changed` | -| `launch_hermes_run` | runtime service or Temporal activity | `hermes_runs`, `agent_sessions`, `audit_events` | `hermes_run_changed` | -| `record_hermes_run_status` | Hermes/OZ callback, reconciler, platform operator | `hermes_runs`, `artifact_links` | `hermes_run_changed` | -| `submit_evidence` | implementer, QA, reviewer, adapter | `artifact_links`, `graph_edges`, `audit_events` | `work_item_changed` | -| `request_gate_review` | implementer, manager, workflow | `gates`, `work_items` | `gate_requested`, `work_item_changed` | -| `decide_gate` | reviewer hat, not same active implementer assignment | `gate_decisions`, `gates`, `work_items`, `audit_events` | `gate_decided`, `work_item_changed` | -| `record_memory_event` | memory adapter, assigned agent/session, memory curator | `memory_events`, `graph_edges`, `audit_events` | `memory_event_recorded` | -| `submit_credential_request` | any authorized hat with anchored work | `credential_requests`, `work_items`, `discussion_anchors` | `credential_request_changed` | -| `complete_outcome_review` | manager, memory curator, reviewer | `work_items`, `decisions`, optional follow-up `work_items` | `outcome_review_completed` | +| Command | Actor scope | Writes | Emits | +| --------------------------- | ------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ | +| `send_supervisor_signal` | any authorized hat with supervisor line; capability request inputs enter through this path | `supervisor_signals`, `work_items`, `discussion_anchors`, `graph_nodes`, `audit_events`, `outbox_events` | `supervisor_signal_sent`, `work_item_changed` | +| `triage_supervisor_signal` | target supervisor hat, director, or engineering manager | `supervisor_signals`, `work_items`, `assignments`, `gates`, `context_packs` | `supervisor_signal_triaged`, `work_item_changed`, `gate_requested` | +| `create_discussion_anchor` | any authorized hat | `discussion_anchors`, `graph_edges` | `work_item_changed` | +| `create_context_pack` | manager, reviewer, implementer for assigned work | `context_packs`, `graph_edges`, `audit_events` | `work_item_changed` | +| `mark_work_ready` | manager or reviewer | `work_items`, `work_item_state_history`, `gates` | `work_item_changed`, `gate_requested` | +| `reserve_hat` | manager, director, platform operator | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed` | +| `issue_hat_token` | hat service, after policy allow | `hat_tokens`, `audit_events` | `hat_token_changed` | +| `refresh_hat_token` | active assigned agent/session | `hat_tokens`, `audit_events` | `hat_token_changed` | +| `revoke_hat_assignment` | manager, director, security, policy automation | `hat_assignments`, `hat_tokens`, `audit_events` | `hat_assignment_changed`, `hat_token_changed` | +| `start_schedule_block` | assigned agent/session or scheduler | `work_schedule_blocks`, `agent_sessions` | `schedule_block_changed` | +| `start_prompt_flow` | assigned agent/session | `prompt_flow_runs`, `prompt_flow_phase_runs` | `prompt_flow_changed` | +| `record_universal_action` | assigned agent/session, workflow activity, adapter | `universal_action_records`, `mcp_tool_calls`, `audit_events` | `prompt_flow_changed` | +| `record_action_observation` | adapter, worker, reviewer, assigned agent | `universal_action_observations`, `artifact_links` | `prompt_flow_changed` | +| `launch_hermes_run` | runtime service or Temporal activity | `hermes_runs`, `agent_sessions`, `audit_events` | `hermes_run_changed` | +| `record_hermes_run_status` | Hermes/OZ callback, reconciler, platform operator | `hermes_runs`, `artifact_links` | `hermes_run_changed` | +| `submit_evidence` | implementer, QA, reviewer, adapter | `artifact_links`, `graph_edges`, `audit_events` | `work_item_changed` | +| `request_gate_review` | implementer, manager, workflow | `gates`, `work_items` | `gate_requested`, `work_item_changed` | +| `decide_gate` | reviewer hat, not same active implementer assignment | `gate_decisions`, `gates`, `work_items`, `audit_events` | `gate_decided`, `work_item_changed` | +| `record_memory_event` | memory adapter, assigned agent/session, memory curator | `memory_events`, `graph_edges`, `audit_events` | `memory_event_recorded` | +| `submit_credential_request` | any authorized hat with anchored work | `credential_requests`, `work_items`, `discussion_anchors` | `credential_request_changed` | +| `complete_outcome_review` | manager, memory curator, reviewer | `work_items`, `decisions`, optional follow-up `work_items` | `outcome_review_completed` | ## Idempotency @@ -321,9 +324,10 @@ idempotency conflict. ## Outbox and NATS -CockroachDB transactions should write domain state and `outbox_events` -together. A worker publishes outbox rows to NATS JetStream and marks -them published. +Durable state transactions should write domain state and `outbox_events` +together. The first durable adapter uses CockroachDB, but the command +model only depends on generic state ports. A worker publishes outbox +rows to NATS JetStream and marks them published. Subject shape: diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md index fd0bbea7dc..b3d8a6ed20 100644 --- a/agentic-organization/packages/README.md +++ b/agentic-organization/packages/README.md @@ -11,8 +11,8 @@ host, or Kubernetes deployment is introduced. | ----------------- | -------------------------------------------------------------------------------------------------------------------------- | | `domain` | typed command names, event names, aggregate names, work item state machine, event envelope, shared records | | `application` | command pipeline, handler registry, idempotency handling, state-store ports, first supervisor-chain signal command handler | -| `state` | in-memory Organization state-store factory used as the first repository port fake | -| `state-cockroach` | CockroachDB state-store/outbox-source contracts, SQL statement catalogs, and first core-state migration skeleton | +| `state` | generic state-store and outbox-source ports plus the in-memory Organization state-store factory fake | +| `state-cockroach` | first replaceable durable SQL adapter for the state-store/outbox-source ports, backed by CockroachDB | | `messaging` | NATS subject contract, outbox publisher port, event publisher port, and domain resolver without a live NATS dependency | | `messaging-nats` | NATS JetStream event publisher adapter contract with canonical JSON payloads, headers, and message IDs | | `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes | @@ -38,7 +38,10 @@ supervisor-chain signal command CockroachDB, JetStream publishing, Temporal, Dapr, Hermes, Hindsight, and the hat-system CRDs come next as adapters behind these contracts. They should not redefine command names, event names, state names, -correlation fields, or policy authority. +correlation fields, or policy authority. CockroachDB is the first +durable state adapter because it exists in `full-ai-cluster`; application +and messaging code must remain database-agnostic so a later durable +store can replace it behind the same ports. The application package must not construct concrete state adapters. Runtime hosts and tests provide a `CommandStateStoreFactory`; the state @@ -46,16 +49,17 @@ package implements the current in-memory factory. Command routing uses a handler registry so new commands add handlers instead of editing a central `switch` or `if` dispatcher. -`CommandStateStore` is async even when backed by the in-memory fake. The -real CockroachDB adapter must not be squeezed into a synchronous toy -shape. +`CommandStateStore` and `OutboxEventSource` are async even when backed +by in-memory fakes. Durable adapters must not be squeezed into a +synchronous toy shape. The outbox publisher owns the generic publish loop: claim unpublished -outbox rows, resolve a typed Organization domain, publish through an -`EventPublisher` port, and mark the outbox row published only after the -publish returns successfully. The NATS package implements that port and -is the only package in this slice that knows about NATS headers, -message IDs, and JSON transport payloads. +outbox rows from the generic state port, resolve a typed Organization +domain, publish through an `EventPublisher` port, and mark the outbox +row published only after the publish returns successfully. The NATS +package implements that publisher port and is the only package in this +slice that knows about NATS headers, message IDs, and JSON transport +payloads. State adapters must not import messaging adapters. ## Validation diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts index b9ae9bdd1e..499b70d246 100644 --- a/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts @@ -30,6 +30,11 @@ describe("package dependency boundaries", () => { sourceGlob: "messaging/src/**/*.ts", forbiddenImportFragments: ["../messaging-nats", "../../messaging-nats", "nats"], }, + { + packageName: PackageBoundaryRule.StateAdapter, + sourceGlob: "state-cockroach/src/**/*.ts", + forbiddenImportFragments: ["../../messaging", "../messaging", "nats", "jetstream"], + }, ], }); diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts index 08bfb7386a..eb7a82d16c 100644 --- a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts @@ -5,6 +5,7 @@ import { fileURLToPath } from "node:url"; export const PackageBoundaryRule = { Application: "application", Messaging: "messaging", + StateAdapter: "state_adapter", } as const; export type PackageBoundaryRule = (typeof PackageBoundaryRule)[keyof typeof PackageBoundaryRule]; diff --git a/agentic-organization/packages/messaging/src/index.ts b/agentic-organization/packages/messaging/src/index.ts index 2cb19ab88e..87b9ed3716 100644 --- a/agentic-organization/packages/messaging/src/index.ts +++ b/agentic-organization/packages/messaging/src/index.ts @@ -4,12 +4,9 @@ export { OutboxPublishOutcomeStatus, createOutboxPublisher, resolveAgenticMessagingDomain, - type ClaimUnpublishedOutboxEventsInput, type CreateOutboxPublisherInput, type EventPublication, type EventPublisher, - type MarkOutboxEventPublishedInput, - type OutboxEventSource, type OutboxPublishBatchResult, type OutboxPublisher, type ResolveAgenticMessagingDomain, diff --git a/agentic-organization/packages/messaging/src/outbox-publisher.test.ts b/agentic-organization/packages/messaging/src/outbox-publisher.test.ts index 4de151fdc3..d47415bd54 100644 --- a/agentic-organization/packages/messaging/src/outbox-publisher.test.ts +++ b/agentic-organization/packages/messaging/src/outbox-publisher.test.ts @@ -7,6 +7,7 @@ import { createAgenticEventEnvelope, type OutboxEvent, } from "../../domain/src/index.ts"; +import type { OutboxEventSource } from "../../state/src/index.ts"; import { AgenticMessagingDomain, OutboxPublishOutcomeStatus, @@ -14,7 +15,6 @@ import { resolveAgenticMessagingDomain, type EventPublication, type EventPublisher, - type OutboxEventSource, } from "./outbox-publisher.ts"; describe("outbox publisher", () => { diff --git a/agentic-organization/packages/messaging/src/outbox-publisher.ts b/agentic-organization/packages/messaging/src/outbox-publisher.ts index 0897a35274..50209ff9d3 100644 --- a/agentic-organization/packages/messaging/src/outbox-publisher.ts +++ b/agentic-organization/packages/messaging/src/outbox-publisher.ts @@ -1,4 +1,5 @@ import { AgenticEventType, type OutboxEvent } from "../../domain/src/index.ts"; +import type { ClaimUnpublishedOutboxEventsInput, OutboxEventSource } from "../../state/src/index.ts"; import { buildAgenticEventSubject } from "./subject-builder.ts"; export const AgenticMessagingDomain = { @@ -15,20 +16,6 @@ export const OutboxPublishOutcomeStatus = { export type OutboxPublishOutcomeStatus = (typeof OutboxPublishOutcomeStatus)[keyof typeof OutboxPublishOutcomeStatus]; -export type ClaimUnpublishedOutboxEventsInput = { - batchSize: number; -}; - -export type MarkOutboxEventPublishedInput = { - outboxEventId: string; - publishedAt: string; -}; - -export type OutboxEventSource = { - claimUnpublishedOutboxEvents: (input: ClaimUnpublishedOutboxEventsInput) => Promise; - markOutboxEventPublished: (input: MarkOutboxEventPublishedInput) => Promise; -}; - export type EventPublication = { subject: string; outboxEvent: OutboxEvent; diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.ts b/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.ts index 28b74733ba..9234b21277 100644 --- a/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.ts +++ b/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.ts @@ -1,5 +1,5 @@ import type { AgenticEventEnvelope } from "../../domain/src/index.ts"; -import type { OutboxEventSource } from "../../messaging/src/index.ts"; +import type { OutboxEventSource } from "../../state/src/index.ts"; import { CockroachTableName } from "./cockroach-schema.ts"; export const CockroachOutboxEventSourceStatement = { diff --git a/agentic-organization/packages/state/src/index.ts b/agentic-organization/packages/state/src/index.ts index 8ab8d1eff9..1b1611aa83 100644 --- a/agentic-organization/packages/state/src/index.ts +++ b/agentic-organization/packages/state/src/index.ts @@ -3,3 +3,8 @@ export { type InMemoryOrganizationStoreFactory, type InMemoryOrganizationStoreSnapshot, } from "./in-memory-organization-store.ts"; +export type { + ClaimUnpublishedOutboxEventsInput, + MarkOutboxEventPublishedInput, + OutboxEventSource, +} from "./outbox-event-source.ts"; diff --git a/agentic-organization/packages/state/src/outbox-event-source.ts b/agentic-organization/packages/state/src/outbox-event-source.ts new file mode 100644 index 0000000000..6125bed5f4 --- /dev/null +++ b/agentic-organization/packages/state/src/outbox-event-source.ts @@ -0,0 +1,15 @@ +import type { OutboxEvent } from "../../domain/src/index.ts"; + +export type ClaimUnpublishedOutboxEventsInput = { + batchSize: number; +}; + +export type MarkOutboxEventPublishedInput = { + outboxEventId: string; + publishedAt: string; +}; + +export type OutboxEventSource = { + claimUnpublishedOutboxEvents: (input: ClaimUnpublishedOutboxEventsInput) => Promise; + markOutboxEventPublished: (input: MarkOutboxEventPublishedInput) => Promise; +}; diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 71f30853f1..a3d752c6bc 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -56,21 +56,32 @@ Organization state only by calling Organization commands. concrete state adapters, Cockroach adapters, NestJS, NATS, Dapr, Temporal, Drizzle, Postgres, or other runtime clients - **AND** a violation fails the test suite before the boundary can drift +- **AND** state adapter source files are checked for forbidden imports + of messaging, NATS, JetStream, or other event transport clients ### Requirement: Commands are idempotent Organization commands MUST use deterministic idempotency keys at the command boundary. -#### Scenario: Cockroach core state schema exists +#### Scenario: Durable core state schema exists -- **WHEN** the first CockroachDB migration contract is loaded +- **WHEN** the first durable state migration contract is loaded - **THEN** it declares work item, supervisor signal, audit event, outbox event, and idempotency record tables - **AND** outbox rows include trace ID, correlation ID, and canonical envelope JSON fields for later NATS publication and workflow visibility +#### Scenario: Durable state adapter is replaceable + +- **WHEN** application or messaging package source is inspected +- **THEN** it does not import CockroachDB, Drizzle, Postgres, or + database-client packages +- **AND** it depends on generic state and outbox-source ports instead +- **AND** CockroachDB is treated as the first replaceable durable adapter + for the cluster, not as the application model + #### Scenario: Matching replay - **WHEN** a command is submitted twice with the same idempotency key From 24c30bcd1483f081fb60b998c97ac6e968c256f5 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 19:43:42 -0400 Subject: [PATCH 11/21] refactor(agentic-org): separate package tests from source Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 4 + .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 4 + agentic-organization/package.json | 2 +- agentic-organization/packages/README.md | 7 +- .../{src => test}/command-pipeline.test.ts | 8 +- .../send-supervisor-signal.test.ts | 8 +- .../{src => test}/event-envelope.test.ts | 2 +- .../hat-communication-brief.test.ts | 4 +- .../work-item-state-machine.test.ts | 2 +- .../packages/governance/src/index.ts | 5 + .../src/package-dependency-boundaries.test.ts | 43 ------- .../src/package-dependency-boundaries.ts | 63 +++++++++- .../package-dependency-boundaries.test.ts | 114 ++++++++++++++++++ .../nats-jetstream-event-publisher.test.ts | 2 +- .../{src => test}/outbox-publisher.test.ts | 2 +- .../{src => test}/subject-builder.test.ts | 2 +- .../{src => test}/span-attributes.test.ts | 2 +- .../{src => test}/workflow-visibility.test.ts | 2 +- .../{src => test}/event-automation.test.ts | 7 +- .../cockroach-command-state-store.test.ts | 2 +- .../cockroach-outbox-event-source.test.ts | 2 +- .../{src => test}/cockroach-schema.test.ts | 2 +- openspec/specs/agentic-organization/spec.md | 10 ++ 23 files changed, 231 insertions(+), 68 deletions(-) rename agentic-organization/packages/application/{src => test}/command-pipeline.test.ts (92%) rename agentic-organization/packages/application/{src/handlers => test}/send-supervisor-signal.test.ts (93%) rename agentic-organization/packages/domain/{src => test}/event-envelope.test.ts (98%) rename agentic-organization/packages/domain/{src => test}/hat-communication-brief.test.ts (94%) rename agentic-organization/packages/domain/{src => test}/work-item-state-machine.test.ts (92%) delete mode 100644 agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts create mode 100644 agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts rename agentic-organization/packages/messaging-nats/{src => test}/nats-jetstream-event-publisher.test.ts (98%) rename agentic-organization/packages/messaging/{src => test}/outbox-publisher.test.ts (99%) rename agentic-organization/packages/messaging/{src => test}/subject-builder.test.ts (88%) rename agentic-organization/packages/observability/{src => test}/span-attributes.test.ts (98%) rename agentic-organization/packages/observability/{src => test}/workflow-visibility.test.ts (99%) rename agentic-organization/packages/runtime/{src => test}/event-automation.test.ts (94%) rename agentic-organization/packages/state-cockroach/{src => test}/cockroach-command-state-store.test.ts (98%) rename agentic-organization/packages/state-cockroach/{src => test}/cockroach-outbox-event-source.test.ts (98%) rename agentic-organization/packages/state-cockroach/{src => test}/cockroach-schema.test.ts (97%) diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index 9f8e615eba..b250cb2f3b 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -108,6 +108,10 @@ Hermes runs, MCP calls, and UI evidence. - A governance test enforces that the Cockroach state adapter does not import messaging, NATS, or JetStream. Durable state can be swapped without dragging transport concerns into the repository layer. +- A governance test enforces package source layout: production code + lives under `packages//src`, tests live under + `packages//test`, and `*.test.ts` files are rejected from + production source trees. - The outbox publisher claims unpublished events, publishes each event through an `EventPublisher` port, and marks rows published only after the publish succeeds. diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 5462e2b839..138699f70a 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -116,6 +116,10 @@ Rules: - Cross-package imports use public exports only. - No controller, worker entrypoint, Temporal workflow, Dapr actor, or MCP route contains business rules. +- Production source and test source are separated. Package + implementation code lives in `packages//src`; package tests live + in `packages//test`. Governance checks should reject `*.test.ts` + files inside production source trees. ## Package Layers diff --git a/agentic-organization/package.json b/agentic-organization/package.json index 0fd10e913a..3d1157cebb 100644 --- a/agentic-organization/package.json +++ b/agentic-organization/package.json @@ -3,7 +3,7 @@ "private": true, "type": "module", "scripts": { - "test": "node --experimental-strip-types --test packages/**/*.test.ts", + "test": "node --experimental-strip-types --test packages/*/test/**/*.test.ts", "typecheck": "npx --yes -p typescript@6.0.3 tsc -p tsconfig.json" }, "engines": { diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md index b3d8a6ed20..cb5cb1c1aa 100644 --- a/agentic-organization/packages/README.md +++ b/agentic-organization/packages/README.md @@ -49,6 +49,11 @@ package implements the current in-memory factory. Command routing uses a handler registry so new commands add handlers instead of editing a central `switch` or `if` dispatcher. +Production source and test source are separated by package. Application +code lives under `packages//src`; tests live under +`packages//test`. The governance package enforces that `*.test.ts` +files do not land in production `src` trees. + `CommandStateStore` and `OutboxEventSource` are async even when backed by in-memory fakes. Durable adapters must not be squeezed into a synchronous toy shape. @@ -73,7 +78,7 @@ The test command uses Node's built-in test runner and TypeScript type stripping: ```text -node --experimental-strip-types --test packages/**/*.test.ts +node --experimental-strip-types --test packages/*/test/**/*.test.ts ``` This is a deliberate NodeNext starting point so the package contracts diff --git a/agentic-organization/packages/application/src/command-pipeline.test.ts b/agentic-organization/packages/application/test/command-pipeline.test.ts similarity index 92% rename from agentic-organization/packages/application/src/command-pipeline.test.ts rename to agentic-organization/packages/application/test/command-pipeline.test.ts index 48fa57e79a..6774a9923b 100644 --- a/agentic-organization/packages/application/src/command-pipeline.test.ts +++ b/agentic-organization/packages/application/test/command-pipeline.test.ts @@ -3,10 +3,10 @@ import { describe, test } from "node:test"; import { CommandType, SupervisorChainLevel, SupervisorSignalToolType } from "../../domain/src/index.ts"; import { createInMemoryOrganizationStoreFactory } from "../../state/src/index.ts"; -import { createCommandHandlerRegistry } from "./command-handler-registry.ts"; -import { CommandErrorCode, CommandResultStatus, type CommandResult } from "./command-result.ts"; -import { createCommandPipeline, type PipelineCommand } from "./command-pipeline.ts"; -import { createSendSupervisorSignalHandler } from "./handlers/send-supervisor-signal.ts"; +import { createCommandHandlerRegistry } from "../src/command-handler-registry.ts"; +import { CommandErrorCode, CommandResultStatus, type CommandResult } from "../src/command-result.ts"; +import { createCommandPipeline, type PipelineCommand } from "../src/command-pipeline.ts"; +import { createSendSupervisorSignalHandler } from "../src/handlers/send-supervisor-signal.ts"; const command: PipelineCommand = { commandId: "cmd-supervisor-signal-001", diff --git a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts b/agentic-organization/packages/application/test/send-supervisor-signal.test.ts similarity index 93% rename from agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts rename to agentic-organization/packages/application/test/send-supervisor-signal.test.ts index 2c9068c182..be6a74d688 100644 --- a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.test.ts +++ b/agentic-organization/packages/application/test/send-supervisor-signal.test.ts @@ -8,10 +8,10 @@ import { SupervisorChainLevel, SupervisorSignalStatus, SupervisorSignalToolType, -} from "../../../domain/src/index.ts"; -import { createInMemoryOrganizationStoreFactory } from "../../../state/src/index.ts"; -import { CommandResultStatus, type CommandResult } from "../command-result.ts"; -import { sendSupervisorSignal, type SendSupervisorSignalCommand } from "./send-supervisor-signal.ts"; +} from "../../domain/src/index.ts"; +import { createInMemoryOrganizationStoreFactory } from "../../state/src/index.ts"; +import { CommandResultStatus, type CommandResult } from "../src/command-result.ts"; +import { sendSupervisorSignal, type SendSupervisorSignalCommand } from "../src/handlers/send-supervisor-signal.ts"; const command: SendSupervisorSignalCommand = { commandId: "cmd-supervisor-signal-001", diff --git a/agentic-organization/packages/domain/src/event-envelope.test.ts b/agentic-organization/packages/domain/test/event-envelope.test.ts similarity index 98% rename from agentic-organization/packages/domain/src/event-envelope.test.ts rename to agentic-organization/packages/domain/test/event-envelope.test.ts index 8bea403c2b..08403cc21e 100644 --- a/agentic-organization/packages/domain/src/event-envelope.test.ts +++ b/agentic-organization/packages/domain/test/event-envelope.test.ts @@ -6,7 +6,7 @@ import { AgenticEventType, createAgenticEventEnvelope, type CommandTrace, -} from "./event-envelope.ts"; +} from "../src/event-envelope.ts"; const commandTrace: CommandTrace = { commandId: "cmd-capability-001", diff --git a/agentic-organization/packages/domain/src/hat-communication-brief.test.ts b/agentic-organization/packages/domain/test/hat-communication-brief.test.ts similarity index 94% rename from agentic-organization/packages/domain/src/hat-communication-brief.test.ts rename to agentic-organization/packages/domain/test/hat-communication-brief.test.ts index 093a9449fa..705142b3a6 100644 --- a/agentic-organization/packages/domain/src/hat-communication-brief.test.ts +++ b/agentic-organization/packages/domain/test/hat-communication-brief.test.ts @@ -1,8 +1,8 @@ import { deepEqual, equal } from "node:assert/strict"; import { describe, test } from "node:test"; -import { DefaultTeamMemberSupervisorTools, buildHatCommunicationBrief } from "./hat-communication-brief.ts"; -import { SupervisorChainLevel, SupervisorSignalToolType } from "./supervisor-communication.ts"; +import { DefaultTeamMemberSupervisorTools, buildHatCommunicationBrief } from "../src/hat-communication-brief.ts"; +import { SupervisorChainLevel, SupervisorSignalToolType } from "../src/supervisor-communication.ts"; describe("hat communication brief", () => { test("explains duty, supervisor line, and efficient upward tools", () => { diff --git a/agentic-organization/packages/domain/src/work-item-state-machine.test.ts b/agentic-organization/packages/domain/test/work-item-state-machine.test.ts similarity index 92% rename from agentic-organization/packages/domain/src/work-item-state-machine.test.ts rename to agentic-organization/packages/domain/test/work-item-state-machine.test.ts index d7a7987f3b..7799f79e57 100644 --- a/agentic-organization/packages/domain/src/work-item-state-machine.test.ts +++ b/agentic-organization/packages/domain/test/work-item-state-machine.test.ts @@ -1,7 +1,7 @@ import { equal, throws } from "node:assert/strict"; import { describe, test } from "node:test"; -import { WorkItemState, assertWorkItemTransition, createInitialWorkItemState } from "./work-item-state-machine.ts"; +import { WorkItemState, assertWorkItemTransition, createInitialWorkItemState } from "../src/work-item-state-machine.ts"; describe("work item state machine", () => { test("new work starts in the typed new state", () => { diff --git a/agentic-organization/packages/governance/src/index.ts b/agentic-organization/packages/governance/src/index.ts index 4e0ff163ae..dfb46ca73c 100644 --- a/agentic-organization/packages/governance/src/index.ts +++ b/agentic-organization/packages/governance/src/index.ts @@ -1,7 +1,12 @@ export { PackageBoundaryRule, + PackageSourceLayoutViolationReason, validatePackageDependencyBoundaries, + validatePackageSourceLayout, type PackageDependencyBoundaryRule, type PackageDependencyBoundaryViolation, + type PackageSourceLayoutRule, + type PackageSourceLayoutViolation, type ValidatePackageDependencyBoundariesInput, + type ValidatePackageSourceLayoutInput, } from "./package-dependency-boundaries.ts"; diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts deleted file mode 100644 index 499b70d246..0000000000 --- a/agentic-organization/packages/governance/src/package-dependency-boundaries.test.ts +++ /dev/null @@ -1,43 +0,0 @@ -import { equal } from "node:assert/strict"; -import { describe, test } from "node:test"; - -import { PackageBoundaryRule, validatePackageDependencyBoundaries } from "./package-dependency-boundaries.ts"; - -describe("package dependency boundaries", () => { - test("keeps application independent from state and runtime adapters", async () => { - const violations = await validatePackageDependencyBoundaries({ - rootDirectory: new URL("../..", import.meta.url), - rules: [ - { - packageName: PackageBoundaryRule.Application, - sourceGlob: "application/src/**/*.ts", - forbiddenImportFragments: [ - "../../state", - "../../../state", - "state-cockroach", - "nestjs", - "@nestjs", - "nats", - "dapr", - "temporal", - "drizzle", - "pg", - "postgres", - ], - }, - { - packageName: PackageBoundaryRule.Messaging, - sourceGlob: "messaging/src/**/*.ts", - forbiddenImportFragments: ["../messaging-nats", "../../messaging-nats", "nats"], - }, - { - packageName: PackageBoundaryRule.StateAdapter, - sourceGlob: "state-cockroach/src/**/*.ts", - forbiddenImportFragments: ["../../messaging", "../messaging", "nats", "jetstream"], - }, - ], - }); - - equal(violations.length, 0, violations.map((violation) => violation.message).join("\n")); - }); -}); diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts index eb7a82d16c..52fa4f343f 100644 --- a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts @@ -5,6 +5,7 @@ import { fileURLToPath } from "node:url"; export const PackageBoundaryRule = { Application: "application", Messaging: "messaging", + ProductionSource: "production_source", StateAdapter: "state_adapter", } as const; @@ -23,11 +24,37 @@ export type PackageDependencyBoundaryViolation = { message: string; }; +export const PackageSourceLayoutViolationReason = { + TestFileInProductionSource: "test_file_in_production_source", +} as const; + +export type PackageSourceLayoutViolationReason = + (typeof PackageSourceLayoutViolationReason)[keyof typeof PackageSourceLayoutViolationReason]; + +export type PackageSourceLayoutRule = { + packageName: PackageBoundaryRule; + sourceGlob: string; + forbiddenFileSuffix: string; + reason: PackageSourceLayoutViolationReason; +}; + +export type PackageSourceLayoutViolation = { + packageName: PackageBoundaryRule; + filePath: string; + reason: PackageSourceLayoutViolationReason; + message: string; +}; + export type ValidatePackageDependencyBoundariesInput = { rootDirectory: URL; rules: readonly PackageDependencyBoundaryRule[]; }; +export type ValidatePackageSourceLayoutInput = { + rootDirectory: URL; + rules: readonly PackageSourceLayoutRule[]; +}; + const TypeScriptSourceExtension = ".ts"; const TestSourceExtension = ".test.ts"; const RecursiveTypeScriptGlobSuffix = "/**/*.ts"; @@ -65,14 +92,46 @@ export async function validatePackageDependencyBoundaries( return violations; } +export async function validatePackageSourceLayout( + input: ValidatePackageSourceLayoutInput, +): Promise { + const rootDirectoryPath = fileURLToPath(input.rootDirectory); + const violations: PackageSourceLayoutViolation[] = []; + + for (const rule of input.rules) { + const sourceFiles = await findFiles(rootDirectoryPath, rule.sourceGlob); + + for (const sourceFile of sourceFiles) { + if (!sourceFile.endsWith(rule.forbiddenFileSuffix)) { + continue; + } + + violations.push({ + packageName: rule.packageName, + filePath: normalizePath(relative(rootDirectoryPath, sourceFile)), + reason: rule.reason, + message: `${rule.packageName} has forbidden source-layout file ${normalizePath( + relative(rootDirectoryPath, sourceFile), + )}`, + }); + } + } + + return violations; +} + async function findSourceFiles(rootDirectoryPath: string, sourceGlob: string): Promise { + const sourceFiles = await findFiles(rootDirectoryPath, sourceGlob); + return sourceFiles.filter((sourceFile) => !sourceFile.endsWith(TestSourceExtension)); +} + +async function findFiles(rootDirectoryPath: string, sourceGlob: string): Promise { if (!sourceGlob.endsWith(RecursiveTypeScriptGlobSuffix)) { throw new Error(`unsupported source glob: ${sourceGlob}`); } const sourceRoot = join(rootDirectoryPath, sourceGlob.slice(0, -RecursiveTypeScriptGlobSuffix.length)); - const sourceFiles = await collectTypeScriptSourceFiles(sourceRoot); - return sourceFiles.filter((sourceFile) => !sourceFile.endsWith(TestSourceExtension)); + return collectTypeScriptSourceFiles(sourceRoot); } async function collectTypeScriptSourceFiles(directoryPath: string): Promise { diff --git a/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts new file mode 100644 index 0000000000..f52300f73f --- /dev/null +++ b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts @@ -0,0 +1,114 @@ +import { equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + PackageBoundaryRule, + PackageSourceLayoutViolationReason, + validatePackageDependencyBoundaries, + validatePackageSourceLayout, +} from "../src/package-dependency-boundaries.ts"; + +const packagesRootDirectory = new URL("../..", import.meta.url); + +describe("package dependency boundaries", () => { + test("keeps application independent from state and runtime adapters", async () => { + const violations = await validatePackageDependencyBoundaries({ + rootDirectory: packagesRootDirectory, + rules: [ + { + packageName: PackageBoundaryRule.Application, + sourceGlob: "application/src/**/*.ts", + forbiddenImportFragments: [ + "../../state", + "../../../state", + "state-cockroach", + "nestjs", + "@nestjs", + "nats", + "dapr", + "temporal", + "drizzle", + "pg", + "postgres", + ], + }, + { + packageName: PackageBoundaryRule.Messaging, + sourceGlob: "messaging/src/**/*.ts", + forbiddenImportFragments: ["../messaging-nats", "../../messaging-nats", "nats"], + }, + { + packageName: PackageBoundaryRule.StateAdapter, + sourceGlob: "state-cockroach/src/**/*.ts", + forbiddenImportFragments: ["../../messaging", "../messaging", "nats", "jetstream"], + }, + ], + }); + + equal(violations.length, 0, violations.map((violation) => violation.message).join("\n")); + }); + + test("keeps tests out of production source directories", async () => { + const violations = await validatePackageSourceLayout({ + rootDirectory: packagesRootDirectory, + rules: [ + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "application/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "domain/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "governance/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "messaging/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "messaging-nats/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "observability/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "runtime/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "state/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "state-cockroach/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + ], + }); + + equal(violations.length, 0, violations.map((violation) => violation.message).join("\n")); + }); +}); diff --git a/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.test.ts b/agentic-organization/packages/messaging-nats/test/nats-jetstream-event-publisher.test.ts similarity index 98% rename from agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.test.ts rename to agentic-organization/packages/messaging-nats/test/nats-jetstream-event-publisher.test.ts index b565bdc65f..8408a37349 100644 --- a/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-publisher.test.ts +++ b/agentic-organization/packages/messaging-nats/test/nats-jetstream-event-publisher.test.ts @@ -11,7 +11,7 @@ import { createNatsJetStreamEventPublisher, NatsHeaderName, type NatsJetStreamClient, -} from "./nats-jetstream-event-publisher.ts"; +} from "../src/nats-jetstream-event-publisher.ts"; describe("NATS JetStream event publisher", () => { test("publishes canonical JSON with idempotent headers and message ID", async () => { diff --git a/agentic-organization/packages/messaging/src/outbox-publisher.test.ts b/agentic-organization/packages/messaging/test/outbox-publisher.test.ts similarity index 99% rename from agentic-organization/packages/messaging/src/outbox-publisher.test.ts rename to agentic-organization/packages/messaging/test/outbox-publisher.test.ts index d47415bd54..a3ffb1ddae 100644 --- a/agentic-organization/packages/messaging/src/outbox-publisher.test.ts +++ b/agentic-organization/packages/messaging/test/outbox-publisher.test.ts @@ -15,7 +15,7 @@ import { resolveAgenticMessagingDomain, type EventPublication, type EventPublisher, -} from "./outbox-publisher.ts"; +} from "../src/outbox-publisher.ts"; describe("outbox publisher", () => { test("resolves event domains through typed mappings", () => { diff --git a/agentic-organization/packages/messaging/src/subject-builder.test.ts b/agentic-organization/packages/messaging/test/subject-builder.test.ts similarity index 88% rename from agentic-organization/packages/messaging/src/subject-builder.test.ts rename to agentic-organization/packages/messaging/test/subject-builder.test.ts index 49b21bf303..f108c6f836 100644 --- a/agentic-organization/packages/messaging/src/subject-builder.test.ts +++ b/agentic-organization/packages/messaging/test/subject-builder.test.ts @@ -2,7 +2,7 @@ import { equal } from "node:assert/strict"; import { describe, test } from "node:test"; import { AgenticEventType } from "../../domain/src/index.ts"; -import { buildAgenticEventSubject } from "./subject-builder.ts"; +import { buildAgenticEventSubject } from "../src/subject-builder.ts"; describe("agentic event NATS subjects", () => { test("uses a stable organization-scoped subject shape", () => { diff --git a/agentic-organization/packages/observability/src/span-attributes.test.ts b/agentic-organization/packages/observability/test/span-attributes.test.ts similarity index 98% rename from agentic-organization/packages/observability/src/span-attributes.test.ts rename to agentic-organization/packages/observability/test/span-attributes.test.ts index a144a6c426..4481a0bb4a 100644 --- a/agentic-organization/packages/observability/src/span-attributes.test.ts +++ b/agentic-organization/packages/observability/test/span-attributes.test.ts @@ -7,7 +7,7 @@ import { WorkItemState, createAgenticEventEnvelope, } from "../../domain/src/index.ts"; -import { MessagingSystemName, buildAgenticSpanAttributes } from "./span-attributes.ts"; +import { MessagingSystemName, buildAgenticSpanAttributes } from "../src/span-attributes.ts"; describe("agentic observability span attributes", () => { test("projects event context into LGTM-friendly OpenTelemetry attributes", () => { diff --git a/agentic-organization/packages/observability/src/workflow-visibility.test.ts b/agentic-organization/packages/observability/test/workflow-visibility.test.ts similarity index 99% rename from agentic-organization/packages/observability/src/workflow-visibility.test.ts rename to agentic-organization/packages/observability/test/workflow-visibility.test.ts index 395d1f09d1..2217bb3b5f 100644 --- a/agentic-organization/packages/observability/src/workflow-visibility.test.ts +++ b/agentic-organization/packages/observability/test/workflow-visibility.test.ts @@ -14,7 +14,7 @@ import { WeakPointIndicatorType, WorkflowObservationKind, buildWorkflowVisibilityRecord, -} from "./workflow-visibility.ts"; +} from "../src/workflow-visibility.ts"; describe("workflow visibility records", () => { test("builds a plug-in visibility record for agent self-monitoring", () => { diff --git a/agentic-organization/packages/runtime/src/event-automation.test.ts b/agentic-organization/packages/runtime/test/event-automation.test.ts similarity index 94% rename from agentic-organization/packages/runtime/src/event-automation.test.ts rename to agentic-organization/packages/runtime/test/event-automation.test.ts index 79f15dd8fc..db552cf760 100644 --- a/agentic-organization/packages/runtime/src/event-automation.test.ts +++ b/agentic-organization/packages/runtime/test/event-automation.test.ts @@ -9,7 +9,12 @@ import { SupervisorSignalToolType, createAgenticEventEnvelope, } from "../../domain/src/index.ts"; -import { ReactionPlanActionType, ReactionPlanReason, RequiredHat, evaluateV0AutomationRules } from "./reaction-plan.ts"; +import { + ReactionPlanActionType, + ReactionPlanReason, + RequiredHat, + evaluateV0AutomationRules, +} from "../src/reaction-plan.ts"; describe("v0 event automation rules", () => { test("plans target-supervisor triage when a hat sends an upward signal", () => { diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.test.ts b/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts similarity index 98% rename from agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.test.ts rename to agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts index 74508ce2dd..2b36d8421d 100644 --- a/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.test.ts +++ b/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts @@ -13,7 +13,7 @@ import { CockroachCommandStateStoreStatement, createCockroachCommandStateStoreFactory, type CockroachSqlExecutor, -} from "./cockroach-command-state-store.ts"; +} from "../src/cockroach-command-state-store.ts"; describe("cockroach command state store", () => { test("implements command-state-store operations behind a SQL executor", async () => { diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.test.ts b/agentic-organization/packages/state-cockroach/test/cockroach-outbox-event-source.test.ts similarity index 98% rename from agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.test.ts rename to agentic-organization/packages/state-cockroach/test/cockroach-outbox-event-source.test.ts index a5e6eda061..0b3fdf9062 100644 --- a/agentic-organization/packages/state-cockroach/src/cockroach-outbox-event-source.test.ts +++ b/agentic-organization/packages/state-cockroach/test/cockroach-outbox-event-source.test.ts @@ -7,7 +7,7 @@ import { createCockroachOutboxEventSource, type CockroachOutboxSqlExecutor, type CockroachOutboxSqlStatement, -} from "./cockroach-outbox-event-source.ts"; +} from "../src/cockroach-outbox-event-source.ts"; describe("cockroach outbox event source", () => { test("claims unpublished outbox events and marks them published", async () => { diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-schema.test.ts b/agentic-organization/packages/state-cockroach/test/cockroach-schema.test.ts similarity index 97% rename from agentic-organization/packages/state-cockroach/src/cockroach-schema.test.ts rename to agentic-organization/packages/state-cockroach/test/cockroach-schema.test.ts index 7eba2ff572..a7f99d740a 100644 --- a/agentic-organization/packages/state-cockroach/src/cockroach-schema.test.ts +++ b/agentic-organization/packages/state-cockroach/test/cockroach-schema.test.ts @@ -5,7 +5,7 @@ import { CockroachCoreStateMigrationName, CockroachTableName, createCockroachCoreStateMigration, -} from "./cockroach-schema.ts"; +} from "../src/cockroach-schema.ts"; describe("cockroach core state schema", () => { test("declares the first authoritative state, audit, outbox, and idempotency tables", () => { diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index a3d752c6bc..ae7f93e7e5 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -59,6 +59,16 @@ Organization state only by calling Organization commands. - **AND** state adapter source files are checked for forbidden imports of messaging, NATS, JetStream, or other event transport clients +#### Scenario: Tests are kept out of production source trees + +- **WHEN** package source-layout governance tests run +- **THEN** production source directories are scanned for `*.test.ts` + files +- **AND** every package keeps implementation code under + `packages//src` +- **AND** every package keeps tests under `packages//test` +- **AND** a test file inside a production source tree fails the suite + ### Requirement: Commands are idempotent Organization commands MUST use deterministic idempotency keys at the From 9fe66edd2e48d634fca08f9ba2b809df1f7ee8d9 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 20:06:17 -0400 Subject: [PATCH 12/21] feat(agentic-org): add inbound event ingestion spine Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 22 ++- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 13 +- agentic-organization/packages/README.md | 13 ++ .../packages/domain/src/index.ts | 7 + .../packages/domain/src/reaction-plan.ts | 44 +++++ .../package-dependency-boundaries.test.ts | 9 +- .../packages/runtime/src/event-ingestion.ts | 91 +++++++++ .../packages/runtime/src/index.ts | 9 + .../packages/runtime/src/reaction-plan.ts | 49 ++--- .../runtime/test/event-ingestion.test.ts | 172 ++++++++++++++++++ .../0001_agentic_org_core_state.sql | 22 +++ .../src/cockroach-event-ingestion-store.ts | 141 ++++++++++++++ .../state-cockroach/src/cockroach-schema.ts | 32 ++++ .../packages/state-cockroach/src/index.ts | 8 + .../cockroach-event-ingestion-store.test.ts | 92 ++++++++++ .../test/cockroach-schema.test.ts | 5 + .../state/src/event-ingestion-store.ts | 84 +++++++++ .../packages/state/src/index.ts | 12 ++ openspec/specs/agentic-organization/spec.md | 45 ++++- 19 files changed, 824 insertions(+), 46 deletions(-) create mode 100644 agentic-organization/packages/domain/src/reaction-plan.ts create mode 100644 agentic-organization/packages/runtime/src/event-ingestion.ts create mode 100644 agentic-organization/packages/runtime/test/event-ingestion.test.ts create mode 100644 agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts create mode 100644 agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts create mode 100644 agentic-organization/packages/state/src/event-ingestion-store.ts diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index b250cb2f3b..81937eaaa8 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -32,6 +32,9 @@ send_supervisor_signal -> outbox publisher -> NATS JetStream event publisher adapter -> NATS subject contract + -> event ingestion processor + -> inbox receipt / consumer dedupe + -> persisted reaction plans -> LGTM span attributes -> supervisor triage reaction plan ``` @@ -117,6 +120,15 @@ Hermes runs, MCP calls, and UI evidence. the publish succeeds. - The NATS adapter publishes canonical JSON envelopes with typed headers and event IDs as message IDs for idempotent JetStream publication. +- The event ingestion processor accepts decoded canonical envelopes, + dedupes them by event ID plus consumer name, evaluates automation + rules once, rejects same-event payload hash conflicts, and persists + reaction plans through a store boundary that durable adapters can make + transactional. +- The Cockroach adapter now declares inbox receipt and reaction plan + tables plus a SQL-backed event-ingestion store. This is still behind a + generic state port; live NATS consumers are not hardcoded into the + adapter. - Duplicate commands with the same idempotency key and request hash replay the stored result. - Duplicate commands with the same idempotency key and a different @@ -128,11 +140,11 @@ Hermes runs, MCP calls, and UI evidence. ## Next Slice -The next slice should add inbox/consumer dedupe before automation starts -performing side effects from NATS events. After that, wire the outbox -publisher into a worker host and add a transactional durable-state -adapter integration test using CockroachDB as the first cluster-backed -implementation once a local/dev connection is available. +The next slice should wire an in-process worker host that composes the +outbox publisher and event ingestion processor behind explicit ports. +After that, add a transactional durable-state adapter integration test +using CockroachDB as the first cluster-backed implementation once a +local/dev connection is available. Do not make the next slice a pile of bespoke request commands. Build the generic supervisor triage lifecycle first, then let specialized diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 138699f70a..026cdf21e2 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -475,6 +475,16 @@ deterministic `idempotencyKey`. External side effects must either be natively idempotent or wrapped by a command that stores the external request/result. +The first executable runtime slice implements this as an event ingestion +processor before a live NATS consumer exists. A transport adapter decodes +the canonical envelope, calls the processor, and the processor checks +the inbox receipt, evaluates rules, and persists the receipt plus +reaction plans through one store operation. Durable adapters should +implement that operation transactionally so a saved receipt cannot +silently suppress a reaction plan that failed to persist. The processor +also compares payload hashes for repeated `eventId + consumerName` +pairs; conflicting payloads are not treated as normal duplicates. + ### Stream and Consumer Manifests Every stream and durable consumer should declare: @@ -806,7 +816,8 @@ other work. hat-system. 6. Add NATS outbox publisher and one consumer after command tests pass. 7. Add inbox/consumer dedupe before any NATS-driven automation performs - side effects. + side effects. The first package-level processor and Cockroach adapter + now exist; the live NATS consumer host is still pending. 8. Add the first rule catalog and reaction executor for ready work, review staffing, QA staffing, blocker escalation, and late run incidents. diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md index cb5cb1c1aa..e1ad02ba82 100644 --- a/agentic-organization/packages/README.md +++ b/agentic-organization/packages/README.md @@ -32,6 +32,9 @@ supervisor-chain signal command -> outbox publisher -> NATS JetStream event publisher adapter -> NATS subject / telemetry contract + -> event ingestion processor + -> inbox receipt / consumer dedupe + -> persisted reaction plans -> automation reaction plan ``` @@ -66,6 +69,16 @@ package implements that publisher port and is the only package in this slice that knows about NATS headers, message IDs, and JSON transport payloads. State adapters must not import messaging adapters. +The event ingestion processor owns the generic consume loop after a +transport adapter has decoded a canonical event envelope. It checks an +inbox receipt before evaluating rules, records the receipt and generated +reaction plans through one store operation, and returns duplicate +without re-running rules when the same event reaches the same consumer +again. If the same event ID reaches the same consumer with a different +payload hash, the processor returns a payload-conflict outcome instead +of hiding the drift. Live NATS consumers will bind to this processor +later. + ## Validation Run the package tests from `agentic-organization/`: diff --git a/agentic-organization/packages/domain/src/index.ts b/agentic-organization/packages/domain/src/index.ts index f4821b78a8..676446defe 100644 --- a/agentic-organization/packages/domain/src/index.ts +++ b/agentic-organization/packages/domain/src/index.ts @@ -13,6 +13,13 @@ export { type CreateAgenticEventEnvelopeInput, } from "./event-envelope.ts"; export { WorkItemState, assertWorkItemTransition, createInitialWorkItemState } from "./work-item-state-machine.ts"; +export { + ReactionPlanActionType, + ReactionPlanReason, + ReactionPlanStatus, + RequiredHat, + type ReactionPlanAction, +} from "./reaction-plan.ts"; export { SupervisorChainLevel, SupervisorSignalStatus, diff --git a/agentic-organization/packages/domain/src/reaction-plan.ts b/agentic-organization/packages/domain/src/reaction-plan.ts new file mode 100644 index 0000000000..902ba67b0f --- /dev/null +++ b/agentic-organization/packages/domain/src/reaction-plan.ts @@ -0,0 +1,44 @@ +import type { SupervisorChainLevel } from "./supervisor-communication.ts"; + +export const ReactionPlanActionType = { + CreateSupervisorTriage: "create_supervisor_triage", + RequestReviewGate: "request_review_gate", +} as const; + +export type ReactionPlanActionType = (typeof ReactionPlanActionType)[keyof typeof ReactionPlanActionType]; + +export const RequiredHat = { + CSuite: "c_suite", + Director: "director", + EngineeringManager: "engineering_manager", + ExecutiveBoard: "executive_board", + Reviewer: "reviewer", +} as const; + +export type RequiredHat = (typeof RequiredHat)[keyof typeof RequiredHat]; + +export const ReactionPlanReason = { + SupervisorSignalNeedsTriage: "supervisor signal needs triage", + WorkItemEnteredReadyState: "work item entered ready state", +} as const; + +export type ReactionPlanReason = (typeof ReactionPlanReason)[keyof typeof ReactionPlanReason]; + +export const ReactionPlanStatus = { + Planned: "planned", +} as const; + +export type ReactionPlanStatus = (typeof ReactionPlanStatus)[keyof typeof ReactionPlanStatus]; + +export type ReactionPlanAction = { + actionType: ReactionPlanActionType; + triggerEventId: string; + organizationId: string; + projectId: string; + teamId?: string; + workItemId: string; + supervisorSignalId?: string; + targetLevel?: SupervisorChainLevel; + requiredHat: RequiredHat; + reason: ReactionPlanReason; +}; diff --git a/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts index f52300f73f..14c1a384dc 100644 --- a/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts +++ b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts @@ -40,7 +40,14 @@ describe("package dependency boundaries", () => { { packageName: PackageBoundaryRule.StateAdapter, sourceGlob: "state-cockroach/src/**/*.ts", - forbiddenImportFragments: ["../../messaging", "../messaging", "nats", "jetstream"], + forbiddenImportFragments: [ + "../../messaging", + "../messaging", + "../../runtime", + "../runtime", + "nats", + "jetstream", + ], }, ], }); diff --git a/agentic-organization/packages/runtime/src/event-ingestion.ts b/agentic-organization/packages/runtime/src/event-ingestion.ts new file mode 100644 index 0000000000..98810ad803 --- /dev/null +++ b/agentic-organization/packages/runtime/src/event-ingestion.ts @@ -0,0 +1,91 @@ +import type { AgenticEventEnvelope } from "../../domain/src/index.ts"; +import { ReactionPlanStatus } from "../../domain/src/index.ts"; +import { + EventIngestionOutcomeStatus, + type EventIngestionStore, + type InboundEventConsumerName, + type InboxReceiptLookup, + type InboxReceiptRecord, + type ReactionPlanRecord, +} from "../../state/src/index.ts"; +import type { ReactionPlanAction } from "./reaction-plan.ts"; + +export type EventRuleEvaluator = (envelope: AgenticEventEnvelope) => readonly ReactionPlanAction[]; + +export type EventPayloadHashCalculator = (envelope: AgenticEventEnvelope) => string; + +export type CreateEventIngestionProcessorInput = { + store: EventIngestionStore; + evaluateRules: EventRuleEvaluator; + consumerName: InboundEventConsumerName; + calculatePayloadHash: EventPayloadHashCalculator; + now: () => string; + createId: (prefix: string) => string; +}; + +export type IngestEventInput = { + envelope: AgenticEventEnvelope; +}; + +export type EventIngestionResult = { + status: EventIngestionOutcomeStatus; + reactionPlans: readonly ReactionPlanRecord[]; +}; + +export type EventIngestionProcessor = { + ingest: (input: IngestEventInput) => Promise; +}; + +export function createEventIngestionProcessor(input: CreateEventIngestionProcessorInput): EventIngestionProcessor { + return { + ingest: async ({ envelope }) => { + const lookup: InboxReceiptLookup = { + eventId: envelope.eventId, + consumerName: input.consumerName, + }; + const existingReceipt = await input.store.findInboxReceipt(lookup); + const payloadHash = input.calculatePayloadHash(envelope); + + if (existingReceipt !== undefined) { + if (existingReceipt.payloadHash !== payloadHash) { + return { + status: EventIngestionOutcomeStatus.PayloadConflict, + reactionPlans: [], + }; + } + + return { + status: EventIngestionOutcomeStatus.Duplicate, + reactionPlans: [], + }; + } + + const observedAt = input.now(); + const receipt: InboxReceiptRecord = { + ...lookup, + firstSeenAt: observedAt, + payloadHash, + }; + + const reactionPlans = input.evaluateRules(envelope).map((action) => ({ + reactionPlanId: input.createId("reaction-plan"), + consumerName: input.consumerName, + createdAt: observedAt, + status: ReactionPlanStatus.Planned, + action, + })); + + await input.store.recordEventProcessingOutcome({ + receipt, + reactionPlans, + processedAt: observedAt, + result: EventIngestionOutcomeStatus.Processed, + }); + + return { + status: EventIngestionOutcomeStatus.Processed, + reactionPlans, + }; + }, + }; +} diff --git a/agentic-organization/packages/runtime/src/index.ts b/agentic-organization/packages/runtime/src/index.ts index d531ba46e4..fba7a3e308 100644 --- a/agentic-organization/packages/runtime/src/index.ts +++ b/agentic-organization/packages/runtime/src/index.ts @@ -1,3 +1,12 @@ +export { + createEventIngestionProcessor, + type CreateEventIngestionProcessorInput, + type EventIngestionProcessor, + type EventIngestionResult, + type EventPayloadHashCalculator, + type EventRuleEvaluator, + type IngestEventInput, +} from "./event-ingestion.ts"; export { ReactionPlanActionType, ReactionPlanReason, diff --git a/agentic-organization/packages/runtime/src/reaction-plan.ts b/agentic-organization/packages/runtime/src/reaction-plan.ts index 091324ca45..b8ce268d7c 100644 --- a/agentic-organization/packages/runtime/src/reaction-plan.ts +++ b/agentic-organization/packages/runtime/src/reaction-plan.ts @@ -1,41 +1,14 @@ -import { AgenticEventType, SupervisorChainLevel, type AgenticEventEnvelope } from "../../domain/src/index.ts"; - -export const ReactionPlanActionType = { - CreateSupervisorTriage: "create_supervisor_triage", - RequestReviewGate: "request_review_gate", -} as const; - -export type ReactionPlanActionType = (typeof ReactionPlanActionType)[keyof typeof ReactionPlanActionType]; - -export const RequiredHat = { - CSuite: "c_suite", - Director: "director", - EngineeringManager: "engineering_manager", - ExecutiveBoard: "executive_board", - Reviewer: "reviewer", -} as const; - -export type RequiredHat = (typeof RequiredHat)[keyof typeof RequiredHat]; - -export const ReactionPlanReason = { - SupervisorSignalNeedsTriage: "supervisor signal needs triage", - WorkItemEnteredReadyState: "work item entered ready state", -} as const; - -export type ReactionPlanReason = (typeof ReactionPlanReason)[keyof typeof ReactionPlanReason]; - -export type ReactionPlanAction = { - actionType: ReactionPlanActionType; - triggerEventId: string; - organizationId: string; - projectId: string; - teamId?: string; - workItemId: string; - supervisorSignalId?: string; - targetLevel?: SupervisorChainLevel; - requiredHat: RequiredHat; - reason: ReactionPlanReason; -}; +import { + AgenticEventType, + ReactionPlanActionType, + ReactionPlanReason, + RequiredHat, + SupervisorChainLevel, + type AgenticEventEnvelope, + type ReactionPlanAction, +} from "../../domain/src/index.ts"; + +export { ReactionPlanActionType, ReactionPlanReason, RequiredHat, type ReactionPlanAction }; type SupervisorSignalSentPayload = { targetHatAssignmentId: string; diff --git a/agentic-organization/packages/runtime/test/event-ingestion.test.ts b/agentic-organization/packages/runtime/test/event-ingestion.test.ts new file mode 100644 index 0000000000..2c7f47e2f5 --- /dev/null +++ b/agentic-organization/packages/runtime/test/event-ingestion.test.ts @@ -0,0 +1,172 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + ReactionPlanStatus, + SupervisorChainLevel, + SupervisorSignalStatus, + SupervisorSignalToolType, + createAgenticEventEnvelope, +} from "../../domain/src/index.ts"; +import { + EventIngestionOutcomeStatus, + InboundEventConsumerName, + createInMemoryEventIngestionStore, +} from "../../state/src/index.ts"; +import { + ReactionPlanActionType, + ReactionPlanReason, + RequiredHat, + createEventIngestionProcessor, + evaluateV0AutomationRules, +} from "../src/index.ts"; + +describe("event ingestion processor", () => { + test("records an inbox receipt and persists reaction plans for a new event", async () => { + const store = createInMemoryEventIngestionStore(); + const processor = createEventIngestionProcessor({ + store, + evaluateRules: evaluateV0AutomationRules, + consumerName: InboundEventConsumerName.V0AutomationPlanner, + calculatePayloadHash: (envelope) => `hash-${envelope.eventId}`, + now: () => "2026-05-25T22:00:00.000Z", + createId: (prefix) => `${prefix}-001`, + }); + + const result = await processor.ingest({ + envelope: createSupervisorSignalEnvelope(), + }); + + equal(result.status, EventIngestionOutcomeStatus.Processed); + equal(result.reactionPlans.length, 1); + deepEqual(store.snapshot.inboxReceipts, [ + { + eventId: "evt-supervisor-signal-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + firstSeenAt: "2026-05-25T22:00:00.000Z", + processedAt: "2026-05-25T22:00:00.000Z", + payloadHash: "hash-evt-supervisor-signal-001", + result: EventIngestionOutcomeStatus.Processed, + }, + ]); + deepEqual(store.snapshot.reactionPlans, [ + { + reactionPlanId: "reaction-plan-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + createdAt: "2026-05-25T22:00:00.000Z", + status: ReactionPlanStatus.Planned, + action: { + actionType: ReactionPlanActionType.CreateSupervisorTriage, + triggerEventId: "evt-supervisor-signal-001", + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + supervisorSignalId: "supervisor-signal-001", + targetLevel: SupervisorChainLevel.Manager, + requiredHat: RequiredHat.EngineeringManager, + reason: ReactionPlanReason.SupervisorSignalNeedsTriage, + }, + }, + ]); + }); + + test("dedupes replayed events before rule evaluation side effects", async () => { + const store = createInMemoryEventIngestionStore(); + let evaluationCount = 0; + const processor = createEventIngestionProcessor({ + store, + evaluateRules: (envelope) => { + evaluationCount += 1; + return evaluateV0AutomationRules(envelope); + }, + consumerName: InboundEventConsumerName.V0AutomationPlanner, + calculatePayloadHash: (eventEnvelope) => `hash-${eventEnvelope.eventId}`, + now: () => "2026-05-25T22:00:00.000Z", + createId: (prefix) => `${prefix}-${evaluationCount}`, + }); + + const envelope = createSupervisorSignalEnvelope(); + const firstResult = await processor.ingest({ + envelope, + }); + const replayResult = await processor.ingest({ + envelope, + }); + + equal(firstResult.status, EventIngestionOutcomeStatus.Processed); + equal(replayResult.status, EventIngestionOutcomeStatus.Duplicate); + equal(evaluationCount, 1); + equal(store.snapshot.inboxReceipts.length, 1); + equal(store.snapshot.reactionPlans.length, 1); + }); + + test("rejects same event ID with a different payload hash", async () => { + const store = createInMemoryEventIngestionStore(); + let evaluationCount = 0; + const processor = createEventIngestionProcessor({ + store, + evaluateRules: (envelope) => { + evaluationCount += 1; + return evaluateV0AutomationRules(envelope); + }, + consumerName: InboundEventConsumerName.V0AutomationPlanner, + calculatePayloadHash: (eventEnvelope) => `hash-${eventEnvelope.payload.title}`, + now: () => "2026-05-25T22:00:00.000Z", + createId: (prefix) => `${prefix}-${evaluationCount}`, + }); + + const firstResult = await processor.ingest({ + envelope: createSupervisorSignalEnvelope(), + }); + const conflictResult = await processor.ingest({ + envelope: createSupervisorSignalEnvelope("Different payload"), + }); + + equal(firstResult.status, EventIngestionOutcomeStatus.Processed); + equal(conflictResult.status, EventIngestionOutcomeStatus.PayloadConflict); + equal(evaluationCount, 1); + equal(store.snapshot.inboxReceipts.length, 1); + equal(store.snapshot.reactionPlans.length, 1); + }); +}); + +function createSupervisorSignalEnvelope(title = "Blocked on scoped NATS publisher") { + return createAgenticEventEnvelope({ + eventId: "evt-supervisor-signal-001", + eventType: AgenticEventType.SupervisorSignalSent, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-supervisor-signal-001", + correlationId: "corr-supervisor-signal-001", + causationId: "cause-team-work-001", + traceId: "trace-supervisor-signal-001", + idempotencyKey: "idem-supervisor-signal-001", + }, + payload: { + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatAssignmentId: "hat-assignment-em-001", + toolType: SupervisorSignalToolType.ReportBlocker, + status: SupervisorSignalStatus.Sent, + title, + }, + }); +} diff --git a/agentic-organization/packages/state-cockroach/migrations/0001_agentic_org_core_state.sql b/agentic-organization/packages/state-cockroach/migrations/0001_agentic_org_core_state.sql index 6cbc38010b..df24b2470c 100644 --- a/agentic-organization/packages/state-cockroach/migrations/0001_agentic_org_core_state.sql +++ b/agentic-organization/packages/state-cockroach/migrations/0001_agentic_org_core_state.sql @@ -52,6 +52,28 @@ CREATE TABLE IF NOT EXISTS agentic_org_outbox_events ( published_at TIMESTAMPTZ ); +CREATE TABLE IF NOT EXISTS agentic_org_inbox_receipts ( + event_id STRING NOT NULL, + consumer_name STRING NOT NULL, + first_seen_at TIMESTAMPTZ NOT NULL, + processed_at TIMESTAMPTZ, + payload_hash STRING NOT NULL, + result STRING, + PRIMARY KEY (event_id, consumer_name) +); + +CREATE TABLE IF NOT EXISTS agentic_org_reaction_plans ( + reaction_plan_id STRING PRIMARY KEY, + consumer_name STRING NOT NULL, + created_at TIMESTAMPTZ NOT NULL, + status STRING NOT NULL, + trigger_event_id STRING NOT NULL, + organization_id STRING NOT NULL, + project_id STRING NOT NULL, + work_item_id STRING NOT NULL, + action_json JSONB NOT NULL +); + CREATE TABLE IF NOT EXISTS agentic_org_idempotency_records ( idempotency_key STRING PRIMARY KEY, request_hash STRING NOT NULL, diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts b/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts new file mode 100644 index 0000000000..20a489c7ab --- /dev/null +++ b/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts @@ -0,0 +1,141 @@ +import type { EventIngestionStore, InboxReceiptRecord, ReactionPlanRecord } from "../../state/src/index.ts"; +import { CockroachTableName } from "./cockroach-schema.ts"; + +export const CockroachEventIngestionStoreStatement = { + FindInboxReceipt: "find_inbox_receipt", + InsertInboxReceipt: "insert_inbox_receipt", + InsertReactionPlan: "insert_reaction_plan", + MarkInboxReceiptProcessed: "mark_inbox_receipt_processed", +} as const; + +export type CockroachEventIngestionStoreStatement = + (typeof CockroachEventIngestionStoreStatement)[keyof typeof CockroachEventIngestionStoreStatement]; + +export type CockroachEventIngestionSqlStatement = { + name: CockroachEventIngestionStoreStatement; + sql: string; + parameters: readonly unknown[]; +}; + +export type CockroachEventIngestionSqlResult> = { + rows: readonly Row[]; +}; + +export type CockroachEventIngestionSqlExecutor = { + execute: >( + statement: CockroachEventIngestionSqlStatement, + ) => Promise>; +}; + +export type CreateCockroachEventIngestionStoreInput = { + executor: CockroachEventIngestionSqlExecutor; +}; + +export function createCockroachEventIngestionStore( + input: CreateCockroachEventIngestionStoreInput, +): EventIngestionStore { + return { + findInboxReceipt: async (lookup) => { + const result = await input.executor.execute({ + name: CockroachEventIngestionStoreStatement.FindInboxReceipt, + sql: CockroachEventIngestionStoreSql.FindInboxReceipt, + parameters: [lookup.eventId, lookup.consumerName], + }); + const row = result.rows[0]; + + if (row === undefined) { + return undefined; + } + + return { + eventId: row.event_id, + consumerName: row.consumer_name, + firstSeenAt: row.first_seen_at, + payloadHash: row.payload_hash, + ...(row.processed_at === undefined ? {} : { processedAt: row.processed_at }), + ...(row.result === undefined ? {} : { result: row.result }), + }; + }, + recordEventProcessingOutcome: async (outcome) => { + const receipt = outcome.receipt; + await input.executor.execute({ + name: CockroachEventIngestionStoreStatement.InsertInboxReceipt, + sql: CockroachEventIngestionStoreSql.InsertInboxReceipt, + parameters: [receipt.eventId, receipt.consumerName, receipt.firstSeenAt, receipt.payloadHash], + }); + + for (const reactionPlan of outcome.reactionPlans) { + await input.executor.execute({ + name: CockroachEventIngestionStoreStatement.InsertReactionPlan, + sql: CockroachEventIngestionStoreSql.InsertReactionPlan, + parameters: [ + reactionPlan.reactionPlanId, + reactionPlan.consumerName, + reactionPlan.createdAt, + reactionPlan.status, + reactionPlan.action.triggerEventId, + reactionPlan.action.organizationId, + reactionPlan.action.projectId, + reactionPlan.action.workItemId, + reactionPlan.action, + ], + }); + } + + await input.executor.execute({ + name: CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed, + sql: CockroachEventIngestionStoreSql.MarkInboxReceiptProcessed, + parameters: [receipt.eventId, receipt.consumerName, outcome.processedAt, outcome.result], + }); + }, + }; +} + +type InboxReceiptRow = { + event_id: string; + consumer_name: InboxReceiptRecord["consumerName"]; + first_seen_at: string; + processed_at?: string; + payload_hash: string; + result?: InboxReceiptRecord["result"]; +}; + +const CockroachEventIngestionStoreSql = { + FindInboxReceipt: ` + SELECT event_id, consumer_name, first_seen_at, processed_at, payload_hash, result + FROM ${CockroachTableName.InboxReceipts} + WHERE event_id = $1 + AND consumer_name = $2 + `, + InsertInboxReceipt: ` + INSERT INTO ${CockroachTableName.InboxReceipts} ( + event_id, + consumer_name, + first_seen_at, + payload_hash + ) VALUES ($1, $2, $3, $4) + `, + InsertReactionPlan: ` + INSERT INTO ${CockroachTableName.ReactionPlans} ( + reaction_plan_id, + consumer_name, + created_at, + status, + trigger_event_id, + organization_id, + project_id, + work_item_id, + action_json + ) VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) + `, + MarkInboxReceiptProcessed: ` + UPDATE ${CockroachTableName.InboxReceipts} + SET + processed_at = $3, + result = $4 + WHERE event_id = $1 + AND consumer_name = $2 + `, +} as const; + +export type { ReactionPlanRecord }; diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-schema.ts b/agentic-organization/packages/state-cockroach/src/cockroach-schema.ts index b1e0381d8d..042d2106cb 100644 --- a/agentic-organization/packages/state-cockroach/src/cockroach-schema.ts +++ b/agentic-organization/packages/state-cockroach/src/cockroach-schema.ts @@ -9,7 +9,9 @@ export const CockroachTableName = { WorkItems: "agentic_org_work_items", SupervisorSignals: "agentic_org_supervisor_signals", AuditEvents: "agentic_org_audit_events", + InboxReceipts: "agentic_org_inbox_receipts", OutboxEvents: "agentic_org_outbox_events", + ReactionPlans: "agentic_org_reaction_plans", IdempotencyRecords: "agentic_org_idempotency_records", } as const; @@ -28,6 +30,8 @@ export function createCockroachCoreStateMigration(): CockroachSchemaMigration { createSupervisorSignalsTableSql(), createAuditEventsTableSql(), createOutboxEventsTableSql(), + createInboxReceiptsTableSql(), + createReactionPlansTableSql(), createIdempotencyRecordsTableSql(), ].join("\n\n"), }; @@ -107,3 +111,31 @@ CREATE TABLE IF NOT EXISTS ${CockroachTableName.IdempotencyRecords} ( result_json JSONB NOT NULL );`.trim(); } + +function createInboxReceiptsTableSql(): string { + return ` +CREATE TABLE IF NOT EXISTS ${CockroachTableName.InboxReceipts} ( + event_id STRING NOT NULL, + consumer_name STRING NOT NULL, + first_seen_at TIMESTAMPTZ NOT NULL, + processed_at TIMESTAMPTZ, + payload_hash STRING NOT NULL, + result STRING, + PRIMARY KEY (event_id, consumer_name) +);`.trim(); +} + +function createReactionPlansTableSql(): string { + return ` +CREATE TABLE IF NOT EXISTS ${CockroachTableName.ReactionPlans} ( + reaction_plan_id STRING PRIMARY KEY, + consumer_name STRING NOT NULL, + created_at TIMESTAMPTZ NOT NULL, + status STRING NOT NULL, + trigger_event_id STRING NOT NULL, + organization_id STRING NOT NULL, + project_id STRING NOT NULL, + work_item_id STRING NOT NULL, + action_json JSONB NOT NULL +);`.trim(); +} diff --git a/agentic-organization/packages/state-cockroach/src/index.ts b/agentic-organization/packages/state-cockroach/src/index.ts index bf795c6d1b..f31bcadcbe 100644 --- a/agentic-organization/packages/state-cockroach/src/index.ts +++ b/agentic-organization/packages/state-cockroach/src/index.ts @@ -14,6 +14,14 @@ export { type CockroachOutboxSqlStatement, type CreateCockroachOutboxEventSourceInput, } from "./cockroach-outbox-event-source.ts"; +export { + CockroachEventIngestionStoreStatement, + createCockroachEventIngestionStore, + type CockroachEventIngestionSqlExecutor, + type CockroachEventIngestionSqlResult, + type CockroachEventIngestionSqlStatement, + type CreateCockroachEventIngestionStoreInput, +} from "./cockroach-event-ingestion-store.ts"; export { CockroachCoreStateMigrationName, CockroachTableName, diff --git a/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts b/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts new file mode 100644 index 0000000000..150ec7de13 --- /dev/null +++ b/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts @@ -0,0 +1,92 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { ReactionPlanActionType, ReactionPlanReason, ReactionPlanStatus, RequiredHat } from "../../domain/src/index.ts"; +import { + EventIngestionOutcomeStatus, + InboundEventConsumerName, + type ReactionPlanRecord, +} from "../../state/src/index.ts"; +import { + CockroachEventIngestionStoreStatement, + createCockroachEventIngestionStore, + type CockroachEventIngestionSqlExecutor, + type CockroachEventIngestionSqlStatement, +} from "../src/cockroach-event-ingestion-store.ts"; + +describe("cockroach event ingestion store", () => { + test("implements inbox receipt and reaction plan persistence behind a SQL executor", async () => { + const executor = createRecordingExecutor(); + const store = createCockroachEventIngestionStore({ + executor, + }); + + equal( + await store.findInboxReceipt({ + eventId: "evt-supervisor-signal-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + }), + undefined, + ); + await store.recordEventProcessingOutcome({ + receipt: { + eventId: "evt-supervisor-signal-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + firstSeenAt: "2026-05-25T22:00:00.000Z", + payloadHash: "hash-evt-supervisor-signal-001", + }, + reactionPlans: [createReactionPlanRecord()], + processedAt: "2026-05-25T22:00:00.000Z", + result: EventIngestionOutcomeStatus.Processed, + }); + + deepEqual( + executor.statements.map((statement) => statement.name), + [ + CockroachEventIngestionStoreStatement.FindInboxReceipt, + CockroachEventIngestionStoreStatement.InsertInboxReceipt, + CockroachEventIngestionStoreStatement.InsertReactionPlan, + CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed, + ], + ); + }); +}); + +type RecordingCockroachEventIngestionSqlExecutor = CockroachEventIngestionSqlExecutor & { + statements: CockroachEventIngestionSqlStatement[]; +}; + +function createRecordingExecutor(): RecordingCockroachEventIngestionSqlExecutor { + const statements: CockroachEventIngestionSqlStatement[] = []; + + return { + statements, + execute: async (statement) => { + statements.push(statement); + + return { + rows: [], + }; + }, + }; +} + +function createReactionPlanRecord(): ReactionPlanRecord { + return { + reactionPlanId: "reaction-plan-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + createdAt: "2026-05-25T22:00:00.000Z", + status: ReactionPlanStatus.Planned, + action: { + actionType: ReactionPlanActionType.CreateSupervisorTriage, + triggerEventId: "evt-supervisor-signal-001", + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + supervisorSignalId: "supervisor-signal-001", + requiredHat: RequiredHat.EngineeringManager, + reason: ReactionPlanReason.SupervisorSignalNeedsTriage, + }, + }; +} diff --git a/agentic-organization/packages/state-cockroach/test/cockroach-schema.test.ts b/agentic-organization/packages/state-cockroach/test/cockroach-schema.test.ts index eb77a7fc97..25342193b6 100644 --- a/agentic-organization/packages/state-cockroach/test/cockroach-schema.test.ts +++ b/agentic-organization/packages/state-cockroach/test/cockroach-schema.test.ts @@ -16,12 +16,17 @@ describe("cockroach core state schema", () => { ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.SupervisorSignals}`)); ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.AuditEvents}`)); ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.OutboxEvents}`)); + ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.InboxReceipts}`)); + ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.ReactionPlans}`)); ok(migration.sql.includes(`CREATE TABLE IF NOT EXISTS ${CockroachTableName.IdempotencyRecords}`)); ok(migration.sql.includes("trace_id STRING NOT NULL")); ok(migration.sql.includes("correlation_id STRING NOT NULL")); ok(migration.sql.includes("envelope_json JSONB NOT NULL")); ok(migration.sql.includes("claimed_at TIMESTAMPTZ")); ok(migration.sql.includes("claim_expires_at TIMESTAMPTZ")); + ok(migration.sql.includes("PRIMARY KEY (event_id, consumer_name)")); + ok(migration.sql.includes("status STRING NOT NULL")); + ok(migration.sql.includes("action_json JSONB NOT NULL")); ok(migration.sql.includes("result_json JSONB NOT NULL")); }); }); diff --git a/agentic-organization/packages/state/src/event-ingestion-store.ts b/agentic-organization/packages/state/src/event-ingestion-store.ts new file mode 100644 index 0000000000..68ec2ae570 --- /dev/null +++ b/agentic-organization/packages/state/src/event-ingestion-store.ts @@ -0,0 +1,84 @@ +import type { ReactionPlanAction, ReactionPlanStatus } from "../../domain/src/index.ts"; + +export const InboundEventConsumerName = { + V0AutomationPlanner: "v0_automation_planner", +} as const; + +export type InboundEventConsumerName = (typeof InboundEventConsumerName)[keyof typeof InboundEventConsumerName]; + +export const EventIngestionOutcomeStatus = { + Duplicate: "duplicate", + PayloadConflict: "payload_conflict", + Processed: "processed", +} as const; + +export type EventIngestionOutcomeStatus = + (typeof EventIngestionOutcomeStatus)[keyof typeof EventIngestionOutcomeStatus]; + +export type InboxReceiptLookup = { + eventId: string; + consumerName: InboundEventConsumerName; +}; + +export type InboxReceiptRecord = InboxReceiptLookup & { + firstSeenAt: string; + payloadHash: string; + processedAt?: string; + result?: EventIngestionOutcomeStatus; +}; + +export type ReactionPlanRecord = { + reactionPlanId: string; + consumerName: InboundEventConsumerName; + createdAt: string; + status: ReactionPlanStatus; + action: ReactionPlanAction; +}; + +export type RecordEventProcessingOutcomeInput = { + receipt: InboxReceiptRecord; + reactionPlans: readonly ReactionPlanRecord[]; + processedAt: string; + result: EventIngestionOutcomeStatus; +}; + +export type EventIngestionStore = { + findInboxReceipt: (lookup: InboxReceiptLookup) => Promise; + recordEventProcessingOutcome: (input: RecordEventProcessingOutcomeInput) => Promise; +}; + +export type InMemoryEventIngestionStoreSnapshot = { + readonly inboxReceipts: readonly InboxReceiptRecord[]; + readonly reactionPlans: readonly ReactionPlanRecord[]; +}; + +export type InMemoryEventIngestionStore = EventIngestionStore & { + readonly snapshot: InMemoryEventIngestionStoreSnapshot; +}; + +export function createInMemoryEventIngestionStore(): InMemoryEventIngestionStore { + const inboxReceipts = new Map(); + const reactionPlans: ReactionPlanRecord[] = []; + + return { + get snapshot() { + return { + inboxReceipts: [...inboxReceipts.values()], + reactionPlans, + }; + }, + findInboxReceipt: async (lookup) => inboxReceipts.get(createInboxReceiptKey(lookup)), + recordEventProcessingOutcome: async (input) => { + inboxReceipts.set(createInboxReceiptKey(input.receipt), { + ...input.receipt, + processedAt: input.processedAt, + result: input.result, + }); + reactionPlans.push(...input.reactionPlans); + }, + }; +} + +function createInboxReceiptKey(input: InboxReceiptLookup): string { + return `${input.consumerName}:${input.eventId}`; +} diff --git a/agentic-organization/packages/state/src/index.ts b/agentic-organization/packages/state/src/index.ts index 1b1611aa83..c91de4b605 100644 --- a/agentic-organization/packages/state/src/index.ts +++ b/agentic-organization/packages/state/src/index.ts @@ -3,6 +3,18 @@ export { type InMemoryOrganizationStoreFactory, type InMemoryOrganizationStoreSnapshot, } from "./in-memory-organization-store.ts"; +export { + EventIngestionOutcomeStatus, + InboundEventConsumerName, + createInMemoryEventIngestionStore, + type EventIngestionStore, + type InboxReceiptLookup, + type InboxReceiptRecord, + type InMemoryEventIngestionStore, + type InMemoryEventIngestionStoreSnapshot, + type ReactionPlanRecord, + type RecordEventProcessingOutcomeInput, +} from "./event-ingestion-store.ts"; export type { ClaimUnpublishedOutboxEventsInput, MarkOutboxEventPublishedInput, diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index ae7f93e7e5..a51300d259 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -57,7 +57,8 @@ Organization state only by calling Organization commands. Temporal, Drizzle, Postgres, or other runtime clients - **AND** a violation fails the test suite before the boundary can drift - **AND** state adapter source files are checked for forbidden imports - of messaging, NATS, JetStream, or other event transport clients + of runtime implementation packages, messaging, NATS, JetStream, or + other event transport clients #### Scenario: Tests are kept out of production source trees @@ -186,6 +187,48 @@ and a concrete event-publisher adapter. correlation ID, causation ID, trace ID, idempotency key, and outbox event ID +### Requirement: Inbound events are deduped before automation + +Organization event consumers MUST record inbox receipts before +automation side effects and MUST persist reaction plans instead of +executing privileged work directly. + +#### Scenario: New event is ingested by an automation consumer + +- **WHEN** a decoded canonical event envelope reaches the runtime event + ingestion processor +- **THEN** the processor checks for an inbox receipt by event ID and + consumer name +- **AND** a missing receipt allows rule evaluation +- **AND** the processor records the inbox receipt and generated reaction + plans through one store operation +- **AND** the reaction plans preserve the triggering event ID, target + scope, required hat, action type, and reason + +#### Scenario: Duplicate event is ingested by an automation consumer + +- **WHEN** the same event ID reaches the same consumer again +- **THEN** the processor returns a duplicate outcome +- **AND** no automation rules are re-evaluated +- **AND** no duplicate reaction plans are created + +#### Scenario: Conflicting event payload is ingested by an automation consumer + +- **WHEN** the same event ID reaches the same consumer with a different + payload hash +- **THEN** the processor returns a payload-conflict outcome +- **AND** no automation rules are re-evaluated +- **AND** no duplicate reaction plans are created + +#### Scenario: Durable state schema supports inbound event dedupe + +- **WHEN** the durable state migration contract is loaded +- **THEN** it declares inbox receipt storage keyed by event ID and + consumer name +- **AND** it declares reaction plan storage for generated automation + plans +- **AND** reaction plans include a persisted status + ### Requirement: Telemetry is complete at the event boundary Organization packages MUST expose OpenTelemetry-compatible attributes From c9f3d189e268e153ea4e83a0bbe4002cac47d473 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 20:16:53 -0400 Subject: [PATCH 13/21] feat(agentic-org): add worker composition host Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 22 +- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 53 ++- agentic-organization/packages/README.md | 16 + .../src/package-dependency-boundaries.ts | 1 + .../package-dependency-boundaries.test.ts | 25 ++ .../packages/workers/src/index.ts | 12 + .../packages/workers/src/worker-host.ts | 218 ++++++++++ .../packages/workers/test/worker-host.test.ts | 389 ++++++++++++++++++ openspec/specs/agentic-organization/spec.md | 41 ++ 9 files changed, 756 insertions(+), 21 deletions(-) create mode 100644 agentic-organization/packages/workers/src/index.ts create mode 100644 agentic-organization/packages/workers/src/worker-host.ts create mode 100644 agentic-organization/packages/workers/test/worker-host.test.ts diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index 81937eaaa8..22fda9efa9 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -35,6 +35,7 @@ send_supervisor_signal -> event ingestion processor -> inbox receipt / consumer dedupe -> persisted reaction plans + -> worker host cycle summary -> LGTM span attributes -> supervisor triage reaction plan ``` @@ -51,6 +52,7 @@ send_supervisor_signal | `@agentic-org/messaging-nats` | NATS JetStream event publisher adapter contract with canonical JSON payloads, headers, and message IDs | | `@agentic-org/observability` | OpenTelemetry/LGTM span attribute projection | | `@agentic-org/runtime` | first rule that plans triage for the target supervisor when a chain signal is sent | +| `@agentic-org/workers` | process-boundary run-once worker host that composes outbox publishing and inbound event ingestion through ports | | `@agentic-org/governance` | package dependency-boundary checks that prevent application code from importing concrete state/runtime adapters | ## NodeNext Runtime Decision @@ -129,6 +131,15 @@ Hermes runs, MCP calls, and UI evidence. tables plus a SQL-backed event-ingestion store. This is still behind a generic state port; live NATS consumers are not hardcoded into the adapter. +- The worker host now runs one bounded outbox cycle plus one bounded + inbound-ingestion cycle through explicit ports, then returns an + idle/worked/degraded summary suitable for future logs, metrics, and UI + workflow visibility. If one lane fails, the other lane still runs and + the failure is returned as typed cycle data. +- A governance test enforces that worker source does not import the + Cockroach adapter, NATS adapter, NestJS, NATS, Dapr, Temporal, + Drizzle, or Postgres clients. Worker code remains a process boundary, + not a concrete infrastructure host. - Duplicate commands with the same idempotency key and request hash replay the stored result. - Duplicate commands with the same idempotency key and a different @@ -140,11 +151,12 @@ Hermes runs, MCP calls, and UI evidence. ## Next Slice -The next slice should wire an in-process worker host that composes the -outbox publisher and event ingestion processor behind explicit ports. -After that, add a transactional durable-state adapter integration test -using CockroachDB as the first cluster-backed implementation once a -local/dev connection is available. +The next slice should add the live NATS inbound consumer adapter behind +the existing worker-host `InboundEventSource` port, keeping canonical +envelope decoding, ack decisions, and DLQ behavior outside the runtime +rule processor. After that, add a transactional durable-state adapter +integration test using CockroachDB as the first cluster-backed +implementation once a local/dev connection is available. Do not make the next slice a pile of bespoke request commands. Build the generic supervisor triage lifecycle first, then let specialized diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 026cdf21e2..37798589b7 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -166,21 +166,22 @@ calling them. ### Layer 3: State, Messaging, and Runtime Adapters -| Package | Owns | -| ---------------------------------------- | ----------------------------------------------------------------------------------------------- | -| `@agentic-org/state` | generic state-store, outbox-source, inbox, idempotency, transaction, and lease ports | -| `@agentic-org/state-cockroach` | first replaceable durable SQL implementation of state-store and outbox-source ports | -| `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | -| `@agentic-org/messaging-nats` | NATS JetStream implementation of the event publisher port, canonical JSON, headers, message IDs | -| `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | -| `@agentic-org/actors-dapr` | Dapr actor interfaces, actor implementations, reminders, actor state projection | -| `@agentic-org/mcp` | MCP schemas, tool registry, preflight checks, policy-checked tool handlers | -| `@agentic-org/hermes` | Hermes session adapter, run adapter, callback contract, run context builder | -| `@agentic-org/memory` | Hindsight adapter, hat-scoped recall/retain/reflect, memory attribution, memory health | -| `@agentic-org/k8s-hats` | generated or checked Hat, HatBinding, HatSwap, HatPolicy types, informers, projection decoding | -| `@agentic-org/openziti` | OpenZiti transport adapter, identity/config access, connectivity checks | -| `@agentic-org/credential-proxy` | credential request adapter, scoped credential use, audit hooks | -| `@agentic-org/adapters-agentic-services` | temporary wrappers around reused `agentic-services` primitives | +| Package | Owns | +| ---------------------------------------- | -------------------------------------------------------------------------------------------------------------- | +| `@agentic-org/state` | generic state-store, outbox-source, inbox, idempotency, transaction, and lease ports | +| `@agentic-org/state-cockroach` | first replaceable durable SQL implementation of state-store and outbox-source ports | +| `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | +| `@agentic-org/messaging-nats` | NATS JetStream implementation of the event publisher port, canonical JSON, headers, message IDs | +| `@agentic-org/workers` | small worker process boundary that composes outbox publishing, inbound ingestion, and schedulers through ports | +| `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | +| `@agentic-org/actors-dapr` | Dapr actor interfaces, actor implementations, reminders, actor state projection | +| `@agentic-org/mcp` | MCP schemas, tool registry, preflight checks, policy-checked tool handlers | +| `@agentic-org/hermes` | Hermes session adapter, run adapter, callback contract, run context builder | +| `@agentic-org/memory` | Hindsight adapter, hat-scoped recall/retain/reflect, memory attribution, memory health | +| `@agentic-org/k8s-hats` | generated or checked Hat, HatBinding, HatSwap, HatPolicy types, informers, projection decoding | +| `@agentic-org/openziti` | OpenZiti transport adapter, identity/config access, connectivity checks | +| `@agentic-org/credential-proxy` | credential request adapter, scoped credential use, audit hooks | +| `@agentic-org/adapters-agentic-services` | temporary wrappers around reused `agentic-services` primitives | Adapters are replaceable. The Organization should be able to run a V0 slice with in-process fakes, then swap in Temporal, Dapr, Hermes, @@ -251,6 +252,16 @@ inbox, and lease adapters must be able to perform real I/O without changing command-handler contracts. CockroachDB is the first durable SQL adapter in the cluster, not an application-layer dependency. +The first worker boundary follows the same rule. `@agentic-org/workers` +does not create NATS clients, Cockroach clients, Nest modules, Temporal +workers, or Dapr actors. It receives an outbox publisher, an inbound +event source, and an event-ingestion processor through ports, runs one +bounded cycle, and returns an idle/worked/degraded summary. A failure in +one lane is captured as typed cycle data while the other lane still gets +a chance to run. `apps/workers` will later bind those ports to real +cluster adapters and attach process concerns such as health checks, +metrics, structured logs, readiness, and graceful shutdown. + ## SOLID Rules ### Single Responsibility @@ -485,6 +496,14 @@ silently suppress a reaction plan that failed to persist. The processor also compares payload hashes for repeated `eventId + consumerName` pairs; conflicting payloads are not treated as normal duplicates. +A worker host composes that ingestion processor with the outbox +publisher but stays below the NestJS process layer. This creates a +testable boundary where later live NATS consumers can be added as +`InboundEventSource` implementations without changing rule evaluation or +reaction-plan persistence. The current source port is replayable pull +only; live NATS ack, nack, checkpoint, backoff, and DLQ behavior remains +owned by the transport adapter and `apps/workers` process host. + ### Stream and Consumer Manifests Every stream and durable consumer should declare: @@ -817,7 +836,9 @@ other work. 6. Add NATS outbox publisher and one consumer after command tests pass. 7. Add inbox/consumer dedupe before any NATS-driven automation performs side effects. The first package-level processor and Cockroach adapter - now exist; the live NATS consumer host is still pending. + now exist; the first package-level worker host composes the outbox and + inbound-ingestion loops through ports, while the live NATS consumer + adapter is still pending. 8. Add the first rule catalog and reaction executor for ready work, review staffing, QA staffing, blocker escalation, and late run incidents. diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md index e1ad02ba82..294947f830 100644 --- a/agentic-organization/packages/README.md +++ b/agentic-organization/packages/README.md @@ -17,6 +17,7 @@ host, or Kubernetes deployment is introduced. | `messaging-nats` | NATS JetStream event publisher adapter contract with canonical JSON payloads, headers, and message IDs | | `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes | | `runtime` | first event-to-automation reaction rule | +| `workers` | process-boundary worker host that composes outbox publishing and inbound ingestion through ports only | | `governance` | package dependency-boundary checks that keep core packages SOLID and adapter-free | ## Slice Rule @@ -35,6 +36,7 @@ supervisor-chain signal command -> event ingestion processor -> inbox receipt / consumer dedupe -> persisted reaction plans + -> worker host run-once cycle -> automation reaction plan ``` @@ -79,6 +81,20 @@ payload hash, the processor returns a payload-conflict outcome instead of hiding the drift. Live NATS consumers will bind to this processor later. +The worker host composes the outbox publisher and inbound event +ingestion processor behind a small run-once boundary. It does not know +about live NATS clients, CockroachDB clients, NestJS modules, Temporal, +or Dapr. Future runtime processes should provide concrete ports from the +composition layer and use the returned idle/worked/degraded cycle +summary for logs, metrics, health checks, and UI-visible workflow +telemetry. A failed outbox or inbound lane is reported as degraded +instead of hiding the failure or starving the other lane. + +`InboundEventSource` is intentionally only a replayable pull port in +this package. Live NATS ack, nack, checkpoint, backoff, and DLQ behavior +belongs in the future NATS consumer adapter so transport policy does not +leak into runtime rule evaluation. + ## Validation Run the package tests from `agentic-organization/`: diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts index 52fa4f343f..4348b41828 100644 --- a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts @@ -7,6 +7,7 @@ export const PackageBoundaryRule = { Messaging: "messaging", ProductionSource: "production_source", StateAdapter: "state_adapter", + Workers: "workers", } as const; export type PackageBoundaryRule = (typeof PackageBoundaryRule)[keyof typeof PackageBoundaryRule]; diff --git a/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts index 14c1a384dc..13ca12b1a8 100644 --- a/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts +++ b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts @@ -49,6 +49,25 @@ describe("package dependency boundaries", () => { "jetstream", ], }, + { + packageName: PackageBoundaryRule.Workers, + sourceGlob: "workers/src/**/*.ts", + forbiddenImportFragments: [ + "../../state-cockroach", + "../state-cockroach", + "../../messaging-nats", + "../messaging-nats", + "nestjs", + "@nestjs", + "nats", + "jetstream", + "dapr", + "temporal", + "drizzle", + "pg", + "postgres", + ], + }, ], }); @@ -113,6 +132,12 @@ describe("package dependency boundaries", () => { forbiddenFileSuffix: ".test.ts", reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, }, + { + packageName: PackageBoundaryRule.ProductionSource, + sourceGlob: "workers/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, ], }); diff --git a/agentic-organization/packages/workers/src/index.ts b/agentic-organization/packages/workers/src/index.ts new file mode 100644 index 0000000000..5d60c85835 --- /dev/null +++ b/agentic-organization/packages/workers/src/index.ts @@ -0,0 +1,12 @@ +export { + WorkerCycleStatus, + WorkerLane, + createOrganizationWorkerHost, + type CreateOrganizationWorkerHostInput, + type InboundEventSource, + type OrganizationWorkerHost, + type PullInboundEventsInput, + type WorkerCycleResult, + type WorkerInboundCycleSummary, + type WorkerPortFailure, +} from "./worker-host.ts"; diff --git a/agentic-organization/packages/workers/src/worker-host.ts b/agentic-organization/packages/workers/src/worker-host.ts new file mode 100644 index 0000000000..f73c20d1fb --- /dev/null +++ b/agentic-organization/packages/workers/src/worker-host.ts @@ -0,0 +1,218 @@ +import type { AgenticEventEnvelope } from "../../domain/src/index.ts"; +import { + OutboxPublishOutcomeStatus, + type OutboxPublishBatchResult, + type OutboxPublisher, +} from "../../messaging/src/index.ts"; +import type { EventIngestionProcessor } from "../../runtime/src/index.ts"; +import { EventIngestionOutcomeStatus } from "../../state/src/index.ts"; + +export const WorkerCycleStatus = { + Degraded: "degraded", + Idle: "idle", + Worked: "worked", +} as const; + +export type WorkerCycleStatus = (typeof WorkerCycleStatus)[keyof typeof WorkerCycleStatus]; + +export const WorkerLane = { + Inbound: "inbound", + Outbox: "outbox", +} as const; + +export type WorkerLane = (typeof WorkerLane)[keyof typeof WorkerLane]; + +export type PullInboundEventsInput = { + batchSize: number; +}; + +export type InboundEventSource = { + pullNextBatch: (input: PullInboundEventsInput) => Promise; +}; + +export type WorkerInboundCycleSummary = { + pulledCount: number; + processedCount: number; + duplicateCount: number; + payloadConflictCount: number; + failedCount: number; + reactionPlanCount: number; +}; + +export type WorkerPortFailure = { + lane: WorkerLane; + message: string; +}; + +export type WorkerCycleResult = { + status: WorkerCycleStatus; + outbox: OutboxPublishBatchResult | undefined; + inbound: WorkerInboundCycleSummary; + failures: readonly WorkerPortFailure[]; +}; + +export type OrganizationWorkerHost = { + runOnce: () => Promise; +}; + +export type CreateOrganizationWorkerHostInput = { + outboxPublisher: OutboxPublisher; + inboundEventSource: InboundEventSource; + eventIngestionProcessor: EventIngestionProcessor; + outboxBatchSize: number; + inboundBatchSize: number; +}; + +export function createOrganizationWorkerHost(input: CreateOrganizationWorkerHostInput): OrganizationWorkerHost { + return { + runOnce: async () => { + const failures: WorkerPortFailure[] = []; + const outbox = await publishOutboxBatch({ + outboxPublisher: input.outboxPublisher, + batchSize: input.outboxBatchSize, + failures, + }); + const inbound = await processInboundBatch({ + inboundEventSource: input.inboundEventSource, + batchSize: input.inboundBatchSize, + eventIngestionProcessor: input.eventIngestionProcessor, + failures, + }); + + return { + status: resolveWorkerCycleStatus({ + outbox, + inbound, + failures, + }), + outbox, + inbound, + failures, + }; + }, + }; +} + +type PublishOutboxBatchInput = { + outboxPublisher: OutboxPublisher; + batchSize: number; + failures: WorkerPortFailure[]; +}; + +async function publishOutboxBatch(input: PublishOutboxBatchInput): Promise { + try { + return await input.outboxPublisher.publishNextBatch({ + batchSize: input.batchSize, + }); + } catch (error) { + input.failures.push({ + lane: WorkerLane.Outbox, + message: extractErrorMessage(error), + }); + return undefined; + } +} + +type ProcessInboundBatchInput = { + inboundEventSource: InboundEventSource; + batchSize: number; + eventIngestionProcessor: EventIngestionProcessor; + failures: WorkerPortFailure[]; +}; + +async function processInboundBatch(input: ProcessInboundBatchInput): Promise { + try { + const envelopes = await input.inboundEventSource.pullNextBatch({ + batchSize: input.batchSize, + }); + + return await ingestInboundBatch({ + envelopes, + eventIngestionProcessor: input.eventIngestionProcessor, + failures: input.failures, + }); + } catch (error) { + input.failures.push({ + lane: WorkerLane.Inbound, + message: extractErrorMessage(error), + }); + return createEmptyInboundCycleSummary(0); + } +} + +type IngestInboundBatchInput = { + envelopes: readonly AgenticEventEnvelope[]; + eventIngestionProcessor: EventIngestionProcessor; + failures: WorkerPortFailure[]; +}; + +async function ingestInboundBatch(input: IngestInboundBatchInput): Promise { + const summary = createEmptyInboundCycleSummary(input.envelopes.length); + + for (const envelope of input.envelopes) { + try { + const result = await input.eventIngestionProcessor.ingest({ + envelope, + }); + + if (result.status === EventIngestionOutcomeStatus.Processed) { + summary.processedCount += 1; + } + + if (result.status === EventIngestionOutcomeStatus.Duplicate) { + summary.duplicateCount += 1; + } + + if (result.status === EventIngestionOutcomeStatus.PayloadConflict) { + summary.payloadConflictCount += 1; + } + + summary.reactionPlanCount += result.reactionPlans.length; + } catch (error) { + summary.failedCount += 1; + input.failures.push({ + lane: WorkerLane.Inbound, + message: extractErrorMessage(error), + }); + } + } + + return summary; +} + +function createEmptyInboundCycleSummary(pulledCount: number): WorkerInboundCycleSummary { + return { + pulledCount, + processedCount: 0, + duplicateCount: 0, + payloadConflictCount: 0, + failedCount: 0, + reactionPlanCount: 0, + }; +} + +type ResolveWorkerCycleStatusInput = { + outbox: OutboxPublishBatchResult | undefined; + inbound: WorkerInboundCycleSummary; + failures: readonly WorkerPortFailure[]; +}; + +function resolveWorkerCycleStatus(input: ResolveWorkerCycleStatusInput): WorkerCycleStatus { + if (input.failures.length > 0) { + return WorkerCycleStatus.Degraded; + } + + if (input.outbox?.status === OutboxPublishOutcomeStatus.Published || input.inbound.pulledCount > 0) { + return WorkerCycleStatus.Worked; + } + + return WorkerCycleStatus.Idle; +} + +function extractErrorMessage(error: unknown): string { + if (error instanceof Error) { + return error.message; + } + + return String(error); +} diff --git a/agentic-organization/packages/workers/test/worker-host.test.ts b/agentic-organization/packages/workers/test/worker-host.test.ts new file mode 100644 index 0000000000..f5c7015a17 --- /dev/null +++ b/agentic-organization/packages/workers/test/worker-host.test.ts @@ -0,0 +1,389 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + SupervisorChainLevel, + createAgenticEventEnvelope, + type AgenticEventEnvelope, + type OutboxEvent, +} from "../../domain/src/index.ts"; +import { + OutboxPublishOutcomeStatus, + createOutboxPublisher, + resolveAgenticMessagingDomain, + type EventPublication, + type OutboxPublishBatchResult, + type OutboxPublisher, +} from "../../messaging/src/index.ts"; +import { createEventIngestionProcessor, evaluateV0AutomationRules } from "../../runtime/src/index.ts"; +import { + EventIngestionOutcomeStatus, + InboundEventConsumerName, + createInMemoryEventIngestionStore, + type ReactionPlanRecord, +} from "../../state/src/index.ts"; +import { WorkerCycleStatus, WorkerLane, createOrganizationWorkerHost, type InboundEventSource } from "../src/index.ts"; + +describe("organization worker host", () => { + test("runs one bounded outbox and inbound ingestion cycle", async () => { + const inboundEnvelope = createInboundEnvelope(); + const outboxPublisher = createRecordingOutboxPublisher({ + status: OutboxPublishOutcomeStatus.Published, + attemptedCount: 1, + publishedOutboxEventIds: ["outbox-001"], + }); + const inboundEventSource = createRecordingInboundEventSource([inboundEnvelope]); + const eventIngestionProcessor = createRecordingEventIngestionProcessor(); + const workerHost = createOrganizationWorkerHost({ + outboxPublisher, + inboundEventSource, + eventIngestionProcessor, + outboxBatchSize: 25, + inboundBatchSize: 10, + }); + + const result = await workerHost.runOnce(); + + deepEqual(outboxPublisher.batchSizes, [25]); + deepEqual(inboundEventSource.batchSizes, [10]); + deepEqual(eventIngestionProcessor.ingestedEventIds, ["evt-inbound-001"]); + deepEqual(result, { + status: WorkerCycleStatus.Worked, + outbox: { + status: OutboxPublishOutcomeStatus.Published, + attemptedCount: 1, + publishedOutboxEventIds: ["outbox-001"], + }, + inbound: { + pulledCount: 1, + processedCount: 1, + duplicateCount: 0, + payloadConflictCount: 0, + failedCount: 0, + reactionPlanCount: 0, + }, + failures: [], + }); + }); + + test("composes outbox publishing and inbound ingestion without live NATS", async () => { + const publishedEvents: EventPublication[] = []; + const eventIngestionStore = createInMemoryEventIngestionStore(); + const outboxEvent = createOutboxEvent(createInboundEnvelope("evt-composed-001", createSupervisorSignalPayload())); + const workerHost = createOrganizationWorkerHost({ + outboxPublisher: createOutboxPublisher({ + outboxSource: createSingleEventOutboxSource(outboxEvent), + eventPublisher: { + publish: async (publication) => { + publishedEvents.push(publication); + }, + }, + environment: "test", + resolveDomain: resolveAgenticMessagingDomain, + now: () => "2026-05-25T20:05:00.000Z", + }), + inboundEventSource: { + pullNextBatch: async () => publishedEvents.map((publication) => publication.outboxEvent.envelope), + }, + eventIngestionProcessor: createEventIngestionProcessor({ + store: eventIngestionStore, + evaluateRules: evaluateV0AutomationRules, + consumerName: InboundEventConsumerName.V0AutomationPlanner, + calculatePayloadHash: (envelope) => JSON.stringify(envelope.payload), + now: () => "2026-05-25T20:06:00.000Z", + createId: (prefix) => `${prefix}-001`, + }), + outboxBatchSize: 25, + inboundBatchSize: 10, + }); + + const result = await workerHost.runOnce(); + + equal(result.status, WorkerCycleStatus.Worked); + equal(publishedEvents.length, 1); + equal(result.outbox?.status, OutboxPublishOutcomeStatus.Published); + equal(result.inbound.processedCount, 1); + equal(result.inbound.reactionPlanCount, 1); + equal(eventIngestionStore.snapshot.inboxReceipts.length, 1); + equal(eventIngestionStore.snapshot.reactionPlans.length, 1); + }); + + test("reports idle when no outbox or inbound work is available", async () => { + const workerHost = createOrganizationWorkerHost({ + outboxPublisher: createRecordingOutboxPublisher({ + status: OutboxPublishOutcomeStatus.Empty, + attemptedCount: 0, + publishedOutboxEventIds: [], + }), + inboundEventSource: createRecordingInboundEventSource([]), + eventIngestionProcessor: createRecordingEventIngestionProcessor(), + outboxBatchSize: 25, + inboundBatchSize: 10, + }); + + const result = await workerHost.runOnce(); + + equal(result.status, WorkerCycleStatus.Idle); + equal(result.inbound.pulledCount, 0); + equal(result.outbox?.status, OutboxPublishOutcomeStatus.Empty); + }); + + test("summarizes duplicate and payload-conflict inbound outcomes without hiding them", async () => { + const eventIngestionProcessor = createRecordingEventIngestionProcessor([ + EventIngestionOutcomeStatus.Duplicate, + EventIngestionOutcomeStatus.PayloadConflict, + ]); + const workerHost = createOrganizationWorkerHost({ + outboxPublisher: createRecordingOutboxPublisher({ + status: OutboxPublishOutcomeStatus.Empty, + attemptedCount: 0, + publishedOutboxEventIds: [], + }), + inboundEventSource: createRecordingInboundEventSource([ + createInboundEnvelope("evt-duplicate-001"), + createInboundEnvelope("evt-conflict-001"), + ]), + eventIngestionProcessor, + outboxBatchSize: 25, + inboundBatchSize: 10, + }); + + const result = await workerHost.runOnce(); + + deepEqual(eventIngestionProcessor.ingestedEventIds, ["evt-duplicate-001", "evt-conflict-001"]); + deepEqual(result.inbound, { + pulledCount: 2, + processedCount: 0, + duplicateCount: 1, + payloadConflictCount: 1, + failedCount: 0, + reactionPlanCount: 0, + }); + equal(result.status, WorkerCycleStatus.Worked); + }); + + test("continues inbound ingestion when the outbox lane fails", async () => { + const inboundEnvelope = createInboundEnvelope("evt-after-outbox-failure-001"); + const eventIngestionProcessor = createRecordingEventIngestionProcessor(); + const workerHost = createOrganizationWorkerHost({ + outboxPublisher: createFailingOutboxPublisher("outbox unavailable"), + inboundEventSource: createRecordingInboundEventSource([inboundEnvelope]), + eventIngestionProcessor, + outboxBatchSize: 25, + inboundBatchSize: 10, + }); + + const result = await workerHost.runOnce(); + + equal(result.status, WorkerCycleStatus.Degraded); + deepEqual(eventIngestionProcessor.ingestedEventIds, ["evt-after-outbox-failure-001"]); + deepEqual(result.inbound, { + pulledCount: 1, + processedCount: 1, + duplicateCount: 0, + payloadConflictCount: 0, + failedCount: 0, + reactionPlanCount: 0, + }); + deepEqual(result.failures, [ + { + lane: WorkerLane.Outbox, + message: "outbox unavailable", + }, + ]); + }); + + test("continues the inbound batch when one event fails ingestion", async () => { + const eventIngestionProcessor = createRecordingEventIngestionProcessor( + [EventIngestionOutcomeStatus.Processed], + "ingestion failed", + ); + const workerHost = createOrganizationWorkerHost({ + outboxPublisher: createRecordingOutboxPublisher({ + status: OutboxPublishOutcomeStatus.Empty, + attemptedCount: 0, + publishedOutboxEventIds: [], + }), + inboundEventSource: createRecordingInboundEventSource([ + createInboundEnvelope("evt-ingest-ok-001"), + createInboundEnvelope("evt-ingest-fails-001"), + createInboundEnvelope("evt-ingest-ok-002"), + ]), + eventIngestionProcessor, + outboxBatchSize: 25, + inboundBatchSize: 10, + }); + + const result = await workerHost.runOnce(); + + equal(result.status, WorkerCycleStatus.Degraded); + deepEqual(eventIngestionProcessor.ingestedEventIds, [ + "evt-ingest-ok-001", + "evt-ingest-fails-001", + "evt-ingest-ok-002", + ]); + deepEqual(result.inbound, { + pulledCount: 3, + processedCount: 2, + duplicateCount: 0, + payloadConflictCount: 0, + failedCount: 1, + reactionPlanCount: 0, + }); + deepEqual(result.failures, [ + { + lane: WorkerLane.Inbound, + message: "ingestion failed", + }, + ]); + }); +}); + +type RecordingOutboxPublisher = OutboxPublisher & { + batchSizes: number[]; +}; + +function createRecordingOutboxPublisher(result: OutboxPublishBatchResult): RecordingOutboxPublisher { + const batchSizes: number[] = []; + + return { + batchSizes, + publishNextBatch: async (input) => { + batchSizes.push(input.batchSize); + return result; + }, + }; +} + +function createFailingOutboxPublisher(message: string): OutboxPublisher { + return { + publishNextBatch: async () => { + throw new Error(message); + }, + }; +} + +type RecordingInboundEventSource = InboundEventSource & { + batchSizes: number[]; +}; + +function createRecordingInboundEventSource(envelopes: readonly AgenticEventEnvelope[]): RecordingInboundEventSource { + const batchSizes: number[] = []; + + return { + batchSizes, + pullNextBatch: async (input) => { + batchSizes.push(input.batchSize); + return envelopes; + }, + }; +} + +type RecordingEventIngestionProcessor = { + ingestedEventIds: string[]; + ingest: (input: { envelope: AgenticEventEnvelope }) => Promise<{ + status: EventIngestionOutcomeStatus; + reactionPlans: readonly ReactionPlanRecord[]; + }>; +}; + +function createRecordingEventIngestionProcessor( + statuses: readonly EventIngestionOutcomeStatus[] = [EventIngestionOutcomeStatus.Processed], + failureMessage?: string, +): RecordingEventIngestionProcessor { + const ingestedEventIds: string[] = []; + let currentIndex = 0; + + return { + ingestedEventIds, + ingest: async (input) => { + ingestedEventIds.push(input.envelope.eventId); + if (failureMessage !== undefined && currentIndex === 1) { + currentIndex += 1; + throw new Error(failureMessage); + } + + const status = statuses[currentIndex] ?? EventIngestionOutcomeStatus.Processed; + currentIndex += 1; + + return { + status, + reactionPlans: [], + }; + }, + }; +} + +function createSingleEventOutboxSource(outboxEvent: OutboxEvent): { + claimUnpublishedOutboxEvents: () => Promise; + markOutboxEventPublished: () => Promise; +} { + let claimed = false; + + return { + claimUnpublishedOutboxEvents: async () => { + if (claimed) { + return []; + } + + claimed = true; + return [outboxEvent]; + }, + markOutboxEventPublished: async () => { + outboxEvent.publishedAt = "2026-05-25T20:05:00.000Z"; + }, + }; +} + +function createOutboxEvent(envelope: AgenticEventEnvelope): OutboxEvent { + return { + outboxEventId: "outbox-composed-001", + envelope, + }; +} + +function createSupervisorSignalPayload(): Record { + return { + targetHatAssignmentId: "hat-assignment-manager-001", + targetLevel: SupervisorChainLevel.Manager, + title: "Blocked on scoped NATS publisher", + }; +} + +function createInboundEnvelope( + eventId = "evt-inbound-001", + payload: Record = { + title: "Blocked on scoped NATS publisher", + }, +): AgenticEventEnvelope { + return createAgenticEventEnvelope({ + eventId, + eventType: AgenticEventType.SupervisorSignalSent, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-supervisor-signal-001", + correlationId: "corr-supervisor-signal-001", + causationId: "cause-team-work-001", + traceId: "trace-supervisor-signal-001", + idempotencyKey: "idem-supervisor-signal-001", + }, + payload, + }); +} diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index a51300d259..be4cc4a81d 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -229,6 +229,47 @@ executing privileged work directly. plans - **AND** reaction plans include a persisted status +### Requirement: Worker process boundary composes event loops through ports + +Organization worker code MUST remain a small composition boundary until +live infrastructure adapters are bound by a runtime host. + +#### Scenario: Worker runs one bounded cycle + +- **WHEN** the worker host is asked to run once +- **THEN** it publishes at most one bounded outbox batch through an + outbox publisher port +- **AND** it pulls at most one bounded inbound batch through an inbound + event source port +- **AND** it sends each decoded event envelope through the event + ingestion processor +- **AND** it returns a cycle summary with outbox status, inbound pulled + count, processed count, duplicate count, payload-conflict count, + failed count, reaction-plan count, and failure details +- **AND** it reports idle only when no outbox events are published and + no inbound events are pulled + +#### Scenario: Worker reports degraded lanes without starving other lanes + +- **WHEN** the outbox lane fails during a worker cycle +- **THEN** the worker still attempts the inbound ingestion lane +- **AND** the cycle result reports a degraded status with the outbox + failure message +- **AND** inbound processing counts remain visible +- **WHEN** one inbound event fails during ingestion +- **THEN** later inbound events in the same batch are still attempted +- **AND** the cycle result reports failed inbound count and failure + details + +#### Scenario: Worker source remains adapter-free + +- **WHEN** package dependency-boundary tests inspect worker source +- **THEN** worker source is checked for forbidden imports of the + Cockroach adapter, NATS adapter, NestJS, NATS, Dapr, Temporal, + Drizzle, Postgres, or other concrete runtime clients +- **AND** concrete process concerns are left for `apps/workers` or + adapter packages + ### Requirement: Telemetry is complete at the event boundary Organization packages MUST expose OpenTelemetry-compatible attributes From 3beb12fd93562191306181caa4ab6352c8323871 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 20:25:53 -0400 Subject: [PATCH 14/21] feat(agentic-org): add NATS inbound consumer adapter Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 26 +- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 30 +- agentic-organization/packages/README.md | 18 +- .../packages/messaging-nats/src/index.ts | 14 + .../src/nats-jetstream-event-consumer.ts | 300 ++++++++++++++++++ .../nats-jetstream-event-consumer.test.ts | 290 +++++++++++++++++ .../packages/observability/src/index.ts | 7 + .../src/nats-consumer-attributes.ts | 59 ++++ .../test/nats-consumer-attributes.test.ts | 40 +++ openspec/specs/agentic-organization/spec.md | 39 +++ 10 files changed, 803 insertions(+), 20 deletions(-) create mode 100644 agentic-organization/packages/messaging-nats/src/nats-jetstream-event-consumer.ts create mode 100644 agentic-organization/packages/messaging-nats/test/nats-jetstream-event-consumer.test.ts create mode 100644 agentic-organization/packages/observability/src/nats-consumer-attributes.ts create mode 100644 agentic-organization/packages/observability/test/nats-consumer-attributes.test.ts diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index 22fda9efa9..cad035186d 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -31,6 +31,7 @@ send_supervisor_signal -> outbox event with canonical event envelope -> outbox publisher -> NATS JetStream event publisher adapter + -> NATS JetStream event consumer adapter -> NATS subject contract -> event ingestion processor -> inbox receipt / consumer dedupe @@ -49,8 +50,8 @@ send_supervisor_signal | `@agentic-org/state` | generic state-store/outbox-source ports plus the in-memory Organization state-store factory fake | | `@agentic-org/state-cockroach` | first replaceable durable SQL implementation of the state-store/outbox-source ports, backed by CockroachDB | | `@agentic-org/messaging` | stable `agentic-org....` subject builder, outbox publisher, event publisher port, and typed domain resolver | -| `@agentic-org/messaging-nats` | NATS JetStream event publisher adapter contract with canonical JSON payloads, headers, and message IDs | -| `@agentic-org/observability` | OpenTelemetry/LGTM span attribute projection | +| `@agentic-org/messaging-nats` | NATS JetStream publisher and consumer adapter contracts with canonical JSON payloads, headers, message IDs, ack/nack, termination, and DLQ policy | +| `@agentic-org/observability` | OpenTelemetry/LGTM span attribute projection for event envelopes and NATS consumer batch summaries | | `@agentic-org/runtime` | first rule that plans triage for the target supervisor when a chain signal is sent | | `@agentic-org/workers` | process-boundary run-once worker host that composes outbox publishing and inbound event ingestion through ports | | `@agentic-org/governance` | package dependency-boundary checks that prevent application code from importing concrete state/runtime adapters | @@ -122,6 +123,11 @@ Hermes runs, MCP calls, and UI evidence. the publish succeeds. - The NATS adapter publishes canonical JSON envelopes with typed headers and event IDs as message IDs for idempotent JetStream publication. +- The NATS consumer adapter decodes canonical JSON envelopes, preserves + the traceable event boundary into the runtime ingestion processor, + acknowledges processed and duplicate messages, terminates and + dead-letters invalid envelopes or payload conflicts, and + negative-acknowledges transient ingestion failures. - The event ingestion processor accepts decoded canonical envelopes, dedupes them by event ID plus consumer name, evaluates automation rules once, rejects same-event payload hash conflicts, and persists @@ -148,15 +154,19 @@ Hermes runs, MCP calls, and UI evidence. - Event envelopes reject missing command trace fields. - The first automation rule produces a supervisor triage plan, not an unreviewed side effect. +- Observability now exposes NATS consumer batch attributes for received, + processed, duplicate, payload-conflict, invalid, failed, + acknowledged, negative-acknowledged, terminated, and dead-lettered + counts. ## Next Slice -The next slice should add the live NATS inbound consumer adapter behind -the existing worker-host `InboundEventSource` port, keeping canonical -envelope decoding, ack decisions, and DLQ behavior outside the runtime -rule processor. After that, add a transactional durable-state adapter -integration test using CockroachDB as the first cluster-backed -implementation once a local/dev connection is available. +The next slice should add the first runnable `apps/workers` composition +host that binds the outbox publisher, NATS consumer adapter, runtime +ingestion processor, durable state adapters, and observability helpers +behind process configuration. After that, add a transactional +durable-state adapter integration test using CockroachDB as the first +cluster-backed implementation once a local/dev connection is available. Do not make the next slice a pile of bespoke request commands. Build the generic supervisor triage lifecycle first, then let specialized diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 37798589b7..d3e025280b 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -171,7 +171,7 @@ calling them. | `@agentic-org/state` | generic state-store, outbox-source, inbox, idempotency, transaction, and lease ports | | `@agentic-org/state-cockroach` | first replaceable durable SQL implementation of state-store and outbox-source ports | | `@agentic-org/messaging` | NATS envelope builder, subject builder, JetStream publisher, consumer, DLQ, replay contracts | -| `@agentic-org/messaging-nats` | NATS JetStream implementation of the event publisher port, canonical JSON, headers, message IDs | +| `@agentic-org/messaging-nats` | NATS JetStream implementation of publisher and consumer ports, canonical JSON, headers, ack/nack, and DLQ | | `@agentic-org/workers` | small worker process boundary that composes outbox publishing, inbound ingestion, and schedulers through ports | | `@agentic-org/workflows-temporal` | Temporal workflow and activity contracts, task queues, workflow clients | | `@agentic-org/actors-dapr` | Dapr actor interfaces, actor implementations, reminders, actor state projection | @@ -498,11 +498,21 @@ pairs; conflicting payloads are not treated as normal duplicates. A worker host composes that ingestion processor with the outbox publisher but stays below the NestJS process layer. This creates a -testable boundary where later live NATS consumers can be added as -`InboundEventSource` implementations without changing rule evaluation or -reaction-plan persistence. The current source port is replayable pull -only; live NATS ack, nack, checkpoint, backoff, and DLQ behavior remains -owned by the transport adapter and `apps/workers` process host. +testable boundary where replayable inbound sources and live transport +consumers can both feed the same rule processor without changing rule +evaluation or reaction-plan persistence. The worker-host source port is +replayable pull only; live NATS ack, nack, checkpoint, backoff, and DLQ +behavior remains owned by the transport adapter and `apps/workers` +process host. + +The first NATS consumer adapter is now the transport-policy boundary. It +decodes canonical JSON envelopes and calls the runtime ingestion +processor, but it owns JetStream-style decisions: ack processed and +duplicate messages, terminate plus dead-letter invalid envelopes and +payload conflicts, and negative-acknowledge transient ingestion +failures. This keeps runtime rules deterministic and transport-neutral +while still making live NATS behavior testable before a Nest worker +process exists. ### Stream and Consumer Manifests @@ -781,6 +791,10 @@ package should standardize: - workflow visibility records that project command/event context into UI- and agent-readable health, stage, trace, scope, aggregate, and weak-point indicator fields. +- NATS consumer batch attributes for stream, durable consumer, received, + processed, duplicate, payload-conflict, invalid, failed, + acknowledged, negative-acknowledged, terminated, and dead-lettered + counts. Every runtime host should be inspectable from either direction: @@ -837,8 +851,8 @@ other work. 7. Add inbox/consumer dedupe before any NATS-driven automation performs side effects. The first package-level processor and Cockroach adapter now exist; the first package-level worker host composes the outbox and - inbound-ingestion loops through ports, while the live NATS consumer - adapter is still pending. + inbound-ingestion loops through ports, and the NATS consumer adapter + owns live ack/nack/DLQ policy. 8. Add the first rule catalog and reaction executor for ready work, review staffing, QA staffing, blocker escalation, and late run incidents. diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md index 294947f830..9af87531cc 100644 --- a/agentic-organization/packages/README.md +++ b/agentic-organization/packages/README.md @@ -14,8 +14,8 @@ host, or Kubernetes deployment is introduced. | `state` | generic state-store and outbox-source ports plus the in-memory Organization state-store factory fake | | `state-cockroach` | first replaceable durable SQL adapter for the state-store/outbox-source ports, backed by CockroachDB | | `messaging` | NATS subject contract, outbox publisher port, event publisher port, and domain resolver without a live NATS dependency | -| `messaging-nats` | NATS JetStream event publisher adapter contract with canonical JSON payloads, headers, and message IDs | -| `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes | +| `messaging-nats` | NATS JetStream publisher and consumer adapter contracts with canonical JSON, headers, ack/nack, and DLQ policy | +| `observability` | LGTM/OpenTelemetry attribute projection from Agentic event envelopes and NATS consumer batch summaries | | `runtime` | first event-to-automation reaction rule | | `workers` | process-boundary worker host that composes outbox publishing and inbound ingestion through ports only | | `governance` | package dependency-boundary checks that keep core packages SOLID and adapter-free | @@ -32,6 +32,7 @@ supervisor-chain signal command -> outbox event -> outbox publisher -> NATS JetStream event publisher adapter + -> NATS JetStream event consumer adapter -> NATS subject / telemetry contract -> event ingestion processor -> inbox receipt / consumer dedupe @@ -71,6 +72,15 @@ package implements that publisher port and is the only package in this slice that knows about NATS headers, message IDs, and JSON transport payloads. State adapters must not import messaging adapters. +The NATS consumer adapter owns live transport policy. It fetches a +bounded batch from a pull-consumer port, decodes canonical event +envelopes, calls the runtime event-ingestion processor, and then chooses +the transport action. Processed and duplicate messages are acknowledged. +Invalid envelopes and same-event payload conflicts are terminated and +published to a dead-letter port. Transient ingestion failures are +negative-acknowledged for retry. Runtime rule evaluation does not know +about ack, nack, termination, backoff, or DLQ mechanics. + The event ingestion processor owns the generic consume loop after a transport adapter has decoded a canonical event envelope. It checks an inbox receipt before evaluating rules, records the receipt and generated @@ -92,8 +102,8 @@ instead of hiding the failure or starving the other lane. `InboundEventSource` is intentionally only a replayable pull port in this package. Live NATS ack, nack, checkpoint, backoff, and DLQ behavior -belongs in the future NATS consumer adapter so transport policy does not -leak into runtime rule evaluation. +belongs in the NATS consumer adapter so transport policy does not leak +into runtime rule evaluation. ## Validation diff --git a/agentic-organization/packages/messaging-nats/src/index.ts b/agentic-organization/packages/messaging-nats/src/index.ts index 9bc7e8466b..ee81e3d166 100644 --- a/agentic-organization/packages/messaging-nats/src/index.ts +++ b/agentic-organization/packages/messaging-nats/src/index.ts @@ -5,3 +5,17 @@ export { type NatsJetStreamClient, type NatsJetStreamMessage, } from "./nats-jetstream-event-publisher.ts"; +export { + NatsDeadLetterReason, + NatsInboundMessageAckAction, + createNatsJetStreamEventConsumer, + type CreateNatsJetStreamEventConsumerInput, + type FetchNatsJetStreamBatchInput, + type NatsDeadLetterMessage, + type NatsDeadLetterPublisher, + type NatsJetStreamConsumeBatchResult, + type NatsJetStreamEventConsumer, + type NatsJetStreamInboundMessage, + type NatsJetStreamPullConsumer, + type ProcessNatsJetStreamBatchInput, +} from "./nats-jetstream-event-consumer.ts"; diff --git a/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-consumer.ts b/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-consumer.ts new file mode 100644 index 0000000000..e9d06b4b19 --- /dev/null +++ b/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-consumer.ts @@ -0,0 +1,300 @@ +import { + AgenticAggregateType, + AgenticEventType, + EventSchemaVersion, + createAgenticEventEnvelope, + type AgenticEventEnvelope, +} from "../../domain/src/index.ts"; +import type { EventIngestionProcessor } from "../../runtime/src/index.ts"; +import { EventIngestionOutcomeStatus } from "../../state/src/index.ts"; + +export const NatsInboundMessageAckAction = { + Acknowledge: "acknowledge", + NegativeAcknowledge: "negative_acknowledge", + Terminate: "terminate", +} as const; + +export type NatsInboundMessageAckAction = + (typeof NatsInboundMessageAckAction)[keyof typeof NatsInboundMessageAckAction]; + +export const NatsDeadLetterReason = { + InvalidEnvelope: "invalid_envelope", + PayloadConflict: "payload_conflict", +} as const; + +export type NatsDeadLetterReason = (typeof NatsDeadLetterReason)[keyof typeof NatsDeadLetterReason]; + +export type FetchNatsJetStreamBatchInput = { + batchSize: number; +}; + +export type NatsJetStreamInboundMessage = { + subject: string; + payload: string; + headers: Record; + acknowledge: () => Promise; + negativeAcknowledge: () => Promise; + terminate: () => Promise; +}; + +export type NatsJetStreamPullConsumer = { + fetchNextBatch: (input: FetchNatsJetStreamBatchInput) => Promise; +}; + +export type NatsDeadLetterMessage = { + sourceSubject: string; + payload: string; + headers: Record; + reason: NatsDeadLetterReason; +}; + +export type NatsDeadLetterPublisher = { + publish: (message: NatsDeadLetterMessage) => Promise; +}; + +export type ProcessNatsJetStreamBatchInput = { + batchSize: number; +}; + +export type NatsJetStreamConsumeBatchResult = { + receivedCount: number; + processedCount: number; + duplicateCount: number; + payloadConflictCount: number; + invalidCount: number; + failedCount: number; + acknowledgedCount: number; + negativeAcknowledgedCount: number; + terminatedCount: number; + deadLetteredCount: number; +}; + +export type NatsJetStreamEventConsumer = { + processNextBatch: (input: ProcessNatsJetStreamBatchInput) => Promise; +}; + +export type CreateNatsJetStreamEventConsumerInput = { + pullConsumer: NatsJetStreamPullConsumer; + eventIngestionProcessor: EventIngestionProcessor; + deadLetterPublisher: NatsDeadLetterPublisher; +}; + +export function createNatsJetStreamEventConsumer( + input: CreateNatsJetStreamEventConsumerInput, +): NatsJetStreamEventConsumer { + return { + processNextBatch: async ({ batchSize }) => { + const messages = await input.pullConsumer.fetchNextBatch({ + batchSize, + }); + const result = createEmptyConsumeBatchResult(messages.length); + + for (const message of messages) { + await processMessage({ + message, + eventIngestionProcessor: input.eventIngestionProcessor, + deadLetterPublisher: input.deadLetterPublisher, + result, + }); + } + + return result; + }, + }; +} + +type ProcessMessageInput = { + message: NatsJetStreamInboundMessage; + eventIngestionProcessor: EventIngestionProcessor; + deadLetterPublisher: NatsDeadLetterPublisher; + result: NatsJetStreamConsumeBatchResult; +}; + +async function processMessage(input: ProcessMessageInput): Promise { + const envelope = decodeCanonicalEventEnvelope(input.message.payload); + + if (envelope === undefined) { + input.result.invalidCount += 1; + await terminateWithDeadLetter({ + message: input.message, + deadLetterPublisher: input.deadLetterPublisher, + result: input.result, + reason: NatsDeadLetterReason.InvalidEnvelope, + }); + return; + } + + try { + const ingestionResult = await input.eventIngestionProcessor.ingest({ + envelope, + }); + + if (ingestionResult.status === EventIngestionOutcomeStatus.PayloadConflict) { + input.result.payloadConflictCount += 1; + await terminateWithDeadLetter({ + message: input.message, + deadLetterPublisher: input.deadLetterPublisher, + result: input.result, + reason: NatsDeadLetterReason.PayloadConflict, + }); + return; + } + + if (ingestionResult.status === EventIngestionOutcomeStatus.Duplicate) { + input.result.duplicateCount += 1; + } + + if (ingestionResult.status === EventIngestionOutcomeStatus.Processed) { + input.result.processedCount += 1; + } + + await input.message.acknowledge(); + input.result.acknowledgedCount += 1; + } catch { + input.result.failedCount += 1; + await input.message.negativeAcknowledge(); + input.result.negativeAcknowledgedCount += 1; + } +} + +type TerminateWithDeadLetterInput = { + message: NatsJetStreamInboundMessage; + deadLetterPublisher: NatsDeadLetterPublisher; + result: NatsJetStreamConsumeBatchResult; + reason: NatsDeadLetterReason; +}; + +async function terminateWithDeadLetter(input: TerminateWithDeadLetterInput): Promise { + await input.deadLetterPublisher.publish({ + sourceSubject: input.message.subject, + payload: input.message.payload, + headers: input.message.headers, + reason: input.reason, + }); + input.result.deadLetteredCount += 1; + await input.message.terminate(); + input.result.terminatedCount += 1; +} + +function decodeCanonicalEventEnvelope(payload: string): AgenticEventEnvelope | undefined { + const parsed = parseJsonRecord(payload); + + if (parsed === undefined || !isCanonicalEventEnvelopeRecord(parsed)) { + return undefined; + } + + try { + return createAgenticEventEnvelope({ + eventId: parsed.eventId, + eventType: parsed.eventType, + schemaVersion: parsed.schemaVersion, + occurredAt: parsed.occurredAt, + actor: parsed.actor, + scope: parsed.scope, + aggregate: parsed.aggregate, + trace: parsed.trace, + replay: parsed.replay, + payload: parsed.payload, + }); + } catch { + return undefined; + } +} + +function parseJsonRecord(payload: string): Record | undefined { + try { + const parsed: unknown = JSON.parse(payload); + + if (isRecord(parsed)) { + return parsed; + } + + return undefined; + } catch { + return undefined; + } +} + +function isCanonicalEventEnvelopeRecord(value: Record): value is AgenticEventEnvelope { + return ( + typeof value.eventId === "string" && + isAgenticEventType(value.eventType) && + value.schemaVersion === EventSchemaVersion.AgenticOrgEventV1 && + typeof value.occurredAt === "string" && + isActor(value.actor) && + isScope(value.scope) && + isAggregate(value.aggregate) && + isTrace(value.trace) && + isReplay(value.replay) + ); +} + +function isActor(value: unknown): value is AgenticEventEnvelope["actor"] { + return isRecord(value) && typeof value.agentId === "string" && typeof value.hatAssignmentId === "string"; +} + +function isScope(value: unknown): value is AgenticEventEnvelope["scope"] { + return ( + isRecord(value) && + typeof value.organizationId === "string" && + typeof value.projectId === "string" && + typeof value.workItemId === "string" && + isOptionalString(value.initiativeId) && + isOptionalString(value.teamId) + ); +} + +function isAggregate(value: unknown): value is AgenticEventEnvelope["aggregate"] { + return ( + isRecord(value) && + typeof value.aggregateId === "string" && + isAgenticAggregateType(value.aggregateType) && + typeof value.aggregateVersion === "number" + ); +} + +function isTrace(value: unknown): value is AgenticEventEnvelope["trace"] { + return ( + isRecord(value) && + typeof value.commandId === "string" && + typeof value.correlationId === "string" && + typeof value.causationId === "string" && + typeof value.traceId === "string" && + typeof value.idempotencyKey === "string" + ); +} + +function isReplay(value: unknown): value is AgenticEventEnvelope["replay"] { + return isRecord(value) && typeof value.isReplay === "boolean"; +} + +function isAgenticEventType(value: unknown): value is AgenticEventType { + return Object.values(AgenticEventType).includes(value as AgenticEventType); +} + +function isAgenticAggregateType(value: unknown): value is AgenticAggregateType { + return Object.values(AgenticAggregateType).includes(value as AgenticAggregateType); +} + +function isOptionalString(value: unknown): boolean { + return value === undefined || typeof value === "string"; +} + +function isRecord(value: unknown): value is Record { + return typeof value === "object" && value !== null; +} + +function createEmptyConsumeBatchResult(receivedCount: number): NatsJetStreamConsumeBatchResult { + return { + receivedCount, + processedCount: 0, + duplicateCount: 0, + payloadConflictCount: 0, + invalidCount: 0, + failedCount: 0, + acknowledgedCount: 0, + negativeAcknowledgedCount: 0, + terminatedCount: 0, + deadLetteredCount: 0, + }; +} diff --git a/agentic-organization/packages/messaging-nats/test/nats-jetstream-event-consumer.test.ts b/agentic-organization/packages/messaging-nats/test/nats-jetstream-event-consumer.test.ts new file mode 100644 index 0000000000..5c9a318fd1 --- /dev/null +++ b/agentic-organization/packages/messaging-nats/test/nats-jetstream-event-consumer.test.ts @@ -0,0 +1,290 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + AgenticAggregateType, + AgenticEventType, + createAgenticEventEnvelope, + type AgenticEventEnvelope, +} from "../../domain/src/index.ts"; +import type { EventIngestionProcessor } from "../../runtime/src/index.ts"; +import { EventIngestionOutcomeStatus } from "../../state/src/index.ts"; +import { + NatsDeadLetterReason, + NatsInboundMessageAckAction, + createNatsJetStreamEventConsumer, + type NatsJetStreamInboundMessage, + type NatsJetStreamPullConsumer, +} from "../src/index.ts"; + +describe("NATS JetStream event consumer", () => { + test("decodes canonical event envelopes and acknowledges processed messages", async () => { + const envelope = createEnvelope(); + const message = createRecordingInboundMessage({ + payload: JSON.stringify(envelope), + }); + const pullConsumer = createRecordingPullConsumer([message]); + const eventIngestionProcessor = createRecordingEventIngestionProcessor(EventIngestionOutcomeStatus.Processed); + const deadLetterPublisher = createRecordingDeadLetterPublisher(); + const consumer = createNatsJetStreamEventConsumer({ + pullConsumer, + eventIngestionProcessor, + deadLetterPublisher, + }); + + const result = await consumer.processNextBatch({ + batchSize: 10, + }); + + deepEqual(pullConsumer.batchSizes, [10]); + deepEqual(eventIngestionProcessor.eventIds, ["evt-nats-001"]); + deepEqual(eventIngestionProcessor.traceFields, [ + { + eventId: "evt-nats-001", + traceId: "trace-nats-001", + correlationId: "corr-nats-001", + idempotencyKey: "idem-nats-001", + }, + ]); + deepEqual(message.ackActions, [NatsInboundMessageAckAction.Acknowledge]); + equal(deadLetterPublisher.messages.length, 0); + deepEqual(result, { + receivedCount: 1, + processedCount: 1, + duplicateCount: 0, + payloadConflictCount: 0, + invalidCount: 0, + failedCount: 0, + acknowledgedCount: 1, + negativeAcknowledgedCount: 0, + terminatedCount: 0, + deadLetteredCount: 0, + }); + }); + + test("terminates and dead-letters invalid payloads without calling runtime ingestion", async () => { + const message = createRecordingInboundMessage({ + payload: "{not-json", + }); + const eventIngestionProcessor = createRecordingEventIngestionProcessor(EventIngestionOutcomeStatus.Processed); + const deadLetterPublisher = createRecordingDeadLetterPublisher(); + const consumer = createNatsJetStreamEventConsumer({ + pullConsumer: createRecordingPullConsumer([message]), + eventIngestionProcessor, + deadLetterPublisher, + }); + + const result = await consumer.processNextBatch({ + batchSize: 10, + }); + + deepEqual(eventIngestionProcessor.eventIds, []); + deepEqual(message.ackActions, [NatsInboundMessageAckAction.Terminate]); + equal(result.invalidCount, 1); + equal(result.deadLetteredCount, 1); + deepEqual(deadLetterPublisher.messages, [ + { + sourceSubject: "agentic-org.local.org-lfg.supervisor_signal.supervisor_signal.sent", + payload: "{not-json", + headers: { + "Nats-Msg-Event-Id": "evt-nats-001", + }, + reason: NatsDeadLetterReason.InvalidEnvelope, + }, + ]); + }); + + test("terminates and dead-letters payload conflicts", async () => { + const envelope = createEnvelope(); + const message = createRecordingInboundMessage({ + payload: JSON.stringify(envelope), + }); + const deadLetterPublisher = createRecordingDeadLetterPublisher(); + const consumer = createNatsJetStreamEventConsumer({ + pullConsumer: createRecordingPullConsumer([message]), + eventIngestionProcessor: createRecordingEventIngestionProcessor(EventIngestionOutcomeStatus.PayloadConflict), + deadLetterPublisher, + }); + + const result = await consumer.processNextBatch({ + batchSize: 10, + }); + + deepEqual(message.ackActions, [NatsInboundMessageAckAction.Terminate]); + equal(result.payloadConflictCount, 1); + equal(result.deadLetteredCount, 1); + deepEqual(deadLetterPublisher.messages, [ + { + sourceSubject: "agentic-org.local.org-lfg.supervisor_signal.supervisor_signal.sent", + payload: JSON.stringify(envelope), + headers: { + "Nats-Msg-Event-Id": "evt-nats-001", + }, + reason: NatsDeadLetterReason.PayloadConflict, + }, + ]); + }); + + test("negative-acknowledges transient ingestion failures", async () => { + const message = createRecordingInboundMessage({ + payload: JSON.stringify(createEnvelope()), + }); + const consumer = createNatsJetStreamEventConsumer({ + pullConsumer: createRecordingPullConsumer([message]), + eventIngestionProcessor: createFailingEventIngestionProcessor("store unavailable"), + deadLetterPublisher: createRecordingDeadLetterPublisher(), + }); + + const result = await consumer.processNextBatch({ + batchSize: 10, + }); + + deepEqual(message.ackActions, [NatsInboundMessageAckAction.NegativeAcknowledge]); + equal(result.failedCount, 1); + equal(result.negativeAcknowledgedCount, 1); + }); +}); + +function createEnvelope(): AgenticEventEnvelope { + return createAgenticEventEnvelope({ + eventId: "evt-nats-001", + eventType: AgenticEventType.SupervisorSignalSent, + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + workItemId: "work-nats-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-nats-001", + correlationId: "corr-nats-001", + causationId: "cause-nats-001", + traceId: "trace-nats-001", + idempotencyKey: "idem-nats-001", + }, + payload: { + title: "Blocked on NATS inbound adapter", + }, + }); +} + +function createRecordingInboundMessage(input: { payload: string }): NatsJetStreamInboundMessage & { + ackActions: NatsInboundMessageAckAction[]; +} { + const ackActions: NatsInboundMessageAckAction[] = []; + + return { + subject: "agentic-org.local.org-lfg.supervisor_signal.supervisor_signal.sent", + payload: input.payload, + headers: { + "Nats-Msg-Event-Id": "evt-nats-001", + }, + ackActions, + acknowledge: async () => { + ackActions.push(NatsInboundMessageAckAction.Acknowledge); + }, + negativeAcknowledge: async () => { + ackActions.push(NatsInboundMessageAckAction.NegativeAcknowledge); + }, + terminate: async () => { + ackActions.push(NatsInboundMessageAckAction.Terminate); + }, + }; +} + +function createRecordingPullConsumer(messages: readonly NatsJetStreamInboundMessage[]): NatsJetStreamPullConsumer & { + batchSizes: number[]; +} { + const batchSizes: number[] = []; + + return { + batchSizes, + fetchNextBatch: async (input) => { + batchSizes.push(input.batchSize); + return messages; + }, + }; +} + +function createRecordingEventIngestionProcessor(status: EventIngestionOutcomeStatus): EventIngestionProcessor & { + eventIds: string[]; + traceFields: { + eventId: string; + traceId: string; + correlationId: string; + idempotencyKey: string; + }[]; +} { + const eventIds: string[] = []; + const traceFields: { + eventId: string; + traceId: string; + correlationId: string; + idempotencyKey: string; + }[] = []; + + return { + eventIds, + traceFields, + ingest: async (input) => { + eventIds.push(input.envelope.eventId); + traceFields.push({ + eventId: input.envelope.eventId, + traceId: input.envelope.trace.traceId, + correlationId: input.envelope.trace.correlationId, + idempotencyKey: input.envelope.trace.idempotencyKey, + }); + + return { + status, + reactionPlans: [], + }; + }, + }; +} + +function createFailingEventIngestionProcessor(message: string): EventIngestionProcessor { + return { + ingest: async () => { + throw new Error(message); + }, + }; +} + +function createRecordingDeadLetterPublisher(): { + messages: { + sourceSubject: string; + payload: string; + headers: Record; + reason: NatsDeadLetterReason; + }[]; + publish: (input: { + sourceSubject: string; + payload: string; + headers: Record; + reason: NatsDeadLetterReason; + }) => Promise; +} { + const messages: { + sourceSubject: string; + payload: string; + headers: Record; + reason: NatsDeadLetterReason; + }[] = []; + + return { + messages, + publish: async (input) => { + messages.push(input); + }, + }; +} diff --git a/agentic-organization/packages/observability/src/index.ts b/agentic-organization/packages/observability/src/index.ts index 354ec11a1f..a17c33859d 100644 --- a/agentic-organization/packages/observability/src/index.ts +++ b/agentic-organization/packages/observability/src/index.ts @@ -5,6 +5,13 @@ export { type AgenticSpanAttributes, type BuildAgenticSpanAttributesInput, } from "./span-attributes.ts"; +export { + NatsConsumerAttributeKey, + buildNatsConsumerBatchAttributes, + type BuildNatsConsumerBatchAttributesInput, + type NatsConsumerBatchAttributes, + type NatsConsumerBatchCounts, +} from "./nats-consumer-attributes.ts"; export { VisibilityHealth, WeakPointIndicatorType, diff --git a/agentic-organization/packages/observability/src/nats-consumer-attributes.ts b/agentic-organization/packages/observability/src/nats-consumer-attributes.ts new file mode 100644 index 0000000000..dd85b32e59 --- /dev/null +++ b/agentic-organization/packages/observability/src/nats-consumer-attributes.ts @@ -0,0 +1,59 @@ +import { MessagingSystemName } from "./span-attributes.ts"; + +export const NatsConsumerAttributeKey = { + MessagingSystem: "messaging.system", + StreamName: "messaging.nats.stream", + ConsumerName: "messaging.nats.consumer", + ReceivedCount: "agentic.nats.consumer.received_count", + ProcessedCount: "agentic.nats.consumer.processed_count", + DuplicateCount: "agentic.nats.consumer.duplicate_count", + PayloadConflictCount: "agentic.nats.consumer.payload_conflict_count", + InvalidCount: "agentic.nats.consumer.invalid_count", + FailedCount: "agentic.nats.consumer.failed_count", + AcknowledgedCount: "agentic.nats.consumer.acknowledged_count", + NegativeAcknowledgedCount: "agentic.nats.consumer.negative_acknowledged_count", + TerminatedCount: "agentic.nats.consumer.terminated_count", + DeadLetteredCount: "agentic.nats.consumer.dead_lettered_count", +} as const; + +export type NatsConsumerAttributeKey = (typeof NatsConsumerAttributeKey)[keyof typeof NatsConsumerAttributeKey]; + +export type NatsConsumerBatchCounts = { + receivedCount: number; + processedCount: number; + duplicateCount: number; + payloadConflictCount: number; + invalidCount: number; + failedCount: number; + acknowledgedCount: number; + negativeAcknowledgedCount: number; + terminatedCount: number; + deadLetteredCount: number; +}; + +export type BuildNatsConsumerBatchAttributesInput = NatsConsumerBatchCounts & { + streamName: string; + durableName: string; +}; + +export type NatsConsumerBatchAttributes = Record; + +export function buildNatsConsumerBatchAttributes( + input: BuildNatsConsumerBatchAttributesInput, +): NatsConsumerBatchAttributes { + return { + [NatsConsumerAttributeKey.MessagingSystem]: MessagingSystemName.Nats, + [NatsConsumerAttributeKey.StreamName]: input.streamName, + [NatsConsumerAttributeKey.ConsumerName]: input.durableName, + [NatsConsumerAttributeKey.ReceivedCount]: input.receivedCount, + [NatsConsumerAttributeKey.ProcessedCount]: input.processedCount, + [NatsConsumerAttributeKey.DuplicateCount]: input.duplicateCount, + [NatsConsumerAttributeKey.PayloadConflictCount]: input.payloadConflictCount, + [NatsConsumerAttributeKey.InvalidCount]: input.invalidCount, + [NatsConsumerAttributeKey.FailedCount]: input.failedCount, + [NatsConsumerAttributeKey.AcknowledgedCount]: input.acknowledgedCount, + [NatsConsumerAttributeKey.NegativeAcknowledgedCount]: input.negativeAcknowledgedCount, + [NatsConsumerAttributeKey.TerminatedCount]: input.terminatedCount, + [NatsConsumerAttributeKey.DeadLetteredCount]: input.deadLetteredCount, + }; +} diff --git a/agentic-organization/packages/observability/test/nats-consumer-attributes.test.ts b/agentic-organization/packages/observability/test/nats-consumer-attributes.test.ts new file mode 100644 index 0000000000..d64de48558 --- /dev/null +++ b/agentic-organization/packages/observability/test/nats-consumer-attributes.test.ts @@ -0,0 +1,40 @@ +import { deepEqual } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { buildNatsConsumerBatchAttributes } from "../src/index.ts"; + +describe("NATS consumer batch observability attributes", () => { + test("projects inbound consumer counts into LGTM-friendly attributes", () => { + deepEqual( + buildNatsConsumerBatchAttributes({ + durableName: "agentic-org-v0-automation-planner", + streamName: "agentic-org-events", + receivedCount: 6, + processedCount: 2, + duplicateCount: 1, + payloadConflictCount: 1, + invalidCount: 1, + failedCount: 1, + acknowledgedCount: 3, + negativeAcknowledgedCount: 1, + terminatedCount: 2, + deadLetteredCount: 2, + }), + { + "messaging.system": "nats", + "messaging.nats.stream": "agentic-org-events", + "messaging.nats.consumer": "agentic-org-v0-automation-planner", + "agentic.nats.consumer.received_count": 6, + "agentic.nats.consumer.processed_count": 2, + "agentic.nats.consumer.duplicate_count": 1, + "agentic.nats.consumer.payload_conflict_count": 1, + "agentic.nats.consumer.invalid_count": 1, + "agentic.nats.consumer.failed_count": 1, + "agentic.nats.consumer.acknowledged_count": 3, + "agentic.nats.consumer.negative_acknowledged_count": 1, + "agentic.nats.consumer.terminated_count": 2, + "agentic.nats.consumer.dead_lettered_count": 2, + }, + ); + }); +}); diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index be4cc4a81d..797c1372cc 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -187,6 +187,36 @@ and a concrete event-publisher adapter. correlation ID, causation ID, trace ID, idempotency key, and outbox event ID +#### Scenario: NATS adapter consumes a valid event + +- **WHEN** the NATS JetStream consumer adapter fetches a message with a + canonical event envelope +- **THEN** it decodes the envelope and sends it to the event ingestion + processor +- **AND** a processed event is acknowledged +- **AND** a duplicate event is acknowledged without treating it as a + transport failure + +#### Scenario: NATS adapter handles invalid or conflicting events + +- **WHEN** the NATS JetStream consumer adapter receives an invalid + envelope +- **THEN** runtime ingestion is not called +- **AND** the message is terminated and published to the dead-letter + port with an invalid-envelope reason +- **WHEN** runtime ingestion reports a payload conflict +- **THEN** the message is terminated and published to the dead-letter + port with a payload-conflict reason + +#### Scenario: NATS adapter retries transient ingestion failures + +- **WHEN** the event ingestion processor throws while handling a valid + envelope +- **THEN** the NATS JetStream consumer adapter negative-acknowledges the + message for retry +- **AND** the runtime rule processor does not know about NATS ack, nack, + termination, backoff, or DLQ mechanics + ### Requirement: Inbound events are deduped before automation Organization event consumers MUST record inbox receipts before @@ -282,6 +312,15 @@ for the full trace chain before live LGTM ingestion is wired. causation, trace, idempotency, actor, hat assignment, organization, project, work item, aggregate, and NATS destination fields +#### Scenario: NATS consumer batch is projected to telemetry + +- **WHEN** a NATS consumer batch result is projected to telemetry +- **THEN** the attributes include messaging system, stream, durable + consumer, received count, processed count, duplicate count, + payload-conflict count, invalid count, failed count, acknowledged + count, negative-acknowledged count, terminated count, and + dead-lettered count + ### Requirement: Workflow visibility records expose weak points Organization packages MUST project meaningful workflow movement into a From cde22a75f1c100c3478034046422ea5a812eb0c7 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 21:17:21 -0400 Subject: [PATCH 15/21] feat(agentic-org): add workers app runtime host Co-Authored-By: Codex --- agentic-organization/apps/workers/README.md | 36 +++ .../apps/workers/src/index.ts | 16 ++ .../apps/workers/src/worker-runtime.ts | 239 ++++++++++++++++++ .../apps/workers/test/worker-runtime.test.ts | 233 +++++++++++++++++ .../docs/FIRST_IMPLEMENTATION_SLICE.md | 27 +- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 20 +- agentic-organization/package.json | 2 +- agentic-organization/packages/README.md | 2 +- .../src/package-dependency-boundaries.ts | 2 + .../package-dependency-boundaries.test.ts | 72 ++++++ .../packages/observability/src/index.ts | 6 + .../src/worker-cycle-attributes.ts | 41 +++ agentic-organization/tsconfig.json | 2 +- openspec/specs/agentic-organization/spec.md | 31 +++ 14 files changed, 719 insertions(+), 10 deletions(-) create mode 100644 agentic-organization/apps/workers/README.md create mode 100644 agentic-organization/apps/workers/src/index.ts create mode 100644 agentic-organization/apps/workers/src/worker-runtime.ts create mode 100644 agentic-organization/apps/workers/test/worker-runtime.test.ts create mode 100644 agentic-organization/packages/observability/src/worker-cycle-attributes.ts diff --git a/agentic-organization/apps/workers/README.md b/agentic-organization/apps/workers/README.md new file mode 100644 index 0000000000..5aaf101a58 --- /dev/null +++ b/agentic-organization/apps/workers/README.md @@ -0,0 +1,36 @@ +# Agentic Organization Workers App + +`apps/workers` is the first runtime-host shell for Agentic +Organization. It is intentionally small and NodeNext-first so the +process boundary can be tested before NestJS, real NATS clients, +CockroachDB connection pools, Kubernetes manifests, or process +supervisors are introduced. + +## Responsibility + +The app composes existing packages. It does not own business rules. + +Current duties: + +- run the package-level Organization worker cycle; +- run the NATS JetStream consumer adapter cycle; +- pass configured NATS batch size, stream name, and durable consumer + name into the adapter boundary; +- emit worker-cycle and NATS-consumer batch telemetry records; +- return a healthy/degraded runtime result that makes failures visible + without starving the other loop. + +## Boundary + +`apps/workers` may bind adapter packages and process configuration. +Packages must not import this app. + +The app currently receives ports that tests can fake: + +- `OrganizationWorkerHost`; +- `NatsJetStreamEventConsumer`; +- `WorkerRuntimeTelemetrySink`. + +Future concrete process wiring can bind these ports to CockroachDB, +NATS, OTLP/logging, health checks, readiness checks, and graceful +shutdown without changing runtime rule evaluation. diff --git a/agentic-organization/apps/workers/src/index.ts b/agentic-organization/apps/workers/src/index.ts new file mode 100644 index 0000000000..1b44e96993 --- /dev/null +++ b/agentic-organization/apps/workers/src/index.ts @@ -0,0 +1,16 @@ +export { + WorkerRuntimeConfigError, + WorkerRuntimeConfigErrorCode, + WorkerRuntimeFailureStage, + WorkerRuntimeStatus, + WorkerRuntimeTelemetryEventName, + createWorkerRuntime, + type CreateWorkerRuntimeInput, + type WorkerRuntime, + type WorkerRuntimeConfig, + type WorkerRuntimeFailure, + type WorkerRuntimeRunResult, + type WorkerRuntimeTelemetryAttributes, + type WorkerRuntimeTelemetryRecord, + type WorkerRuntimeTelemetrySink, +} from "./worker-runtime.ts"; diff --git a/agentic-organization/apps/workers/src/worker-runtime.ts b/agentic-organization/apps/workers/src/worker-runtime.ts new file mode 100644 index 0000000000..7cf907d3c8 --- /dev/null +++ b/agentic-organization/apps/workers/src/worker-runtime.ts @@ -0,0 +1,239 @@ +import { + buildNatsConsumerBatchAttributes, + buildWorkerCycleAttributes, + type NatsConsumerBatchAttributes, + type WorkerCycleAttributes, +} from "../../../packages/observability/src/index.ts"; +import { OutboxPublishOutcomeStatus } from "../../../packages/messaging/src/index.ts"; +import type { + NatsJetStreamConsumeBatchResult, + NatsJetStreamEventConsumer, +} from "../../../packages/messaging-nats/src/index.ts"; +import { + WorkerCycleStatus, + type OrganizationWorkerHost, + type WorkerCycleResult, +} from "../../../packages/workers/src/index.ts"; + +export const WorkerRuntimeStatus = { + Degraded: "degraded", + Healthy: "healthy", +} as const; + +export type WorkerRuntimeStatus = (typeof WorkerRuntimeStatus)[keyof typeof WorkerRuntimeStatus]; + +export const WorkerRuntimeFailureStage = { + NatsConsumer: "nats_consumer", + OrganizationWorker: "organization_worker", +} as const; + +export type WorkerRuntimeFailureStage = (typeof WorkerRuntimeFailureStage)[keyof typeof WorkerRuntimeFailureStage]; + +export const WorkerRuntimeConfigErrorCode = { + InvalidNatsInboundBatchSize: "invalid_nats_inbound_batch_size", + MissingEnvironment: "missing_environment", + MissingNatsDurableName: "missing_nats_durable_name", + MissingNatsStreamName: "missing_nats_stream_name", + MissingOrganizationId: "missing_organization_id", +} as const; + +export type WorkerRuntimeConfigErrorCode = + (typeof WorkerRuntimeConfigErrorCode)[keyof typeof WorkerRuntimeConfigErrorCode]; + +export class WorkerRuntimeConfigError extends Error { + readonly code: WorkerRuntimeConfigErrorCode; + + constructor(code: WorkerRuntimeConfigErrorCode) { + super(`invalid worker runtime config: ${code}`); + this.code = code; + } +} + +export const WorkerRuntimeTelemetryEventName = { + NatsConsumerBatchProcessed: "agentic.worker.nats_consumer.batch_processed", + WorkerCycleCompleted: "agentic.worker.cycle.completed", +} as const; + +export type WorkerRuntimeTelemetryEventName = + (typeof WorkerRuntimeTelemetryEventName)[keyof typeof WorkerRuntimeTelemetryEventName]; + +export type WorkerRuntimeTelemetryAttributes = WorkerCycleAttributes | NatsConsumerBatchAttributes; + +export type WorkerRuntimeTelemetryRecord = { + eventName: WorkerRuntimeTelemetryEventName; + attributes: WorkerRuntimeTelemetryAttributes; +}; + +export type WorkerRuntimeTelemetrySink = { + record: (record: WorkerRuntimeTelemetryRecord) => Promise; +}; + +export type WorkerRuntimeConfig = { + environment: string; + natsInboundBatchSize: number; + organizationId: string; + natsStreamName: string; + natsDurableName: string; +}; + +export type WorkerRuntimeFailure = { + stage: WorkerRuntimeFailureStage; + message: string; +}; + +export type WorkerRuntimeRunResult = { + status: WorkerRuntimeStatus; + workerCycle: WorkerCycleResult | undefined; + natsConsumerBatch: NatsJetStreamConsumeBatchResult | undefined; + failures: readonly WorkerRuntimeFailure[]; +}; + +export type WorkerRuntime = { + runOnce: () => Promise; +}; + +export type CreateWorkerRuntimeInput = { + config: WorkerRuntimeConfig; + organizationWorkerHost: OrganizationWorkerHost; + natsEventConsumer: NatsJetStreamEventConsumer; + telemetrySink: WorkerRuntimeTelemetrySink; +}; + +export function createWorkerRuntime(input: CreateWorkerRuntimeInput): WorkerRuntime { + validateWorkerRuntimeConfig(input.config); + + return { + runOnce: async () => { + const failures: WorkerRuntimeFailure[] = []; + const workerCycle = await runOrganizationWorker({ + organizationWorkerHost: input.organizationWorkerHost, + telemetrySink: input.telemetrySink, + failures, + }); + const natsConsumerBatch = await runNatsConsumer({ + config: input.config, + natsEventConsumer: input.natsEventConsumer, + telemetrySink: input.telemetrySink, + failures, + }); + + return { + status: resolveWorkerRuntimeStatus({ + workerCycle, + natsConsumerBatch, + failures, + }), + workerCycle, + natsConsumerBatch, + failures, + }; + }, + }; +} + +function validateWorkerRuntimeConfig(config: WorkerRuntimeConfig): void { + assertNonEmptyConfigValue(config.environment, WorkerRuntimeConfigErrorCode.MissingEnvironment); + assertNonEmptyConfigValue(config.organizationId, WorkerRuntimeConfigErrorCode.MissingOrganizationId); + assertNonEmptyConfigValue(config.natsStreamName, WorkerRuntimeConfigErrorCode.MissingNatsStreamName); + assertNonEmptyConfigValue(config.natsDurableName, WorkerRuntimeConfigErrorCode.MissingNatsDurableName); + + if (!Number.isInteger(config.natsInboundBatchSize) || config.natsInboundBatchSize < 1) { + throw new WorkerRuntimeConfigError(WorkerRuntimeConfigErrorCode.InvalidNatsInboundBatchSize); + } +} + +function assertNonEmptyConfigValue(value: string, code: WorkerRuntimeConfigErrorCode): void { + if (value.trim().length === 0) { + throw new WorkerRuntimeConfigError(code); + } +} + +type RunOrganizationWorkerInput = { + organizationWorkerHost: OrganizationWorkerHost; + telemetrySink: WorkerRuntimeTelemetrySink; + failures: WorkerRuntimeFailure[]; +}; + +async function runOrganizationWorker(input: RunOrganizationWorkerInput): Promise { + try { + const workerCycle = await input.organizationWorkerHost.runOnce(); + await input.telemetrySink.record({ + eventName: WorkerRuntimeTelemetryEventName.WorkerCycleCompleted, + attributes: buildWorkerCycleAttributes({ + status: workerCycle.status, + outboxStatus: workerCycle.outbox?.status ?? OutboxPublishOutcomeStatus.Empty, + inboundPulledCount: workerCycle.inbound.pulledCount, + inboundProcessedCount: workerCycle.inbound.processedCount, + inboundDuplicateCount: workerCycle.inbound.duplicateCount, + inboundPayloadConflictCount: workerCycle.inbound.payloadConflictCount, + inboundFailedCount: workerCycle.inbound.failedCount, + inboundReactionPlanCount: workerCycle.inbound.reactionPlanCount, + failureCount: workerCycle.failures.length, + }), + }); + return workerCycle; + } catch (error) { + input.failures.push({ + stage: WorkerRuntimeFailureStage.OrganizationWorker, + message: extractErrorMessage(error), + }); + return undefined; + } +} + +type RunNatsConsumerInput = { + config: WorkerRuntimeConfig; + natsEventConsumer: NatsJetStreamEventConsumer; + telemetrySink: WorkerRuntimeTelemetrySink; + failures: WorkerRuntimeFailure[]; +}; + +async function runNatsConsumer(input: RunNatsConsumerInput): Promise { + try { + const natsConsumerBatch = await input.natsEventConsumer.processNextBatch({ + batchSize: input.config.natsInboundBatchSize, + }); + await input.telemetrySink.record({ + eventName: WorkerRuntimeTelemetryEventName.NatsConsumerBatchProcessed, + attributes: buildNatsConsumerBatchAttributes({ + streamName: input.config.natsStreamName, + durableName: input.config.natsDurableName, + ...natsConsumerBatch, + }), + }); + return natsConsumerBatch; + } catch (error) { + input.failures.push({ + stage: WorkerRuntimeFailureStage.NatsConsumer, + message: extractErrorMessage(error), + }); + return undefined; + } +} + +type ResolveWorkerRuntimeStatusInput = { + workerCycle: WorkerCycleResult | undefined; + natsConsumerBatch: NatsJetStreamConsumeBatchResult | undefined; + failures: readonly WorkerRuntimeFailure[]; +}; + +function resolveWorkerRuntimeStatus(input: ResolveWorkerRuntimeStatusInput): WorkerRuntimeStatus { + if ( + input.failures.length > 0 || + input.workerCycle?.status === WorkerCycleStatus.Degraded || + input.natsConsumerBatch?.failedCount !== 0 || + input.natsConsumerBatch?.deadLetteredCount !== 0 + ) { + return WorkerRuntimeStatus.Degraded; + } + + return WorkerRuntimeStatus.Healthy; +} + +function extractErrorMessage(error: unknown): string { + if (error instanceof Error) { + return error.message; + } + + return String(error); +} diff --git a/agentic-organization/apps/workers/test/worker-runtime.test.ts b/agentic-organization/apps/workers/test/worker-runtime.test.ts new file mode 100644 index 0000000000..53ba335661 --- /dev/null +++ b/agentic-organization/apps/workers/test/worker-runtime.test.ts @@ -0,0 +1,233 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { OutboxPublishOutcomeStatus } from "../../../packages/messaging/src/index.ts"; +import type { NatsJetStreamConsumeBatchResult } from "../../../packages/messaging-nats/src/index.ts"; +import { WorkerCycleStatus, type WorkerCycleResult } from "../../../packages/workers/src/index.ts"; +import { + WorkerRuntimeFailureStage, + WorkerRuntimeConfigError, + WorkerRuntimeConfigErrorCode, + WorkerRuntimeStatus, + WorkerRuntimeTelemetryEventName, + createWorkerRuntime, + type WorkerRuntimeTelemetrySink, +} from "../src/index.ts"; + +describe("worker runtime composition host", () => { + test("runs worker and NATS consumer loops with configured telemetry", async () => { + const organizationWorkerHost = createRecordingOrganizationWorkerHost(createWorkedWorkerCycle()); + const natsEventConsumer = createRecordingNatsEventConsumer(createProcessedNatsBatch()); + const telemetrySink = createRecordingTelemetrySink(); + const runtime = createWorkerRuntime({ + config: createRuntimeConfig(), + organizationWorkerHost, + natsEventConsumer, + telemetrySink, + }); + + const result = await runtime.runOnce(); + + equal(result.status, WorkerRuntimeStatus.Healthy); + equal(organizationWorkerHost.runCount, 1); + deepEqual(natsEventConsumer.batchSizes, [50]); + deepEqual(telemetrySink.records, [ + { + eventName: WorkerRuntimeTelemetryEventName.WorkerCycleCompleted, + attributes: { + "agentic.worker.cycle.status": WorkerCycleStatus.Worked, + "agentic.worker.outbox.status": OutboxPublishOutcomeStatus.Published, + "agentic.worker.inbound.pulled_count": 0, + "agentic.worker.inbound.processed_count": 0, + "agentic.worker.inbound.duplicate_count": 0, + "agentic.worker.inbound.payload_conflict_count": 0, + "agentic.worker.inbound.failed_count": 0, + "agentic.worker.inbound.reaction_plan_count": 0, + "agentic.worker.failure_count": 0, + }, + }, + { + eventName: WorkerRuntimeTelemetryEventName.NatsConsumerBatchProcessed, + attributes: { + "messaging.system": "nats", + "messaging.nats.stream": "agentic-org-events", + "messaging.nats.consumer": "agentic-org-v0-automation-planner", + "agentic.nats.consumer.received_count": 2, + "agentic.nats.consumer.processed_count": 2, + "agentic.nats.consumer.duplicate_count": 0, + "agentic.nats.consumer.payload_conflict_count": 0, + "agentic.nats.consumer.invalid_count": 0, + "agentic.nats.consumer.failed_count": 0, + "agentic.nats.consumer.acknowledged_count": 2, + "agentic.nats.consumer.negative_acknowledged_count": 0, + "agentic.nats.consumer.terminated_count": 0, + "agentic.nats.consumer.dead_lettered_count": 0, + }, + }, + ]); + }); + + test("keeps the NATS loop running when the worker loop throws", async () => { + const natsEventConsumer = createRecordingNatsEventConsumer(createProcessedNatsBatch()); + const runtime = createWorkerRuntime({ + config: createRuntimeConfig(), + organizationWorkerHost: createFailingOrganizationWorkerHost("outbox loop failed"), + natsEventConsumer, + telemetrySink: createRecordingTelemetrySink(), + }); + + const result = await runtime.runOnce(); + + equal(result.status, WorkerRuntimeStatus.Degraded); + deepEqual(natsEventConsumer.batchSizes, [50]); + deepEqual(result.failures, [ + { + stage: WorkerRuntimeFailureStage.OrganizationWorker, + message: "outbox loop failed", + }, + ]); + }); + + test("marks the runtime degraded when NATS consumer reports dead letters", async () => { + const runtime = createWorkerRuntime({ + config: createRuntimeConfig(), + organizationWorkerHost: createRecordingOrganizationWorkerHost(createWorkedWorkerCycle()), + natsEventConsumer: createRecordingNatsEventConsumer({ + ...createProcessedNatsBatch(), + deadLetteredCount: 1, + terminatedCount: 1, + }), + telemetrySink: createRecordingTelemetrySink(), + }); + + const result = await runtime.runOnce(); + + equal(result.status, WorkerRuntimeStatus.Degraded); + }); + + test("rejects invalid process config before loops can start", () => { + try { + createWorkerRuntime({ + config: { + ...createRuntimeConfig(), + natsInboundBatchSize: 0, + }, + organizationWorkerHost: createRecordingOrganizationWorkerHost(createWorkedWorkerCycle()), + natsEventConsumer: createRecordingNatsEventConsumer(createProcessedNatsBatch()), + telemetrySink: createRecordingTelemetrySink(), + }); + throw new Error("expected worker runtime config validation to fail"); + } catch (error) { + equal(error instanceof WorkerRuntimeConfigError, true); + equal((error as WorkerRuntimeConfigError).code, WorkerRuntimeConfigErrorCode.InvalidNatsInboundBatchSize); + } + }); +}); + +function createRuntimeConfig(): { + environment: string; + natsInboundBatchSize: number; + organizationId: string; + natsStreamName: string; + natsDurableName: string; +} { + return { + environment: "test", + natsInboundBatchSize: 50, + organizationId: "org-lfg", + natsStreamName: "agentic-org-events", + natsDurableName: "agentic-org-v0-automation-planner", + }; +} + +function createWorkedWorkerCycle(): WorkerCycleResult { + return { + status: WorkerCycleStatus.Worked, + outbox: { + status: OutboxPublishOutcomeStatus.Published, + attemptedCount: 1, + publishedOutboxEventIds: ["outbox-001"], + }, + inbound: { + pulledCount: 0, + processedCount: 0, + duplicateCount: 0, + payloadConflictCount: 0, + failedCount: 0, + reactionPlanCount: 0, + }, + failures: [], + }; +} + +function createProcessedNatsBatch(): NatsJetStreamConsumeBatchResult { + return { + receivedCount: 2, + processedCount: 2, + duplicateCount: 0, + payloadConflictCount: 0, + invalidCount: 0, + failedCount: 0, + acknowledgedCount: 2, + negativeAcknowledgedCount: 0, + terminatedCount: 0, + deadLetteredCount: 0, + }; +} + +function createRecordingOrganizationWorkerHost(result: WorkerCycleResult): { + runCount: number; + runOnce: () => Promise; +} { + return { + runCount: 0, + runOnce: async function runOnce() { + this.runCount += 1; + return result; + }, + }; +} + +function createFailingOrganizationWorkerHost(message: string): { + runOnce: () => Promise; +} { + return { + runOnce: async () => { + throw new Error(message); + }, + }; +} + +function createRecordingNatsEventConsumer(result: NatsJetStreamConsumeBatchResult): { + batchSizes: number[]; + processNextBatch: (input: { batchSize: number }) => Promise; +} { + const batchSizes: number[] = []; + + return { + batchSizes, + processNextBatch: async (input) => { + batchSizes.push(input.batchSize); + return result; + }, + }; +} + +function createRecordingTelemetrySink(): WorkerRuntimeTelemetrySink & { + records: { + eventName: WorkerRuntimeTelemetryEventName; + attributes: Record; + }[]; +} { + const records: { + eventName: WorkerRuntimeTelemetryEventName; + attributes: Record; + }[] = []; + + return { + records, + record: async (record) => { + records.push(record); + }, + }; +} diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index cad035186d..a6cd3966bb 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -37,6 +37,7 @@ send_supervisor_signal -> inbox receipt / consumer dedupe -> persisted reaction plans -> worker host cycle summary + -> apps/workers runtime summary -> LGTM span attributes -> supervisor triage reaction plan ``` @@ -56,6 +57,12 @@ send_supervisor_signal | `@agentic-org/workers` | process-boundary run-once worker host that composes outbox publishing and inbound event ingestion through ports | | `@agentic-org/governance` | package dependency-boundary checks that prevent application code from importing concrete state/runtime adapters | +## Apps + +| App | Implemented first | +| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `apps/workers` | NodeNext runtime-host shell that runs the package worker cycle, runs the NATS consumer cycle, emits telemetry records, and reports healthy/degraded state | + ## NodeNext Runtime Decision Agentic Organization now has a local `package.json` and @@ -158,15 +165,23 @@ Hermes runs, MCP calls, and UI evidence. processed, duplicate, payload-conflict, invalid, failed, acknowledged, negative-acknowledged, terminated, and dead-lettered counts. +- `apps/workers` now composes the package worker cycle and NATS consumer + cycle behind process configuration and telemetry ports. If one cycle + throws, the other still runs and the runtime result is degraded with a + typed failure stage. +- `apps/workers` validates required process config before any loop can + start: environment, Organization ID, NATS stream, durable consumer, + and positive NATS inbound batch size. ## Next Slice -The next slice should add the first runnable `apps/workers` composition -host that binds the outbox publisher, NATS consumer adapter, runtime -ingestion processor, durable state adapters, and observability helpers -behind process configuration. After that, add a transactional -durable-state adapter integration test using CockroachDB as the first -cluster-backed implementation once a local/dev connection is available. +The next slice should bind the first real process adapters into +`apps/workers`: concrete NATS pull/publish clients, the durable +CockroachDB outbox/inbox adapters, and a telemetry sink that can later +send structured logs and metrics into the full-ai-cluster LGTM stack. +After that, add a transactional durable-state adapter integration test +using CockroachDB as the first cluster-backed implementation once a +local/dev connection is available. Do not make the next slice a pile of bespoke request commands. Build the generic supervisor triage lifecycle first, then let specialized diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index d3e025280b..5020e971f6 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -262,6 +262,15 @@ a chance to run. `apps/workers` will later bind those ports to real cluster adapters and attach process concerns such as health checks, metrics, structured logs, readiness, and graceful shutdown. +`apps/workers` now exists as the first NodeNext runtime-host shell. It +does not introduce NestJS yet. It composes the package-level worker host +and the NATS consumer adapter, applies runtime config such as +environment, Organization ID, NATS stream, durable consumer, and batch +size, records telemetry through a sink port, and reports +healthy/degraded status. Concrete NATS clients, CockroachDB pools, +readiness endpoints, structured logging, and shutdown hooks still belong +to later process-adapter wiring. + ## SOLID Rules ### Single Responsibility @@ -813,6 +822,13 @@ tools, policy denials, harness failures, and telemetry gaps, then route fixes through the same command, review, and security lifecycle as any other work. +The first `apps/workers` runtime projects both package worker-cycle +counts and NATS consumer batch counts through telemetry sink ports. The +runtime treats package degraded status, thrown loop failures, +dead-lettered NATS messages, and failed NATS messages as degraded state +so weak points can surface before the process is connected to real +cluster telemetry. + ## V0 Build Sequence 1. Create package skeletons for: @@ -856,7 +872,9 @@ other work. 8. Add the first rule catalog and reaction executor for ready work, review staffing, QA staffing, blocker escalation, and late run incidents. -9. Add the NestJS API and worker hosts. +9. Add runtime hosts. The first NodeNext `apps/workers` host now + composes the worker and NATS consumer loops through ports; NestJS API + and richer worker process wiring are still pending. 10. Add UI projections for work board, review center, and evidence timeline. 11. Add real cluster adapters one at a time. diff --git a/agentic-organization/package.json b/agentic-organization/package.json index 3d1157cebb..e650e5ee88 100644 --- a/agentic-organization/package.json +++ b/agentic-organization/package.json @@ -3,7 +3,7 @@ "private": true, "type": "module", "scripts": { - "test": "node --experimental-strip-types --test packages/*/test/**/*.test.ts", + "test": "node --experimental-strip-types --test packages/*/test/**/*.test.ts apps/*/test/**/*.test.ts", "typecheck": "npx --yes -p typescript@6.0.3 tsc -p tsconfig.json" }, "engines": { diff --git a/agentic-organization/packages/README.md b/agentic-organization/packages/README.md index 9af87531cc..47fd012c5d 100644 --- a/agentic-organization/packages/README.md +++ b/agentic-organization/packages/README.md @@ -117,7 +117,7 @@ The test command uses Node's built-in test runner and TypeScript type stripping: ```text -node --experimental-strip-types --test packages/*/test/**/*.test.ts +node --experimental-strip-types --test packages/*/test/**/*.test.ts apps/*/test/**/*.test.ts ``` This is a deliberate NodeNext starting point so the package contracts diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts index 4348b41828..242e063c0d 100644 --- a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts @@ -4,7 +4,9 @@ import { fileURLToPath } from "node:url"; export const PackageBoundaryRule = { Application: "application", + ApplicationHost: "application_host", Messaging: "messaging", + Packages: "packages", ProductionSource: "production_source", StateAdapter: "state_adapter", Workers: "workers", diff --git a/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts index 13ca12b1a8..515781f6c3 100644 --- a/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts +++ b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts @@ -9,6 +9,7 @@ import { } from "../src/package-dependency-boundaries.ts"; const packagesRootDirectory = new URL("../..", import.meta.url); +const agenticOrganizationRootDirectory = new URL("../../..", import.meta.url); describe("package dependency boundaries", () => { test("keeps application independent from state and runtime adapters", async () => { @@ -143,4 +144,75 @@ describe("package dependency boundaries", () => { equal(violations.length, 0, violations.map((violation) => violation.message).join("\n")); }); + + test("keeps package code independent from app hosts", async () => { + const violations = await validatePackageDependencyBoundaries({ + rootDirectory: packagesRootDirectory, + rules: [ + { + packageName: PackageBoundaryRule.Packages, + sourceGlob: "application/src/**/*.ts", + forbiddenImportFragments: ["apps/"], + }, + { + packageName: PackageBoundaryRule.Packages, + sourceGlob: "domain/src/**/*.ts", + forbiddenImportFragments: ["apps/"], + }, + { + packageName: PackageBoundaryRule.Packages, + sourceGlob: "messaging/src/**/*.ts", + forbiddenImportFragments: ["apps/"], + }, + { + packageName: PackageBoundaryRule.Packages, + sourceGlob: "messaging-nats/src/**/*.ts", + forbiddenImportFragments: ["apps/"], + }, + { + packageName: PackageBoundaryRule.Packages, + sourceGlob: "observability/src/**/*.ts", + forbiddenImportFragments: ["apps/"], + }, + { + packageName: PackageBoundaryRule.Packages, + sourceGlob: "runtime/src/**/*.ts", + forbiddenImportFragments: ["apps/"], + }, + { + packageName: PackageBoundaryRule.Packages, + sourceGlob: "state/src/**/*.ts", + forbiddenImportFragments: ["apps/"], + }, + { + packageName: PackageBoundaryRule.Packages, + sourceGlob: "state-cockroach/src/**/*.ts", + forbiddenImportFragments: ["apps/"], + }, + { + packageName: PackageBoundaryRule.Packages, + sourceGlob: "workers/src/**/*.ts", + forbiddenImportFragments: ["apps/"], + }, + ], + }); + + equal(violations.length, 0, violations.map((violation) => violation.message).join("\n")); + }); + + test("keeps app tests out of production source directories", async () => { + const violations = await validatePackageSourceLayout({ + rootDirectory: agenticOrganizationRootDirectory, + rules: [ + { + packageName: PackageBoundaryRule.ApplicationHost, + sourceGlob: "apps/workers/src/**/*.ts", + forbiddenFileSuffix: ".test.ts", + reason: PackageSourceLayoutViolationReason.TestFileInProductionSource, + }, + ], + }); + + equal(violations.length, 0, violations.map((violation) => violation.message).join("\n")); + }); }); diff --git a/agentic-organization/packages/observability/src/index.ts b/agentic-organization/packages/observability/src/index.ts index a17c33859d..3a16effc1c 100644 --- a/agentic-organization/packages/observability/src/index.ts +++ b/agentic-organization/packages/observability/src/index.ts @@ -12,6 +12,12 @@ export { type NatsConsumerBatchAttributes, type NatsConsumerBatchCounts, } from "./nats-consumer-attributes.ts"; +export { + WorkerCycleAttributeKey, + buildWorkerCycleAttributes, + type BuildWorkerCycleAttributesInput, + type WorkerCycleAttributes, +} from "./worker-cycle-attributes.ts"; export { VisibilityHealth, WeakPointIndicatorType, diff --git a/agentic-organization/packages/observability/src/worker-cycle-attributes.ts b/agentic-organization/packages/observability/src/worker-cycle-attributes.ts new file mode 100644 index 0000000000..ec18407a82 --- /dev/null +++ b/agentic-organization/packages/observability/src/worker-cycle-attributes.ts @@ -0,0 +1,41 @@ +export const WorkerCycleAttributeKey = { + Status: "agentic.worker.cycle.status", + OutboxStatus: "agentic.worker.outbox.status", + InboundPulledCount: "agentic.worker.inbound.pulled_count", + InboundProcessedCount: "agentic.worker.inbound.processed_count", + InboundDuplicateCount: "agentic.worker.inbound.duplicate_count", + InboundPayloadConflictCount: "agentic.worker.inbound.payload_conflict_count", + InboundFailedCount: "agentic.worker.inbound.failed_count", + InboundReactionPlanCount: "agentic.worker.inbound.reaction_plan_count", + FailureCount: "agentic.worker.failure_count", +} as const; + +export type WorkerCycleAttributeKey = (typeof WorkerCycleAttributeKey)[keyof typeof WorkerCycleAttributeKey]; + +export type BuildWorkerCycleAttributesInput = { + status: string; + outboxStatus: string; + inboundPulledCount: number; + inboundProcessedCount: number; + inboundDuplicateCount: number; + inboundPayloadConflictCount: number; + inboundFailedCount: number; + inboundReactionPlanCount: number; + failureCount: number; +}; + +export type WorkerCycleAttributes = Record; + +export function buildWorkerCycleAttributes(input: BuildWorkerCycleAttributesInput): WorkerCycleAttributes { + return { + [WorkerCycleAttributeKey.Status]: input.status, + [WorkerCycleAttributeKey.OutboxStatus]: input.outboxStatus, + [WorkerCycleAttributeKey.InboundPulledCount]: input.inboundPulledCount, + [WorkerCycleAttributeKey.InboundProcessedCount]: input.inboundProcessedCount, + [WorkerCycleAttributeKey.InboundDuplicateCount]: input.inboundDuplicateCount, + [WorkerCycleAttributeKey.InboundPayloadConflictCount]: input.inboundPayloadConflictCount, + [WorkerCycleAttributeKey.InboundFailedCount]: input.inboundFailedCount, + [WorkerCycleAttributeKey.InboundReactionPlanCount]: input.inboundReactionPlanCount, + [WorkerCycleAttributeKey.FailureCount]: input.failureCount, + }; +} diff --git a/agentic-organization/tsconfig.json b/agentic-organization/tsconfig.json index db930028ea..657668a740 100644 --- a/agentic-organization/tsconfig.json +++ b/agentic-organization/tsconfig.json @@ -16,5 +16,5 @@ "skipLibCheck": true, "noEmit": true }, - "include": ["packages/**/*.ts"] + "include": ["packages/**/*.ts", "apps/**/*.ts"] } diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 797c1372cc..80720ae270 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -300,6 +300,37 @@ live infrastructure adapters are bound by a runtime host. - **AND** concrete process concerns are left for `apps/workers` or adapter packages +#### Scenario: Workers app composes process loops + +- **WHEN** the `apps/workers` runtime host is asked to run once +- **THEN** it runs the package-level Organization worker cycle +- **AND** it runs the NATS consumer adapter cycle with the configured + inbound batch size +- **AND** it records worker-cycle telemetry and NATS-consumer batch + telemetry through a telemetry sink port +- **AND** it reports healthy only when both loops complete without + degraded worker status, NATS failures, or dead letters + +#### Scenario: Workers app rejects invalid process config + +- **WHEN** the `apps/workers` runtime host is created with missing + environment, missing Organization ID, missing NATS stream, missing + durable consumer, or non-positive NATS inbound batch size +- **THEN** runtime creation fails with a typed configuration error before + any worker loop can start + +#### Scenario: Workers app keeps loops visible when one loop fails + +- **WHEN** the package-level worker cycle throws +- **THEN** the `apps/workers` runtime host still attempts the NATS + consumer cycle +- **AND** the runtime result reports degraded status with a typed + organization-worker failure stage +- **WHEN** the NATS consumer cycle reports failed or dead-lettered + messages +- **THEN** the runtime result reports degraded status without hiding the + batch counts + ### Requirement: Telemetry is complete at the event boundary Organization packages MUST expose OpenTelemetry-compatible attributes From 77f3fd7734e2813ac8082da43b8eeed09a23010c Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 21:28:17 -0400 Subject: [PATCH 16/21] feat(agentic-org): add workers process composition seam Co-Authored-By: Codex --- agentic-organization/apps/workers/README.md | 29 +++++++ .../apps/workers/src/composition.ts | 28 +++++++ .../apps/workers/src/config.ts | 67 +++++++++++++++++ .../apps/workers/src/index.ts | 2 + .../workers/test/worker-composition.test.ts | 75 +++++++++++++++++++ .../apps/workers/test/worker-config.test.ts | 61 +++++++++++++++ .../docs/FIRST_IMPLEMENTATION_SLICE.md | 29 ++++--- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 33 ++++++-- openspec/specs/agentic-organization/spec.md | 20 +++++ 9 files changed, 326 insertions(+), 18 deletions(-) create mode 100644 agentic-organization/apps/workers/src/composition.ts create mode 100644 agentic-organization/apps/workers/src/config.ts create mode 100644 agentic-organization/apps/workers/test/worker-composition.test.ts create mode 100644 agentic-organization/apps/workers/test/worker-config.test.ts diff --git a/agentic-organization/apps/workers/README.md b/agentic-organization/apps/workers/README.md index 5aaf101a58..93b2cfc4bd 100644 --- a/agentic-organization/apps/workers/README.md +++ b/agentic-organization/apps/workers/README.md @@ -12,6 +12,10 @@ The app composes existing packages. It does not own business rules. Current duties: +- parse typed runtime configuration from process environment values that + Kubernetes can later provide through ConfigMaps and Secrets; +- compose the runtime through a single app-level factory so concrete + adapters stay outside domain and package code; - run the package-level Organization worker cycle; - run the NATS JetStream consumer adapter cycle; - pass configured NATS batch size, stream name, and durable consumer @@ -34,3 +38,28 @@ The app currently receives ports that tests can fake: Future concrete process wiring can bind these ports to CockroachDB, NATS, OTLP/logging, health checks, readiness checks, and graceful shutdown without changing runtime rule evaluation. + +## Environment + +The worker app owns process configuration. Packages receive typed +config and ports; they do not read environment variables. + +Required runtime values: + +- `AGENTIC_ORG_ENV`; +- `AGENTIC_ORG_ID`; +- `NATS_STREAM`; +- `NATS_DURABLE`; +- `NATS_INBOUND_BATCH_SIZE`. + +URLs, credentials, and other sensitive adapter settings belong in later +adapter config bound by the composition root. They should come from +Kubernetes Secrets or ExternalSecrets in the full AI cluster, not from +domain packages. + +## Composition Root + +`composeWorkerRuntime` is the app-level seam where already-constructed +ports are connected to the runtime. Today tests provide fake ports. The +next production slice should construct real CockroachDB, NATS, and +telemetry adapters here while preserving the same package contracts. diff --git a/agentic-organization/apps/workers/src/composition.ts b/agentic-organization/apps/workers/src/composition.ts new file mode 100644 index 0000000000..4347648f2a --- /dev/null +++ b/agentic-organization/apps/workers/src/composition.ts @@ -0,0 +1,28 @@ +import type { NatsJetStreamEventConsumer } from "../../../packages/messaging-nats/src/index.ts"; +import type { OrganizationWorkerHost } from "../../../packages/workers/src/index.ts"; +import { + createWorkerRuntime, + type WorkerRuntime, + type WorkerRuntimeConfig, + type WorkerRuntimeTelemetrySink, +} from "./worker-runtime.ts"; + +export type WorkerRuntimePorts = { + organizationWorkerHost: OrganizationWorkerHost; + natsEventConsumer: NatsJetStreamEventConsumer; + telemetrySink: WorkerRuntimeTelemetrySink; +}; + +export type ComposeWorkerRuntimeInput = { + config: WorkerRuntimeConfig; + ports: WorkerRuntimePorts; +}; + +export function composeWorkerRuntime(input: ComposeWorkerRuntimeInput): WorkerRuntime { + return createWorkerRuntime({ + config: input.config, + organizationWorkerHost: input.ports.organizationWorkerHost, + natsEventConsumer: input.ports.natsEventConsumer, + telemetrySink: input.ports.telemetrySink, + }); +} diff --git a/agentic-organization/apps/workers/src/config.ts b/agentic-organization/apps/workers/src/config.ts new file mode 100644 index 0000000000..0b486c47b0 --- /dev/null +++ b/agentic-organization/apps/workers/src/config.ts @@ -0,0 +1,67 @@ +import { WorkerRuntimeConfigError, WorkerRuntimeConfigErrorCode, type WorkerRuntimeConfig } from "./worker-runtime.ts"; + +export const WorkerProcessEnvName = { + AgenticOrgEnv: "AGENTIC_ORG_ENV", + AgenticOrgId: "AGENTIC_ORG_ID", + NatsDurable: "NATS_DURABLE", + NatsInboundBatchSize: "NATS_INBOUND_BATCH_SIZE", + NatsStream: "NATS_STREAM", +} as const; + +export type WorkerProcessEnvName = (typeof WorkerProcessEnvName)[keyof typeof WorkerProcessEnvName]; + +export type WorkerProcessEnvironment = Partial>; + +export function parseWorkerRuntimeConfigFromEnv(env: WorkerProcessEnvironment): WorkerRuntimeConfig { + return { + environment: readRequiredEnvValue( + env, + WorkerProcessEnvName.AgenticOrgEnv, + WorkerRuntimeConfigErrorCode.MissingEnvironment, + ), + organizationId: readRequiredEnvValue( + env, + WorkerProcessEnvName.AgenticOrgId, + WorkerRuntimeConfigErrorCode.MissingOrganizationId, + ), + natsStreamName: readRequiredEnvValue( + env, + WorkerProcessEnvName.NatsStream, + WorkerRuntimeConfigErrorCode.MissingNatsStreamName, + ), + natsDurableName: readRequiredEnvValue( + env, + WorkerProcessEnvName.NatsDurable, + WorkerRuntimeConfigErrorCode.MissingNatsDurableName, + ), + natsInboundBatchSize: parseNatsInboundBatchSize(env[WorkerProcessEnvName.NatsInboundBatchSize]), + }; +} + +function readRequiredEnvValue( + env: WorkerProcessEnvironment, + name: WorkerProcessEnvName, + errorCode: WorkerRuntimeConfigErrorCode, +): string { + const value = env[name]; + + if (value === undefined || value.trim().length === 0) { + throw new WorkerRuntimeConfigError(errorCode); + } + + return value; +} + +function parseNatsInboundBatchSize(value: string | undefined): number { + if (value === undefined || value.trim().length === 0) { + throw new WorkerRuntimeConfigError(WorkerRuntimeConfigErrorCode.InvalidNatsInboundBatchSize); + } + + const parsedValue = Number(value); + + if (!Number.isInteger(parsedValue) || parsedValue < 1) { + throw new WorkerRuntimeConfigError(WorkerRuntimeConfigErrorCode.InvalidNatsInboundBatchSize); + } + + return parsedValue; +} diff --git a/agentic-organization/apps/workers/src/index.ts b/agentic-organization/apps/workers/src/index.ts index 1b44e96993..bd7d45dbca 100644 --- a/agentic-organization/apps/workers/src/index.ts +++ b/agentic-organization/apps/workers/src/index.ts @@ -1,3 +1,5 @@ +export { WorkerProcessEnvName, parseWorkerRuntimeConfigFromEnv, type WorkerProcessEnvironment } from "./config.ts"; +export { composeWorkerRuntime, type ComposeWorkerRuntimeInput, type WorkerRuntimePorts } from "./composition.ts"; export { WorkerRuntimeConfigError, WorkerRuntimeConfigErrorCode, diff --git a/agentic-organization/apps/workers/test/worker-composition.test.ts b/agentic-organization/apps/workers/test/worker-composition.test.ts new file mode 100644 index 0000000000..0d2aa035a4 --- /dev/null +++ b/agentic-organization/apps/workers/test/worker-composition.test.ts @@ -0,0 +1,75 @@ +import { equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { OutboxPublishOutcomeStatus } from "../../../packages/messaging/src/index.ts"; +import type { NatsJetStreamConsumeBatchResult } from "../../../packages/messaging-nats/src/index.ts"; +import { WorkerCycleStatus, type WorkerCycleResult } from "../../../packages/workers/src/index.ts"; +import { WorkerRuntimeStatus, composeWorkerRuntime, type WorkerRuntimeTelemetrySink } from "../src/index.ts"; + +describe("worker runtime composition", () => { + test("creates a runnable worker runtime from parsed config and ports", async () => { + const runtime = composeWorkerRuntime({ + config: { + environment: "dev", + organizationId: "org-lfg", + natsStreamName: "agentic-org-events", + natsDurableName: "agentic-org-v0-automation-planner", + natsInboundBatchSize: 25, + }, + ports: { + organizationWorkerHost: { + runOnce: async () => createWorkedWorkerCycle(), + }, + natsEventConsumer: { + processNextBatch: async () => createProcessedNatsBatch(), + }, + telemetrySink: createNoopTelemetrySink(), + }, + }); + + const result = await runtime.runOnce(); + + equal(result.status, WorkerRuntimeStatus.Healthy); + }); +}); + +function createWorkedWorkerCycle(): WorkerCycleResult { + return { + status: WorkerCycleStatus.Worked, + outbox: { + status: OutboxPublishOutcomeStatus.Published, + attemptedCount: 1, + publishedOutboxEventIds: ["outbox-001"], + }, + inbound: { + pulledCount: 0, + processedCount: 0, + duplicateCount: 0, + payloadConflictCount: 0, + failedCount: 0, + reactionPlanCount: 0, + }, + failures: [], + }; +} + +function createProcessedNatsBatch(): NatsJetStreamConsumeBatchResult { + return { + receivedCount: 0, + processedCount: 0, + duplicateCount: 0, + payloadConflictCount: 0, + invalidCount: 0, + failedCount: 0, + acknowledgedCount: 0, + negativeAcknowledgedCount: 0, + terminatedCount: 0, + deadLetteredCount: 0, + }; +} + +function createNoopTelemetrySink(): WorkerRuntimeTelemetrySink { + return { + record: async () => undefined, + }; +} diff --git a/agentic-organization/apps/workers/test/worker-config.test.ts b/agentic-organization/apps/workers/test/worker-config.test.ts new file mode 100644 index 0000000000..d79939ec75 --- /dev/null +++ b/agentic-organization/apps/workers/test/worker-config.test.ts @@ -0,0 +1,61 @@ +import { deepEqual, equal } from "node:assert/strict"; +import { describe, test } from "node:test"; + +import { + WorkerProcessEnvName, + WorkerRuntimeConfigError, + WorkerRuntimeConfigErrorCode, + parseWorkerRuntimeConfigFromEnv, +} from "../src/index.ts"; + +describe("worker runtime config parsing", () => { + test("parses typed runtime config from process env", () => { + deepEqual( + parseWorkerRuntimeConfigFromEnv({ + [WorkerProcessEnvName.AgenticOrgEnv]: "dev", + [WorkerProcessEnvName.AgenticOrgId]: "org-lfg", + [WorkerProcessEnvName.NatsStream]: "agentic-org-events", + [WorkerProcessEnvName.NatsDurable]: "agentic-org-v0-automation-planner", + [WorkerProcessEnvName.NatsInboundBatchSize]: "25", + }), + { + environment: "dev", + organizationId: "org-lfg", + natsStreamName: "agentic-org-events", + natsDurableName: "agentic-org-v0-automation-planner", + natsInboundBatchSize: 25, + }, + ); + }); + + test("rejects missing required env values with typed errors", () => { + try { + parseWorkerRuntimeConfigFromEnv({ + [WorkerProcessEnvName.AgenticOrgId]: "org-lfg", + [WorkerProcessEnvName.NatsStream]: "agentic-org-events", + [WorkerProcessEnvName.NatsDurable]: "agentic-org-v0-automation-planner", + [WorkerProcessEnvName.NatsInboundBatchSize]: "25", + }); + throw new Error("expected config parsing to fail"); + } catch (error) { + equal(error instanceof WorkerRuntimeConfigError, true); + equal((error as WorkerRuntimeConfigError).code, WorkerRuntimeConfigErrorCode.MissingEnvironment); + } + }); + + test("rejects invalid numeric env values with typed errors", () => { + try { + parseWorkerRuntimeConfigFromEnv({ + [WorkerProcessEnvName.AgenticOrgEnv]: "dev", + [WorkerProcessEnvName.AgenticOrgId]: "org-lfg", + [WorkerProcessEnvName.NatsStream]: "agentic-org-events", + [WorkerProcessEnvName.NatsDurable]: "agentic-org-v0-automation-planner", + [WorkerProcessEnvName.NatsInboundBatchSize]: "not-a-number", + }); + throw new Error("expected config parsing to fail"); + } catch (error) { + equal(error instanceof WorkerRuntimeConfigError, true); + equal((error as WorkerRuntimeConfigError).code, WorkerRuntimeConfigErrorCode.InvalidNatsInboundBatchSize); + } + }); +}); diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index a6cd3966bb..49f2f7bdc2 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -59,9 +59,9 @@ send_supervisor_signal ## Apps -| App | Implemented first | -| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `apps/workers` | NodeNext runtime-host shell that runs the package worker cycle, runs the NATS consumer cycle, emits telemetry records, and reports healthy/degraded state | +| App | Implemented first | +| -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `apps/workers` | NodeNext runtime-host shell that parses process config, composes worker ports, runs the package worker cycle, runs the NATS consumer cycle, emits telemetry records, and reports healthy/degraded state | ## NodeNext Runtime Decision @@ -172,16 +172,25 @@ Hermes runs, MCP calls, and UI evidence. - `apps/workers` validates required process config before any loop can start: environment, Organization ID, NATS stream, durable consumer, and positive NATS inbound batch size. +- `apps/workers` parses required runtime values from typed environment + names: `AGENTIC_ORG_ENV`, `AGENTIC_ORG_ID`, `NATS_STREAM`, + `NATS_DURABLE`, and `NATS_INBOUND_BATCH_SIZE`. +- `apps/workers` exposes an app-level composition factory that receives + typed config plus already-constructed ports. Future real CockroachDB, + NATS, and telemetry adapters bind at this app seam instead of leaking + process or secret concerns into reusable packages. ## Next Slice -The next slice should bind the first real process adapters into -`apps/workers`: concrete NATS pull/publish clients, the durable -CockroachDB outbox/inbox adapters, and a telemetry sink that can later -send structured logs and metrics into the full-ai-cluster LGTM stack. -After that, add a transactional durable-state adapter integration test -using CockroachDB as the first cluster-backed implementation once a -local/dev connection is available. +The next slice should add the first real process adapter factories below +`apps/workers`: concrete NATS pull/publish client construction, durable +CockroachDB outbox/inbox adapter construction, and a telemetry sink that +can later send structured logs and metrics into the full-ai-cluster LGTM +stack. Keep URLs, credentials, and connection pools in app adapter config +fed by Kubernetes Secret or ExternalSecret values, never in domain +packages. After that, add a transactional durable-state adapter +integration test using CockroachDB as the first cluster-backed +implementation once a local/dev connection is available. Do not make the next slice a pile of bespoke request commands. Build the generic supervisor triage lifecycle first, then let specialized diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 5020e971f6..0810b954c1 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -264,13 +264,21 @@ metrics, structured logs, readiness, and graceful shutdown. `apps/workers` now exists as the first NodeNext runtime-host shell. It does not introduce NestJS yet. It composes the package-level worker host -and the NATS consumer adapter, applies runtime config such as -environment, Organization ID, NATS stream, durable consumer, and batch -size, records telemetry through a sink port, and reports -healthy/degraded status. Concrete NATS clients, CockroachDB pools, +and the NATS consumer adapter, parses typed process environment values +into runtime config, records telemetry through a sink port, and reports +healthy/degraded status. Its current required environment contract is +`AGENTIC_ORG_ENV`, `AGENTIC_ORG_ID`, `NATS_STREAM`, `NATS_DURABLE`, and +`NATS_INBOUND_BATCH_SIZE`. Concrete NATS clients, CockroachDB pools, readiness endpoints, structured logging, and shutdown hooks still belong to later process-adapter wiring. +The `apps/workers` composition root receives typed config plus +already-constructed ports. This is the only place the worker process +should know which concrete adapter implementation is being used. Domain, +application, runtime, worker, and observability packages must stay free +of process environment, Kubernetes Secret, ExternalSecret, connection +pool, and client-construction details. + ## SOLID Rules ### Single Responsibility @@ -660,6 +668,11 @@ Secrets/ExternalSecrets, but the domain package should never see those values. The Nest composition layer binds configuration into adapter ports. +The current `apps/workers` NodeNext host applies this rule before NestJS +is introduced: non-secret operational values are parsed from typed env +names, while URLs, credentials, and client construction remain reserved +for process adapter factories supplied by the composition root. + Minimum runtime environment contract: ```text @@ -827,7 +840,10 @@ counts and NATS consumer batch counts through telemetry sink ports. The runtime treats package degraded status, thrown loop failures, dead-lettered NATS messages, and failed NATS messages as degraded state so weak points can surface before the process is connected to real -cluster telemetry. +cluster telemetry. The composition root is therefore the future bridge +from these records into the full-ai-cluster LGTM stack: structured logs +to Loki, traces to Tempo through Alloy, metrics to Prometheus/Mimir, and +dashboard projections in Grafana. ## V0 Build Sequence @@ -872,9 +888,10 @@ cluster telemetry. 8. Add the first rule catalog and reaction executor for ready work, review staffing, QA staffing, blocker escalation, and late run incidents. -9. Add runtime hosts. The first NodeNext `apps/workers` host now - composes the worker and NATS consumer loops through ports; NestJS API - and richer worker process wiring are still pending. +9. Add runtime hosts. The first NodeNext `apps/workers` host now parses + typed process config and composes the worker and NATS consumer loops + through ports; NestJS API and richer worker process wiring are still + pending. 10. Add UI projections for work board, review center, and evidence timeline. 11. Add real cluster adapters one at a time. diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 80720ae270..12bc4f3814 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -311,6 +311,26 @@ live infrastructure adapters are bound by a runtime host. - **AND** it reports healthy only when both loops complete without degraded worker status, NATS failures, or dead letters +#### Scenario: Workers app parses process environment + +- **WHEN** the `apps/workers` runtime host parses process environment + values +- **THEN** it reads `AGENTIC_ORG_ENV`, `AGENTIC_ORG_ID`, `NATS_STREAM`, + `NATS_DURABLE`, and `NATS_INBOUND_BATCH_SIZE` through typed env names +- **AND** it returns typed runtime configuration for the composition root +- **AND** packages do not read process environment values directly +- **AND** URLs, credentials, and connection pools remain process adapter + concerns supplied through Kubernetes Secret or ExternalSecret backed + configuration later + +#### Scenario: Workers app composes adapter ports + +- **WHEN** the `apps/workers` composition root is created +- **THEN** it receives typed runtime config plus already-constructed + worker, NATS consumer, and telemetry ports +- **AND** the composition root returns a runnable worker runtime without + leaking concrete adapter construction into package code + #### Scenario: Workers app rejects invalid process config - **WHEN** the `apps/workers` runtime host is created with missing From 0975a440acec8c93dc045e7bd77ebd4c3ac1f33f Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 21:39:35 -0400 Subject: [PATCH 17/21] fix(agentic-org): preserve worker visibility on telemetry failure Co-Authored-By: Codex --- .../apps/workers/src/config.ts | 12 ++- .../apps/workers/src/worker-runtime.ts | 84 ++++++++++++++----- .../apps/workers/test/worker-config.test.ts | 26 +++++- .../apps/workers/test/worker-runtime.test.ts | 49 +++++++++++ .../docs/FIRST_IMPLEMENTATION_SLICE.md | 11 ++- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 34 ++++---- openspec/specs/agentic-organization/spec.md | 8 ++ 7 files changed, 179 insertions(+), 45 deletions(-) diff --git a/agentic-organization/apps/workers/src/config.ts b/agentic-organization/apps/workers/src/config.ts index 0b486c47b0..ee25b75feb 100644 --- a/agentic-organization/apps/workers/src/config.ts +++ b/agentic-organization/apps/workers/src/config.ts @@ -1,5 +1,7 @@ import { WorkerRuntimeConfigError, WorkerRuntimeConfigErrorCode, type WorkerRuntimeConfig } from "./worker-runtime.ts"; +const decimalIntegerPattern = /^[0-9]+$/; + export const WorkerProcessEnvName = { AgenticOrgEnv: "AGENTIC_ORG_ENV", AgenticOrgId: "AGENTIC_ORG_ID", @@ -49,17 +51,19 @@ function readRequiredEnvValue( throw new WorkerRuntimeConfigError(errorCode); } - return value; + return value.trim(); } function parseNatsInboundBatchSize(value: string | undefined): number { - if (value === undefined || value.trim().length === 0) { + const trimmedValue = value?.trim(); + + if (trimmedValue === undefined || trimmedValue.length === 0 || !decimalIntegerPattern.test(trimmedValue)) { throw new WorkerRuntimeConfigError(WorkerRuntimeConfigErrorCode.InvalidNatsInboundBatchSize); } - const parsedValue = Number(value); + const parsedValue = Number(trimmedValue); - if (!Number.isInteger(parsedValue) || parsedValue < 1) { + if (!Number.isSafeInteger(parsedValue) || parsedValue < 1) { throw new WorkerRuntimeConfigError(WorkerRuntimeConfigErrorCode.InvalidNatsInboundBatchSize); } diff --git a/agentic-organization/apps/workers/src/worker-runtime.ts b/agentic-organization/apps/workers/src/worker-runtime.ts index 7cf907d3c8..3c726a70dc 100644 --- a/agentic-organization/apps/workers/src/worker-runtime.ts +++ b/agentic-organization/apps/workers/src/worker-runtime.ts @@ -25,6 +25,7 @@ export type WorkerRuntimeStatus = (typeof WorkerRuntimeStatus)[keyof typeof Work export const WorkerRuntimeFailureStage = { NatsConsumer: "nats_consumer", OrganizationWorker: "organization_worker", + Telemetry: "telemetry", } as const; export type WorkerRuntimeFailureStage = (typeof WorkerRuntimeFailureStage)[keyof typeof WorkerRuntimeFailureStage]; @@ -157,19 +158,23 @@ type RunOrganizationWorkerInput = { async function runOrganizationWorker(input: RunOrganizationWorkerInput): Promise { try { const workerCycle = await input.organizationWorkerHost.runOnce(); - await input.telemetrySink.record({ - eventName: WorkerRuntimeTelemetryEventName.WorkerCycleCompleted, - attributes: buildWorkerCycleAttributes({ - status: workerCycle.status, - outboxStatus: workerCycle.outbox?.status ?? OutboxPublishOutcomeStatus.Empty, - inboundPulledCount: workerCycle.inbound.pulledCount, - inboundProcessedCount: workerCycle.inbound.processedCount, - inboundDuplicateCount: workerCycle.inbound.duplicateCount, - inboundPayloadConflictCount: workerCycle.inbound.payloadConflictCount, - inboundFailedCount: workerCycle.inbound.failedCount, - inboundReactionPlanCount: workerCycle.inbound.reactionPlanCount, - failureCount: workerCycle.failures.length, - }), + await recordTelemetry({ + telemetrySink: input.telemetrySink, + failures: input.failures, + record: { + eventName: WorkerRuntimeTelemetryEventName.WorkerCycleCompleted, + attributes: buildWorkerCycleAttributes({ + status: workerCycle.status, + outboxStatus: workerCycle.outbox?.status ?? OutboxPublishOutcomeStatus.Empty, + inboundPulledCount: workerCycle.inbound.pulledCount, + inboundProcessedCount: workerCycle.inbound.processedCount, + inboundDuplicateCount: workerCycle.inbound.duplicateCount, + inboundPayloadConflictCount: workerCycle.inbound.payloadConflictCount, + inboundFailedCount: workerCycle.inbound.failedCount, + inboundReactionPlanCount: workerCycle.inbound.reactionPlanCount, + failureCount: workerCycle.failures.length, + }), + }, }); return workerCycle; } catch (error) { @@ -193,13 +198,17 @@ async function runNatsConsumer(input: RunNatsConsumerInput): Promise { + try { + await input.telemetrySink.record(input.record); + } catch (error) { + input.failures.push({ + stage: WorkerRuntimeFailureStage.Telemetry, + message: extractErrorMessage(error), + }); + } +} + type ResolveWorkerRuntimeStatusInput = { workerCycle: WorkerCycleResult | undefined; natsConsumerBatch: NatsJetStreamConsumeBatchResult | undefined; @@ -221,8 +247,7 @@ function resolveWorkerRuntimeStatus(input: ResolveWorkerRuntimeStatusInput): Wor if ( input.failures.length > 0 || input.workerCycle?.status === WorkerCycleStatus.Degraded || - input.natsConsumerBatch?.failedCount !== 0 || - input.natsConsumerBatch?.deadLetteredCount !== 0 + isNatsConsumerBatchDegraded(input.natsConsumerBatch) ) { return WorkerRuntimeStatus.Degraded; } @@ -230,6 +255,21 @@ function resolveWorkerRuntimeStatus(input: ResolveWorkerRuntimeStatusInput): Wor return WorkerRuntimeStatus.Healthy; } +function isNatsConsumerBatchDegraded(batch: NatsJetStreamConsumeBatchResult | undefined): boolean { + if (batch === undefined) { + return true; + } + + return ( + batch.failedCount !== 0 || + batch.deadLetteredCount !== 0 || + batch.invalidCount !== 0 || + batch.payloadConflictCount !== 0 || + batch.negativeAcknowledgedCount !== 0 || + batch.terminatedCount !== 0 + ); +} + function extractErrorMessage(error: unknown): string { if (error instanceof Error) { return error.message; diff --git a/agentic-organization/apps/workers/test/worker-config.test.ts b/agentic-organization/apps/workers/test/worker-config.test.ts index d79939ec75..c071d2f2be 100644 --- a/agentic-organization/apps/workers/test/worker-config.test.ts +++ b/agentic-organization/apps/workers/test/worker-config.test.ts @@ -12,10 +12,10 @@ describe("worker runtime config parsing", () => { test("parses typed runtime config from process env", () => { deepEqual( parseWorkerRuntimeConfigFromEnv({ - [WorkerProcessEnvName.AgenticOrgEnv]: "dev", - [WorkerProcessEnvName.AgenticOrgId]: "org-lfg", - [WorkerProcessEnvName.NatsStream]: "agentic-org-events", - [WorkerProcessEnvName.NatsDurable]: "agentic-org-v0-automation-planner", + [WorkerProcessEnvName.AgenticOrgEnv]: " dev ", + [WorkerProcessEnvName.AgenticOrgId]: " org-lfg ", + [WorkerProcessEnvName.NatsStream]: " agentic-org-events ", + [WorkerProcessEnvName.NatsDurable]: " agentic-org-v0-automation-planner ", [WorkerProcessEnvName.NatsInboundBatchSize]: "25", }), { @@ -58,4 +58,22 @@ describe("worker runtime config parsing", () => { equal((error as WorkerRuntimeConfigError).code, WorkerRuntimeConfigErrorCode.InvalidNatsInboundBatchSize); } }); + + test("rejects non-decimal or unsafe batch sizes with typed errors", () => { + for (const batchSize of ["1.5", "1e3", "9007199254740992"]) { + try { + parseWorkerRuntimeConfigFromEnv({ + [WorkerProcessEnvName.AgenticOrgEnv]: "dev", + [WorkerProcessEnvName.AgenticOrgId]: "org-lfg", + [WorkerProcessEnvName.NatsStream]: "agentic-org-events", + [WorkerProcessEnvName.NatsDurable]: "agentic-org-v0-automation-planner", + [WorkerProcessEnvName.NatsInboundBatchSize]: batchSize, + }); + throw new Error("expected config parsing to fail"); + } catch (error) { + equal(error instanceof WorkerRuntimeConfigError, true); + equal((error as WorkerRuntimeConfigError).code, WorkerRuntimeConfigErrorCode.InvalidNatsInboundBatchSize); + } + } + }); }); diff --git a/agentic-organization/apps/workers/test/worker-runtime.test.ts b/agentic-organization/apps/workers/test/worker-runtime.test.ts index 53ba335661..a5b2fbe32f 100644 --- a/agentic-organization/apps/workers/test/worker-runtime.test.ts +++ b/agentic-organization/apps/workers/test/worker-runtime.test.ts @@ -105,6 +105,47 @@ describe("worker runtime composition host", () => { equal(result.status, WorkerRuntimeStatus.Degraded); }); + test("keeps successful loop results visible when telemetry fails", async () => { + const runtime = createWorkerRuntime({ + config: createRuntimeConfig(), + organizationWorkerHost: createRecordingOrganizationWorkerHost(createWorkedWorkerCycle()), + natsEventConsumer: createRecordingNatsEventConsumer(createProcessedNatsBatch()), + telemetrySink: createFailingTelemetrySink("telemetry sink unavailable"), + }); + + const result = await runtime.runOnce(); + + equal(result.status, WorkerRuntimeStatus.Degraded); + equal(result.workerCycle?.status, WorkerCycleStatus.Worked); + equal(result.natsConsumerBatch?.processedCount, 2); + deepEqual(result.failures, [ + { + stage: WorkerRuntimeFailureStage.Telemetry, + message: "telemetry sink unavailable", + }, + { + stage: WorkerRuntimeFailureStage.Telemetry, + message: "telemetry sink unavailable", + }, + ]); + }); + + test("marks the runtime degraded when NATS consumer reports non-happy counters", async () => { + const runtime = createWorkerRuntime({ + config: createRuntimeConfig(), + organizationWorkerHost: createRecordingOrganizationWorkerHost(createWorkedWorkerCycle()), + natsEventConsumer: createRecordingNatsEventConsumer({ + ...createProcessedNatsBatch(), + invalidCount: 1, + }), + telemetrySink: createRecordingTelemetrySink(), + }); + + const result = await runtime.runOnce(); + + equal(result.status, WorkerRuntimeStatus.Degraded); + }); + test("rejects invalid process config before loops can start", () => { try { createWorkerRuntime({ @@ -231,3 +272,11 @@ function createRecordingTelemetrySink(): WorkerRuntimeTelemetrySink & { }, }; } + +function createFailingTelemetrySink(message: string): WorkerRuntimeTelemetrySink { + return { + record: async () => { + throw new Error(message); + }, + }; +} diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index 49f2f7bdc2..8dae3777d2 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -169,12 +169,21 @@ Hermes runs, MCP calls, and UI evidence. cycle behind process configuration and telemetry ports. If one cycle throws, the other still runs and the runtime result is degraded with a typed failure stage. +- `apps/workers` keeps successful worker/NATS cycle results visible even + when telemetry recording fails. Telemetry sink failures degrade the + runtime through a dedicated failure stage instead of erasing completed + work. - `apps/workers` validates required process config before any loop can start: environment, Organization ID, NATS stream, durable consumer, and positive NATS inbound batch size. - `apps/workers` parses required runtime values from typed environment names: `AGENTIC_ORG_ENV`, `AGENTIC_ORG_ID`, `NATS_STREAM`, - `NATS_DURABLE`, and `NATS_INBOUND_BATCH_SIZE`. + `NATS_DURABLE`, and `NATS_INBOUND_BATCH_SIZE`. String values are + trimmed, and NATS inbound batch size must be a safe positive decimal + integer. +- `apps/workers` treats any non-happy NATS consumer counter as degraded: + failed, dead-lettered, invalid, payload-conflict, + negative-acknowledged, or terminated messages. - `apps/workers` exposes an app-level composition factory that receives typed config plus already-constructed ports. Future real CockroachDB, NATS, and telemetry adapters bind at this app seam instead of leaking diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 0810b954c1..95fcc3ab33 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -678,8 +678,9 @@ Minimum runtime environment contract: ```text AGENTIC_ORG_ENV AGENTIC_ORG_ID -COCKROACH_URL -NATS_URL +NATS_STREAM +NATS_DURABLE +NATS_INBOUND_BATCH_SIZE TEMPORAL_ADDRESS HINDSIGHT_URL HERMES_URL @@ -688,11 +689,12 @@ OTEL_EXPORTER_OTLP_ENDPOINT HAT_SYSTEM_NAMESPACE ``` -Secrets such as database credentials, NATS credentials, OpenZiti -credentials, LLM provider keys, and credential-proxy tokens must come -from Vault through External Secrets or another approved cluster secret -path. They should not live in plain Kubernetes manifests and should not -be baked into the Agentic Organization image. +Adapter-specific URLs and secrets such as CockroachDB URLs, NATS URLs, +database credentials, NATS credentials, OpenZiti credentials, LLM +provider keys, and credential-proxy tokens must come from Vault through +External Secrets or another approved cluster secret path. They should +not live in plain Kubernetes manifests and should not be baked into the +Agentic Organization image. ### ArgoCD Sync Wave @@ -837,13 +839,17 @@ other work. The first `apps/workers` runtime projects both package worker-cycle counts and NATS consumer batch counts through telemetry sink ports. The -runtime treats package degraded status, thrown loop failures, -dead-lettered NATS messages, and failed NATS messages as degraded state -so weak points can surface before the process is connected to real -cluster telemetry. The composition root is therefore the future bridge -from these records into the full-ai-cluster LGTM stack: structured logs -to Loki, traces to Tempo through Alloy, metrics to Prometheus/Mimir, and -dashboard projections in Grafana. +runtime treats package degraded status, thrown loop failures, telemetry +sink failures, dead-lettered NATS messages, invalid NATS messages, +payload-conflict NATS messages, negative acknowledgements, terminated +messages, and failed NATS messages as degraded state so weak points can +surface before the process is connected to real cluster telemetry. +Telemetry failures must not erase successful worker or NATS cycle +results; they are captured as their own typed failure stage. The +composition root is therefore the future bridge from these records into +the full-ai-cluster LGTM stack: structured logs to Loki, traces to Tempo +through Alloy, metrics to Prometheus/Mimir, and dashboard projections in +Grafana. ## V0 Build Sequence diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 12bc4f3814..4fc290a87a 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -346,10 +346,18 @@ live infrastructure adapters are bound by a runtime host. consumer cycle - **AND** the runtime result reports degraded status with a typed organization-worker failure stage +- **WHEN** telemetry recording throws after a worker or NATS consumer + cycle succeeds +- **THEN** the successful cycle result remains visible in the runtime + result +- **AND** the runtime result reports degraded status with a typed + telemetry failure stage - **WHEN** the NATS consumer cycle reports failed or dead-lettered messages - **THEN** the runtime result reports degraded status without hiding the batch counts +- **AND** invalid, payload-conflict, negative-acknowledged, or + terminated NATS messages also make the runtime result degraded ### Requirement: Telemetry is complete at the event boundary From 0e14b4e5549984a6876ad93f105ac0a0e8afb434 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 21:47:30 -0400 Subject: [PATCH 18/21] docs(agentic-org): add north star alignment checkpoint Co-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 12 + .../docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md | 233 ++++++++++++++++++ agentic-organization/docs/README.md | 22 ++ .../docs/V0_EXECUTABLE_CONTRACT.md | 26 +- 4 files changed, 280 insertions(+), 13 deletions(-) create mode 100644 agentic-organization/docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index 8dae3777d2..f207d4d1b7 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -42,6 +42,18 @@ send_supervisor_signal -> supervisor triage reaction plan ``` +## Checkpoint Boundary + +The implemented slice does not yet create discussion anchors, graph +nodes, hat assignments, hat tokens, policy decisions, prompt-flow runs, +Hermes runs, or reviewer gates. Those remain V0 follow-on commands. + +Capability-request-shaped inputs should continue to enter through +`send_supervisor_signal`. The target supervisor triage step decides +whether to create a `CapabilityRequest` work item, route to security, +open a discussion, assign implementation work, answer directly, or +escalate. + ## Packages | Package | Implemented first | diff --git a/agentic-organization/docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md b/agentic-organization/docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md new file mode 100644 index 0000000000..ebdb947d1d --- /dev/null +++ b/agentic-organization/docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md @@ -0,0 +1,233 @@ +# North Star Alignment Checkpoint + +## Status + +Current checkpoint after the first executable TypeScript slices and +subagent review. + +## Verdict + +Agentic Organization is directionally aligned with the north star: + +- the primary executable primitive is `send_supervisor_signal`; +- hats are modeled as time-bounded authority, skill, policy, and + communication roles rather than agent identity; +- Organization DB owns business intent, while cluster substrates enforce + or project runtime state; +- work, discussions, decisions, runs, evidence, and memory must stay + anchored to project, initiative, task, gate, incident, release, policy, + or context-gap work; +- the runtime is event-driven through durable state, outbox, NATS + publication, inbox dedupe, reaction plans, workers, and telemetry; +- the design keeps agents able to expand tools, prompt flows, workflows, + and lifecycles through governed organizational work instead of a fixed + list of one-off commands. + +The main risk is convergence. The doc set is broad enough that older +sections still describe future products as if they are current V0 +entrypoints. V0 must remain smaller and sharper. + +## Canonical V0 Product Contract + +The current V0 contract is: + +```text +hat communication brief + -> send_supervisor_signal + -> supervisor triage plan + -> anchored work item and context + -> gate decision + -> hat assignment and scoped runtime authority + -> scheduled prompt-flow run + -> Hermes run binding + -> evidence submission + -> reviewer decision + -> outcome review + -> follow-up work when gaps are found +``` + +Capability requests, credential requests, missing-tool reports, +workflow gaps, security asks, memory gaps, blockers, questions, and +process-improvement ideas are not separate first primitives. They enter +through supervisor-chain communication, then become specialized work only +after the responsible hat triages them. + +## Alignment Confirmed + +### Supervisor Chain + +`SUPERVISOR_CHAIN_COMMUNICATION.md`, `FIRST_IMPLEMENTATION_SLICE.md`, +`V0_SCHEMA_AND_COMMANDS.md`, and OpenSpec all point at +`send_supervisor_signal` as the generic coordination primitive. + +### Hat Model + +`CLUSTER_NATIVE_HAT_SYSTEM.md`, `V0_POLICY_AND_RUNTIME_BOUNDARIES.md`, +and `V0_SCHEMA_AND_COMMANDS.md` preserve hats as scoped, time-bounded +roles with authority, skills, RBAC/policy, succession, and supervisor +graph position. + +### Work Anchors + +`AGENT_NATIVE_KNOWLEDGE_GRAPH.md`, `WORK_AND_RELEASE_MANAGEMENT_OS.md`, +and `UI_AND_OBSERVABILITY_CONCEPTS.md` reject unanchored discussions. +Meetings, one-on-ones, broadcasts, votes, review comments, reports, and +team threads must reference work before they can affect state. + +### Cluster Substrate Position + +`AI_CLUSTER_SCAFFOLD_CONTEXT.md`, `CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md`, +`TECHNICAL_CA_PACKAGE_ARCHITECTURE.md`, and `V0_EXECUTABLE_CONTRACT.md` +correctly place Agentic Organization as a TypeScript consumer workload +on `full-ai-cluster`, not a parallel substrate. + +### Implementation Direction + +The current packages prove the right spine: + +- command handler registry; +- idempotency check; +- supervisor signal handler; +- audit/outbox envelope; +- NATS subject and publisher/consumer adapters; +- inbox dedupe and reaction plans; +- worker host and app composition shell; +- telemetry attributes and workflow visibility records; +- package-boundary governance. + +## Drift To Correct + +### Capability Request Language + +Some older docs still describe agents as directly submitting capability +requests. Those sections must be normalized to: + +```text +agent observes gap + -> send_supervisor_signal + -> supervisor triage + -> optional CapabilityRequest work item + -> department/security/architecture routing + -> implementation/review/activation/outcome review +``` + +Capability request remains a valid work item type. It is not the first +communication primitive. + +### State Name Divergence + +Work item states are named differently across Work OS, V0 schema, UI +concepts, and implementation. Before adding more commands, create one +state reconciliation table that maps: + +- conceptual Work OS state; +- V0 enum; +- UI column; +- event name; +- owner package; +- allowed transitions; +- gate owner. + +### Discussion Anchor Gap + +The docs say V0 work should include discussion anchors and graph nodes. +The current implementation only writes the supervisor signal, audit +event, outbox event, idempotency record, inbox receipts, and reaction +plans. The next V0 command slice must either implement discussion-anchor +creation or explicitly stage it as the next command after +`send_supervisor_signal`. + +### Transaction Boundary Gap + +The current Cockroach adapters execute multiple statements through a +generic SQL executor. The architecture requires state, audit, outbox, +idempotency, inbox receipt, and reaction-plan writes to be atomic at the +right boundaries. The next durable-state slice should introduce a +transactional unit-of-work or single operation port for command outcomes +and event-ingestion outcomes. + +### Policy And Hat Authority Gap + +`send_supervisor_signal` does not yet validate actor hat authority, +source level, target supervisor, or active hat assignment. Before API, +MCP, Hermes, or worker hosts accept real agent commands, the application +boundary needs a policy/hat-authority port and tests for unauthorized +source hats, invalid target supervisors, expired/revoked hats, and +missing assignments. + +### Command Surface Closure + +The command pipeline and command result are still shaped around the +first command. Before adding `triage_supervisor_signal`, +`reserve_hat`, or `decide_gate`, make the pipeline generic over +registered command/result contracts or return a generic command outcome +with typed artifacts and events. + +### Raw Chat Tool Names + +Tool inventory language still includes broad names such as +`send_message`, `open_thread`, and `open_team_chat`. These should be +defined as anchored wrappers, not raw chat authority. All communication +paths must validate or create a `discussion_anchor` before opening a +conversation. + +### UAG Is Not Yet Canonical + +Prompt flows correctly point toward Universal Action Grammar, but UAG v0 +needs a typed registry: action names, target kinds, action modes, +reversibility, observation status, evidence requirements, and replay +semantics. + +## Cluster Integration Gaps + +### Hat-System Projection + +Agentic Organization needs a `k8s-hats` package that can read Hat, +HatBinding, HatSwap, and HatPolicy CRDs, then project them into +Organization signals. The CRD subject model currently differs from +Agentic Organization NATS subjects, so a translator or dual-subject +contract is required. + +### Identity Mapping + +Organization events use `agentId` and `hatAssignmentId`; hat-system +bindings use SPIFFE wearer identity. V0 needs a canonical mapping from +Organization agent/session/hat assignment to SPIFFE identity. + +### Hindsight Memory Attribution + +Docs correctly separate Hindsight memory from Organization graph facts, +but no memory package exists yet. V0 needs a memory attribution contract +for agent ID, hat ID, project, initiative, task, prompt-flow run, and +outcome review. + +### Hermes/OZ Runtime + +`launch_hermes_run` is still a documented boundary, not an executable +adapter. V0 needs a narrow Hermes runtime port before real cloud/runtime +integration. + +### LGTM Export + +The observability helpers are useful but not fully wired to cluster +export. V0 needs concrete OTLP/log/metric adapter config, service +labels, dashboard ownership, and alertable degraded-worker signals. + +## Checkpoint Priorities + +1. Normalize capability-request language across docs so + supervisor-chain communication is the only first primitive. +2. Add a V0 state reconciliation table before expanding command enums or + UI boards. +3. Implement transactional command/event-ingestion outcome ports for + CockroachDB. +4. Add policy/hat-authority checks before exposing command handlers to + API, MCP, Hermes, or workers. +5. Add `triage_supervisor_signal` as the next real command slice. +6. Add discussion-anchor enforcement and graph retrieval OpenSpec + scenarios, then implement the minimal anchor command. +7. Define UAG v0 as a typed package contract before adding prompt-flow + execution. +8. Build one substrate integration at a time, starting with hat-system + projection because identity, authority, CRDs, NATS subjects, and + policy meet there. diff --git a/agentic-organization/docs/README.md b/agentic-organization/docs/README.md index 2dc8721ccd..04dde60ac7 100644 --- a/agentic-organization/docs/README.md +++ b/agentic-organization/docs/README.md @@ -26,6 +26,7 @@ Current documents: - [Implementation Readiness Checklist](./IMPLEMENTATION_READINESS_CHECKLIST.md) - the decisions and contracts that should be defined before scaffolding the first implementation slice. - [Implementation Governance](./IMPLEMENTATION_GOVERNANCE.md) - the current-state, OpenSpec, authority, idempotency, telemetry, security, and quality rules for implementation work. - [First Implementation Slice](./FIRST_IMPLEMENTATION_SLICE.md) - the NodeNext TypeScript package slice proving command, state, audit, outbox, NATS subject, telemetry, and reaction-plan contracts. +- [North Star Alignment Checkpoint](./NORTH_STAR_ALIGNMENT_CHECKPOINT.md) - current alignment verdict, drift list, and next priorities against the Agentic Organization north star. - [V0 Executable Contract](./V0_EXECUTABLE_CONTRACT.md) - the smallest end-to-end runtime slice, grounded against the current `full-ai-cluster` substrate. - [V0 Schema and Commands](./V0_SCHEMA_AND_COMMANDS.md) - the CockroachDB-backed state groups, enums, command contract, outbox model, and TypeScript-facing runtime events for the first implementation. - [V0 Policy and Runtime Boundaries](./V0_POLICY_AND_RUNTIME_BOUNDARIES.md) - the hat policy matrix, MCP preflight checks, cluster runtime boundaries, failure rules, and ArgoCD integration shape. @@ -40,6 +41,27 @@ The intent is to keep the architecture document focused on what the Organization These documents are reference substrate, not a mandate to implement every concept at once. The first implementation should choose the smallest end-to-end slice from [Implementation Readiness Checklist](./IMPLEMENTATION_READINESS_CHECKLIST.md), ship it, and prune or revise the reference docs as the concrete system teaches us. +The current V0 product contract is: + +```text +hat communication brief + -> send_supervisor_signal + -> supervisor triage plan + -> anchored work item and context + -> gate decision + -> hat assignment and scoped runtime authority + -> scheduled prompt-flow run + -> Hermes run binding + -> evidence submission + -> reviewer decision + -> outcome review +``` + +Capability requests, credential requests, workflow gaps, memory gaps, +questions, and blockers enter through supervisor-chain communication +first. They become specialized work only after the responsible hat +triages them. + ## Placement These docs live at `agentic-organization/docs/` as the documentation root for the Agentic Organization subsystem. Runtime code can live under the Agentic Organization product tree, but cluster deployment should land as a `full-ai-cluster/k8s/applications/agentic-organization/` ArgoCD workload. Agentic Organization runs on the `full-ai-cluster` substrate; it is not a second cluster substrate. diff --git a/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md b/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md index 957f6b87c3..845543bdb5 100644 --- a/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md +++ b/agentic-organization/docs/V0_EXECUTABLE_CONTRACT.md @@ -54,16 +54,16 @@ V0 does not need: - autonomous creation of new tools, workflows, or credential proxy endpoints. -V0 should still model those future paths as supervisor-chain signals and -capability requests, so the Organization can later route them through -its own lifecycle. +V0 should still model those future paths as supervisor-chain signals. +Capability request inputs enter through that signal path and become +specialized work only after supervisor triage. ## First Vertical Slice The first executable slice is: ```text -supervisor-chain signal or capability request +supervisor-chain signal -> anchored work item, discussion anchor, and context pack -> one readiness or review gate -> hat assignment @@ -89,15 +89,15 @@ This is the smallest useful loop because it proves: Keep the first hat set small: -| Hat | V0 reason | -| ------------------- | ----------------------------------------------------------------------------------- | -| Director | accepts or rejects escalated supervisor signals or capability requests for V0 scope | -| Engineering Manager | grooms the work item, selects schedule, assigns implementer and reviewer hats | -| Implementer | executes the prompt flow and submits evidence | -| Code Reviewer | reviews the evidence and blocks self-approval | -| Memory Curator | reviews memory writes or flags memory gaps when the run ends | -| Platform Operator | handles runtime failure, pod/session issues, and integration health | -| Security Reviewer | required only when the request needs a new credential or external tool scope | +| Hat | V0 reason | +| ------------------- | ------------------------------------------------------------------------------------------------- | +| Director | accepts or rejects escalated supervisor signals, including capability-request inputs for V0 scope | +| Engineering Manager | grooms the work item, selects schedule, assigns implementer and reviewer hats | +| Implementer | executes the prompt flow and submits evidence | +| Code Reviewer | reviews the evidence and blocks self-approval | +| Memory Curator | reviews memory writes or flags memory gaps when the run ends | +| Platform Operator | handles runtime failure, pod/session issues, and integration health | +| Security Reviewer | required only when the request needs a new credential or external tool scope | The Executive Board, TPM, Product Owner, Architect, QA Reviewer, Hat Designer, and department directors remain first-class in the reference From cebc1f8a35326a2766a9435a9c12e7e33ca406b1 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 22:19:38 -0400 Subject: [PATCH 19/21] feat(agentic-org): add transactional outcome ports Add command outcome persistence so handlers return typed effects and the pipeline records idempotency, state, audit, and outbox through one generic port. Add Cockroach transaction seams for command and event-ingestion outcomes, including receipt-claim race handling and SQL null normalization. Strengthen governance/docs so vendor-specific adapters stay behind generic Organization interfaces.\n\nCo-Authored-By: Codex --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 36 +++- .../docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md | 39 ++-- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 40 ++++ .../src/command-handler-registry.ts | 14 +- .../application/src/command-pipeline.ts | 51 +++-- .../src/handlers/send-supervisor-signal.ts | 30 +-- .../packages/application/src/index.ts | 7 +- .../packages/application/src/ports.ts | 26 +-- .../application/test/command-pipeline.test.ts | 97 +++++++++ .../test/send-supervisor-signal.test.ts | 19 +- .../src/package-dependency-boundaries.ts | 1 + .../package-dependency-boundaries.test.ts | 21 ++ .../packages/runtime/src/event-ingestion.ts | 22 ++- .../runtime/test/event-ingestion.test.ts | 44 +++++ .../src/cockroach-command-state-store.ts | 151 ++++++++------ .../src/cockroach-event-ingestion-store.ts | 139 +++++++++---- .../cockroach-command-state-store.test.ts | 187 ++++++++++-------- .../cockroach-event-ingestion-store.test.ts | 109 +++++++++- .../state/src/event-ingestion-store.ts | 14 +- .../state/src/in-memory-organization-store.ts | 16 +- openspec/specs/agentic-organization/spec.md | 69 ++++++- 21 files changed, 837 insertions(+), 295 deletions(-) diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index f207d4d1b7..963cb267e6 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -29,12 +29,14 @@ send_supervisor_signal -> chain-of-command signal -> audit event -> outbox event with canonical event envelope + -> command outcome persisted through one state-store operation -> outbox publisher -> NATS JetStream event publisher adapter -> NATS JetStream event consumer adapter -> NATS subject contract -> event ingestion processor -> inbox receipt / consumer dedupe + -> event-processing outcome persisted through one store operation -> persisted reaction plans -> worker host cycle summary -> apps/workers runtime summary @@ -124,6 +126,10 @@ Hermes runs, MCP calls, and UI evidence. - The command pipeline receives state-store factories and command handlers through ports instead of constructing in-memory adapters or branching on command types. +- Command handlers return typed effects; the command pipeline persists + the supervisor signal, audit events, outbox events, and idempotency + record through one `recordCommandOutcome` port. Handlers do not write + piecemeal state. - State-store and outbox-source ports are async from the beginning so durable SQL, NATS-backed workers, and other real adapters do not inherit a fake synchronous shape. @@ -152,10 +158,30 @@ Hermes runs, MCP calls, and UI evidence. rules once, rejects same-event payload hash conflicts, and persists reaction plans through a store boundary that durable adapters can make transactional. +- Event ingestion treats only completed receipts as duplicates. If a + same-payload receipt exists without `processedAt` and `result`, the + processor re-evaluates the event and records the full outcome so old + orphan receipts do not suppress automation. - The Cockroach adapter now declares inbox receipt and reaction plan tables plus a SQL-backed event-ingestion store. This is still behind a generic state port; live NATS consumers are not hardcoded into the adapter. +- The Cockroach command and event-ingestion adapters expose + adapter-local transaction batch seams. Application and runtime code + still see generic outcome ports; Cockroach-specific transaction + mechanics stay in `@agentic-org/state-cockroach`. +- The Cockroach command adapter records the idempotency row before + effect rows inside the command transaction batch, so a duplicate key + aborts before supervisor signal, audit, or outbox rows are submitted. +- The Cockroach event-ingestion adapter normalizes SQL `NULL` + completion fields to pending receipts and claims the pending receipt + before inserting reaction plans. If the claim reports duplicate or + payload conflict, the adapter returns that generic outcome without + inserting reaction plans. +- Governance now checks that runtime code, like application code, cannot + import vendor adapters or vendor clients directly. Vendor packages must + implement generic Organization ports consumed by application/runtime + packages. - The worker host now runs one bounded outbox cycle plus one bounded inbound-ingestion cycle through explicit ports, then returns an idle/worked/degraded summary suitable for future logs, metrics, and UI @@ -203,15 +229,17 @@ Hermes runs, MCP calls, and UI evidence. ## Next Slice -The next slice should add the first real process adapter factories below +The next slice should add policy and hat-authority checks before real +API, MCP, Hermes, or worker command entrypoints can call the command +pipeline. After that, add the first real process adapter factories below `apps/workers`: concrete NATS pull/publish client construction, durable CockroachDB outbox/inbox adapter construction, and a telemetry sink that can later send structured logs and metrics into the full-ai-cluster LGTM stack. Keep URLs, credentials, and connection pools in app adapter config fed by Kubernetes Secret or ExternalSecret values, never in domain -packages. After that, add a transactional durable-state adapter -integration test using CockroachDB as the first cluster-backed -implementation once a local/dev connection is available. +packages. Add a durable-state integration test using CockroachDB as the +first cluster-backed implementation once a local/dev connection is +available. Do not make the next slice a pile of bespoke request commands. Build the generic supervisor triage lifecycle first, then let specialized diff --git a/agentic-organization/docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md b/agentic-organization/docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md index ebdb947d1d..be6b34ee01 100644 --- a/agentic-organization/docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md +++ b/agentic-organization/docs/NORTH_STAR_ALIGNMENT_CHECKPOINT.md @@ -86,11 +86,11 @@ on `full-ai-cluster`, not a parallel substrate. The current packages prove the right spine: - command handler registry; -- idempotency check; +- idempotency check and atomic command-outcome persistence port; - supervisor signal handler; - audit/outbox envelope; - NATS subject and publisher/consumer adapters; -- inbox dedupe and reaction plans; +- inbox dedupe, orphan-receipt recovery, and reaction plans; - worker host and app composition shell; - telemetry attributes and workflow visibility records; - package-boundary governance. @@ -137,14 +137,24 @@ plans. The next V0 command slice must either implement discussion-anchor creation or explicitly stage it as the next command after `send_supervisor_signal`. -### Transaction Boundary Gap +### Transaction Boundary Progress -The current Cockroach adapters execute multiple statements through a -generic SQL executor. The architecture requires state, audit, outbox, -idempotency, inbox receipt, and reaction-plan writes to be atomic at the -right boundaries. The next durable-state slice should introduce a -transactional unit-of-work or single operation port for command outcomes -and event-ingestion outcomes. +The command pipeline now persists supervisor signal state, audit events, +outbox events, and idempotency records through one +`recordCommandOutcome` port. Command handlers return typed effects +instead of writing piecemeal state. + +The event ingestion path already used a single +`recordEventProcessingOutcome` port and now treats unfinished receipts +as recoverable rather than duplicate. Cockroach command and +event-ingestion adapters now expose transaction-batch executor seams so +the app/runtime layers remain database-generic while durable adapters +can commit outcome batches atomically. + +The remaining gap is integration-level proof against a real CockroachDB +transaction. The current tests prove the batch boundary and runtime +recovery behavior; a future local/dev-cluster integration test should +prove actual rollback behavior with the real adapter binding. ### Policy And Hat Authority Gap @@ -219,13 +229,14 @@ labels, dashboard ownership, and alertable degraded-worker signals. supervisor-chain communication is the only first primitive. 2. Add a V0 state reconciliation table before expanding command enums or UI boards. -3. Implement transactional command/event-ingestion outcome ports for - CockroachDB. -4. Add policy/hat-authority checks before exposing command handlers to +3. Add policy/hat-authority checks before exposing command handlers to API, MCP, Hermes, or workers. -5. Add `triage_supervisor_signal` as the next real command slice. -6. Add discussion-anchor enforcement and graph retrieval OpenSpec +4. Add `triage_supervisor_signal` as the next real command slice. +5. Add discussion-anchor enforcement and graph retrieval OpenSpec scenarios, then implement the minimal anchor command. +6. Add real CockroachDB transaction integration coverage for command + outcomes and event-ingestion outcomes once a dev connection is + available. 7. Define UAG v0 as a typed package contract before adding prompt-flow execution. 8. Build one substrate integration at a time, starting with hat-system diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 95fcc3ab33..390497e236 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -239,6 +239,14 @@ HatSystemPort -> KubernetesHatSystemAdapter or ReadOnlyFakeHatSystemAdapter ``` Business services should depend on ports, not concrete adapters. +Every vendor-specific implementation must sit behind a generic +Organization interface exported by a non-vendor package. For example, +application code sees `CommandStateStore`, runtime code sees +`EventIngestionStore`, and messaging code sees `EventPublisher`; it +must not see CockroachDB, NATS, OpenZiti, Hindsight, Hermes, Temporal, +Dapr, Kubernetes, or provider-specific clients directly. Vendor +packages may define private executor seams for their own composition, +but those seams are not application contracts. The command pipeline must also depend on a handler registry and a state-store factory supplied by the composition layer. It must not @@ -252,6 +260,16 @@ inbox, and lease adapters must be able to perform real I/O without changing command-handler contracts. CockroachDB is the first durable SQL adapter in the cluster, not an application-layer dependency. +Command handlers must return typed effects, not write state directly. +The command pipeline owns idempotency lookup and calls one command +outcome port that records the business state, audit events, outbox +events, and idempotency record together. This keeps the application +layer closed to concrete database transactions while still giving +durable adapters one atomic commit boundary for a command result. +Durable command adapters should reserve the idempotency record before +effect rows inside that transaction so an idempotency race aborts before +supervisor signal, audit, or outbox state becomes visible. + The first worker boundary follows the same rule. `@agentic-org/workers` does not create NATS clients, Cockroach clients, Nest modules, Temporal workers, or Dapr actors. It receives an outbox publisher, an inbound @@ -513,6 +531,28 @@ silently suppress a reaction plan that failed to persist. The processor also compares payload hashes for repeated `eventId + consumerName` pairs; conflicting payloads are not treated as normal duplicates. +The processor treats only completed inbox receipts as duplicates. A +receipt with a matching payload hash but without completion fields is a +recoverable pending/orphan state: the rule processor may re-evaluate the +event and call the same outcome store operation to complete the receipt +and persist reaction plans. Durable adapters should still make this rare +by committing receipt, reaction plans, and processed marker in one +transaction. + +Cockroach-specific transaction mechanics stay inside +`@agentic-org/state-cockroach`. Application and runtime packages see +outcome ports. The Cockroach adapter receives transaction-batch SQL +executor seams for command outcomes and event-ingestion outcomes, and a +real process adapter must bind those seams to an actual CockroachDB +transaction before production traffic uses the adapter. +The event-ingestion Cockroach adapter must normalize SQL `NULL` +completion fields to omitted receipt fields and claim the pending +receipt at the start of the transaction. Reaction-plan inserts and the +processed marker must be conditional on that claim. If another consumer +already completed the receipt, the adapter returns a duplicate outcome +through the generic `EventIngestionStore` result without inserting +reaction plans. + A worker host composes that ingestion processor with the outbox publisher but stays below the NestJS process layer. This creates a testable boundary where replayable inbound sources and live transport diff --git a/agentic-organization/packages/application/src/command-handler-registry.ts b/agentic-organization/packages/application/src/command-handler-registry.ts index ff3c19918a..3f960c1e49 100644 --- a/agentic-organization/packages/application/src/command-handler-registry.ts +++ b/agentic-organization/packages/application/src/command-handler-registry.ts @@ -1,17 +1,19 @@ -import type { Clock, CommandStateStore, IdGenerator } from "./ports.ts"; +import type { Clock, CommandEffects, IdGenerator } from "./ports.ts"; export type TypedCommand = { type: string; }; -export type CommandExecutionContext = Clock & - IdGenerator & { - store: CommandStateStore; - }; +export type CommandHandlerOutcome = { + result: Result; + effects: CommandEffects; +}; + +export type CommandExecutionContext = Clock & IdGenerator; export type CommandHandler = { commandType: Command["type"]; - execute: (command: Command, context: CommandExecutionContext) => Promise; + execute: (command: Command, context: CommandExecutionContext) => Promise>; }; export type CommandHandlerRegistry = { diff --git a/agentic-organization/packages/application/src/command-pipeline.ts b/agentic-organization/packages/application/src/command-pipeline.ts index 15483b7e26..5772f3a58a 100644 --- a/agentic-organization/packages/application/src/command-pipeline.ts +++ b/agentic-organization/packages/application/src/command-pipeline.ts @@ -1,7 +1,7 @@ import type { CommandHandlerRegistry } from "./command-handler-registry.ts"; import { CommandErrorCode, CommandResultStatus, type CommandResult } from "./command-result.ts"; import type { SendSupervisorSignalCommand } from "./handlers/send-supervisor-signal.ts"; -import type { Clock, CommandStateStore, CommandStateStoreFactory, IdGenerator } from "./ports.ts"; +import type { Clock, CommandEffects, CommandStateStore, CommandStateStoreFactory, IdGenerator } from "./ports.ts"; export type PipelineCommand = SendSupervisorSignalCommand; @@ -52,38 +52,49 @@ async function executeCommand( }; } - const result = await dispatchCommand(command, store, dependencies); - await store.saveIdempotencyRecord({ - idempotencyKey: command.idempotencyKey, - requestHash: command.requestHash, - result, + const outcome = await dispatchCommand(command, dependencies); + + await store.recordCommandOutcome({ + idempotencyRecord: { + idempotencyKey: command.idempotencyKey, + requestHash: command.requestHash, + result: outcome.result, + }, + effects: outcome.result.status === CommandResultStatus.Accepted ? outcome.effects : createEmptyCommandEffects(), }); - return result; + return outcome.result; } async function dispatchCommand( command: PipelineCommand, - store: CommandStateStore, dependencies: CommandPipelineDependencies, -): Promise { +): Promise<{ result: CommandResult; effects: CommandEffects }> { const handler = dependencies.handlerRegistry.resolveHandler(command.type); if (handler !== undefined) { - return await handler.execute(command, { - ...dependencies, - store, - }); + return await handler.execute(command, dependencies); } return { - status: CommandResultStatus.Rejected, - idempotency: { - replayed: false, - }, - error: { - code: CommandErrorCode.UnsupportedCommand, - message: "unsupported command type", + result: { + status: CommandResultStatus.Rejected, + idempotency: { + replayed: false, + }, + error: { + code: CommandErrorCode.UnsupportedCommand, + message: "unsupported command type", + }, }, + effects: createEmptyCommandEffects(), + }; +} + +function createEmptyCommandEffects(): CommandEffects { + return { + supervisorSignals: [], + auditEvents: [], + outboxEvents: [], }; } diff --git a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts index 05660f5029..8ffe2c2b8b 100644 --- a/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts +++ b/agentic-organization/packages/application/src/handlers/send-supervisor-signal.ts @@ -11,9 +11,9 @@ import { type SupervisorSignalToolType, } from "../../../domain/src/index.ts"; import { createAgenticEventEnvelope } from "../../../domain/src/index.ts"; -import type { CommandHandler } from "../command-handler-registry.ts"; +import type { CommandHandler, CommandHandlerOutcome } from "../command-handler-registry.ts"; import { CommandResultStatus, type CommandResult } from "../command-result.ts"; -import type { Clock, CommandStateStore, IdGenerator } from "../ports.ts"; +import type { Clock, IdGenerator } from "../ports.ts"; export const IdPrefix = { SupervisorSignal: "supervisor-signal", @@ -45,10 +45,7 @@ export type SendSupervisorSignalCommand = { relatedWorkItemId: string; }; -export type SendSupervisorSignalDependencies = Clock & - IdGenerator & { - store: CommandStateStore; - }; +export type SendSupervisorSignalDependencies = Clock & IdGenerator; export function createSendSupervisorSignalHandler(): CommandHandler { return { @@ -60,7 +57,7 @@ export function createSendSupervisorSignalHandler(): CommandHandler { +): Promise> { const occurredAt = dependencies.now(); const supervisorSignal: SupervisorSignal = { supervisorSignalId: dependencies.createId(IdPrefix.SupervisorSignal), @@ -123,15 +120,18 @@ export async function sendSupervisorSignal( }), }; - await dependencies.store.appendSupervisorSignal(supervisorSignal); - await dependencies.store.appendAuditEvent(auditEvent); - await dependencies.store.appendOutboxEvent(outboxEvent); - return { - status: CommandResultStatus.Accepted, - supervisorSignal, - idempotency: { - replayed: false, + result: { + status: CommandResultStatus.Accepted, + supervisorSignal, + idempotency: { + replayed: false, + }, + }, + effects: { + supervisorSignals: [supervisorSignal], + auditEvents: [auditEvent], + outboxEvents: [outboxEvent], }, }; } diff --git a/agentic-organization/packages/application/src/index.ts b/agentic-organization/packages/application/src/index.ts index 52603c3c9e..40cfc8860e 100644 --- a/agentic-organization/packages/application/src/index.ts +++ b/agentic-organization/packages/application/src/index.ts @@ -2,6 +2,7 @@ export { createCommandHandlerRegistry, type CommandExecutionContext, type CommandHandler, + type CommandHandlerOutcome, type CommandHandlerRegistry, type TypedCommand, } from "./command-handler-registry.ts"; @@ -20,12 +21,10 @@ export { type SendSupervisorSignalDependencies, } from "./handlers/send-supervisor-signal.ts"; export type { - AuditEventStore, Clock, + CommandEffects, CommandStateStore, CommandStateStoreFactory, - IdempotencyRecordStore, IdGenerator, - OutboxEventStore, - SupervisorSignalStore, + RecordCommandOutcomeInput, } from "./ports.ts"; diff --git a/agentic-organization/packages/application/src/ports.ts b/agentic-organization/packages/application/src/ports.ts index 2f8778b841..9100075169 100644 --- a/agentic-organization/packages/application/src/ports.ts +++ b/agentic-organization/packages/application/src/ports.ts @@ -8,28 +8,22 @@ export type IdGenerator = { createId: (prefix: string) => string; }; -export type IdempotencyRecordStore = { - findIdempotencyRecord: (idempotencyKey: string) => Promise | undefined>; - saveIdempotencyRecord: (record: IdempotencyRecord) => Promise; -}; - -export type SupervisorSignalStore = { - appendSupervisorSignal: (supervisorSignal: SupervisorSignal) => Promise; +export type CommandEffects = { + supervisorSignals: readonly SupervisorSignal[]; + auditEvents: readonly AuditEvent[]; + outboxEvents: readonly OutboxEvent[]; }; -export type AuditEventStore = { - appendAuditEvent: (auditEvent: AuditEvent) => Promise; +export type RecordCommandOutcomeInput = { + idempotencyRecord: IdempotencyRecord; + effects: CommandEffects; }; -export type OutboxEventStore = { - appendOutboxEvent: (outboxEvent: OutboxEvent) => Promise; +export type CommandStateStore = { + findIdempotencyRecord: (idempotencyKey: string) => Promise | undefined>; + recordCommandOutcome: (input: RecordCommandOutcomeInput) => Promise; }; -export type CommandStateStore = IdempotencyRecordStore & - SupervisorSignalStore & - AuditEventStore & - OutboxEventStore; - export type CommandStateStoreFactory = { createCommandStateStore: () => CommandStateStore; }; diff --git a/agentic-organization/packages/application/test/command-pipeline.test.ts b/agentic-organization/packages/application/test/command-pipeline.test.ts index 6774a9923b..8a21ea381f 100644 --- a/agentic-organization/packages/application/test/command-pipeline.test.ts +++ b/agentic-organization/packages/application/test/command-pipeline.test.ts @@ -7,6 +7,7 @@ import { createCommandHandlerRegistry } from "../src/command-handler-registry.ts import { CommandErrorCode, CommandResultStatus, type CommandResult } from "../src/command-result.ts"; import { createCommandPipeline, type PipelineCommand } from "../src/command-pipeline.ts"; import { createSendSupervisorSignalHandler } from "../src/handlers/send-supervisor-signal.ts"; +import type { CommandStateStore, CommandStateStoreFactory, RecordCommandOutcomeInput } from "../src/ports.ts"; const command: PipelineCommand = { commandId: "cmd-supervisor-signal-001", @@ -81,4 +82,100 @@ describe("command pipeline idempotency", () => { equal(stateStoreFactory.snapshot.supervisorSignals.length, 1); equal(stateStoreFactory.snapshot.outboxEvents.length, 1); }); + + test("records command effects and idempotency through one outcome port", async () => { + const stateStoreFactory = createRecordingCommandStateStoreFactory(); + const pipeline = createCommandPipeline({ + stateStoreFactory, + handlerRegistry: createCommandHandlerRegistry([createSendSupervisorSignalHandler()]), + now: () => "2026-05-25T20:00:00.000Z", + createId: (prefix) => `${prefix}-001`, + }); + + const result = await pipeline.execute(command); + + equal(result.status, CommandResultStatus.Accepted); + equal(stateStoreFactory.recordedOutcomes.length, 1); + equal(stateStoreFactory.recordedOutcomes[0]?.idempotencyRecord.idempotencyKey, command.idempotencyKey); + equal(stateStoreFactory.recordedOutcomes[0]?.effects.supervisorSignals.length, 1); + equal(stateStoreFactory.recordedOutcomes[0]?.effects.auditEvents.length, 1); + equal(stateStoreFactory.recordedOutcomes[0]?.effects.outboxEvents.length, 1); + }); + + test("does not perform piecemeal command writes when outcome recording fails", async () => { + const stateStoreFactory = createFailingOutcomeCommandStateStoreFactory("transaction unavailable"); + const pipeline = createCommandPipeline({ + stateStoreFactory, + handlerRegistry: createCommandHandlerRegistry([createSendSupervisorSignalHandler()]), + now: () => "2026-05-25T20:00:00.000Z", + createId: (prefix) => `${prefix}-001`, + }); + + try { + await pipeline.execute(command); + throw new Error("expected command outcome recording to fail"); + } catch (error) { + equal(error instanceof Error, true); + equal((error as Error).message, "transaction unavailable"); + } + + equal(stateStoreFactory.appendCallCount, 0); + equal(stateStoreFactory.recordCallCount, 1); + }); }); + +type RecordingCommandStateStoreFactory = CommandStateStoreFactory & { + recordedOutcomes: RecordCommandOutcomeInput[]; +}; + +function createRecordingCommandStateStoreFactory(): RecordingCommandStateStoreFactory { + const recordedOutcomes: RecordCommandOutcomeInput[] = []; + + return { + recordedOutcomes, + createCommandStateStore: () => ({ + findIdempotencyRecord: async () => undefined, + recordCommandOutcome: async (input) => { + recordedOutcomes.push(input); + }, + }), + }; +} + +type FailingOutcomeCommandStateStoreFactory = CommandStateStoreFactory & { + readonly appendCallCount: number; + readonly recordCallCount: number; +}; + +function createFailingOutcomeCommandStateStoreFactory( + message: string, +): FailingOutcomeCommandStateStoreFactory { + let appendCallCount = 0; + let recordCallCount = 0; + + return { + get appendCallCount() { + return appendCallCount; + }, + get recordCallCount() { + return recordCallCount; + }, + createCommandStateStore: () => + ({ + findIdempotencyRecord: async () => undefined, + recordCommandOutcome: async () => { + recordCallCount += 1; + throw new Error(message); + }, + appendSupervisorSignal: async () => { + appendCallCount += 1; + }, + appendAuditEvent: async () => { + appendCallCount += 1; + }, + appendOutboxEvent: async () => { + appendCallCount += 1; + }, + }) as CommandStateStore, + }; +} diff --git a/agentic-organization/packages/application/test/send-supervisor-signal.test.ts b/agentic-organization/packages/application/test/send-supervisor-signal.test.ts index be6a74d688..60f80826ea 100644 --- a/agentic-organization/packages/application/test/send-supervisor-signal.test.ts +++ b/agentic-organization/packages/application/test/send-supervisor-signal.test.ts @@ -9,7 +9,6 @@ import { SupervisorSignalStatus, SupervisorSignalToolType, } from "../../domain/src/index.ts"; -import { createInMemoryOrganizationStoreFactory } from "../../state/src/index.ts"; import { CommandResultStatus, type CommandResult } from "../src/command-result.ts"; import { sendSupervisorSignal, type SendSupervisorSignalCommand } from "../src/handlers/send-supervisor-signal.ts"; @@ -38,24 +37,20 @@ const command: SendSupervisorSignalCommand = { }; describe("send supervisor signal handler", () => { - test("persists chain communication, audit event, and outbox event atomically", async () => { - const stateStoreFactory = createInMemoryOrganizationStoreFactory(); - const store = stateStoreFactory.createCommandStateStore(); - - const result = await sendSupervisorSignal(command, { - store, + test("returns chain communication, audit event, and outbox event effects", async () => { + const outcome = await sendSupervisorSignal(command, { now: () => "2026-05-25T20:00:00.000Z", createId: (prefix) => `${prefix}-001`, }); + const result = outcome.result as CommandResult; equal(result.status, CommandResultStatus.Accepted); ok(result.supervisorSignal); equal(result.supervisorSignal.status, SupervisorSignalStatus.Sent); - equal(stateStoreFactory.snapshot.supervisorSignals.length, 1); - equal(stateStoreFactory.snapshot.workItems.length, 0); - equal(stateStoreFactory.snapshot.auditEvents.length, 1); - equal(stateStoreFactory.snapshot.outboxEvents.length, 1); - deepEqual(stateStoreFactory.snapshot.outboxEvents[0]?.envelope, { + deepEqual(outcome.effects.supervisorSignals, [result.supervisorSignal]); + equal(outcome.effects.auditEvents.length, 1); + equal(outcome.effects.outboxEvents.length, 1); + deepEqual(outcome.effects.outboxEvents[0]?.envelope, { eventId: "evt-001", eventType: AgenticEventType.SupervisorSignalSent, schemaVersion: "agentic.org.event.v1", diff --git a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts index 242e063c0d..c928093f31 100644 --- a/agentic-organization/packages/governance/src/package-dependency-boundaries.ts +++ b/agentic-organization/packages/governance/src/package-dependency-boundaries.ts @@ -8,6 +8,7 @@ export const PackageBoundaryRule = { Messaging: "messaging", Packages: "packages", ProductionSource: "production_source", + Runtime: "runtime", StateAdapter: "state_adapter", Workers: "workers", } as const; diff --git a/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts index 515781f6c3..639e4c0adc 100644 --- a/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts +++ b/agentic-organization/packages/governance/test/package-dependency-boundaries.test.ts @@ -23,6 +23,7 @@ describe("package dependency boundaries", () => { "../../state", "../../../state", "state-cockroach", + "cockroach", "nestjs", "@nestjs", "nats", @@ -38,6 +39,26 @@ describe("package dependency boundaries", () => { sourceGlob: "messaging/src/**/*.ts", forbiddenImportFragments: ["../messaging-nats", "../../messaging-nats", "nats"], }, + { + packageName: PackageBoundaryRule.Runtime, + sourceGlob: "runtime/src/**/*.ts", + forbiddenImportFragments: [ + "../../state-cockroach", + "../state-cockroach", + "../../messaging-nats", + "../messaging-nats", + "cockroach", + "nestjs", + "@nestjs", + "nats", + "jetstream", + "dapr", + "temporal", + "drizzle", + "pg", + "postgres", + ], + }, { packageName: PackageBoundaryRule.StateAdapter, sourceGlob: "state-cockroach/src/**/*.ts", diff --git a/agentic-organization/packages/runtime/src/event-ingestion.ts b/agentic-organization/packages/runtime/src/event-ingestion.ts index 98810ad803..3836e2185d 100644 --- a/agentic-organization/packages/runtime/src/event-ingestion.ts +++ b/agentic-organization/packages/runtime/src/event-ingestion.ts @@ -54,16 +54,18 @@ export function createEventIngestionProcessor(input: CreateEventIngestionProcess }; } - return { - status: EventIngestionOutcomeStatus.Duplicate, - reactionPlans: [], - }; + if (isCompletedReceipt(existingReceipt)) { + return { + status: EventIngestionOutcomeStatus.Duplicate, + reactionPlans: [], + }; + } } const observedAt = input.now(); const receipt: InboxReceiptRecord = { ...lookup, - firstSeenAt: observedAt, + firstSeenAt: existingReceipt?.firstSeenAt ?? observedAt, payloadHash, }; @@ -75,7 +77,7 @@ export function createEventIngestionProcessor(input: CreateEventIngestionProcess action, })); - await input.store.recordEventProcessingOutcome({ + const persistenceResult = await input.store.recordEventProcessingOutcome({ receipt, reactionPlans, processedAt: observedAt, @@ -83,9 +85,13 @@ export function createEventIngestionProcessor(input: CreateEventIngestionProcess }); return { - status: EventIngestionOutcomeStatus.Processed, - reactionPlans, + status: persistenceResult.status, + reactionPlans: persistenceResult.reactionPlans, }; }, }; } + +function isCompletedReceipt(receipt: InboxReceiptRecord): boolean { + return receipt.processedAt !== undefined && receipt.result !== undefined; +} diff --git a/agentic-organization/packages/runtime/test/event-ingestion.test.ts b/agentic-organization/packages/runtime/test/event-ingestion.test.ts index 2c7f47e2f5..a77d2eabc9 100644 --- a/agentic-organization/packages/runtime/test/event-ingestion.test.ts +++ b/agentic-organization/packages/runtime/test/event-ingestion.test.ts @@ -12,7 +12,9 @@ import { } from "../../domain/src/index.ts"; import { EventIngestionOutcomeStatus, + type EventIngestionStore, InboundEventConsumerName, + type RecordEventProcessingOutcomeInput, createInMemoryEventIngestionStore, } from "../../state/src/index.ts"; import { @@ -131,6 +133,48 @@ describe("event ingestion processor", () => { equal(store.snapshot.inboxReceipts.length, 1); equal(store.snapshot.reactionPlans.length, 1); }); + + test("retries an unprocessed inbox receipt instead of treating it as duplicate", async () => { + let evaluationCount = 0; + let recordedOutcome: RecordEventProcessingOutcomeInput | undefined; + const store: EventIngestionStore = { + findInboxReceipt: async () => ({ + eventId: "evt-supervisor-signal-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + firstSeenAt: "2026-05-25T21:59:00.000Z", + payloadHash: "hash-evt-supervisor-signal-001", + }), + recordEventProcessingOutcome: async (input) => { + recordedOutcome = input; + + return { + status: input.result, + reactionPlans: input.reactionPlans, + }; + }, + }; + const processor = createEventIngestionProcessor({ + store, + evaluateRules: (envelope) => { + evaluationCount += 1; + return evaluateV0AutomationRules(envelope); + }, + consumerName: InboundEventConsumerName.V0AutomationPlanner, + calculatePayloadHash: (eventEnvelope) => `hash-${eventEnvelope.eventId}`, + now: () => "2026-05-25T22:00:00.000Z", + createId: (prefix) => `${prefix}-${evaluationCount}`, + }); + + const result = await processor.ingest({ + envelope: createSupervisorSignalEnvelope(), + }); + + equal(result.status, EventIngestionOutcomeStatus.Processed); + equal(evaluationCount, 1); + equal(recordedOutcome?.result, EventIngestionOutcomeStatus.Processed); + equal(recordedOutcome?.receipt.firstSeenAt, "2026-05-25T21:59:00.000Z"); + equal(recordedOutcome?.reactionPlans.length, 1); + }); }); function createSupervisorSignalEnvelope(title = "Blocked on scoped NATS publisher") { diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts b/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts index 93403076a6..8a3dd98c52 100644 --- a/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts +++ b/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts @@ -3,7 +3,7 @@ import { CockroachTableName } from "./cockroach-schema.ts"; export const CockroachCommandStateStoreStatement = { FindIdempotencyRecord: "find_idempotency_record", - UpsertIdempotencyRecord: "upsert_idempotency_record", + InsertIdempotencyRecord: "insert_idempotency_record", InsertSupervisorSignal: "insert_supervisor_signal", InsertAuditEvent: "insert_audit_event", InsertOutboxEvent: "insert_outbox_event", @@ -24,6 +24,11 @@ export type CockroachSqlResult> = { export type CockroachSqlExecutor = { execute: >(statement: CockroachSqlStatement) => Promise>; + executeTransaction: (transaction: CockroachSqlTransaction) => Promise; +}; + +export type CockroachSqlTransaction = { + statements: readonly CockroachSqlStatement[]; }; export type CreateCockroachCommandStateStoreFactoryInput = { @@ -58,70 +63,94 @@ function createCockroachCommandStateStore(executor: CockroachSqlExecutor result: row.result_json as Result, }; }, - saveIdempotencyRecord: async (record) => { - await executor.execute({ - name: CockroachCommandStateStoreStatement.UpsertIdempotencyRecord, - sql: CockroachCommandStateStoreSql.UpsertIdempotencyRecord, - parameters: [record.idempotencyKey, record.requestHash, record.result], - }); - }, - appendSupervisorSignal: async (supervisorSignal) => { - await executor.execute({ - name: CockroachCommandStateStoreStatement.InsertSupervisorSignal, - sql: CockroachCommandStateStoreSql.InsertSupervisorSignal, - parameters: [ - supervisorSignal.supervisorSignalId, - supervisorSignal.organizationId, - supervisorSignal.projectId, - supervisorSignal.teamId, - supervisorSignal.sourceLevel, - supervisorSignal.targetLevel, - supervisorSignal.targetHatAssignmentId, - supervisorSignal.sender.agentId, - supervisorSignal.sender.hatAssignmentId, - supervisorSignal.toolType, - supervisorSignal.status, - supervisorSignal.title, - supervisorSignal.message, - supervisorSignal.relatedWorkItemId, - supervisorSignal.createdAt, - ], - }); - }, - appendAuditEvent: async (auditEvent) => { - await executor.execute({ - name: CockroachCommandStateStoreStatement.InsertAuditEvent, - sql: CockroachCommandStateStoreSql.InsertAuditEvent, - parameters: [ - auditEvent.auditEventId, - auditEvent.eventName, - auditEvent.aggregateId, - auditEvent.actor.agentId, - auditEvent.actor.hatAssignmentId, - auditEvent.occurredAt, - ], - }); - }, - appendOutboxEvent: async (outboxEvent) => { - await executor.execute({ - name: CockroachCommandStateStoreStatement.InsertOutboxEvent, - sql: CockroachCommandStateStoreSql.InsertOutboxEvent, - parameters: [ - outboxEvent.outboxEventId, - outboxEvent.envelope.eventId, - outboxEvent.envelope.eventType, - outboxEvent.envelope.scope.organizationId, - outboxEvent.envelope.scope.projectId, - outboxEvent.envelope.scope.workItemId, - outboxEvent.envelope.trace.traceId, - outboxEvent.envelope.trace.correlationId, - outboxEvent.envelope, + recordCommandOutcome: async (outcome) => { + await executor.executeTransaction({ + statements: [ + createInsertIdempotencyRecordStatement(outcome.idempotencyRecord), + ...outcome.effects.supervisorSignals.map(createInsertSupervisorSignalStatement), + ...outcome.effects.auditEvents.map(createInsertAuditEventStatement), + ...outcome.effects.outboxEvents.map(createInsertOutboxEventStatement), ], }); }, }; } +type CommandStateStoreResult = Parameters["recordCommandOutcome"]>[0]; + +function createInsertIdempotencyRecordStatement( + record: CommandStateStoreResult["idempotencyRecord"], +): CockroachSqlStatement { + return { + name: CockroachCommandStateStoreStatement.InsertIdempotencyRecord, + sql: CockroachCommandStateStoreSql.InsertIdempotencyRecord, + parameters: [record.idempotencyKey, record.requestHash, record.result], + }; +} + +function createInsertSupervisorSignalStatement( + supervisorSignal: CommandStateStoreResult["effects"]["supervisorSignals"][number], +): CockroachSqlStatement { + return { + name: CockroachCommandStateStoreStatement.InsertSupervisorSignal, + sql: CockroachCommandStateStoreSql.InsertSupervisorSignal, + parameters: [ + supervisorSignal.supervisorSignalId, + supervisorSignal.organizationId, + supervisorSignal.projectId, + supervisorSignal.teamId, + supervisorSignal.sourceLevel, + supervisorSignal.targetLevel, + supervisorSignal.targetHatAssignmentId, + supervisorSignal.sender.agentId, + supervisorSignal.sender.hatAssignmentId, + supervisorSignal.toolType, + supervisorSignal.status, + supervisorSignal.title, + supervisorSignal.message, + supervisorSignal.relatedWorkItemId, + supervisorSignal.createdAt, + ], + }; +} + +function createInsertAuditEventStatement( + auditEvent: CommandStateStoreResult["effects"]["auditEvents"][number], +): CockroachSqlStatement { + return { + name: CockroachCommandStateStoreStatement.InsertAuditEvent, + sql: CockroachCommandStateStoreSql.InsertAuditEvent, + parameters: [ + auditEvent.auditEventId, + auditEvent.eventName, + auditEvent.aggregateId, + auditEvent.actor.agentId, + auditEvent.actor.hatAssignmentId, + auditEvent.occurredAt, + ], + }; +} + +function createInsertOutboxEventStatement( + outboxEvent: CommandStateStoreResult["effects"]["outboxEvents"][number], +): CockroachSqlStatement { + return { + name: CockroachCommandStateStoreStatement.InsertOutboxEvent, + sql: CockroachCommandStateStoreSql.InsertOutboxEvent, + parameters: [ + outboxEvent.outboxEventId, + outboxEvent.envelope.eventId, + outboxEvent.envelope.eventType, + outboxEvent.envelope.scope.organizationId, + outboxEvent.envelope.scope.projectId, + outboxEvent.envelope.scope.workItemId, + outboxEvent.envelope.trace.traceId, + outboxEvent.envelope.trace.correlationId, + outboxEvent.envelope, + ], + }; +} + type IdempotencyRecordRow = { idempotency_key: string; request_hash: string; @@ -134,8 +163,8 @@ const CockroachCommandStateStoreSql = { FROM ${CockroachTableName.IdempotencyRecords} WHERE idempotency_key = $1 `, - UpsertIdempotencyRecord: ` - UPSERT INTO ${CockroachTableName.IdempotencyRecords} ( + InsertIdempotencyRecord: ` + INSERT INTO ${CockroachTableName.IdempotencyRecords} ( idempotency_key, request_hash, result_json diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts b/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts index 20a489c7ab..5c0f0c43a8 100644 --- a/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts +++ b/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts @@ -1,9 +1,14 @@ -import type { EventIngestionStore, InboxReceiptRecord, ReactionPlanRecord } from "../../state/src/index.ts"; +import { + EventIngestionOutcomeStatus, + type EventIngestionStore, + type InboxReceiptRecord, + type ReactionPlanRecord, +} from "../../state/src/index.ts"; import { CockroachTableName } from "./cockroach-schema.ts"; export const CockroachEventIngestionStoreStatement = { FindInboxReceipt: "find_inbox_receipt", - InsertInboxReceipt: "insert_inbox_receipt", + ClaimPendingInboxReceipt: "claim_pending_inbox_receipt", InsertReactionPlan: "insert_reaction_plan", MarkInboxReceiptProcessed: "mark_inbox_receipt_processed", } as const; @@ -21,10 +26,19 @@ export type CockroachEventIngestionSqlResult> = { rows: readonly Row[]; }; +export type CockroachEventIngestionTransactionExecutor = { + execute: >( + statement: CockroachEventIngestionSqlStatement, + ) => Promise>; +}; + export type CockroachEventIngestionSqlExecutor = { execute: >( statement: CockroachEventIngestionSqlStatement, ) => Promise>; + executeTransaction: ( + operation: (executor: CockroachEventIngestionTransactionExecutor) => Promise, + ) => Promise; }; export type CreateCockroachEventIngestionStoreInput = { @@ -52,52 +66,76 @@ export function createCockroachEventIngestionStore( consumerName: row.consumer_name, firstSeenAt: row.first_seen_at, payloadHash: row.payload_hash, - ...(row.processed_at === undefined ? {} : { processedAt: row.processed_at }), - ...(row.result === undefined ? {} : { result: row.result }), + ...(row.processed_at == null ? {} : { processedAt: row.processed_at }), + ...(row.result == null ? {} : { result: row.result }), }; }, recordEventProcessingOutcome: async (outcome) => { const receipt = outcome.receipt; - await input.executor.execute({ - name: CockroachEventIngestionStoreStatement.InsertInboxReceipt, - sql: CockroachEventIngestionStoreSql.InsertInboxReceipt, - parameters: [receipt.eventId, receipt.consumerName, receipt.firstSeenAt, receipt.payloadHash], - }); - for (const reactionPlan of outcome.reactionPlans) { - await input.executor.execute({ - name: CockroachEventIngestionStoreStatement.InsertReactionPlan, - sql: CockroachEventIngestionStoreSql.InsertReactionPlan, - parameters: [ - reactionPlan.reactionPlanId, - reactionPlan.consumerName, - reactionPlan.createdAt, - reactionPlan.status, - reactionPlan.action.triggerEventId, - reactionPlan.action.organizationId, - reactionPlan.action.projectId, - reactionPlan.action.workItemId, - reactionPlan.action, - ], + return await input.executor.executeTransaction(async (transaction) => { + const claimResult = await transaction.execute({ + name: CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt, + sql: CockroachEventIngestionStoreSql.ClaimPendingInboxReceipt, + parameters: [receipt.eventId, receipt.consumerName, receipt.firstSeenAt, receipt.payloadHash], }); - } + const claimStatus = claimResult.rows[0]?.claim_status ?? EventIngestionOutcomeStatus.PayloadConflict; + + if (claimStatus !== EventIngestionOutcomeStatus.Processed) { + return { + status: claimStatus, + reactionPlans: [], + }; + } + + for (const reactionPlan of outcome.reactionPlans) { + await transaction.execute(createInsertReactionPlanStatement(reactionPlan)); + } - await input.executor.execute({ - name: CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed, - sql: CockroachEventIngestionStoreSql.MarkInboxReceiptProcessed, - parameters: [receipt.eventId, receipt.consumerName, outcome.processedAt, outcome.result], + await transaction.execute({ + name: CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed, + sql: CockroachEventIngestionStoreSql.MarkInboxReceiptProcessed, + parameters: [receipt.eventId, receipt.consumerName, outcome.processedAt, outcome.result, receipt.payloadHash], + }); + + return { + status: outcome.result, + reactionPlans: outcome.reactionPlans, + }; }); }, }; } +function createInsertReactionPlanStatement(reactionPlan: ReactionPlanRecord): CockroachEventIngestionSqlStatement { + return { + name: CockroachEventIngestionStoreStatement.InsertReactionPlan, + sql: CockroachEventIngestionStoreSql.InsertReactionPlan, + parameters: [ + reactionPlan.reactionPlanId, + reactionPlan.consumerName, + reactionPlan.createdAt, + reactionPlan.status, + reactionPlan.action.triggerEventId, + reactionPlan.action.organizationId, + reactionPlan.action.projectId, + reactionPlan.action.workItemId, + reactionPlan.action, + ], + }; +} + type InboxReceiptRow = { event_id: string; consumer_name: InboxReceiptRecord["consumerName"]; first_seen_at: string; - processed_at?: string; + processed_at?: string | null; payload_hash: string; - result?: InboxReceiptRecord["result"]; + result?: InboxReceiptRecord["result"] | null; +}; + +type CockroachReceiptClaimRow = { + claim_status: EventIngestionOutcomeStatus; }; const CockroachEventIngestionStoreSql = { @@ -107,13 +145,35 @@ const CockroachEventIngestionStoreSql = { WHERE event_id = $1 AND consumer_name = $2 `, - InsertInboxReceipt: ` - INSERT INTO ${CockroachTableName.InboxReceipts} ( - event_id, - consumer_name, - first_seen_at, - payload_hash - ) VALUES ($1, $2, $3, $4) + ClaimPendingInboxReceipt: ` + WITH claimed_receipt AS ( + INSERT INTO ${CockroachTableName.InboxReceipts} ( + event_id, + consumer_name, + first_seen_at, + payload_hash + ) VALUES ($1, $2, $3, $4) + ON CONFLICT (event_id, consumer_name) DO UPDATE + SET payload_hash = excluded.payload_hash + WHERE ${CockroachTableName.InboxReceipts}.payload_hash = excluded.payload_hash + AND ${CockroachTableName.InboxReceipts}.processed_at IS NULL + AND ${CockroachTableName.InboxReceipts}.result IS NULL + RETURNING event_id + ) + SELECT + CASE + WHEN EXISTS (SELECT 1 FROM claimed_receipt) THEN '${EventIngestionOutcomeStatus.Processed}' + WHEN EXISTS ( + SELECT 1 + FROM ${CockroachTableName.InboxReceipts} + WHERE event_id = $1 + AND consumer_name = $2 + AND payload_hash = $4 + AND processed_at IS NOT NULL + AND result IS NOT NULL + ) THEN '${EventIngestionOutcomeStatus.Duplicate}' + ELSE '${EventIngestionOutcomeStatus.PayloadConflict}' + END AS claim_status `, InsertReactionPlan: ` INSERT INTO ${CockroachTableName.ReactionPlans} ( @@ -135,6 +195,9 @@ const CockroachEventIngestionStoreSql = { result = $4 WHERE event_id = $1 AND consumer_name = $2 + AND payload_hash = $5 + AND processed_at IS NULL + AND result IS NULL `, } as const; diff --git a/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts b/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts index 2b36d8421d..1d9a5f7a70 100644 --- a/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts +++ b/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts @@ -16,7 +16,7 @@ import { } from "../src/cockroach-command-state-store.ts"; describe("cockroach command state store", () => { - test("implements command-state-store operations behind a SQL executor", async () => { + test("records command outcome in one transaction batch", async () => { const executor = createRecordingExecutor(); const factory = createCockroachCommandStateStoreFactory({ executor, @@ -25,83 +25,90 @@ describe("cockroach command state store", () => { equal(await store.findIdempotencyRecord("idem-001"), undefined); - await store.appendSupervisorSignal({ - supervisorSignalId: "supervisor-signal-001", - organizationId: "org-lfg", - projectId: "project-agentic-org", - teamId: "team-runtime", - sourceLevel: SupervisorChainLevel.TeamMember, - targetLevel: SupervisorChainLevel.Manager, - targetHatAssignmentId: "hat-assignment-em-001", - sender: { - agentId: "agent-developer-001", - hatAssignmentId: "hat-assignment-dev-001", - }, - toolType: SupervisorSignalToolType.ReportBlocker, - status: SupervisorSignalStatus.Sent, - title: "Blocked on scoped NATS publisher", - message: "Need a scoped publisher decision.", - relatedWorkItemId: "work-outbox-001", - createdAt: "2026-05-25T20:00:00.000Z", - }); - - await store.appendAuditEvent({ - auditEventId: "audit-001", - eventName: AgenticEventType.SupervisorSignalSent, - aggregateId: "supervisor-signal-001", - actor: { - agentId: "agent-developer-001", - hatAssignmentId: "hat-assignment-dev-001", - }, - occurredAt: "2026-05-25T20:00:00.000Z", - }); - - await store.appendOutboxEvent({ - outboxEventId: "outbox-001", - envelope: { - eventId: "evt-001", - eventType: AgenticEventType.SupervisorSignalSent, - schemaVersion: "agentic.org.event.v1", - occurredAt: "2026-05-25T20:00:00.000Z", - actor: { - agentId: "agent-developer-001", - hatAssignmentId: "hat-assignment-dev-001", - }, - scope: { - organizationId: "org-lfg", - projectId: "project-agentic-org", - teamId: "team-runtime", - workItemId: "work-outbox-001", - }, - aggregate: { - aggregateId: "supervisor-signal-001", - aggregateType: AgenticAggregateType.SupervisorSignal, - aggregateVersion: 1, - }, - trace: { - commandId: "cmd-001", - correlationId: "corr-001", - causationId: "cause-001", - traceId: "trace-001", - idempotencyKey: "idem-001", - }, - replay: { - isReplay: false, - }, - payload: { - title: "Blocked on scoped NATS publisher", + await store.recordCommandOutcome({ + idempotencyRecord: { + idempotencyKey: "idem-001", + requestHash: "hash-001", + result: { + status: CommandResultStatus.Accepted, + idempotency: { + replayed: false, + }, }, }, - }); - - await store.saveIdempotencyRecord({ - idempotencyKey: "idem-001", - requestHash: "hash-001", - result: { - status: CommandResultStatus.Accepted, - idempotency: { - replayed: false, - }, + effects: { + supervisorSignals: [ + { + supervisorSignalId: "supervisor-signal-001", + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatAssignmentId: "hat-assignment-em-001", + sender: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + toolType: SupervisorSignalToolType.ReportBlocker, + status: SupervisorSignalStatus.Sent, + title: "Blocked on scoped NATS publisher", + message: "Need a scoped publisher decision.", + relatedWorkItemId: "work-outbox-001", + createdAt: "2026-05-25T20:00:00.000Z", + }, + ], + auditEvents: [ + { + auditEventId: "audit-001", + eventName: AgenticEventType.SupervisorSignalSent, + aggregateId: "supervisor-signal-001", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + occurredAt: "2026-05-25T20:00:00.000Z", + }, + ], + outboxEvents: [ + { + outboxEventId: "outbox-001", + envelope: { + eventId: "evt-001", + eventType: AgenticEventType.SupervisorSignalSent, + schemaVersion: "agentic.org.event.v1", + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-001", + correlationId: "corr-001", + causationId: "cause-001", + traceId: "trace-001", + idempotencyKey: "idem-001", + }, + replay: { + isReplay: false, + }, + payload: { + title: "Blocked on scoped NATS publisher", + }, + }, + }, + ], }, }); @@ -109,29 +116,51 @@ describe("cockroach command state store", () => { executor.statements.map((statement) => statement.name), [ CockroachCommandStateStoreStatement.FindIdempotencyRecord, + CockroachCommandStateStoreStatement.InsertIdempotencyRecord, CockroachCommandStateStoreStatement.InsertSupervisorSignal, CockroachCommandStateStoreStatement.InsertAuditEvent, CockroachCommandStateStoreStatement.InsertOutboxEvent, - CockroachCommandStateStoreStatement.UpsertIdempotencyRecord, ], ); + deepEqual( + executor.transactionStatements.map((statement) => statement.name), + [ + CockroachCommandStateStoreStatement.InsertIdempotencyRecord, + CockroachCommandStateStoreStatement.InsertSupervisorSignal, + CockroachCommandStateStoreStatement.InsertAuditEvent, + CockroachCommandStateStoreStatement.InsertOutboxEvent, + ], + ); + equal(executor.transactionStatements[0]?.sql.includes("INSERT INTO"), true); + equal(executor.transactionStatements[0]?.sql.includes("UPSERT"), false); }); }); type RecordingCockroachSqlExecutor = CockroachSqlExecutor & { - statements: { name: CockroachCommandStateStoreStatement; parameters: readonly unknown[] }[]; + statements: { name: CockroachCommandStateStoreStatement; sql: string; parameters: readonly unknown[] }[]; + transactionStatements: { name: CockroachCommandStateStoreStatement; sql: string; parameters: readonly unknown[] }[]; }; function createRecordingExecutor(): RecordingCockroachSqlExecutor { - const statements: { name: CockroachCommandStateStoreStatement; parameters: readonly unknown[] }[] = []; + const statements: { name: CockroachCommandStateStoreStatement; sql: string; parameters: readonly unknown[] }[] = []; + const transactionStatements: { + name: CockroachCommandStateStoreStatement; + sql: string; + parameters: readonly unknown[]; + }[] = []; return { statements, + transactionStatements, execute: async (statement) => { statements.push(statement); return { rows: [], }; }, + executeTransaction: async (transaction) => { + transactionStatements.push(...transaction.statements); + statements.push(...transaction.statements); + }, }; } diff --git a/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts b/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts index 150ec7de13..403ee3e269 100644 --- a/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts +++ b/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts @@ -28,7 +28,7 @@ describe("cockroach event ingestion store", () => { }), undefined, ); - await store.recordEventProcessingOutcome({ + const result = await store.recordEventProcessingOutcome({ receipt: { eventId: "evt-supervisor-signal-001", consumerName: InboundEventConsumerName.V0AutomationPlanner, @@ -40,34 +40,133 @@ describe("cockroach event ingestion store", () => { result: EventIngestionOutcomeStatus.Processed, }); + equal(result.status, EventIngestionOutcomeStatus.Processed); + equal(result.reactionPlans.length, 1); deepEqual( executor.statements.map((statement) => statement.name), [ CockroachEventIngestionStoreStatement.FindInboxReceipt, - CockroachEventIngestionStoreStatement.InsertInboxReceipt, + CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt, CockroachEventIngestionStoreStatement.InsertReactionPlan, CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed, ], ); + deepEqual( + executor.transactionStatements.map((statement) => statement.name), + [ + CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt, + CockroachEventIngestionStoreStatement.InsertReactionPlan, + CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed, + ], + ); + equal(executor.transactionStatements[0]?.sql.includes("ON CONFLICT"), true); + equal(executor.transactionStatements[0]?.sql.includes("processed_at IS NULL"), true); + }); + + test("does not insert reaction plans when the inbox receipt claim loses the race", async () => { + const executor = createRecordingExecutor({ + claimStatus: EventIngestionOutcomeStatus.Duplicate, + }); + const store = createCockroachEventIngestionStore({ + executor, + }); + + const result = await store.recordEventProcessingOutcome({ + receipt: { + eventId: "evt-supervisor-signal-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + firstSeenAt: "2026-05-25T22:00:00.000Z", + payloadHash: "hash-evt-supervisor-signal-001", + }, + reactionPlans: [createReactionPlanRecord()], + processedAt: "2026-05-25T22:00:00.000Z", + result: EventIngestionOutcomeStatus.Processed, + }); + + equal(result.status, EventIngestionOutcomeStatus.Duplicate); + deepEqual(result.reactionPlans, []); + deepEqual( + executor.transactionStatements.map((statement) => statement.name), + [CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt], + ); + }); + + test("normalizes SQL null receipt completion fields to pending receipt fields", async () => { + const executor = createRecordingExecutor({ + rows: [ + { + event_id: "evt-supervisor-signal-001", + consumer_name: InboundEventConsumerName.V0AutomationPlanner, + first_seen_at: "2026-05-25T21:59:00.000Z", + processed_at: null, + payload_hash: "hash-evt-supervisor-signal-001", + result: null, + }, + ], + }); + const store = createCockroachEventIngestionStore({ + executor, + }); + + const receipt = await store.findInboxReceipt({ + eventId: "evt-supervisor-signal-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + }); + + deepEqual(receipt, { + eventId: "evt-supervisor-signal-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + firstSeenAt: "2026-05-25T21:59:00.000Z", + payloadHash: "hash-evt-supervisor-signal-001", + }); }); }); type RecordingCockroachEventIngestionSqlExecutor = CockroachEventIngestionSqlExecutor & { statements: CockroachEventIngestionSqlStatement[]; + transactionStatements: CockroachEventIngestionSqlStatement[]; }; -function createRecordingExecutor(): RecordingCockroachEventIngestionSqlExecutor { +function createRecordingExecutor( + input: { + rows?: readonly unknown[]; + claimStatus?: EventIngestionOutcomeStatus; + } = {}, +): RecordingCockroachEventIngestionSqlExecutor { const statements: CockroachEventIngestionSqlStatement[] = []; + const transactionStatements: CockroachEventIngestionSqlStatement[] = []; return { statements, - execute: async (statement) => { + transactionStatements, + execute: async >(statement: CockroachEventIngestionSqlStatement) => { statements.push(statement); return { - rows: [], + rows: (input.rows ?? []) as readonly Row[], }; }, + executeTransaction: async (operation) => + await operation({ + execute: async >(statement: CockroachEventIngestionSqlStatement) => { + transactionStatements.push(statement); + statements.push(statement); + + if (statement.name === CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt) { + return { + rows: [ + { + claim_status: input.claimStatus ?? EventIngestionOutcomeStatus.Processed, + }, + ] as readonly unknown[] as readonly Row[], + }; + } + + return { + rows: [], + }; + }, + }), }; } diff --git a/agentic-organization/packages/state/src/event-ingestion-store.ts b/agentic-organization/packages/state/src/event-ingestion-store.ts index 68ec2ae570..cc70669694 100644 --- a/agentic-organization/packages/state/src/event-ingestion-store.ts +++ b/agentic-organization/packages/state/src/event-ingestion-store.ts @@ -42,9 +42,16 @@ export type RecordEventProcessingOutcomeInput = { result: EventIngestionOutcomeStatus; }; +export type RecordEventProcessingOutcomeResult = { + status: EventIngestionOutcomeStatus; + reactionPlans: readonly ReactionPlanRecord[]; +}; + export type EventIngestionStore = { findInboxReceipt: (lookup: InboxReceiptLookup) => Promise; - recordEventProcessingOutcome: (input: RecordEventProcessingOutcomeInput) => Promise; + recordEventProcessingOutcome: ( + input: RecordEventProcessingOutcomeInput, + ) => Promise; }; export type InMemoryEventIngestionStoreSnapshot = { @@ -75,6 +82,11 @@ export function createInMemoryEventIngestionStore(): InMemoryEventIngestionStore result: input.result, }); reactionPlans.push(...input.reactionPlans); + + return { + status: input.result, + reactionPlans: input.reactionPlans, + }; }, }; } diff --git a/agentic-organization/packages/state/src/in-memory-organization-store.ts b/agentic-organization/packages/state/src/in-memory-organization-store.ts index 42ee7adaa5..0f349d3d9c 100644 --- a/agentic-organization/packages/state/src/in-memory-organization-store.ts +++ b/agentic-organization/packages/state/src/in-memory-organization-store.ts @@ -60,17 +60,11 @@ function createCommandStateStore( ): CommandStateStore { return { findIdempotencyRecord: async (idempotencyKey) => snapshot.idempotencyRecords.get(idempotencyKey), - saveIdempotencyRecord: async (record) => { - snapshot.idempotencyRecords.set(record.idempotencyKey, record); - }, - appendSupervisorSignal: async (supervisorSignal) => { - snapshot.supervisorSignals.push(supervisorSignal); - }, - appendAuditEvent: async (auditEvent) => { - snapshot.auditEvents.push(auditEvent); - }, - appendOutboxEvent: async (outboxEvent) => { - snapshot.outboxEvents.push(outboxEvent); + recordCommandOutcome: async (input) => { + snapshot.idempotencyRecords.set(input.idempotencyRecord.idempotencyKey, input.idempotencyRecord); + snapshot.supervisorSignals.push(...input.effects.supervisorSignals); + snapshot.auditEvents.push(...input.effects.auditEvents); + snapshot.outboxEvents.push(...input.effects.outboxEvents); }, }; } diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 4fc290a87a..99e021a97e 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -37,6 +37,14 @@ Organization state only by calling Organization commands. and idempotency records together - **AND** the adapter does not mutate authoritative state directly +#### Scenario: Command handler returns effects instead of writing state + +- **WHEN** a command handler accepts a valid command +- **THEN** it returns the command result plus typed command effects +- **AND** the handler does not call state append operations directly +- **AND** the command pipeline records the result and effects through a + single command outcome port + #### Scenario: Command pipeline is composed from ports - **WHEN** a runtime host creates a command pipeline @@ -93,6 +101,17 @@ command boundary. - **AND** CockroachDB is treated as the first replaceable durable adapter for the cluster, not as the application model +#### Scenario: Vendor-specific adapters stay behind generic ports + +- **WHEN** application, runtime, worker, or messaging package source is + inspected +- **THEN** it does not import vendor-specific adapter packages or vendor + clients directly +- **AND** vendor packages implement generic Organization ports exposed by + non-vendor packages +- **AND** vendor-specific executor or transaction seams are not used as + application contracts + #### Scenario: Matching replay - **WHEN** a command is submitted twice with the same idempotency key @@ -108,6 +127,16 @@ command boundary. - **THEN** the command is rejected with a typed idempotency conflict - **AND** no new authoritative state is created +#### Scenario: Command outcome persistence fails + +- **WHEN** a new command produces supervisor-signal, audit, and outbox + effects +- **AND** the command outcome store cannot persist the full outcome +- **THEN** no piecemeal command writes are performed by the application + layer +- **AND** durable adapters are responsible for committing or rolling + back the full command outcome atomically + ### Requirement: Events carry traceable envelopes Organization domain events MUST carry a canonical envelope with command, @@ -237,11 +266,21 @@ executing privileged work directly. #### Scenario: Duplicate event is ingested by an automation consumer -- **WHEN** the same event ID reaches the same consumer again +- **WHEN** the same event ID reaches the same consumer again after the + original receipt has a completed result - **THEN** the processor returns a duplicate outcome - **AND** no automation rules are re-evaluated - **AND** no duplicate reaction plans are created +#### Scenario: Unprocessed receipt is retried by an automation consumer + +- **WHEN** the same event ID and payload hash reaches the same consumer + but the existing receipt has no completed result +- **THEN** the processor treats the receipt as recoverable pending work +- **AND** automation rules are re-evaluated +- **AND** the receipt and generated reaction plans are recorded through + the normal event-processing outcome port + #### Scenario: Conflicting event payload is ingested by an automation consumer - **WHEN** the same event ID reaches the same consumer with a different @@ -259,6 +298,34 @@ executing privileged work directly. plans - **AND** reaction plans include a persisted status +#### Scenario: Durable event-ingestion adapter uses one transaction boundary + +- **WHEN** a durable event-ingestion adapter records an event-processing + outcome +- **THEN** the inbox receipt, generated reaction plans, and processed + marker are submitted as one transaction batch +- **AND** runtime rule processors do not receive database transaction + objects + +#### Scenario: Durable event-ingestion adapter loses receipt claim race + +- **WHEN** a durable event-ingestion adapter attempts to record reaction + plans after another consumer has already completed the same receipt +- **THEN** the adapter returns a duplicate event-processing outcome + through the generic event-ingestion port +- **AND** it does not insert reaction plans +- **AND** it does not mark the completed receipt again + +#### Scenario: Durable command adapter uses one transaction boundary + +- **WHEN** a durable command adapter records a command outcome +- **THEN** the idempotency record, command state, audit events, and + outbox events are submitted as one transaction batch +- **AND** the idempotency record is reserved before effect rows are + submitted inside that batch +- **AND** application handlers do not receive database transaction + objects + ### Requirement: Worker process boundary composes event loops through ports Organization worker code MUST remain a small composition boundary until From 2ae3c719a935375ee9f995660c4610f0cbb90ad5 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 22:31:24 -0400 Subject: [PATCH 20/21] fix(agentic-org): keep idempotency races behind generic ports --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 8 + .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 4 + .../application/src/command-pipeline.ts | 48 +++- .../packages/application/src/index.ts | 2 + .../packages/application/src/ports.ts | 25 +- .../application/test/command-pipeline.test.ts | 75 ++++- .../src/cockroach-command-state-store.ts | 119 ++++++-- .../cockroach-command-state-store.test.ts | 258 +++++++++++------- .../state/src/in-memory-organization-store.ts | 27 +- openspec/specs/agentic-organization/spec.md | 16 +- 10 files changed, 443 insertions(+), 139 deletions(-) diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index 963cb267e6..dadf99e42d 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -130,6 +130,10 @@ Hermes runs, MCP calls, and UI evidence. the supervisor signal, audit events, outbox events, and idempotency record through one `recordCommandOutcome` port. Handlers do not write piecemeal state. +- Command outcome persistence returns generic committed, replayed, or + idempotency-conflict results. Durable adapters own idempotency race + handling and return those generic outcomes without exposing duplicate + key or vendor errors to application code. - State-store and outbox-source ports are async from the beginning so durable SQL, NATS-backed workers, and other real adapters do not inherit a fake synchronous shape. @@ -173,6 +177,10 @@ Hermes runs, MCP calls, and UI evidence. - The Cockroach command adapter records the idempotency row before effect rows inside the command transaction batch, so a duplicate key aborts before supervisor signal, audit, or outbox rows are submitted. +- The Cockroach command adapter claims the idempotency key before + inserting effects. If it loses the race, it returns replay or + idempotency conflict through the generic `CommandStateStore` result + and does not insert command effects. - The Cockroach event-ingestion adapter normalizes SQL `NULL` completion fields to pending receipts and claims the pending receipt before inserting reaction plans. If the claim reports duplicate or diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index 390497e236..b5cb0ac370 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -269,6 +269,10 @@ durable adapters one atomic commit boundary for a command result. Durable command adapters should reserve the idempotency record before effect rows inside that transaction so an idempotency race aborts before supervisor signal, audit, or outbox state becomes visible. +The command outcome port returns generic committed, replayed, or +idempotency-conflict results. A vendor adapter may use SQL constraints, +transaction callbacks, CTEs, or other local mechanics to detect races, +but application code only receives the generic outcome. The first worker boundary follows the same rule. `@agentic-org/workers` does not create NATS clients, Cockroach clients, Nest modules, Temporal diff --git a/agentic-organization/packages/application/src/command-pipeline.ts b/agentic-organization/packages/application/src/command-pipeline.ts index 5772f3a58a..9657abc565 100644 --- a/agentic-organization/packages/application/src/command-pipeline.ts +++ b/agentic-organization/packages/application/src/command-pipeline.ts @@ -1,7 +1,14 @@ import type { CommandHandlerRegistry } from "./command-handler-registry.ts"; import { CommandErrorCode, CommandResultStatus, type CommandResult } from "./command-result.ts"; import type { SendSupervisorSignalCommand } from "./handlers/send-supervisor-signal.ts"; -import type { Clock, CommandEffects, CommandStateStore, CommandStateStoreFactory, IdGenerator } from "./ports.ts"; +import { + CommandOutcomePersistenceStatus, + type Clock, + type CommandEffects, + type CommandStateStore, + type CommandStateStoreFactory, + type IdGenerator, +} from "./ports.ts"; export type PipelineCommand = SendSupervisorSignalCommand; @@ -40,21 +47,12 @@ async function executeCommand( } if (existingRecord) { - return { - status: CommandResultStatus.Rejected, - idempotency: { - replayed: false, - }, - error: { - code: CommandErrorCode.IdempotencyConflict, - message: "idempotency key was reused with a different request hash", - }, - }; + return createIdempotencyConflictResult(); } const outcome = await dispatchCommand(command, dependencies); - await store.recordCommandOutcome({ + const persistenceResult = await store.recordCommandOutcome({ idempotencyRecord: { idempotencyKey: command.idempotencyKey, requestHash: command.requestHash, @@ -63,6 +61,19 @@ async function executeCommand( effects: outcome.result.status === CommandResultStatus.Accepted ? outcome.effects : createEmptyCommandEffects(), }); + if (persistenceResult.status === CommandOutcomePersistenceStatus.Replayed) { + return { + ...persistenceResult.result, + idempotency: { + replayed: true, + }, + }; + } + + if (persistenceResult.status === CommandOutcomePersistenceStatus.IdempotencyConflict) { + return createIdempotencyConflictResult(); + } + return outcome.result; } @@ -91,6 +102,19 @@ async function dispatchCommand( }; } +function createIdempotencyConflictResult(): CommandResult { + return { + status: CommandResultStatus.Rejected, + idempotency: { + replayed: false, + }, + error: { + code: CommandErrorCode.IdempotencyConflict, + message: "idempotency key was reused with a different request hash", + }, + }; +} + function createEmptyCommandEffects(): CommandEffects { return { supervisorSignals: [], diff --git a/agentic-organization/packages/application/src/index.ts b/agentic-organization/packages/application/src/index.ts index 40cfc8860e..efb497c57b 100644 --- a/agentic-organization/packages/application/src/index.ts +++ b/agentic-organization/packages/application/src/index.ts @@ -20,6 +20,7 @@ export { type SendSupervisorSignalCommand, type SendSupervisorSignalDependencies, } from "./handlers/send-supervisor-signal.ts"; +export { CommandOutcomePersistenceStatus } from "./ports.ts"; export type { Clock, CommandEffects, @@ -27,4 +28,5 @@ export type { CommandStateStoreFactory, IdGenerator, RecordCommandOutcomeInput, + RecordCommandOutcomeResult, } from "./ports.ts"; diff --git a/agentic-organization/packages/application/src/ports.ts b/agentic-organization/packages/application/src/ports.ts index 9100075169..61c052c7a9 100644 --- a/agentic-organization/packages/application/src/ports.ts +++ b/agentic-organization/packages/application/src/ports.ts @@ -19,9 +19,32 @@ export type RecordCommandOutcomeInput = { effects: CommandEffects; }; +export const CommandOutcomePersistenceStatus = { + Committed: "committed", + IdempotencyConflict: "idempotency_conflict", + Replayed: "replayed", +} as const; + +export type CommandOutcomePersistenceStatus = + (typeof CommandOutcomePersistenceStatus)[keyof typeof CommandOutcomePersistenceStatus]; + +export type RecordCommandOutcomeResult = + | { + status: typeof CommandOutcomePersistenceStatus.Committed; + result: Result; + } + | { + status: typeof CommandOutcomePersistenceStatus.Replayed; + result: Result; + } + | { + status: typeof CommandOutcomePersistenceStatus.IdempotencyConflict; + existingRequestHash?: string; + }; + export type CommandStateStore = { findIdempotencyRecord: (idempotencyKey: string) => Promise | undefined>; - recordCommandOutcome: (input: RecordCommandOutcomeInput) => Promise; + recordCommandOutcome: (input: RecordCommandOutcomeInput) => Promise>; }; export type CommandStateStoreFactory = { diff --git a/agentic-organization/packages/application/test/command-pipeline.test.ts b/agentic-organization/packages/application/test/command-pipeline.test.ts index 8a21ea381f..5480f52cd6 100644 --- a/agentic-organization/packages/application/test/command-pipeline.test.ts +++ b/agentic-organization/packages/application/test/command-pipeline.test.ts @@ -7,7 +7,13 @@ import { createCommandHandlerRegistry } from "../src/command-handler-registry.ts import { CommandErrorCode, CommandResultStatus, type CommandResult } from "../src/command-result.ts"; import { createCommandPipeline, type PipelineCommand } from "../src/command-pipeline.ts"; import { createSendSupervisorSignalHandler } from "../src/handlers/send-supervisor-signal.ts"; -import type { CommandStateStore, CommandStateStoreFactory, RecordCommandOutcomeInput } from "../src/ports.ts"; +import { + CommandOutcomePersistenceStatus, + type CommandStateStore, + type CommandStateStoreFactory, + type RecordCommandOutcomeInput, + type RecordCommandOutcomeResult, +} from "../src/ports.ts"; const command: PipelineCommand = { commandId: "cmd-supervisor-signal-001", @@ -102,6 +108,51 @@ describe("command pipeline idempotency", () => { equal(stateStoreFactory.recordedOutcomes[0]?.effects.outboxEvents.length, 1); }); + test("returns replay when outcome persistence loses a same-request idempotency race", async () => { + const stateStoreFactory = createOutcomeResultCommandStateStoreFactory({ + status: CommandOutcomePersistenceStatus.Replayed, + result: { + status: CommandResultStatus.Accepted, + idempotency: { + replayed: false, + }, + }, + }); + const pipeline = createCommandPipeline({ + stateStoreFactory, + handlerRegistry: createCommandHandlerRegistry([createSendSupervisorSignalHandler()]), + now: () => "2026-05-25T20:00:00.000Z", + createId: (prefix) => `${prefix}-001`, + }); + + const result = await pipeline.execute(command); + + equal(result.status, CommandResultStatus.Accepted); + deepEqual(result.idempotency, { + replayed: true, + }); + equal(stateStoreFactory.recordedOutcomes.length, 1); + }); + + test("returns idempotency conflict when outcome persistence loses a different-request race", async () => { + const stateStoreFactory = createOutcomeResultCommandStateStoreFactory({ + status: CommandOutcomePersistenceStatus.IdempotencyConflict, + existingRequestHash: "hash-other-request", + }); + const pipeline = createCommandPipeline({ + stateStoreFactory, + handlerRegistry: createCommandHandlerRegistry([createSendSupervisorSignalHandler()]), + now: () => "2026-05-25T20:00:00.000Z", + createId: (prefix) => `${prefix}-001`, + }); + + const result = await pipeline.execute(command); + + equal(result.status, CommandResultStatus.Rejected); + equal(result.error?.code, CommandErrorCode.IdempotencyConflict); + equal(stateStoreFactory.recordedOutcomes.length, 1); + }); + test("does not perform piecemeal command writes when outcome recording fails", async () => { const stateStoreFactory = createFailingOutcomeCommandStateStoreFactory("transaction unavailable"); const pipeline = createCommandPipeline({ @@ -137,6 +188,28 @@ function createRecordingCommandStateStoreFactory(): RecordingCommandStat findIdempotencyRecord: async () => undefined, recordCommandOutcome: async (input) => { recordedOutcomes.push(input); + + return { + status: CommandOutcomePersistenceStatus.Committed, + result: input.idempotencyRecord.result, + }; + }, + }), + }; +} + +function createOutcomeResultCommandStateStoreFactory( + outcomeResult: RecordCommandOutcomeResult, +): RecordingCommandStateStoreFactory { + const recordedOutcomes: RecordCommandOutcomeInput[] = []; + + return { + recordedOutcomes, + createCommandStateStore: () => ({ + findIdempotencyRecord: async () => undefined, + recordCommandOutcome: async (input) => { + recordedOutcomes.push(input); + return outcomeResult; }, }), }; diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts b/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts index 8a3dd98c52..abfe60dc44 100644 --- a/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts +++ b/agentic-organization/packages/state-cockroach/src/cockroach-command-state-store.ts @@ -1,9 +1,13 @@ -import type { CommandStateStore, CommandStateStoreFactory } from "../../application/src/ports.ts"; +import { + CommandOutcomePersistenceStatus, + type CommandStateStore, + type CommandStateStoreFactory, +} from "../../application/src/ports.ts"; import { CockroachTableName } from "./cockroach-schema.ts"; export const CockroachCommandStateStoreStatement = { FindIdempotencyRecord: "find_idempotency_record", - InsertIdempotencyRecord: "insert_idempotency_record", + ClaimIdempotencyRecord: "claim_idempotency_record", InsertSupervisorSignal: "insert_supervisor_signal", InsertAuditEvent: "insert_audit_event", InsertOutboxEvent: "insert_outbox_event", @@ -22,13 +26,15 @@ export type CockroachSqlResult> = { rows: readonly Row[]; }; -export type CockroachSqlExecutor = { +export type CockroachSqlTransactionExecutor = { execute: >(statement: CockroachSqlStatement) => Promise>; - executeTransaction: (transaction: CockroachSqlTransaction) => Promise; }; -export type CockroachSqlTransaction = { - statements: readonly CockroachSqlStatement[]; +export type CockroachSqlExecutor = { + execute: >(statement: CockroachSqlStatement) => Promise>; + executeTransaction: ( + operation: (executor: CockroachSqlTransactionExecutor) => Promise, + ) => Promise; }; export type CreateCockroachCommandStateStoreFactoryInput = { @@ -64,13 +70,55 @@ function createCockroachCommandStateStore(executor: CockroachSqlExecutor }; }, recordCommandOutcome: async (outcome) => { - await executor.executeTransaction({ - statements: [ - createInsertIdempotencyRecordStatement(outcome.idempotencyRecord), - ...outcome.effects.supervisorSignals.map(createInsertSupervisorSignalStatement), - ...outcome.effects.auditEvents.map(createInsertAuditEventStatement), - ...outcome.effects.outboxEvents.map(createInsertOutboxEventStatement), - ], + return await executor.executeTransaction(async (transaction) => { + const claimResult = await transaction.execute>({ + name: CockroachCommandStateStoreStatement.ClaimIdempotencyRecord, + sql: CockroachCommandStateStoreSql.ClaimIdempotencyRecord, + parameters: [ + outcome.idempotencyRecord.idempotencyKey, + outcome.idempotencyRecord.requestHash, + outcome.idempotencyRecord.result, + ], + }); + const claim = claimResult.rows[0]; + const claimStatus = claim?.persistence_status ?? CommandOutcomePersistenceStatus.IdempotencyConflict; + + if (claimStatus === CommandOutcomePersistenceStatus.Replayed) { + return { + status: CommandOutcomePersistenceStatus.Replayed, + result: claim?.result_json as Result, + }; + } + + if (claimStatus === CommandOutcomePersistenceStatus.IdempotencyConflict) { + if (claim?.request_hash === undefined || claim.request_hash === null) { + return { + status: CommandOutcomePersistenceStatus.IdempotencyConflict, + }; + } + + return { + status: CommandOutcomePersistenceStatus.IdempotencyConflict, + existingRequestHash: claim.request_hash, + }; + } + + for (const supervisorSignal of outcome.effects.supervisorSignals) { + await transaction.execute(createInsertSupervisorSignalStatement(supervisorSignal)); + } + + for (const auditEvent of outcome.effects.auditEvents) { + await transaction.execute(createInsertAuditEventStatement(auditEvent)); + } + + for (const outboxEvent of outcome.effects.outboxEvents) { + await transaction.execute(createInsertOutboxEventStatement(outboxEvent)); + } + + return { + status: CommandOutcomePersistenceStatus.Committed, + result: outcome.idempotencyRecord.result, + }; }); }, }; @@ -78,16 +126,6 @@ function createCockroachCommandStateStore(executor: CockroachSqlExecutor type CommandStateStoreResult = Parameters["recordCommandOutcome"]>[0]; -function createInsertIdempotencyRecordStatement( - record: CommandStateStoreResult["idempotencyRecord"], -): CockroachSqlStatement { - return { - name: CockroachCommandStateStoreStatement.InsertIdempotencyRecord, - sql: CockroachCommandStateStoreSql.InsertIdempotencyRecord, - parameters: [record.idempotencyKey, record.requestHash, record.result], - }; -} - function createInsertSupervisorSignalStatement( supervisorSignal: CommandStateStoreResult["effects"]["supervisorSignals"][number], ): CockroachSqlStatement { @@ -157,18 +195,41 @@ type IdempotencyRecordRow = { result_json: unknown; }; +type CockroachIdempotencyClaimRow = { + persistence_status: CommandOutcomePersistenceStatus; + request_hash?: string | null; + result_json?: Result | null; +}; + const CockroachCommandStateStoreSql = { FindIdempotencyRecord: ` SELECT idempotency_key, request_hash, result_json FROM ${CockroachTableName.IdempotencyRecords} WHERE idempotency_key = $1 `, - InsertIdempotencyRecord: ` - INSERT INTO ${CockroachTableName.IdempotencyRecords} ( - idempotency_key, - request_hash, - result_json - ) VALUES ($1, $2, $3) + ClaimIdempotencyRecord: ` + WITH claimed_record AS ( + INSERT INTO ${CockroachTableName.IdempotencyRecords} ( + idempotency_key, + request_hash, + result_json + ) VALUES ($1, $2, $3) + ON CONFLICT (idempotency_key) DO NOTHING + RETURNING request_hash, result_json + ), + existing_record AS ( + SELECT request_hash, result_json + FROM ${CockroachTableName.IdempotencyRecords} + WHERE idempotency_key = $1 + ) + SELECT + CASE + WHEN EXISTS (SELECT 1 FROM claimed_record) THEN '${CommandOutcomePersistenceStatus.Committed}' + WHEN EXISTS (SELECT 1 FROM existing_record WHERE request_hash = $2) THEN '${CommandOutcomePersistenceStatus.Replayed}' + ELSE '${CommandOutcomePersistenceStatus.IdempotencyConflict}' + END AS persistence_status, + (SELECT request_hash FROM existing_record) AS request_hash, + (SELECT result_json FROM existing_record) AS result_json `, InsertSupervisorSignal: ` INSERT INTO ${CockroachTableName.SupervisorSignals} ( diff --git a/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts b/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts index 1d9a5f7a70..10ba197230 100644 --- a/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts +++ b/agentic-organization/packages/state-cockroach/test/cockroach-command-state-store.test.ts @@ -1,7 +1,12 @@ import { deepEqual, equal } from "node:assert/strict"; import { describe, test } from "node:test"; -import { CommandResultStatus, type CommandResult } from "../../application/src/index.ts"; +import { + CommandOutcomePersistenceStatus, + CommandResultStatus, + type CommandResult, + type RecordCommandOutcomeInput, +} from "../../application/src/index.ts"; import { AgenticAggregateType, AgenticEventType, @@ -25,98 +30,14 @@ describe("cockroach command state store", () => { equal(await store.findIdempotencyRecord("idem-001"), undefined); - await store.recordCommandOutcome({ - idempotencyRecord: { - idempotencyKey: "idem-001", - requestHash: "hash-001", - result: { - status: CommandResultStatus.Accepted, - idempotency: { - replayed: false, - }, - }, - }, - effects: { - supervisorSignals: [ - { - supervisorSignalId: "supervisor-signal-001", - organizationId: "org-lfg", - projectId: "project-agentic-org", - teamId: "team-runtime", - sourceLevel: SupervisorChainLevel.TeamMember, - targetLevel: SupervisorChainLevel.Manager, - targetHatAssignmentId: "hat-assignment-em-001", - sender: { - agentId: "agent-developer-001", - hatAssignmentId: "hat-assignment-dev-001", - }, - toolType: SupervisorSignalToolType.ReportBlocker, - status: SupervisorSignalStatus.Sent, - title: "Blocked on scoped NATS publisher", - message: "Need a scoped publisher decision.", - relatedWorkItemId: "work-outbox-001", - createdAt: "2026-05-25T20:00:00.000Z", - }, - ], - auditEvents: [ - { - auditEventId: "audit-001", - eventName: AgenticEventType.SupervisorSignalSent, - aggregateId: "supervisor-signal-001", - actor: { - agentId: "agent-developer-001", - hatAssignmentId: "hat-assignment-dev-001", - }, - occurredAt: "2026-05-25T20:00:00.000Z", - }, - ], - outboxEvents: [ - { - outboxEventId: "outbox-001", - envelope: { - eventId: "evt-001", - eventType: AgenticEventType.SupervisorSignalSent, - schemaVersion: "agentic.org.event.v1", - occurredAt: "2026-05-25T20:00:00.000Z", - actor: { - agentId: "agent-developer-001", - hatAssignmentId: "hat-assignment-dev-001", - }, - scope: { - organizationId: "org-lfg", - projectId: "project-agentic-org", - teamId: "team-runtime", - workItemId: "work-outbox-001", - }, - aggregate: { - aggregateId: "supervisor-signal-001", - aggregateType: AgenticAggregateType.SupervisorSignal, - aggregateVersion: 1, - }, - trace: { - commandId: "cmd-001", - correlationId: "corr-001", - causationId: "cause-001", - traceId: "trace-001", - idempotencyKey: "idem-001", - }, - replay: { - isReplay: false, - }, - payload: { - title: "Blocked on scoped NATS publisher", - }, - }, - }, - ], - }, - }); + const result = await store.recordCommandOutcome(createCommandOutcome()); + equal(result.status, CommandOutcomePersistenceStatus.Committed); deepEqual( executor.statements.map((statement) => statement.name), [ CockroachCommandStateStoreStatement.FindIdempotencyRecord, - CockroachCommandStateStoreStatement.InsertIdempotencyRecord, + CockroachCommandStateStoreStatement.ClaimIdempotencyRecord, CockroachCommandStateStoreStatement.InsertSupervisorSignal, CockroachCommandStateStoreStatement.InsertAuditEvent, CockroachCommandStateStoreStatement.InsertOutboxEvent, @@ -125,7 +46,7 @@ describe("cockroach command state store", () => { deepEqual( executor.transactionStatements.map((statement) => statement.name), [ - CockroachCommandStateStoreStatement.InsertIdempotencyRecord, + CockroachCommandStateStoreStatement.ClaimIdempotencyRecord, CockroachCommandStateStoreStatement.InsertSupervisorSignal, CockroachCommandStateStoreStatement.InsertAuditEvent, CockroachCommandStateStoreStatement.InsertOutboxEvent, @@ -134,6 +55,38 @@ describe("cockroach command state store", () => { equal(executor.transactionStatements[0]?.sql.includes("INSERT INTO"), true); equal(executor.transactionStatements[0]?.sql.includes("UPSERT"), false); }); + + test("does not insert effects when idempotency claim replays or conflicts", async () => { + const replayExecutor = createRecordingExecutor({ + claimStatus: CommandOutcomePersistenceStatus.Replayed, + }); + const replayStore = createCockroachCommandStateStoreFactory({ + executor: replayExecutor, + }).createCommandStateStore(); + + const replayResult = await replayStore.recordCommandOutcome(createCommandOutcome()); + + equal(replayResult.status, CommandOutcomePersistenceStatus.Replayed); + deepEqual( + replayExecutor.transactionStatements.map((statement) => statement.name), + [CockroachCommandStateStoreStatement.ClaimIdempotencyRecord], + ); + + const conflictExecutor = createRecordingExecutor({ + claimStatus: CommandOutcomePersistenceStatus.IdempotencyConflict, + }); + const conflictStore = createCockroachCommandStateStoreFactory({ + executor: conflictExecutor, + }).createCommandStateStore(); + + const conflictResult = await conflictStore.recordCommandOutcome(createCommandOutcome()); + + equal(conflictResult.status, CommandOutcomePersistenceStatus.IdempotencyConflict); + deepEqual( + conflictExecutor.transactionStatements.map((statement) => statement.name), + [CockroachCommandStateStoreStatement.ClaimIdempotencyRecord], + ); + }); }); type RecordingCockroachSqlExecutor = CockroachSqlExecutor & { @@ -141,7 +94,9 @@ type RecordingCockroachSqlExecutor = CockroachSqlExecutor & { transactionStatements: { name: CockroachCommandStateStoreStatement; sql: string; parameters: readonly unknown[] }[]; }; -function createRecordingExecutor(): RecordingCockroachSqlExecutor { +function createRecordingExecutor( + input: { claimStatus?: CommandOutcomePersistenceStatus } = {}, +): RecordingCockroachSqlExecutor { const statements: { name: CockroachCommandStateStoreStatement; sql: string; parameters: readonly unknown[] }[] = []; const transactionStatements: { name: CockroachCommandStateStoreStatement; @@ -158,9 +113,126 @@ function createRecordingExecutor(): RecordingCockroachSqlExecutor { rows: [], }; }, - executeTransaction: async (transaction) => { - transactionStatements.push(...transaction.statements); - statements.push(...transaction.statements); + executeTransaction: async (operation) => + await operation({ + execute: async >(statement: { + name: CockroachCommandStateStoreStatement; + sql: string; + parameters: readonly unknown[]; + }) => { + transactionStatements.push(statement); + statements.push(statement); + + if (statement.name === CockroachCommandStateStoreStatement.ClaimIdempotencyRecord) { + return { + rows: [ + { + persistence_status: input.claimStatus ?? CommandOutcomePersistenceStatus.Committed, + request_hash: "hash-001", + result_json: { + status: CommandResultStatus.Accepted, + idempotency: { + replayed: false, + }, + }, + }, + ] as readonly unknown[] as readonly Row[], + }; + } + + return { + rows: [], + }; + }, + }), + }; +} + +function createCommandOutcome(): RecordCommandOutcomeInput { + return { + idempotencyRecord: { + idempotencyKey: "idem-001", + requestHash: "hash-001", + result: { + status: CommandResultStatus.Accepted, + idempotency: { + replayed: false, + }, + }, + }, + effects: { + supervisorSignals: [ + { + supervisorSignalId: "supervisor-signal-001", + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + sourceLevel: SupervisorChainLevel.TeamMember, + targetLevel: SupervisorChainLevel.Manager, + targetHatAssignmentId: "hat-assignment-em-001", + sender: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + toolType: SupervisorSignalToolType.ReportBlocker, + status: SupervisorSignalStatus.Sent, + title: "Blocked on scoped NATS publisher", + message: "Need a scoped publisher decision.", + relatedWorkItemId: "work-outbox-001", + createdAt: "2026-05-25T20:00:00.000Z", + }, + ], + auditEvents: [ + { + auditEventId: "audit-001", + eventName: AgenticEventType.SupervisorSignalSent, + aggregateId: "supervisor-signal-001", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + occurredAt: "2026-05-25T20:00:00.000Z", + }, + ], + outboxEvents: [ + { + outboxEventId: "outbox-001", + envelope: { + eventId: "evt-001", + eventType: AgenticEventType.SupervisorSignalSent, + schemaVersion: "agentic.org.event.v1", + occurredAt: "2026-05-25T20:00:00.000Z", + actor: { + agentId: "agent-developer-001", + hatAssignmentId: "hat-assignment-dev-001", + }, + scope: { + organizationId: "org-lfg", + projectId: "project-agentic-org", + teamId: "team-runtime", + workItemId: "work-outbox-001", + }, + aggregate: { + aggregateId: "supervisor-signal-001", + aggregateType: AgenticAggregateType.SupervisorSignal, + aggregateVersion: 1, + }, + trace: { + commandId: "cmd-001", + correlationId: "corr-001", + causationId: "cause-001", + traceId: "trace-001", + idempotencyKey: "idem-001", + }, + replay: { + isReplay: false, + }, + payload: { + title: "Blocked on scoped NATS publisher", + }, + }, + }, + ], }, }; } diff --git a/agentic-organization/packages/state/src/in-memory-organization-store.ts b/agentic-organization/packages/state/src/in-memory-organization-store.ts index 0f349d3d9c..6e5738dad2 100644 --- a/agentic-organization/packages/state/src/in-memory-organization-store.ts +++ b/agentic-organization/packages/state/src/in-memory-organization-store.ts @@ -1,4 +1,8 @@ -import type { CommandStateStore, CommandStateStoreFactory } from "../../application/src/ports.ts"; +import { + CommandOutcomePersistenceStatus, + type CommandStateStore, + type CommandStateStoreFactory, +} from "../../application/src/ports.ts"; import type { AuditEvent, DiscussionAnchor, @@ -61,10 +65,31 @@ function createCommandStateStore( return { findIdempotencyRecord: async (idempotencyKey) => snapshot.idempotencyRecords.get(idempotencyKey), recordCommandOutcome: async (input) => { + const existingRecord = snapshot.idempotencyRecords.get(input.idempotencyRecord.idempotencyKey); + + if (existingRecord?.requestHash === input.idempotencyRecord.requestHash) { + return { + status: CommandOutcomePersistenceStatus.Replayed, + result: existingRecord.result, + }; + } + + if (existingRecord !== undefined) { + return { + status: CommandOutcomePersistenceStatus.IdempotencyConflict, + existingRequestHash: existingRecord.requestHash, + }; + } + snapshot.idempotencyRecords.set(input.idempotencyRecord.idempotencyKey, input.idempotencyRecord); snapshot.supervisorSignals.push(...input.effects.supervisorSignals); snapshot.auditEvents.push(...input.effects.auditEvents); snapshot.outboxEvents.push(...input.effects.outboxEvents); + + return { + status: CommandOutcomePersistenceStatus.Committed, + result: input.idempotencyRecord.result, + }; }, }; } diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 99e021a97e..2d8eb9d8dd 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -320,12 +320,24 @@ executing privileged work directly. - **WHEN** a durable command adapter records a command outcome - **THEN** the idempotency record, command state, audit events, and - outbox events are submitted as one transaction batch + outbox events are submitted inside one transaction boundary - **AND** the idempotency record is reserved before effect rows are - submitted inside that batch + submitted inside that boundary - **AND** application handlers do not receive database transaction objects +#### Scenario: Durable command adapter loses idempotency claim race + +- **WHEN** a durable command adapter attempts to record a command + outcome after another transaction has already claimed the same + idempotency key +- **THEN** it returns a generic replay or idempotency-conflict result + through the command outcome port +- **AND** it does not insert duplicate supervisor signal, audit event, or + outbox rows +- **AND** application code does not receive vendor-specific duplicate + key errors or transaction objects + ### Requirement: Worker process boundary composes event loops through ports Organization worker code MUST remain a small composition boundary until From 6a1975d5a2743efb2673d87e3445319c3dd6295b Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 22:53:05 -0400 Subject: [PATCH 21/21] fix(agentic-org): harden inbound failure races --- .../docs/FIRST_IMPLEMENTATION_SLICE.md | 10 +- .../docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md | 16 +++- .../src/nats-jetstream-event-consumer.ts | 46 +++++++--- .../nats-jetstream-event-consumer.test.ts | 79 +++++++++++++++- .../src/cockroach-event-ingestion-store.ts | 80 +++++++++++----- .../cockroach-event-ingestion-store.test.ts | 91 +++++++++++++++---- openspec/specs/agentic-organization/spec.md | 24 ++++- 7 files changed, 287 insertions(+), 59 deletions(-) diff --git a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md index dadf99e42d..84826c6391 100644 --- a/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md +++ b/agentic-organization/docs/FIRST_IMPLEMENTATION_SLICE.md @@ -156,7 +156,10 @@ Hermes runs, MCP calls, and UI evidence. the traceable event boundary into the runtime ingestion processor, acknowledges processed and duplicate messages, terminates and dead-letters invalid envelopes or payload conflicts, and - negative-acknowledges transient ingestion failures. + negative-acknowledges transient ingestion failures. If dead-letter + publication or source-message termination fails, it records the + failure, negative-acknowledges the source message, and continues the + batch. - The event ingestion processor accepts decoded canonical envelopes, dedupes them by event ID plus consumer name, evaluates automation rules once, rejects same-event payload hash conflicts, and persists @@ -186,6 +189,11 @@ Hermes runs, MCP calls, and UI evidence. before inserting reaction plans. If the claim reports duplicate or payload conflict, the adapter returns that generic outcome without inserting reaction plans. +- The Cockroach event-ingestion adapter also requires the final + processed-receipt update to return the claimed receipt. If that + completion check fails after reaction plans were prepared, the + transaction rolls back and the adapter returns a generic duplicate + outcome. - Governance now checks that runtime code, like application code, cannot import vendor adapters or vendor clients directly. Vendor packages must implement generic Organization ports consumed by application/runtime diff --git a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md index b5cb0ac370..4b624496f4 100644 --- a/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md +++ b/agentic-organization/docs/TECHNICAL_CA_PACKAGE_ARCHITECTURE.md @@ -557,6 +557,13 @@ already completed the receipt, the adapter returns a duplicate outcome through the generic `EventIngestionStore` result without inserting reaction plans. +The processed marker must also prove the claim was still held by +returning the marked receipt. If the final mark no longer matches a +pending receipt after reaction plans were prepared, the adapter must +abort the transaction so those reaction plans roll back, then return a +generic duplicate outcome. Runtime code must not receive Cockroach +update-count details or transaction objects. + A worker host composes that ingestion processor with the outbox publisher but stays below the NestJS process layer. This creates a testable boundary where replayable inbound sources and live transport @@ -571,9 +578,12 @@ decodes canonical JSON envelopes and calls the runtime ingestion processor, but it owns JetStream-style decisions: ack processed and duplicate messages, terminate plus dead-letter invalid envelopes and payload conflicts, and negative-acknowledge transient ingestion -failures. This keeps runtime rules deterministic and transport-neutral -while still making live NATS behavior testable before a Nest worker -process exists. +failures. If dead-letter publishing or source-message termination +fails, it records the failure, negative-acknowledges the source message +for retry, and continues the fetched batch so one broken DLQ path cannot +starve unrelated messages. This keeps runtime rules deterministic and +transport-neutral while still making live NATS behavior testable before +a Nest worker process exists. ### Stream and Consumer Manifests diff --git a/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-consumer.ts b/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-consumer.ts index e9d06b4b19..434886e3a1 100644 --- a/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-consumer.ts +++ b/agentic-organization/packages/messaging-nats/src/nats-jetstream-event-consumer.ts @@ -152,8 +152,10 @@ async function processMessage(input: ProcessMessageInput): Promise { input.result.acknowledgedCount += 1; } catch { input.result.failedCount += 1; - await input.message.negativeAcknowledge(); - input.result.negativeAcknowledgedCount += 1; + await negativeAcknowledgeFailedMessage({ + message: input.message, + result: input.result, + }); } } @@ -165,15 +167,37 @@ type TerminateWithDeadLetterInput = { }; async function terminateWithDeadLetter(input: TerminateWithDeadLetterInput): Promise { - await input.deadLetterPublisher.publish({ - sourceSubject: input.message.subject, - payload: input.message.payload, - headers: input.message.headers, - reason: input.reason, - }); - input.result.deadLetteredCount += 1; - await input.message.terminate(); - input.result.terminatedCount += 1; + try { + await input.deadLetterPublisher.publish({ + sourceSubject: input.message.subject, + payload: input.message.payload, + headers: input.message.headers, + reason: input.reason, + }); + input.result.deadLetteredCount += 1; + await input.message.terminate(); + input.result.terminatedCount += 1; + } catch { + input.result.failedCount += 1; + await negativeAcknowledgeFailedMessage({ + message: input.message, + result: input.result, + }); + } +} + +type NegativeAcknowledgeFailedMessageInput = { + message: NatsJetStreamInboundMessage; + result: NatsJetStreamConsumeBatchResult; +}; + +async function negativeAcknowledgeFailedMessage(input: NegativeAcknowledgeFailedMessageInput): Promise { + try { + await input.message.negativeAcknowledge(); + input.result.negativeAcknowledgedCount += 1; + } catch { + input.result.failedCount += 1; + } } function decodeCanonicalEventEnvelope(payload: string): AgenticEventEnvelope | undefined { diff --git a/agentic-organization/packages/messaging-nats/test/nats-jetstream-event-consumer.test.ts b/agentic-organization/packages/messaging-nats/test/nats-jetstream-event-consumer.test.ts index 5c9a318fd1..219c96f1a7 100644 --- a/agentic-organization/packages/messaging-nats/test/nats-jetstream-event-consumer.test.ts +++ b/agentic-organization/packages/messaging-nats/test/nats-jetstream-event-consumer.test.ts @@ -125,6 +125,66 @@ describe("NATS JetStream event consumer", () => { ]); }); + test("negative-acknowledges invalid payloads when dead-letter publishing fails", async () => { + const invalidMessage = createRecordingInboundMessage({ + payload: "{not-json", + }); + const validMessage = createRecordingInboundMessage({ + payload: JSON.stringify(createEnvelope()), + }); + const eventIngestionProcessor = createRecordingEventIngestionProcessor(EventIngestionOutcomeStatus.Processed); + const consumer = createNatsJetStreamEventConsumer({ + pullConsumer: createRecordingPullConsumer([invalidMessage, validMessage]), + eventIngestionProcessor, + deadLetterPublisher: createFailingDeadLetterPublisher("dlq unavailable"), + }); + + const result = await consumer.processNextBatch({ + batchSize: 10, + }); + + deepEqual(invalidMessage.ackActions, [NatsInboundMessageAckAction.NegativeAcknowledge]); + deepEqual(validMessage.ackActions, [NatsInboundMessageAckAction.Acknowledge]); + deepEqual(eventIngestionProcessor.eventIds, ["evt-nats-001"]); + equal(result.receivedCount, 2); + equal(result.invalidCount, 1); + equal(result.processedCount, 1); + equal(result.failedCount, 1); + equal(result.deadLetteredCount, 0); + equal(result.terminatedCount, 0); + equal(result.negativeAcknowledgedCount, 1); + equal(result.acknowledgedCount, 1); + }); + + test("negative-acknowledges payload conflicts when message termination fails", async () => { + const envelope = createEnvelope(); + const message = createRecordingInboundMessage({ + payload: JSON.stringify(envelope), + failTerminate: true, + }); + const deadLetterPublisher = createRecordingDeadLetterPublisher(); + const consumer = createNatsJetStreamEventConsumer({ + pullConsumer: createRecordingPullConsumer([message]), + eventIngestionProcessor: createRecordingEventIngestionProcessor(EventIngestionOutcomeStatus.PayloadConflict), + deadLetterPublisher, + }); + + const result = await consumer.processNextBatch({ + batchSize: 10, + }); + + deepEqual(message.ackActions, [ + NatsInboundMessageAckAction.Terminate, + NatsInboundMessageAckAction.NegativeAcknowledge, + ]); + equal(result.payloadConflictCount, 1); + equal(result.failedCount, 1); + equal(result.deadLetteredCount, 1); + equal(result.terminatedCount, 0); + equal(result.negativeAcknowledgedCount, 1); + equal(deadLetterPublisher.messages.length, 1); + }); + test("negative-acknowledges transient ingestion failures", async () => { const message = createRecordingInboundMessage({ payload: JSON.stringify(createEnvelope()), @@ -177,7 +237,10 @@ function createEnvelope(): AgenticEventEnvelope { }); } -function createRecordingInboundMessage(input: { payload: string }): NatsJetStreamInboundMessage & { +function createRecordingInboundMessage(input: { + payload: string; + failTerminate?: boolean; +}): NatsJetStreamInboundMessage & { ackActions: NatsInboundMessageAckAction[]; } { const ackActions: NatsInboundMessageAckAction[] = []; @@ -197,6 +260,10 @@ function createRecordingInboundMessage(input: { payload: string }): NatsJetStrea }, terminate: async () => { ackActions.push(NatsInboundMessageAckAction.Terminate); + + if (input.failTerminate === true) { + throw new Error("terminate unavailable"); + } }, }; } @@ -288,3 +355,13 @@ function createRecordingDeadLetterPublisher(): { }, }; } + +function createFailingDeadLetterPublisher(message: string): { + publish: () => Promise; +} { + return { + publish: async () => { + throw new Error(message); + }, + }; +} diff --git a/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts b/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts index 5c0f0c43a8..577955cc37 100644 --- a/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts +++ b/agentic-organization/packages/state-cockroach/src/cockroach-event-ingestion-store.ts @@ -73,36 +73,57 @@ export function createCockroachEventIngestionStore( recordEventProcessingOutcome: async (outcome) => { const receipt = outcome.receipt; - return await input.executor.executeTransaction(async (transaction) => { - const claimResult = await transaction.execute({ - name: CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt, - sql: CockroachEventIngestionStoreSql.ClaimPendingInboxReceipt, - parameters: [receipt.eventId, receipt.consumerName, receipt.firstSeenAt, receipt.payloadHash], - }); - const claimStatus = claimResult.rows[0]?.claim_status ?? EventIngestionOutcomeStatus.PayloadConflict; + try { + return await input.executor.executeTransaction(async (transaction) => { + const claimResult = await transaction.execute({ + name: CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt, + sql: CockroachEventIngestionStoreSql.ClaimPendingInboxReceipt, + parameters: [receipt.eventId, receipt.consumerName, receipt.firstSeenAt, receipt.payloadHash], + }); + const claimStatus = claimResult.rows[0]?.claim_status ?? EventIngestionOutcomeStatus.PayloadConflict; + + if (claimStatus !== EventIngestionOutcomeStatus.Processed) { + return { + status: claimStatus, + reactionPlans: [], + }; + } + + for (const reactionPlan of outcome.reactionPlans) { + await transaction.execute(createInsertReactionPlanStatement(reactionPlan)); + } + + const markResult = await transaction.execute({ + name: CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed, + sql: CockroachEventIngestionStoreSql.MarkInboxReceiptProcessed, + parameters: [ + receipt.eventId, + receipt.consumerName, + outcome.processedAt, + outcome.result, + receipt.payloadHash, + ], + }); + + if (markResult.rows.length !== 1) { + throw new CockroachInboxReceiptCompletionLostError(); + } - if (claimStatus !== EventIngestionOutcomeStatus.Processed) { return { - status: claimStatus, + status: outcome.result, + reactionPlans: outcome.reactionPlans, + }; + }); + } catch (error) { + if (error instanceof CockroachInboxReceiptCompletionLostError) { + return { + status: EventIngestionOutcomeStatus.Duplicate, reactionPlans: [], }; } - for (const reactionPlan of outcome.reactionPlans) { - await transaction.execute(createInsertReactionPlanStatement(reactionPlan)); - } - - await transaction.execute({ - name: CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed, - sql: CockroachEventIngestionStoreSql.MarkInboxReceiptProcessed, - parameters: [receipt.eventId, receipt.consumerName, outcome.processedAt, outcome.result, receipt.payloadHash], - }); - - return { - status: outcome.result, - reactionPlans: outcome.reactionPlans, - }; - }); + throw error; + } }, }; } @@ -138,6 +159,16 @@ type CockroachReceiptClaimRow = { claim_status: EventIngestionOutcomeStatus; }; +type CockroachMarkedInboxReceiptRow = { + event_id: string; +}; + +class CockroachInboxReceiptCompletionLostError extends Error { + constructor() { + super("inbox receipt completion lost its claim"); + } +} + const CockroachEventIngestionStoreSql = { FindInboxReceipt: ` SELECT event_id, consumer_name, first_seen_at, processed_at, payload_hash, result @@ -198,6 +229,7 @@ const CockroachEventIngestionStoreSql = { AND payload_hash = $5 AND processed_at IS NULL AND result IS NULL + RETURNING event_id `, } as const; diff --git a/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts b/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts index 403ee3e269..2e882b3d5a 100644 --- a/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts +++ b/agentic-organization/packages/state-cockroach/test/cockroach-event-ingestion-store.test.ts @@ -91,6 +91,39 @@ describe("cockroach event ingestion store", () => { ); }); + test("returns duplicate when the processed receipt mark loses the race", async () => { + const executor = createRecordingExecutor({ + markProcessedRowCount: 0, + }); + const store = createCockroachEventIngestionStore({ + executor, + }); + + const result = await store.recordEventProcessingOutcome({ + receipt: { + eventId: "evt-supervisor-signal-001", + consumerName: InboundEventConsumerName.V0AutomationPlanner, + firstSeenAt: "2026-05-25T22:00:00.000Z", + payloadHash: "hash-evt-supervisor-signal-001", + }, + reactionPlans: [createReactionPlanRecord()], + processedAt: "2026-05-25T22:00:00.000Z", + result: EventIngestionOutcomeStatus.Processed, + }); + + equal(result.status, EventIngestionOutcomeStatus.Duplicate); + deepEqual(result.reactionPlans, []); + deepEqual( + executor.transactionStatements.map((statement) => statement.name), + [ + CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt, + CockroachEventIngestionStoreStatement.InsertReactionPlan, + CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed, + ], + ); + equal(executor.rolledBackTransactionCount, 1); + }); + test("normalizes SQL null receipt completion fields to pending receipt fields", async () => { const executor = createRecordingExecutor({ rows: [ @@ -125,20 +158,26 @@ describe("cockroach event ingestion store", () => { type RecordingCockroachEventIngestionSqlExecutor = CockroachEventIngestionSqlExecutor & { statements: CockroachEventIngestionSqlStatement[]; transactionStatements: CockroachEventIngestionSqlStatement[]; + rolledBackTransactionCount: number; }; function createRecordingExecutor( input: { rows?: readonly unknown[]; claimStatus?: EventIngestionOutcomeStatus; + markProcessedRowCount?: number; } = {}, ): RecordingCockroachEventIngestionSqlExecutor { const statements: CockroachEventIngestionSqlStatement[] = []; const transactionStatements: CockroachEventIngestionSqlStatement[] = []; + let rolledBackTransactionCount = 0; return { statements, transactionStatements, + get rolledBackTransactionCount() { + return rolledBackTransactionCount; + }, execute: async >(statement: CockroachEventIngestionSqlStatement) => { statements.push(statement); @@ -146,27 +185,43 @@ function createRecordingExecutor( rows: (input.rows ?? []) as readonly Row[], }; }, - executeTransaction: async (operation) => - await operation({ - execute: async >(statement: CockroachEventIngestionSqlStatement) => { - transactionStatements.push(statement); - statements.push(statement); + executeTransaction: async (operation) => { + try { + return await operation({ + execute: async >(statement: CockroachEventIngestionSqlStatement) => { + transactionStatements.push(statement); + statements.push(statement); + + if (statement.name === CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt) { + return { + rows: [ + { + claim_status: input.claimStatus ?? EventIngestionOutcomeStatus.Processed, + }, + ] as readonly unknown[] as readonly Row[], + }; + } + + if (statement.name === CockroachEventIngestionStoreStatement.MarkInboxReceiptProcessed) { + const rowCount = input.markProcessedRowCount ?? 1; + + return { + rows: Array.from({ length: rowCount }, () => ({ + event_id: "evt-supervisor-signal-001", + })) as readonly unknown[] as readonly Row[], + }; + } - if (statement.name === CockroachEventIngestionStoreStatement.ClaimPendingInboxReceipt) { return { - rows: [ - { - claim_status: input.claimStatus ?? EventIngestionOutcomeStatus.Processed, - }, - ] as readonly unknown[] as readonly Row[], + rows: [], }; - } - - return { - rows: [], - }; - }, - }), + }, + }); + } catch (error) { + rolledBackTransactionCount += 1; + throw error; + } + }, }; } diff --git a/openspec/specs/agentic-organization/spec.md b/openspec/specs/agentic-organization/spec.md index 2d8eb9d8dd..85298f2726 100644 --- a/openspec/specs/agentic-organization/spec.md +++ b/openspec/specs/agentic-organization/spec.md @@ -246,6 +246,16 @@ and a concrete event-publisher adapter. - **AND** the runtime rule processor does not know about NATS ack, nack, termination, backoff, or DLQ mechanics +#### Scenario: NATS adapter falls back when dead-letter handling fails + +- **WHEN** the NATS JetStream consumer adapter receives an invalid + envelope or payload-conflict result +- **AND** publishing to the dead-letter port or terminating the source + message fails +- **THEN** the adapter records the failure and negative-acknowledges the + source message for retry +- **AND** later messages in the fetched batch can still be processed + ### Requirement: Inbound events are deduped before automation Organization event consumers MUST record inbox receipts before @@ -303,7 +313,9 @@ executing privileged work directly. - **WHEN** a durable event-ingestion adapter records an event-processing outcome - **THEN** the inbox receipt, generated reaction plans, and processed - marker are submitted as one transaction batch + marker are submitted inside one transaction boundary +- **AND** the processed marker must return the claimed receipt before the + adapter reports the outcome as processed - **AND** runtime rule processors do not receive database transaction objects @@ -316,6 +328,16 @@ executing privileged work directly. - **AND** it does not insert reaction plans - **AND** it does not mark the completed receipt again +#### Scenario: Durable event-ingestion adapter loses completion race + +- **WHEN** a durable event-ingestion adapter claims a pending receipt but + the final processed marker no longer matches a pending receipt +- **THEN** the adapter rolls back generated reaction plans +- **AND** it returns a duplicate event-processing outcome through the + generic event-ingestion port +- **AND** runtime code does not receive database transaction objects or + vendor-specific update-count errors + #### Scenario: Durable command adapter uses one transaction boundary - **WHEN** a durable command adapter records a command outcome