From ca7e39abcecda3ab95c48d8a4e8299e07cffe75d Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 13:05:49 -0400 Subject: [PATCH 1/2] docs: add agentic organization architecture Place the Agentic Organization design docs under docs/agentic-organization and index them from the docs audience navigation. Document the TypeScript app shape as shared npm capability packages composed by NestJS orchestrator apps. Co-Authored-By: Codex --- docs/README.md | 1 + .../AI_CLUSTER_SCAFFOLD_CONTEXT.md | 247 ++ .../ALWAYS_ON_ORCHESTRATION_RUNTIME.md | 660 +++++ .../AMBIGUOUS_REQUIREMENT_LIFECYCLE.md | 527 ++++ .../ANTI_STALL_PRIORITY_RUNTIME.md | 373 +++ .../CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md | 359 +++ .../CLUSTER_NATIVE_HAT_SYSTEM.md | 416 +++ .../DEPARTMENT_HAT_TOOL_INVENTORY.md | 404 +++ .../FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md | 97 + .../IMPLEMENTATION_CONCEPTS.md | 2232 +++++++++++++++++ .../IMPLEMENTATION_READINESS_CHECKLIST.md | 493 ++++ .../ORGANIZATION_LAYER_BUILD_PLAN.md | 677 +++++ .../ORGANIZATION_RUNTIME_ARCHITECTURE.md | 2231 ++++++++++++++++ docs/agentic-organization/README.md | 23 + .../RUNTIME_TECH_AND_PACKAGE_STRATEGY.md | 723 ++++++ .../UI_AND_OBSERVABILITY_CONCEPTS.md | 755 ++++++ .../WORK_AND_RELEASE_MANAGEMENT_OS.md | 467 ++++ 17 files changed, 10685 insertions(+) create mode 100644 docs/agentic-organization/AI_CLUSTER_SCAFFOLD_CONTEXT.md create mode 100644 docs/agentic-organization/ALWAYS_ON_ORCHESTRATION_RUNTIME.md create mode 100644 docs/agentic-organization/AMBIGUOUS_REQUIREMENT_LIFECYCLE.md create mode 100644 docs/agentic-organization/ANTI_STALL_PRIORITY_RUNTIME.md create mode 100644 docs/agentic-organization/CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md create mode 100644 docs/agentic-organization/CLUSTER_NATIVE_HAT_SYSTEM.md create mode 100644 docs/agentic-organization/DEPARTMENT_HAT_TOOL_INVENTORY.md create mode 100644 docs/agentic-organization/FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md create mode 100644 docs/agentic-organization/IMPLEMENTATION_CONCEPTS.md create mode 100644 docs/agentic-organization/IMPLEMENTATION_READINESS_CHECKLIST.md create mode 100644 docs/agentic-organization/ORGANIZATION_LAYER_BUILD_PLAN.md create mode 100644 docs/agentic-organization/ORGANIZATION_RUNTIME_ARCHITECTURE.md create mode 100644 docs/agentic-organization/README.md create mode 100644 docs/agentic-organization/RUNTIME_TECH_AND_PACKAGE_STRATEGY.md create mode 100644 docs/agentic-organization/UI_AND_OBSERVABILITY_CONCEPTS.md create mode 100644 docs/agentic-organization/WORK_AND_RELEASE_MANAGEMENT_OS.md diff --git a/docs/README.md b/docs/README.md index 06b152888c..166a15d3b7 100644 --- a/docs/README.md +++ b/docs/README.md @@ -18,6 +18,7 @@ If you are not sure which audience you are, read | **Factory adopter** (starting a new project on the factory kit) | [`../AGENTS.md`](../AGENTS.md) + [`../CONTRIBUTING.md`](../CONTRIBUTING.md) -> [2. Factory adopters](#2-factory-adopters) | | **AI agent** (fresh wake, need rules + skills + personas) | [`../CLAUDE.md`](../CLAUDE.md) -> [`../AGENTS.md`](../AGENTS.md) -> [3. AI agents](#3-ai-agents) | | **Zeta contributor** (shipping DBSP algebra, proofs, F# code) | [`ARCHITECTURE.md`](ARCHITECTURE.md) -> [4. Zeta contributors](#4-zeta-contributors) | +| **Agentic Organization builder** (designing the AI cluster organization runtime) | [`agentic-organization/README.md`](agentic-organization/README.md) | | **Zeta consumer** (installing the NuGet libraries in my app) | [`../README.md`](../README.md) -> [5. Zeta consumers](#5-zeta-consumers) | | **Observer / reviewer** (not contributing; evaluating the project) | [`FACTORY-RESUME.md`](FACTORY-RESUME.md) -> [6. Observers / reviewers](#6-observers--reviewers) | | **Research-paper reader** (peer review, citation, verification) | [`research/`](research/) -> [7. Research-paper readers](#7-research-paper-readers) | diff --git a/docs/agentic-organization/AI_CLUSTER_SCAFFOLD_CONTEXT.md b/docs/agentic-organization/AI_CLUSTER_SCAFFOLD_CONTEXT.md new file mode 100644 index 0000000000..4250d8adad --- /dev/null +++ b/docs/agentic-organization/AI_CLUSTER_SCAFFOLD_CONTEXT.md @@ -0,0 +1,247 @@ +# AI Cluster Scaffold Context + +This document captures repository and bootstrap context from the `ai-cluster-bootstrap` work so Organization implementation aligns with the actual cluster direction. It is not a deployment-manifest spec. + +The broader vocabulary and original cluster mental model are captured in [Foundational Context and Language](./FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md). + +Current GitHub project: + +- `https://github.com/Lucent-Financial-Group/Zeta` + +## Repository Shape + +The cluster scaffold is split into two top-level directories: + +| Directory | Purpose | +|---|---| +| `usb-nixos-installer/` | USB-only installer bootstrap, intentionally minimal | +| `full-ai-cluster/` | End-to-end cluster scaffold, including installer copy, NixOS host configs, k3s bootstrap, and ArgoCD applications | + +The full cluster layer includes: + +- Nix flake for installer and per-host configs; +- NixOS modules for common host setup, k3s server/agent, Docker, local storage, GPU support, GPU passthrough, and GPU device plugins; +- host configs for control plane and GPU worker; +- k3s bootstrap manifests applied on first boot; +- ArgoCD App-of-Apps for cluster applications. + +Important concrete directories: + +| Path | Meaning | +|---|---| +| `full-ai-cluster/usb-nixos-installer/` | Byte-identical copy of the standalone USB installer | +| `full-ai-cluster/flake.nix` | Cluster flake for installer, hosts, and maintainer linux-builder support | +| `full-ai-cluster/nixos/modules/` | Host-level NixOS modules | +| `full-ai-cluster/nixos/hosts/control-plane/` | Control-plane host config | +| `full-ai-cluster/nixos/hosts/worker-gpu/` | GPU worker host config | +| `full-ai-cluster/k8s/bootstrap/` | K3S first-boot manifests, applied in dependency order | +| `full-ai-cluster/k8s/applications/` | ArgoCD-recognized platform applications | + +## OS and Cluster Responsibilities + +| Layer | Owns | +|---|---| +| NixOS host layer | k3s, Docker/rootless Docker posture, local storage, GPU drivers/toolkit/passthrough/device plugin support | +| k3s bootstrap layer | first-boot installation of Cilium, security substrate, ArgoCD, and root app | +| ArgoCD layer | ongoing reconciliation of platform applications | +| Organization layer | work, hats, tasks, assignments, approvals, signals, runs, memory attribution, and evidence | + +The Organization should assume the cluster exists as the execution substrate. It should not duplicate host bootstrap logic. + +## Two Reconcilers + +The scaffold deliberately has two reconciliation domains: + +| Reconciler | Scope | Update path | +|---|---|---| +| Nix / NixOS | Host OS, bootloader, kernel modules, K3S service, Docker, host storage, GPU drivers, passthrough, base packages | `nixos-install --flake` for first install, `nixos-rebuild switch --flake` for updates | +| ArgoCD | Cluster workloads and platform applications under `k8s/applications/` | Git commit and push, then ArgoCD reconciliation | + +The Organization runtime should be a consumer of this substrate. It should create Organization records, request workload launches, watch health, and surface drift, but it should not become the host bootstrap system. + +## Cilium Bootstrap Constraint + +Cilium must exist before ArgoCD can schedule reliably when k3s disables its default networking. + +The k3s server configuration disables default networking pieces so Cilium owns networking end to end: + +- no flannel; +- no kube-proxy; +- no k3s network policy; +- Cilium owns CNI, KPR, and policy. + +That means Cilium cannot be installed only later by ArgoCD. The first-boot k3s Helm Controller needs to install Cilium before ArgoCD. + +Current theoretical first-boot ordering: + +1. Cilium; +2. cert-manager; +3. Vault; +4. SPIRE; +5. Trust Manager; +6. External Secrets Operator; +7. ArgoCD; +8. Root App-of-Apps. + +The Organization runtime should treat this as a dependency reality: agent workloads should not launch until the cluster security/network substrate is healthy. + +The concrete bootstrap directory currently represents this order with: + +- `cilium-namespace.yaml`; +- `cilium-install.yaml`; +- `cert-manager-install.yaml`; +- `vault-install.yaml`; +- `spire-install.yaml`; +- `trust-manager-install.yaml`; +- `external-secrets-install.yaml`; +- `argocd-namespace.yaml`; +- `argocd-install.yaml`; +- `root-application.yaml`. + +## Confirmed Component Direction + +| Component | Direction | +|---|---| +| Cilium | CNI, KPR, L7 policy, Hubble, Gateway API, ingress, BPF masquerade, encryption | +| cert-manager | TLS issuance | +| Vault | Secrets backend | +| SPIRE | Workload identity | +| Trust Manager | CA bundle distribution | +| External Secrets Operator | Vault-to-Kubernetes Secret sync | +| ArgoCD | App-of-Apps reconciliation | +| Open Policy Agent / Gatekeeper | Cluster policy constraints | +| Sealed Secrets | Encrypted low-churn Git-stored secrets | +| Longhorn | Persistent storage | +| CockroachDB | Distributed SQL source of truth for Organization-owned critical state | +| Temporal TS | Durable workflow/process rail | +| Dapr Actors | Entity-local actor/concurrency rail | +| Orleans | Present as a custom silo application; not the preferred TypeScript-first Organization primitive unless a .NET grain use case is explicit | +| Argo Workflows / Rollouts | DAG jobs and progressive delivery | +| Hindsight | Hermes persistent memory | +| OpenZiti / Ziti | Secure transport/connectivity layer | +| Hermes | Custom cloud-oriented agent runtime | +| NATS | Event/status/inbox transport | +| Redis | Cache or short-lived coordination support, not Organization truth | +| Weaviate | Vector database option; Hindsight remains Hermes memory integration | +| Loki / Tempo / Alloy / Mimir / Prometheus / Grafana | Observability stack | +| GitLab | Default-on forge/service platform | +| Forgejo | Manual-sync alternative | +| Ollama / vLLM / local coder models | Deferred/manual local-model phase | +| Oz / Warp | Agent/session orchestration layer; distinct from OpenZiti transport | +| Warp as separate app | Removed as a standalone component if Oz owns this orchestration role | + +The pasted scaffold status still mentions Istio in one historical component-status line. Treat that as stale. The active direction is: Istio is removed, and Cilium Service Mesh owns L7 policy, mTLS-capable service mesh behavior, Gateway API, ingress, traffic handling, and observability without per-pod sidecars. + +## Oz, Warp, and OpenZiti Clarification + +Oz should be treated as the Warp-style orchestration layer for Hermes agent/session runs. + +OpenZiti should be treated as the zero-trust transport/connectivity layer. + +Implications for the Organization: + +- use `OpenZiti` terminology where precision matters; +- use `Oz` for the agent run orchestrator; +- use `Warp` only as the orchestration concept Oz provides, not as a separate active app unless that decision changes; +- avoid conflating OpenZiti transport with Organization workflow orchestration; +- Credential Proxy and Cilium/SPIRE still enforce Organization authority even when transport is OpenZiti-backed. + +If the current cluster scaffold uses an `oz/` application directory for OpenZiti, that should be treated as a naming conflict. Prefer renaming the transport application to `openziti/` or documenting it as OpenZiti transport, while reserving Oz for orchestration. + +## Hermes Clarification + +Hermes is custom and cloud-oriented for the current phase. + +Implications: + +- do not assume local Ollama/vLLM endpoints are available in v0; +- cloud provider keys are expected to be supplied through secure build/runtime secret handling; +- local model environment variables can remain future-facing but should not drive MVP design; +- Hermes should integrate with Hindsight memory, Oz orchestration, and OpenZiti transport according to the active cluster configuration. + +## Hindsight Clarification + +Hindsight is the persistent memory system for Hermes. + +The active direction is a standalone Helm chart via ArgoCD, later clarified as the real `vectorize-io/hindsight` OCI chart. + +Implications: + +- design memory as durable and precious; +- do not prune memory by default; +- expose memory health and recall latency to Organization observability; +- enforce hat-scoped retain/recall/reflect through Organization policy or an adapter. + +## Secrets Model + +The scaffold intentionally keeps multiple secret mechanisms because the secrets have different lifetimes and access patterns: + +| Mechanism | Use | +|---|---| +| Sealed Secrets | Encrypted Git-stored secrets for low-churn configuration | +| Vault | Runtime secrets backend, rotation, and audit | +| External Secrets Operator | Vault-to-Kubernetes Secret synchronization | +| SOPS | File-level encryption, including Hermes image-time secrets where required by the current spec | + +Organization implication: agents should never receive broad raw secrets. They should request actions through the MCP Gateway and Credential Proxy, which resolve approved scopes against Vault/External Secrets-backed references. + +## Deferred Local Models + +Local model serving is deferred. + +Deferred components include: + +- Ollama; +- vLLM; +- Deepseek Coder local serving; +- Qwen Coder local serving. + +The Organization should be LLM-provider-neutral, but v0 should not depend on local GPU model availability. GPU infrastructure can exist for future phases. + +## Cluster Update Model + +OS changes belong under `full-ai-cluster/nixos/` and land through NixOS rebuilds. + +Cluster workload changes belong under `full-ai-cluster/k8s/applications/` and land through ArgoCD. + +Organization product changes should be packaged as one or more ArgoCD-managed applications once the app exists. The Organization should expose its own higher-level change lifecycle for agents, but the physical cluster reconciliation contract remains GitOps through ArgoCD. + +## Git Forge Gating + +GitLab is the default-on forge in the cluster scaffold. + +Forgejo is a manual-sync alternative. + +Implications: + +- Credential Proxy should support a forge abstraction; +- first implementation can target GitLab; +- Forgejo support should be a capability expansion item unless required earlier. + +## Security Posture Lessons + +The scaffold review history includes important security lessons the Organization should inherit: + +- do not hardcode admin passwords; +- use existing secrets or sealed/external secret patterns; +- avoid plaintext API keys in Git; +- do not add users to the Docker group by default; +- quote and constrain destructive filesystem paths; +- avoid opening etcd ports broadly; +- gate deferred heavy workloads so they do not auto-start; +- avoid auto-reconciling mutually exclusive heavy services; +- make bootstrap ordering explicit when components depend on each other. + +These lessons should become Organization platform review criteria for internal infrastructure work. + +## Organization Implementation Implications + +Before implementation, define: + +- whether the Oz/Warp run adapter and OpenZiti transport adapter are separate interfaces or composed behind one higher-level launch path; +- how Hermes session containers receive OpenZiti transport configuration; +- how Cilium and SPIRE identity is represented in `AgentSessionActor`; +- how Vault/External Secrets references are represented in Credential Proxy requests; +- which components are required for the first local/cloud MVP; +- which components are deferred and must not auto-start; +- how the UI distinguishes active, deferred, manual-sync, and unavailable cluster capabilities. diff --git a/docs/agentic-organization/ALWAYS_ON_ORCHESTRATION_RUNTIME.md b/docs/agentic-organization/ALWAYS_ON_ORCHESTRATION_RUNTIME.md new file mode 100644 index 0000000000..49153313b0 --- /dev/null +++ b/docs/agentic-organization/ALWAYS_ON_ORCHESTRATION_RUNTIME.md @@ -0,0 +1,660 @@ +# Always-On Orchestration Runtime + +## Purpose + +The Organization needs an always-on runtime that continuously reacts to organizational state. + +This runtime is the layer between: + +- persisted Organization state; +- Oz/Warp run orchestration; +- OpenZiti transport state where private connectivity is required; +- NATS/JetStream events; +- k3s pods; +- Hermes agents; +- schedules, timers, policies, and rules. + +The goal is not to hard-code corporate behavior. The goal is to provide deterministic primitives so Hermes agents can run the Organization while the platform enforces safety, leases, state transitions, budgets, and observability. + +## Core Loop + +```text +state changes + -> domain event persisted + -> event published + -> rules evaluated + -> reaction plan created + -> leases/budget/hat supply checked + -> actions executed + -> run requests / messages / tasks / reports / escalations created + -> outcomes observed + -> reconciliation verifies reality matches Organization state +``` + +The Organization database remains authoritative. NATS, the run orchestrator, k3s, Hindsight, and telemetry systems are synchronized through workers and reconcilers. + +## Always-On Workers + +The control plane needs persistent background workers independent of Hermes agent sessions. + +Initial workers: + +- `SchedulerWorker`: claims due scheduled jobs and durable timers. +- `RuleEvaluationWorker`: evaluates rules after domain events and state changes. +- `ReactionExecutorWorker`: executes approved reaction plans. +- `OutboxPublisherWorker`: publishes persisted outbox events to NATS. +- `NatsConsumerWorker`: consumes durable NATS streams and invokes domain handlers. +- `OzReconcilerWorker`: reconciles Oz run state with Organization run bindings. +- `PodSessionWatchdogWorker`: watches k3s pods and Hermes session health. +- `LeaseReaperWorker`: expires stale runtime leases and fencing tokens. +- `DeadLetterWorker`: classifies, quarantines, replays, or escalates dead-letter messages. +- `TriggerWorker`: evaluates event, threshold, external, and state-timeout triggers. +- `AnomalyClassifierWorker`: converts telemetry anomalies into reports or self-healing attempts. +- `BudgetAndCapacityWorker`: enforces burn-rate, queue admission, hat supply, and scale-down policies. +- `ObservabilityCoverageWorker`: detects missing logs/traces/metrics/health coverage. + +These are boring system processes. Hermes agents may reason about their outputs, but the workers keep the runtime awake. + +## Durable Triggers + +Triggers are first-class runtime objects. + +Trigger types: + +- event trigger: reacts to domain events such as `TaskMarkedReady`; +- state trigger: reacts when an entity enters or leaves a state; +- state-timeout trigger: reacts when an entity remains in a state too long; +- scheduled trigger: reacts on cadence; +- threshold trigger: reacts to metrics, counts, budgets, or quality signals; +- external trigger: reacts to webhook or polled external system changes. + +Core entities: + +```text +durable_triggers +trigger_executions +trigger_checkpoints +``` + +Each trigger should define: + +- scope: organization, department, project, initiative, team, repo, task, or hat; +- owner hat or department; +- predicate; +- action type; +- policy requirements; +- concurrency policy; +- dedupe key; +- idempotency key; +- retry policy; +- cooldown; +- budget policy; +- enabled/paused state; +- version; +- last evaluation; +- next evaluation when time-based. + +## Organizational Rules and Reactions + +Rules decide how the Organization reacts to state. + +Core concepts: + +```text +OrganizationalRule + event/state predicate + scope + priority + owner department + required policy version + conditions + action list + +ReactionPlan + deterministic output of rule evaluation + state transition / assignment / spawn request / escalation / report / backlog item / no-op + +RuleEvaluationEvent + matched rules + skipped rules + conflict resolution + policy version + final action +``` + +Rules should not execute side effects directly. They create a reaction plan. The reaction executor claims the plan, validates leases/policy/budget, then executes actions. + +## Rule Scope + +Rule scopes: + +- organization rules; +- department rules; +- project rules; +- initiative rules; +- team rules; +- repo rules; +- hat rules; +- task/work item rules. + +Rules cascade from broad to narrow. Narrower rules can specialize behavior, but cannot bypass hard policy. + +Example: + +```text +TaskMarkedReady + -> project rule checks required docs + -> engineering department rule checks TDD requirement + -> team rule checks staffing + -> hat supply rule reserves implementer + -> agent launch policy creates Oz run request +``` + +## Conflict Policy + +Rules need deterministic conflict resolution. + +Default precedence: + +```text +Security hard block + > Compliance hard block + > Human emergency override within allowed scope + > Executive Board decision + > QA release block + > Architecture gate + > Budget/capacity hard limit + > Department director policy + > TPM initiative priority + > Team/manager preference +``` + +Any conflict resolution must produce an audit event explaining: + +- conflicting rules; +- precedence applied; +- final decision; +- policy version; +- approving hat or human override when applicable. + +## Agent Launch Policy + +Autonomous agent launches must be governed. + +`AgentLaunchPolicy` should define: + +- triggering event or state; +- allowed launcher hats; +- required target hats; +- required project/initiative/task context; +- Oz run template; +- Hermes profile; +- memory scope; +- documentation context; +- credential scopes; +- budget cap; +- max concurrency; +- idempotency key; +- cancellation conditions; +- completion signal; +- escalation path. + +Example: + +```text +TaskMarkedReady + -> if required docs are approved + -> if hat supply can reserve implementer + -> if budget is available + -> create Hermes implementer run + -> assign task and hat + -> require red-test artifact before implementation evidence +``` + +## Scheduler Semantics + +Scheduled jobs need explicit execution behavior. + +`scheduled_jobs` should include: + +- owner hat assignment; +- department/project/initiative/team scope; +- cadence; +- timezone; +- jitter; +- last run time; +- next run time; +- locked until; +- max runtime; +- misfire policy; +- concurrency policy; +- catch-up policy; +- run policy; +- budget policy; +- expected artifact outputs; +- escalation target; +- schedule version. + +Misfire policies: + +- skip missed run; +- run once immediately; +- catch up all missed runs up to a limit; +- escalate if too stale. + +Concurrency policies: + +- forbid overlap; +- allow overlap with cap; +- replace running job; +- queue behind running job. + +Scheduled jobs should create work, reports, meetings, reviews, or Oz run requests. They should not bypass work management. + +## Durable Timers + +Timers enforce lifecycle expectations. + +Examples: + +- review pending too long; +- vote pending too long; +- Oz callback missing; +- Hermes session silent too long; +- QA suite exceeded max runtime; +- hat token renewal overdue; +- task blocked too long; +- initiative has no TPM; +- project has stale docs; +- department queue exceeds SLA. + +Timers are durable triggers with entity scope and state predicate. + +## Runtime Leases and Leadership + +Workers must prevent duplicate execution. + +`runtime_leases` should include: + +- lease ID; +- resource type; +- resource ID; +- owner worker ID; +- owner pod ID; +- fencing token; +- acquired at; +- expires at; +- renewal count; +- policy; +- last heartbeat; +- release reason. + +Rules: + +- every scheduled job claim requires a lease; +- every reaction execution requires a lease; +- every external watcher checkpoint update requires a lease; +- every self-healing remediation requires a lease; +- stale leases are expired by the lease reaper; +- fencing tokens must be checked before writes that could be duplicated. + +## Queue and NATS Consumer Contracts + +NATS consumers need explicit contracts. + +Each durable consumer should define: + +- stream name; +- subject pattern; +- durable consumer name; +- owner service/worker; +- ack wait; +- max deliveries; +- backoff policy; +- replay policy; +- ordering expectation; +- idempotency key; +- dead-letter target; +- poison message classification; +- schema version handling; +- replay authorization requirements. + +At-least-once delivery is expected. Domain handlers must be idempotent. + +## Dead-Letter Workflow + +Dead-letter messages are work items for the Operations organization. + +Core entities: + +```text +dead_letter_messages +dead_letter_investigations +replay_requests +quarantine_decisions +discard_decisions +``` + +Lifecycle: + +```text +message dead-lettered + -> DLQ Steward hat assigned + -> classify poison / transient / schema / policy / duplicate + -> link trace and original entity + -> decide replay, quarantine, discard, or backlog item + -> require approval for replay if side effects are possible + -> record outcome +``` + +Discarding a message requires evidence and approval. + +## Watchers + +Watchers connect external reality to Organization state. + +Initial watchers: + +- Oz run watcher; +- k3s pod/session watcher; +- NATS stream health watcher; +- credential proxy denial watcher; +- Hindsight memory health watcher; +- telemetry ingestion watcher; +- Git provider watcher; +- CI/pipeline watcher; +- documentation repository watcher; +- project artifact watcher. + +Watcher requirements: + +- checkpoint storage; +- webhook plus polling fallback where possible; +- dedupe keys; +- lag metrics; +- stale-checkpoint alerts; +- owner hat/department; +- failure reports. + +## Reconciliation Loops + +Reconciliation treats the Organization DB as truth and external systems as observed state. + +Reconcilers should detect: + +- pending Oz runs not launched; +- Oz runs with no Organization binding; +- orphaned k3s pods; +- Hermes sessions silent past heartbeat threshold; +- stale hat assignments on dead sessions; +- unprocessed outbox events; +- NATS messages delivered but not reflected in state; +- missing artifacts; +- tasks stuck in impossible states; +- schedules not firing; +- watchers not advancing checkpoints. + +Reconcilers should repair when safe, otherwise create reports and escalations. + +## SLOs and Error Budgets + +Always-on systems need service objectives. + +Initial SLO categories: + +- Organization API availability and latency; +- MCP tool latency and success rate; +- Oz run launch success and callback freshness; +- Hermes session heartbeat freshness; +- NATS publish/consume lag; +- outbox drain time; +- scheduler lag; +- trigger evaluation lag; +- credential proxy availability and denial correctness; +- Hindsight adapter availability and recall latency; +- trace/log/metric ingestion freshness; +- self-healing success and escalation latency. + +Error budgets should influence priority. If a project or component burns its error budget, the Organization should reduce risky new work, prioritize reliability fixes, or require executive approval to continue. + +## Health Contracts + +Every platform component needs: + +- liveness check; +- readiness check; +- dependency check; +- freshness window; +- degraded mode; +- owner hat; +- escalation path; +- SLO target; +- dashboard link; +- runbook skill. + +Health reports should be queryable by project, department, component, cluster, and owner. + +## Self-Healing Policy + +Remediations must be classified. + +Classes: + +- auto-safe: can run without approval; +- approval-required: needs owning manager, DevOps, Security, or Executive approval; +- forbidden: must only create a report/escalation; +- human-only: requires explicit human action. + +Each remediation defines: + +- preconditions; +- blast-radius limit; +- max retry count; +- cooldown; +- rollback plan; +- verification check; +- failure escalation; +- evidence requirements. + +## Runbooks as Skills + +Recurring operational procedures should become versioned skills. + +Runbook skill frontmatter should include: + +```yaml +id: restart-stuck-hermes-session +name: Restart Stuck Hermes Session +type: operational-runbook +owners: + - devops-manager +allowedHats: + - platform-operator + - sre + - incident-commander +triggers: + - hermes.session.silent +preconditions: + - no-active-tool-call +approval: + class: auto-safe +evidence: + - trace + - run-log + - session-heartbeat +rollback: + required: true +status: active +version: 1 +``` + +Runbooks should be linked to incidents, anomalies, self-healing attempts, and postmortems. + +## Runtime Capability Expansion + +The always-on runtime should detect capability gaps and route them into governed expansion work. + +Sources: + +- repeated failed or denied tool calls; +- repeated credential proxy denials; +- repeated manual workarounds; +- repeated QA bounce-backs; +- recurring incidents; +- team review findings; +- director review findings; +- observability coverage gaps; +- missing workflow/automation for a repeatable process. + +Flow: + +```text +capability_gap_detected + -> create CapabilityRequest + -> route to owning Engineering Manager or department manager + -> Department Director prioritizes + -> Security reviews if credentials, tools, data, or automation risk exists + -> Architecture reviews if workflow, actor, runtime, or integration impact exists + -> implementation work is created + -> capability registry is updated after approval + -> rules/triggers may begin using the new capability +``` + +Expansion targets: + +- MCP tools; +- credential proxy endpoints; +- Temporal workflows; +- Dapr actors; +- durable triggers; +- scheduled jobs; +- project skills; +- runbook skills; +- hat capabilities; +- observability probes and dashboards. + +New capabilities should never become active directly from an agent request. They must pass through review gates, tests, observability requirements, registry activation, and policy-scoped availability. + +## Operations Hats + +Additional always-on hats: + +- Platform Operator; +- SRE; +- Incident Commander; +- DLQ Steward; +- Observability Curator; +- Cost Controller; +- Scheduler Steward; +- Trigger Steward; +- Runbook Maintainer. + +These hats do not replace engineering/product departments. They keep the Organization runtime healthy. + +## Incident Response + +Incidents need lifecycle state. + +Severity levels: + +- `sev0`: Organization cannot operate or unsafe actions are occurring. +- `sev1`: major project/runtime capability degraded. +- `sev2`: limited degraded capability with workarounds. +- `sev3`: low-impact issue or recurring annoyance. + +Incident lifecycle: + +```text +detected + -> classified + -> commander_assigned + -> mitigation_in_progress + -> mitigated + -> resolved + -> postmortem_required + -> actions_prioritized + -> closed +``` + +Incident Commander responsibilities: + +- assign responder hats; +- set communication cadence; +- freeze risky actions if needed; +- approve rollback when authorized; +- coordinate self-healing and manual remediation; +- require postmortem and follow-up backlog items. + +## Capacity and Budget Enforcement + +Hat supply and budget should control admission. + +Enforcement loops: + +- reserve hats before launching runs; +- release hats when runs finish or expire; +- queue work when hat supply is exhausted; +- preempt lower-priority work only by explicit policy; +- alert on burn-rate thresholds; +- pause noncritical scheduled jobs when budget is exhausted; +- scale down idle sessions; +- deprovision expired hats; +- escalate starvation or chronic undercapacity to directors/executives. + +## Human Override + +Overrides must be scoped and temporary. + +Required fields: + +- actor; +- scope; +- reason; +- policy bypassed; +- expiration; +- max blast radius; +- rollback plan; +- approval evidence; +- review requirement. + +Dangerous overrides should require two-person approval or Executive Board approval. + +## UI Requirements + +The UI needs views for: + +- rules and reaction history; +- scheduler queue and lag; +- trigger catalog and executions; +- runtime leases; +- worker heartbeats; +- outbox backlog; +- NATS consumers and DLQ; +- watcher checkpoints; +- reconciliation findings; +- SLO/error budgets; +- incident command; +- remediation approvals; +- runbook skill usage; +- budget burn rate and admission control. + +Every automation should be inspectable from the affected project, initiative, task, agent, run, or department. + +## MVP Slice + +First always-on slice: + +```text +TaskMarkedReady event + -> durable trigger fires + -> organizational rule evaluates + -> reaction plan reserves implementer hat + -> Organization creates Oz run request + -> scheduler/worker launches run + -> run binding and trace are recorded + -> watchdog monitors heartbeat + -> timeout trigger escalates if silent + -> completion event releases hat and updates task +``` + +This proves the actual operating system: event, rule, trigger, lease, budget, run, watcher, reconciliation, and UI evidence. diff --git a/docs/agentic-organization/AMBIGUOUS_REQUIREMENT_LIFECYCLE.md b/docs/agentic-organization/AMBIGUOUS_REQUIREMENT_LIFECYCLE.md new file mode 100644 index 0000000000..f67ff71f83 --- /dev/null +++ b/docs/agentic-organization/AMBIGUOUS_REQUIREMENT_LIFECYCLE.md @@ -0,0 +1,527 @@ +# Ambiguous Requirement to Curated Feature Lifecycle + +The Organization must be able to receive a vague requirement and turn it into a well-considered feature. This lifecycle is the path from unclear intent to delivered, verified, and learned-from work. + +The central idea: ambiguity is not a blocker to agentic work. It is a work type. The Work OS should recognize ambiguity, assign the right business/product hats, interview the customer or internal stakeholder, build artifacts, create evidence, and only then hand off to architecture and engineering. + +## Lifecycle Goals + +- Capture the original ambiguous requirement without losing nuance. +- Decide whether the work is customer-facing, internal platform, operational, security, documentation, or capability expansion. +- Ask the right clarifying questions before implementation. +- Let Product and Business hats interview the customer or internal stakeholder. +- Convert conversation into structured requirements, workflows, acceptance criteria, non-goals, risks, and success metrics. +- Produce BRD and supporting artifacts that Architecture, Engineering, QA, Delivery, and Review hats can enforce. +- Ensure implementation follows the clarified intent instead of the first interpretation. +- Preserve the evidence chain from original request to shipped feature. +- Feed learnings back into memory, project skills, templates, test cases, and future discovery. + +## Requirement Maturity States + +The Work OS should track requirement maturity separately from implementation status. + +```text +raw_intake + -> classified + -> ambiguity_scored + -> discovery_required + -> interview_planned + -> interview_in_progress + -> source_evidence_captured + -> requirements_drafted + -> workflow_modeled + -> acceptance_criteria_drafted + -> brd_review + -> product_signoff + -> architecture_ready + -> implementation_ready +``` + +The requirement maturity state should gate the normal work item state. A customer-facing or ambiguous feature should not move to `ready` unless it has reached `implementation_ready` or has an explicit approved no-discovery/no-BRD decision. + +## Phase 1: Intake and Ambiguity Detection + +When a goal, report, or service request enters the Work OS, the first job is to preserve the request and classify it. + +Required outputs: + +- original request; +- requester identity or source; +- customer or internal stakeholder; +- affected project or product area if known; +- raw desired outcome; +- perceived urgency; +- initial ambiguity score; +- likely departments required; +- risk flags. + +Signals: + +- `RequirementReceived` +- `RequirementClassified` +- `AmbiguityDetected` +- `DiscoveryRequired` + +Automation: + +```text +new requirement + -> classify domain + -> score ambiguity + -> detect customer-facing or internal-customer impact + -> attach existing project docs and memories + -> route to Product/Business Discovery queue +``` + +Ambiguity scoring should consider: + +- unclear user or customer; +- unclear expected behavior; +- missing current-state evidence; +- unknown success metric; +- undefined acceptance criteria; +- multiple plausible interpretations; +- cross-project or cross-repo scope; +- security, credential, data, or workflow impact; +- missing business owner; +- high cost or high delivery risk. + +## Phase 2: Discovery Planning + +The Product Owner, Requirement Clarifier, Customer Interviewer, and Business Analyst hats should plan discovery before talking to the customer. + +Required outputs: + +- discovery plan; +- interview participants; +- question set; +- known assumptions; +- required evidence; +- existing documentation to review; +- expected artifact set; +- decision on whether research, prototype, or architecture spike is needed. + +Discovery plan structure: + +```ts +type DiscoveryPlan = { + requirementId: string; + ownerHatAssignmentId: string; + participants: string[]; + questions: string[]; + assumptionsToValidate: string[]; + evidenceToCollect: string[]; + existingDocsToReview: string[]; + requiredArtifacts: string[]; + targetMaturityState: "requirements_drafted" | "workflow_modeled" | "brd_review"; +}; +``` + +Signals: + +- `DiscoveryPlanCreated` +- `InterviewRequested` +- `EvidenceRequested` + +## Phase 3: Customer or Stakeholder Interview + +The Customer Interviewer hat should run a structured interview. This can be a human conversation, an async questionnaire, or a scoped chat. The key is that answers become evidence, not unstructured chat residue. + +Interview modes: + +- one-on-one customer interview; +- internal stakeholder interview; +- team discovery meeting; +- async clarification thread; +- follow-up interview after BRD draft; +- executive clarification for priority or non-goals. + +Required captured evidence: + +- exact questions asked; +- answers; +- unresolved questions; +- contradictions; +- examples and counterexamples; +- customer vocabulary; +- affected workflows; +- current pain; +- desired outcome; +- explicit non-goals; +- priority and deadline pressure; +- acceptance signals from the customer. + +Signals: + +- `InterviewStarted` +- `CustomerAnswerRecorded` +- `ClarificationQuestionOpened` +- `InterviewCompleted` + +Guardrails: + +- The interviewer cannot silently invent answers. +- Unanswered questions must remain visible. +- Conflicting answers must create a clarification task. +- Important assumptions must be marked as assumptions until approved. +- The original request and interview evidence must remain linked to the BRD. + +## Phase 4: Requirement Synthesis + +The Business Analyst and Requirements Analyst hats turn interview evidence into structured requirements. + +Required outputs: + +- problem statement; +- current workflow; +- desired workflow; +- personas or user roles; +- business rules; +- functional requirements; +- non-functional requirements; +- constraints; +- assumptions; +- non-goals; +- edge cases; +- open questions; +- acceptance criteria draft; +- success metrics; +- impacted projects, repos, systems, documents, memories, and skills. + +Signals: + +- `RequirementsDrafted` +- `OpenQuestionsRecorded` +- `AssumptionsRecorded` +- `AcceptanceCriteriaDrafted` + +Quality checks: + +- Every requirement should link to source evidence or an approved assumption. +- Acceptance criteria must be testable. +- Non-goals must be explicit when the scope could expand. +- Edge cases should be captured early enough for QA and Architecture. +- If requirements expose new work, create linked backlog items instead of bloating the feature. + +## Phase 5: Workflow and Experience Modeling + +Complex features need modeled workflows before architecture begins. This is where the Organization should be better than normal ticket trackers. + +Required outputs: + +- workflow map; +- state transitions; +- actor/user journey; +- system interactions; +- exception paths; +- permission boundaries; +- audit/evidence requirements; +- data lifecycle; +- notifications and status changes; +- UI state expectations if relevant; +- API or MCP tool implications if relevant. + +For agentic/internal platform features, model: + +- which hat initiates the workflow; +- which hats review it; +- which tools are involved; +- which triggers or schedules fire; +- which signals are emitted; +- which memories are read or written; +- which Oz/Hermes runs are launched; +- which release or activation flow applies. + +Signals: + +- `WorkflowModeled` +- `WorkflowGapDetected` +- `CapabilityGapDetected` + +## Phase 6: BRD Creation and Review + +The BRD is the contractual bridge between discovery and architecture. + +BRD minimum contents: + +- original request; +- customer/stakeholder context; +- problem statement; +- desired outcome; +- personas or roles; +- current workflow; +- target workflow; +- functional requirements; +- non-functional requirements; +- business rules; +- acceptance criteria; +- success metrics; +- assumptions; +- non-goals; +- risks; +- open questions; +- linked evidence; +- impacted projects and systems; +- follow-up backlog items. + +Review hats: + +- BRD Author creates the BRD. +- BRD Reviewer checks clarity, evidence, testability, and completeness. +- Product Owner signs off product intent. +- Business Approver signs off business readiness. + +Signals: + +- `BrdDraftCreated` +- `BrdReviewRequested` +- `BrdRejected` +- `BrdApproved` +- `ProductSignoffRecorded` + +Rejection reasons should be typed: + +- missing customer evidence; +- ambiguous acceptance criteria; +- conflicting requirements; +- missing workflow; +- missing non-goals; +- unresolved high-impact question; +- incorrect project scope; +- premature architecture handoff. + +## Phase 7: Architecture and Technical Design + +Architecture begins after the business shape is stable enough. + +Required outputs: + +- CA or design doc; +- ADRs when a structural decision is made; +- integration boundaries; +- data model impact; +- runtime impact; +- workflow impact; +- security and credential impact; +- observability requirements; +- migration or rollout plan; +- test strategy input; +- architectural non-goals; +- implementation slices. + +Signals: + +- `ArchitectureReviewRequested` +- `CaCreated` +- `AdrRequired` +- `ArchitectureRejected` +- `ArchitectureApproved` + +Architecture should push back when: + +- BRD is ambiguous; +- workflow is not modeled; +- acceptance criteria are not testable; +- security scope is unclear; +- project boundaries are wrong; +- implementation slice is too large; +- required observability is missing. + +## Phase 8: Feature Decomposition + +The TPM, Engineering Manager, Product Owner, Architect, and QA hats collaborate to split the approved feature. + +Required outputs: + +- initiative if scope is large enough; +- initiative branch plan; +- CI/CD and deployment automation plan; +- task breakdown; +- dependencies; +- required hats; +- review gates; +- QA plan; +- release plan; +- preview or QA environment plan; +- documentation requirements; +- memory and project skill requirements; +- budget and capacity estimate. + +Signals: + +- `InitiativeCreated` +- `InitiativeBranchRequested` +- `AutomationPlanCreated` +- `FeatureDecomposed` +- `TaskCreated` +- `RequiredHatsComputed` +- `QaPlanCreated` +- `ReleasePlanCreated` + +Rules: + +- Tasks should be small enough to test and review. +- Every task should link to BRD and CA context or a documented no-doc exception. +- Every task should have acceptance criteria. +- Every task should specify expected evidence. +- Every task should list required hats and review gates. +- Implementation tasks should target the initiative branch, not `main`. +- Code-producing tasks should include any CI, deployment, preview environment, rollback, or observability automation needed to test and operate the work. + +## Phase 9: Implementation Readiness + +Before an implementer starts, the Engineering Manager or Readiness Reviewer checks the work packet. + +Readiness checklist: + +- BRD approved or no-BRD exception approved; +- CA/ADR/design docs approved if required; +- initiative branch created or requested; +- automation package planned or no-automation decision approved; +- acceptance criteria linked; +- test strategy linked; +- QA plan linked; +- security/credential review requested if needed; +- project skills and repo docs attached; +- relevant memories attached; +- required hats available or queued; +- budget and runtime constraints known; +- dependencies resolved or explicitly sequenced. + +Signals: + +- `ImplementationReadinessRequested` +- `ImplementationReady` +- `ImplementationReadinessRejected` + +Readiness rejection should route to the missing owner: Product, BA, Architecture, Security, Memory, QA Engineering, TPM, or Director. + +## Phase 10: Implementation, Review, QA, and Release + +Once implementation starts, the normal Work OS flow takes over, but it remains anchored to the discovery artifacts. + +Implementation must: + +- read BRD, CA, ADRs, project skills, memories, and acceptance criteria; +- create red tests first for defects and implementation work; +- create or update CI/CD, deployment, preview, rollback, and observability automation required by the feature; +- record green test evidence; +- submit artifacts and trace links; +- request code review. + +Review must: + +- compare implementation against BRD and CA; +- verify tests and evidence; +- reject scope drift; +- enforce policy and documentation requirements. + +QA must: + +- verify against acceptance criteria and workflow model; +- use the branch preview or QA deployment when applicable; +- run browser/API checks as needed; +- attach screenshots, logs, traces, reproduction steps; +- sign off or bounce back when the issue remains reproducible or the behavior is insufficient. + +Delivery must: + +- validate gate chain; +- verify initiative branch QA signoff; +- verify automation package completeness; +- create release candidate; +- merge the fully approved initiative branch to `main`; +- record merge/release/activation evidence; +- verify the system build after merge; +- require post-release verification when needed. + +## Phase 11: Outcome Review and Learning + +The lifecycle is not done when the feature ships. The Organization should review whether the original ambiguous requirement was successfully transformed into the intended outcome. + +Outcome review checks: + +- Did the delivered behavior satisfy the original customer need? +- Did acceptance criteria cover the important cases? +- Were any customer questions misunderstood? +- Did BRD, CA, ADRs, tests, QA, and release artifacts stay linked? +- Were there avoidable rework loops? +- Were the right hats assigned? +- Did memory retrieval help? +- Did the project need a new skill? +- Did the workflow reveal a missing MCP tool, Temporal workflow, Dapr actor, credential scope, or UI feature? + +Possible follow-up work: + +- memory adaptation request; +- project skill creation; +- test case expansion; +- BRD template improvement; +- interview question template improvement; +- architecture guideline update; +- capability request; +- backlog item for adjacent scope; +- hat effectiveness review. + +Signals: + +- `OutcomeReviewStarted` +- `CustomerOutcomeVerified` +- `DiscoveryGapFound` +- `MemoryAdaptationRequested` +- `ProjectSkillRequested` +- `CapabilityRequested` +- `OutcomeReviewCompleted` + +## Required Work OS Support + +To make this lifecycle real, the Work OS needs these capabilities: + +- requirement maturity state separate from task implementation state; +- interview records linked to BRDs; +- structured question and answer capture; +- assumption tracking; +- open question tracking; +- workflow modeling artifacts; +- BRD and product signoff gates; +- CA/ADR/design gates; +- readiness gate that blocks premature engineering; +- artifact requirements by state; +- customer/stakeholder communication threads; +- role-specific discovery queues; +- signals for discovery lag and stale questions; +- outcome review templates; +- feedback path into memory, skills, tests, docs, tools, and workflows. + +## Lifecycle Signals + +Add these signal families to the Work OS: + +| Signal family | Examples | +|---|---| +| Requirement maturity | `RequirementReceived`, `AmbiguityDetected`, `DiscoveryRequired`, `RequirementsDrafted`, `WorkflowModeled`, `ImplementationReady` | +| Interview | `InterviewRequested`, `InterviewStarted`, `CustomerAnswerRecorded`, `ClarificationQuestionOpened`, `InterviewCompleted` | +| BRD/Product | `BrdDraftCreated`, `BrdReviewRequested`, `BrdApproved`, `BrdRejected`, `ProductSignoffRecorded` | +| Architecture readiness | `CaCreated`, `ArchitectureApproved`, `ArchitectureRejected`, `AdrRequired` | +| Decomposition | `FeatureDecomposed`, `RequiredHatsComputed`, `QaPlanCreated`, `ReleasePlanCreated` | +| Learning | `DiscoveryGapFound`, `CustomerOutcomeVerified`, `ProjectSkillRequested`, `MemoryAdaptationRequested` | + +## UI Requirements + +The UI should show an ambiguous requirement becoming a feature. + +Needed views: + +- Requirement maturity board; +- discovery queue; +- interview workspace; +- question and answer evidence panel; +- assumptions and open questions panel; +- workflow model viewer; +- BRD review and product signoff queue; +- architecture readiness queue; +- implementation readiness checklist; +- artifact lineage from original request to release; +- outcome review and learning panel. + +Humans should be able to see not just that a task is in progress, but whether the Organization actually understood the request before building. diff --git a/docs/agentic-organization/ANTI_STALL_PRIORITY_RUNTIME.md b/docs/agentic-organization/ANTI_STALL_PRIORITY_RUNTIME.md new file mode 100644 index 0000000000..ef111df51c --- /dev/null +++ b/docs/agentic-organization/ANTI_STALL_PRIORITY_RUNTIME.md @@ -0,0 +1,373 @@ +# Anti-Stall Prioritization Runtime + +The Organization should be designed to keep moving. Blockers should not create stale pauses. They should become actively managed work with owners, deadlines, alternate lanes, escalation paths, and reconciliation loops while other useful work continues. + +This document defines the anti-stall operating model: the prioritization, blocker resolution, capacity balancing, and progress monitoring routines that directors, TPMs, Engineering Managers, QA leaders, Security leaders, Operations hats, and executive hats run to keep the Organization always moving. + +The scheduler and event runtime can wake these routines and prepare evidence, but the work belongs to the Organization's hats. There should not be a hidden management process making priority decisions outside the organizational chain of command. + +## Operating Principle + +The Organization should treat every stalled state as a signal that creates new work. + +```text +blocked work + -> classify blocker + -> assign blocker owner + -> route resolution work + -> keep unblocked related work moving + -> reconcile when blocker clears + -> replan affected work + -> record learning +``` + +The goal is not to pretend blockers never happen. The goal is that blockers never sit unnoticed and never stop unrelated progress. + +## Movement Invariants + +- Every active initiative must have at least one next executable item or an explicit paused decision. +- Every blocked work item must have a blocker owner, blocker type, unblock strategy, target resolution time, and escalation path. +- Every agent with an active hat should either have assigned work, review work, discovery work, operations work, or be released to save cost. +- Every queue should have an SLO for first response, assignment latency, review latency, and stale-state detection. +- Every scarce hat should have a visible supply queue and reprioritization policy. +- Every dependency should have a fallback plan: parallelizable work, spike, clarification, alternate task, test work, documentation work, or capability request. +- Every stale state should emit a signal and create a management action. + +## Blocker Taxonomy + +Blockers need typed routing. A generic `blocked` state is too weak. + +| Blocker type | Owner hats | Resolution path | +|---|---|---| +| Requirements unclear | Requirement Clarifier, Product Owner, Business Analyst | Open clarification thread, customer interview, BRD update, acceptance criteria revision | +| Customer unavailable | Product Owner, Customer Interviewer, TPM | Schedule follow-up, create async questionnaire, proceed with approved assumption, escalate priority decision | +| Missing BRD/product signoff | Product Owner, BRD Reviewer, Business Approver | Route to BRD review queue and product signoff gate | +| Architecture missing | Architect, Architecture Reviewer, Chief Architect | Create CA/ADR/design task, run architecture meeting, approve or reject design | +| Security/credential blocked | Security Reviewer, Credential Scope Approver, Policy Engineer | Security review, credential scope decision, proxy endpoint request, policy change | +| Hat supply exhausted | Engineering Manager, TPM, Director, Cost Controller | Reprioritize, reserve later, release lower-priority hats, request more supply, queue work | +| Reviewer unavailable | Engineering Manager, Review Coordinator, Director | Reassign reviewer, escalate queue saturation, provision more reviewer hats | +| QA unavailable | QA Director, QA Engineering Manager, TPM | Reprioritize QA queue, assign regression verifier, schedule QA run, provision QA hats | +| Environment/runtime issue | Platform Operator, SRE, Oz/K3s Reconciler | Open operations task, incident, self-healing plan, rerun or rebind session | +| Build/test/pipeline failure | DevOps Analyst, Engineering Manager, Implementer | Classify failure, route to owner, create defect or infra task | +| Dependency not complete | TPM, Dependency Manager, Engineering Manager | Re-sequence, split work, parallelize unaffected tasks, escalate dependency | +| Budget exceeded | Cost Controller, CFO, Director, Executive Board | Pause lower-value work, adjust budget, shrink scope, approve exception | +| Memory/context missing | Memory Curator, Knowledge Router, Engineering Manager | Attach context, create memory adaptation request, project skill request | +| Tool/capability missing | Capability Request Owner, Tool Registry Steward, Architect, Security | Create capability request, route approvals, assign implementation | +| Release blocked | Delivery Reviewer, Release Manager, QA, Security, Architecture | Identify missing evidence/gate, route to responsible hat, re-evaluate release scope | + +## Hat-Owned Operating Cadences + +Anti-stall behavior should be part of the Organization's normal operating rhythm. Durable schedules and event triggers create agenda items, reports, inbox tasks, and meeting requests for the right hats. The hats then decide priority, assignment, escalation, alternate work, or explicit pause. + +| Cadence or routine | Owner hats | Responsibility | +|---|---|---| +| Initiative movement review | TPM, Mission Control Lead, Program Director | Review active initiative flow, blocked tasks, dependency drift, team utilization, release risk, and next executable work | +| Department priority review | Department Director, TPM Manager, Engineering Manager or department manager | Rebalance department initiatives, resolve hat scarcity, move blocked work to the right owner, and escalate cross-department conflicts | +| Engineering execution review | Engineering Manager, Team Lead, Readiness Reviewer | Inspect ready queue, blocked implementation, TDD evidence, missing context, silent assignments, and alternate work options | +| Review queue review | Engineering Manager, Review Coordinator, Architecture Reviewer, Security Reviewer, QA Reviewer, Delivery Reviewer | Detect review saturation, assign reviewers, escalate missing evidence, and request more reviewer hats when needed | +| QA flow review | QA Director, QA Engineering Manager, Regression Scheduler | Prioritize QA-ready work, schedule regression runs, route reproducible failures, and identify test tooling gaps | +| Security and credential review | Security Director, Security Reviewer, Credential Scope Approver, Policy Engineer | Triage credential/tool/policy blockers and decide whether to approve, reject, request architecture input, or open capability work | +| Release readiness review | Release Manager, Delivery Reviewer, TPM, QA Reviewer | Review release candidates, missing gates, rollback plans, post-release verification, and risk tradeoffs | +| Hat supply and budget review | Director, Cost Controller, CFO, Executive Board when needed | Decide whether to reserve, release, expand, or preempt hat capacity based on priority and budget | +| Blocker triage meeting | TPM, Blocker Manager, owning department manager | Classify blockers, assign blocker owners, choose alternate work, and set escalation deadlines | +| Executive movement review | CEO, CTO, COO, CFO, Executive Board | Resolve cross-department priority conflicts, major bottlenecks, budget exceptions, and high-risk pauses | + +The always-on substrate only does mechanical support: + +- detect stale states and queue SLO violations; +- prepare movement reports; +- create scheduled review tasks; +- populate role-specific inboxes; +- open meeting requests under policy; +- compute candidate priority and assignment recommendations; +- emit signals; +- execute approved state transitions. + +The decision is still made by a hat with authority. + +## Prioritization Routines + +Prioritization should be continuous, but it should be conducted through scheduled and event-triggered organizational routines. The platform can compute recommendations; directors, TPMs, managers, reviewers, and executives decide according to their scope. + +### Priority Inputs + +- executive priority; +- project priority; +- initiative priority; +- customer impact; +- revenue or strategic value; +- severity; +- release risk; +- blocked downstream work; +- dependency fan-out; +- queue age; +- hat scarcity; +- budget burn; +- SLO/error budget burn; +- defect reproducibility; +- QA bounce-back count; +- security risk; +- capability unlock value; +- confidence in requirements; +- estimated effort; +- expected learning value. + +### Priority Output + +```ts +type PriorityDecision = { + workItemId: string; + previousRank: number; + newRank: number; + priorityClass: "expedite" | "high" | "normal" | "defer" | "paused"; + reasonCodes: string[]; + requiredHats: string[]; + preemptableAssignments: string[]; + alternateWorkItems: string[]; + blockerResolutionPlanId?: string; + decidedBy: "tpm" | "engineering_manager" | "department_director" | "review_hat" | "agent_vote" | "executive" | "incident_commander" | "approved_policy"; + expiresAt?: string; +}; +``` + +Priority decisions should be explainable in the UI. Agents need to know why work moved up or down. + +The system may generate `PriorityRecommendation` records, but those are not final decisions unless an approved policy explicitly allows automatic action for that scope. + +## Blocker Lifecycle + +```text +blocker_detected + -> blocker_classified + -> owner_hat_recommended + -> blocker_triage_task_created + -> owner_hat_assigned_by_TPM_or_manager + -> resolution_plan_created + -> alternate_work_approved + -> resolution_in_progress + -> blocker_resolved + -> dependent_work_reactivated + -> outcome_review +``` + +Every blocker should create: + +- owner hat assignment; +- triage task for the responsible TPM, manager, or director; +- blocker resolution work item; +- affected dependency list; +- alternate work recommendation and approval; +- expected resolution time; +- escalation policy and accountable hat; +- stale timeout; +- outcome note after resolution. + +## Alternate Work Strategy + +When work is blocked, the responsible TPM, Engineering Manager, or department manager should keep agents productive by assigning approved alternate work. + +Alternate work types: + +- implement independent subtasks; +- write tests or test harnesses; +- improve docs or project skills; +- prepare QA cases; +- perform architecture spike; +- perform discovery or clarification; +- work on adjacent backlog items in the same initiative; +- reduce tech debt in approved scope; +- investigate observability gaps; +- resolve capability request dependencies; +- review or QA other work in the same project if the active hat allows it. + +Guardrails: + +- Alternate work must be tied to a work item. +- Alternate work must not bypass priority policy. +- Alternate work must not silently expand scope. +- When the original blocker clears, the owning TPM or manager decides whether to resume, finish alternate work first, or reassign. The assignment service executes the approved decision. + +## Work Stealing and Reassignment + +The Organization needs controlled reassignment so idle hats can help without chaos. + +Allowed when: + +- original owner is silent past SLA; +- assignment token expired and refresh failed; +- hat supply is scarce and higher-priority work needs capacity; +- work is blocked but another hat can resolve the blocker; +- queue SLO is violated; +- incident policy requires preemption. + +Required checks: + +- active hat has authority for target work; +- work item supports reassignment; +- current owner is notified; +- partial artifacts are preserved; +- run/session state is reconciled; +- decision is audited; +- dependent queues are updated. + +## Queue SLOs + +Every queue should have SLOs. + +| Queue | Example SLO | +|---|---| +| Requirement clarification | first response within policy window; no open question stale beyond threshold | +| BRD review | review assigned quickly; rejection reasons typed | +| Architecture review | architecture-risk work cannot sit unassigned | +| Code review | review queue saturation creates reviewer provisioning or reprioritization | +| QA verification | QA-ready work gets assigned or escalated | +| Security review | credential/tool requests get triaged by risk quickly | +| Delivery review | release candidates cannot sit without missing-evidence signal | +| Blocker resolution | blockers get owner and resolution plan quickly | +| Capability expansion | requests get classified and either accepted, rejected, or deferred | + +SLO violations should emit signals such as: + +- `QueueSloViolated` +- `ReviewQueueSaturated` +- `BlockerOwnerMissing` +- `BlockedWorkStale` +- `AssignmentSilent` +- `HatSupplyBottleneck` +- `AlternateWorkAssigned` +- `DependencyCleared` +- `WorkReactivated` + +## Continuous Reconciliation + +Reconciliation should compare intended state to actual state and route discrepancies to the owning hats. + +| Intended state | Actual check | +|---|---| +| Assigned work has active agent | hat assignment exists, token valid, session heartbeat present | +| Ready work has required hats | hat supply reserved or waiting queue visible | +| Blocked work has owner | blocker work item and owner hat exist | +| Review requested | reviewer assigned and review SLO active | +| QA requested | QA assignment exists and evidence plan is attached | +| Release candidate open | missing gates/evidence are explicit | +| Oz run active | run bound to work item and heartbeating | +| Department active | at least one next action, scheduled review, or explicit pause exists | +| Project active | initiatives are moving, blocked, or intentionally paused with reason | + +If intended and actual state diverge, the system creates a report, inbox item, or meeting request for the responsible hat. Safe mechanical repairs can be automated by approved policy, but priority, staffing, and pause/resume decisions stay with organizational roles. + +## Prioritization Meetings and Votes + +Routine priority should be handled by the appropriate hats through scheduled reviews and triggered inbox tasks. Meetings and votes occur when a decision crosses role boundaries or policy says judgment is required. + +Trigger meetings for: + +- cross-department priority conflict; +- high-value work blocked by scarce hats; +- release risk versus feature priority; +- budget pressure; +- conflicting product/architecture/security decisions; +- repeated queue SLO violations; +- major scope changes caused by discovery. + +Meeting output should be a durable decision, not just discussion: + +- priority change; +- budget change; +- hat supply adjustment; +- blocker owner assignment; +- accepted assumption; +- scope reduction; +- deferred work; +- new capability request; +- escalation to Executive Board. + +## UI Requirements + +The UI should make movement visible. + +Needed views: + +- anti-stall command center; +- blocked work board; +- blocker dependency graph; +- queue SLO dashboard; +- hat supply bottleneck view; +- stale assignment view; +- alternate work recommendations; +- priority decision history; +- dependency cleared/reactivation feed; +- preemption and reassignment audit; +- project/initiative movement score. + +The UI should answer: + +- what is blocked? +- who owns the blocker? +- what is happening while it is blocked? +- what will unblock it? +- when will it escalate? +- what work is still moving? +- which hats are scarce? +- what should be reprioritized now? + +## Movement Score + +Each project, initiative, department, and team should have a movement score. + +Inputs: + +- percentage of work with active next action; +- stale state count; +- blocked work age; +- queue SLO violations; +- review/QA/security lag; +- hat supply bottlenecks; +- silent runs; +- dependency fan-out blocked; +- outcome review failure rate; +- release readiness drift. + +The movement score should not become a vanity metric. It should trigger concrete actions: + +- director review; +- TPM reprioritization; +- Engineering Manager staffing changes; +- hat supply request; +- capability request; +- queue policy update; +- executive prioritization vote. + +## MVP Slice + +Add anti-stall behavior to the first Work OS slice: + +```text +task becomes blocked by missing acceptance criteria + -> stale/blocker signal creates TPM blocker triage task + -> TPM classifies requirements blocker and assigns Requirement Clarifier hat + -> Engineering Manager approves alternate task for implementer + -> Product/BA clarifies acceptance criteria + -> blocker resolves + -> TPM reactivates task + -> implementer resumes or new agent is assigned + -> outcome review records whether delay was handled well +``` + +This proves: + +- blockers are typed; +- blockers get owners; +- blocked work produces useful alternate work; +- queues do not silently stall; +- agents are not left idle with active hats; +- dependent work reactivates automatically. + +## Guardrails + +- Do not optimize for motion over correctness. Some work should pause, but pauses must be explicit and owned. +- Do not preempt high-context work casually. Preserve artifacts and handoff state before reassignment. +- Do not use alternate work to hide systemic blockers. Repeated blockers become improvement work. +- Do not let automation override hard policy gates. +- Do not let idle agents keep costly hats without assigned useful work. +- Do not let blocked work disappear from executive, director, TPM, or manager visibility. diff --git a/docs/agentic-organization/CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md b/docs/agentic-organization/CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md new file mode 100644 index 0000000000..9d14ec4ea4 --- /dev/null +++ b/docs/agentic-organization/CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md @@ -0,0 +1,359 @@ +# Cluster Execution and Memory Substrate + +This document captures the theoretical cluster substrate for the Hermes Organization. It focuses on how Hermes agents run in k3s as isolated session containers, how hats and policies bind to workloads, how credential proxies and Cilium service mesh boundaries protect access, how SPIRE supplies workload identity, and how Hindsight provides persistent memory. + +It intentionally avoids deployment YAML details. + +## Core Idea + +The Organization runs on the cluster. + +```text +Organization Work OS + -> approves work, hats, tools, memory scope, credential scope + -> Oz/Warp launches Hermes session + -> OpenZiti supplies private transport/connectivity where required + -> k3s schedules Docker session container + -> sandbox boundary constrains execution + -> Cilium enforces CNI, L7 policy, Gateway API, ingress, Hubble telemetry + -> SPIRE provides workload identity + -> cert-manager / Vault / Trust Manager / External Secrets provide TLS, secrets, trust bundles, and secret sync + -> Credential Proxy mediates protected access + -> Hindsight provides persistent Hermes memory + -> NATS carries events, inboxes, reports, and status + -> Organization records state, signals, audit, and evidence +``` + +The app-level Organization is the business brain. The cluster is the execution and enforcement body. + +## Session Container Model + +Each Hermes work session should run in a Docker container scheduled by k3s. + +The container should have: + +- Hermes runtime; +- active Organization assignment context; +- hat token and refresh path; +- MCP Gateway endpoint; +- Hindsight memory endpoint; +- Credential Proxy endpoint; +- NATS endpoint for inbox/status events; +- work item/run identifiers; +- trace/correlation identifiers; +- sandbox policy; +- service account and mesh identity. + +The simplest starting point is one primary Hermes agent per container. Multi-agent containers can come later only if isolation, attribution, memory scope, and tool authorization remain clear. + +## Bubblewrapped Sandbox Boundary + +The Hermes process inside the container should run inside a second sandbox boundary. + +The purpose of bubblewrap-style isolation is to constrain what the agent process can touch even inside its container. + +Sandbox expectations: + +- scoped filesystem view; +- explicit workspace mount; +- no uncontrolled host access; +- no raw broad secrets; +- controlled process execution; +- controlled network egress; +- clear artifact export path; +- clear log and trace capture path; +- kill/revoke path when hat assignment expires or is revoked. + +The sandbox does not replace Kubernetes, Cilium, SPIRE, OPA, or the Credential Proxy. It adds process-level defense-in-depth for agent sessions. + +## Credential Proxy Boundary + +Hermes agents should not receive raw broad credentials. + +The Credential Proxy should: + +- expose scoped tool/API endpoints; +- validate active hat assignment; +- validate Organization policy; +- validate mesh workload identity where possible; +- record credential use; +- deny access when hat token expires; +- deny access when scope does not match work item/project/team; +- emit audit events and denial signals; +- support Security-reviewed endpoint expansion. + +Credential proxy access should be tied to: + +- agent ID; +- session ID; +- hat assignment ID; +- work item ID; +- project/initiative scope; +- credential scope; +- service account/workload identity; +- trace ID. + +## Hindsight Memory Substrate + +Hindsight is the persistent memory substrate for Hermes. + +From the provided cluster context: + +- Hindsight is `vectorize-io/hindsight`; +- the Helm chart is published as an OCI chart at `ghcr.io/vectorize-io/charts/hindsight`; +- target chart version is `0.3.0`; +- Hermes has a Hindsight integration; +- Hermes can use Hindsight as an external memory provider; +- Hindsight automatically recalls relevant context before LLM calls; +- Hindsight retains conversations; +- Hindsight exposes explicit retain, recall, and reflect tools; +- Hermes should point at the in-cluster Hindsight service through `HINDSIGHT_URL=http://hindsight.hindsight.svc.cluster.local`. + +The design implication: we should treat Hindsight as real runtime infrastructure, not a placeholder memory adapter. + +## Memory Persistence Rules + +Memory is precious state. + +Rules: + +- do not prune memory store by default; +- use persistent storage; +- treat memory deletion or migration as a reviewed operation; +- back memory with a reliable database; +- start with bundled Postgres if needed; +- plan migration to external CockroachDB once healthy and supported by the Hindsight deployment shape; +- keep API keys in Vault-backed secrets, not Git; +- record memory reads and writes in Organization audit; +- attribute memory writes by active hat assignment; +- enforce scoped recall through Organization memory policy. + +The initial chart context uses bundled Postgres with persistent storage. That is acceptable for Hindsight bootstrap, but the Organization-owned source of truth is CockroachDB, and long-term memory storage should move to external CockroachDB if Hindsight supports that deployment path. + +## Hindsight and Hat-Scoped Memory + +Hermes may recall context automatically before every LLM call, but the Organization still needs memory governance. + +The memory adapter should enforce: + +- active hat scope; +- project scope; +- initiative/work item scope; +- team scope; +- meeting scope; +- credential/security sensitivity; +- memory visibility policy; +- sticky attribution after hat release; +- audit of retain/recall/reflect operations. + +Recommended flow: + +```text +Hermes prepares LLM call + -> Hindsight integration requests relevant memory + -> Organization memory adapter injects active context and policy scope + -> Hindsight returns scoped recall + -> Hermes reasons with recalled context + -> retain/reflect writes are attributed to agent + hat assignment + work scope + -> Organization records memory event and signal +``` + +If native Hindsight integration cannot enforce this metadata and policy boundary directly, wrap it. Fork only if the wrapper cannot guarantee scoped recall and attribution. + +## Cluster Security and Service Mesh + +Cilium Service Mesh is the service mesh and Gateway layer. + +Cilium should provide: + +- CNI; +- L7-aware policy; +- Envoy-backed L7 proxy support managed through Cilium; +- service-to-service authorization; +- traffic routing; +- traffic shifting; +- Gateway API; +- ingress; +- egress policy; +- BPF masquerade support; +- Hubble telemetry; +- node-to-node encryption, such as WireGuard when enabled; +- access boundary around Credential Proxy, MCP Gateway, Hindsight, and NATS. + +The cluster direction is sidecarless for service mesh behavior. Cilium provides L7-aware policy, mTLS-capable service mesh behavior, traffic shifting, Gateway API, and ingress natively through the CNI layer rather than injecting an Envoy sidecar into every Hermes pod. + +SPIRE should provide workload identity. Trust Manager should distribute CA bundles. cert-manager should manage TLS certificates. Vault should act as the secrets backend. External Secrets Operator should sync approved Vault secrets into Kubernetes Secrets. + +This lets the cluster inject dependencies around Hermes containers without handing agents uncontrolled network access. + +The Organization should be able to say: + +```text +this session may talk to: + - MCP Gateway + - Hindsight + - NATS + - Credential Proxy endpoint X + - repo/project service Y +and nothing else unless policy changes +``` + +## Secrets and External Configuration + +Secrets should be externalized. + +Rules: + +- no plaintext API keys in Git; +- use Vault-backed ExternalSecret or equivalent; +- LLM provider secrets for Hindsight should be secret references; +- Credential Proxy secrets should stay behind proxy services; +- Hermes session containers should receive references and scoped endpoints, not broad credentials; +- secret access should be auditable by hat assignment and workload identity. + +The Hindsight cluster context uses a provider configuration like Groq with an existing secret reference. The specific provider can change, but the pattern should remain. + +## Bootstrap Dependency Order + +The k3s bootstrap order matters because each layer enables the next. + +The theoretical dependency sequence is: + +1. Cilium: CNI, Hubble, Service Mesh, BPF masquerade, Gateway API, ingress, and encryption. +2. cert-manager: TLS issuance for Vault and later services. +3. Vault: secrets backend. +4. SPIRE: workload identity, with Vault upstream authority once Vault is healthy. +5. Trust Manager: CA bundle distribution from SPIRE/Vault trust roots. +6. External Secrets Operator: Vault-to-Kubernetes Secret synchronization. +7. ArgoCD: reconciles everything else after the platform security substrate is ready. + +The Organization does not need to own these manifests directly, but its runtime assumptions should follow this order. + +## Runtime Identity and Hat Binding + +Cluster execution must line up with hat assignment. + +Each session should be able to prove: + +- Kubernetes namespace; +- pod; +- container; +- service account; +- mesh identity; +- Oz run ID; +- Organization session ID; +- agent ID; +- active hat assignment ID; +- work item ID; +- project/initiative scope; +- token status. + +The MCP Gateway, Credential Proxy, Hindsight adapter, and NATS consumers should not trust only self-reported agent context. They should resolve runtime context through Organization state, actor/session state, and cluster identity. + +## Events and Observability + +Every runtime transition should be observable. + +Required event streams: + +- Oz run requested/started/completed/failed/cancelled; +- pod scheduled/ready/not ready/terminated; +- sandbox started/stopped/denied; +- hat binding activated/refreshed/revoked/released; +- MCP tool call allowed/denied; +- credential proxy request allowed/denied; +- Hindsight recall/retain/reflect events; +- NATS publish/consume/dead-letter; +- artifact produced; +- log/trace/screenshot linked to work item. + +These should roll up into: + +- Organization signal feed; +- audit events; +- UI evidence timeline; +- Loki logs; +- metrics/SLOs; +- NATS streams; +- incident reports when needed. + +## Cluster-Native Hat Integration + +The cluster-native hat system should connect here. + +`HatBinding` should map active Organization hat assignments to cluster-visible runtime state. OPA policies should validate: + +- correct hat for workload; +- allowed service account; +- allowed namespace; +- allowed Credential Proxy scope; +- allowed memory scope; +- no conflicting hat binding; +- TTL/warmup/cooldown; +- supervisor/quorum rules where relevant. + +The Organization still decides who should wear a hat. The cluster enforces that the workload actually behaves like the approved hat. + +## NATS and Status Flow + +NATS should carry: + +- session status; +- task status; +- inbox messages; +- reports; +- HatSwap events; +- memory events; +- credential denial events; +- runtime health events; +- UI live update events. + +NATS is not the source of truth. It is the event transport and live update layer. Organization DB and audit events remain authoritative. + +## Failure Modes + +The substrate should detect and route: + +- Hermes session silent; +- pod scheduled but no session heartbeat; +- Oz run started but no Organization binding; +- Hindsight unavailable or slow; +- memory database storage pressure; +- Credential Proxy denied unexpectedly; +- Credential Proxy allowed unexpectedly; +- hat token expired but process still running; +- sandbox violation; +- NATS stream lag or dead letter; +- Cilium policy or Gateway API mismatch; +- SPIRE workload identity mismatch; +- Trust Manager CA bundle mismatch; +- secret missing or stale; +- memory write without hat attribution. + +Each failure should create an Organization signal and route to the appropriate hat-owned routine: Platform Operator, SRE, Memory Manager, Security Reviewer, TPM, Engineering Manager, or Director. + +## MVP Contract + +The first cluster-backed MVP should prove: + +- Organization creates work and hat assignment; +- Oz launches a Hermes session container; +- session has scoped MCP, Hindsight, NATS, and Credential Proxy endpoints; +- memory recall works through Hindsight; +- memory write is attributed to agent, hat, project, and work item; +- credential access is denied without approved scope; +- session logs, traces, and artifacts link back to the work item; +- hat expiry or revocation removes tool and credential authority; +- runtime status appears in the UI. + +## Open Decisions + +- Do we start with bubblewrap inside each session container, or model the contract first and add it after baseline container execution? +- Is Hindsight accessed directly by Hermes, through an Organization memory adapter, or via a sidecar/proxy that injects scope? +- What is the first storage backend for Hindsight in local cluster bootstrap? +- When do we swap Hindsight from bundled Postgres to external CockroachDB? +- Which services require Cilium L7 policy from day one? +- When should Cilium mutual authentication through SPIRE be flipped on? +- Does HatBinding CRD enforcement ship before or after v0 Organization assignment? +- How do we handle multi-agent containers without mixing memory and authority attribution? +- Which runtime events are mandatory before a session can be considered production-safe? diff --git a/docs/agentic-organization/CLUSTER_NATIVE_HAT_SYSTEM.md b/docs/agentic-organization/CLUSTER_NATIVE_HAT_SYSTEM.md new file mode 100644 index 0000000000..508ebcbf6e --- /dev/null +++ b/docs/agentic-organization/CLUSTER_NATIVE_HAT_SYSTEM.md @@ -0,0 +1,416 @@ +# Cluster-Native Hat System + +This document captures the theoretical design for a Kubernetes-native hat system. It focuses on CRDs, OPA policies, graph enforcement, time-bounded hat bindings, succession, reputation, events, and observability. It intentionally avoids deployment YAML details. + +The goal is not to replace the Organization Work OS. The goal is to give hats a cluster-native control-plane representation so runtime workloads, policies, and observability can reason about roles consistently across distributed Hermes sessions. + +## Core Idea + +Hats are persistent roles. Agents wear hats temporarily. + +```text +Hat persists + -> wearer binds for a scoped duration + -> wearer acts under hat authority + -> binding expires, cools down, or is returned + -> next wearer may inherit the hat context + -> reputation accumulates on the hat and the agent-hat pairing +``` + +This matters because a hat is not just a person or agent label. It is: + +- skills; +- OPA/RBAC authority; +- tool access; +- credential scope; +- memory scope; +- supervisor graph position; +- quorum/voting scope; +- cooldown and warmup rules; +- succession rules; +- reputation and performance history. + +The chosen-and-returnable hat model prevents roles from becoming cages. Agents can rotate through hats while the Organization preserves continuity of the role. + +## Relationship to the Organization DB + +The Organization DB remains the business source of truth. + +The cluster-native hat system is an enforcement and runtime projection layer. + +| Concern | Source of truth | +|---|---| +| Project, initiative, work item, gate, release, and business state | Organization DB | +| Hat definition, authority, hierarchy, and lifecycle policy | Organization DB, projected to CRDs | +| Active runtime hat binding for Kubernetes workloads | HatBinding CRD/status and Organization DB assignment | +| Policy evaluation | OPA/RBAC plus Organization policy service | +| Runtime events and observability | CRD status, Kubernetes Events, Loki, NATS, traces | +| Reputation and performance rollup | Organization DB, with operator status projection | + +The CRDs should not become a second business database. They should mirror and enforce the runtime-relevant parts of hats. + +## CRD Concepts + +### Hat + +`Hat` represents the persistent role. + +Important fields: + +```ts +type HatSpec = { + displayName: string; + departmentId: string; + skills: string[]; + authority: { + rbacRoles: string[]; + opaPolicies: string[]; + mcpToolBundles: string[]; + credentialScopes: string[]; + memoryScopes: string[]; + approvalScopes: string[]; + votingScopes: string[]; + }; + supervises: string[]; + conflictsWith: string[]; + quorum?: { + requiredFor: string[]; + quorumSize: number; + }; + lifecycle: { + maxWearers: number; + tokenTtlSeconds: number; + warmupSeconds: number; + cooldownSeconds: number; + stickyAttribution: boolean; + successionPolicy: "rotate" | "renew" | "election" | "director_assigned" | "executive_vote"; + }; +}; +``` + +Important status: + +```ts +type HatStatus = { + activeWearers: string[]; + coolingDownWearers: string[]; + reputationScore: number; + recentSwapId?: string; + policyReady: boolean; + graphReady: boolean; + observedGeneration: number; +}; +``` + +### HatBinding + +`HatBinding` represents an agent wearing a hat for a specific scope and time. + +Important fields: + +```ts +type HatBindingSpec = { + hatRef: string; + wearerRef: string; + assignmentRef: string; + scope: { + projectId?: string; + initiativeId?: string; + workItemId?: string; + teamId?: string; + ozRunId?: string; + namespace?: string; + serviceAccount?: string; + }; + requestedBy: string; + expiresAt: string; + reason: string; +}; +``` + +Important status: + +```ts +type HatBindingStatus = { + phase: "Pending" | "WarmingUp" | "Active" | "CoolingDown" | "Released" | "Expired" | "Revoked" | "Denied"; + tokenIssuedAt?: string; + tokenExpiresAt?: string; + lastRefreshAt?: string; + lastActivityAt?: string; + swapId?: string; + denialReason?: string; +}; +``` + +### HatPolicy + +`HatPolicy` captures global or scoped rules for assignment, succession, quorum, throttling, and graph constraints. + +Policy examples: + +- no supervisor cycles; +- hat designer quorum required; +- sensitive hats require two-person approval; +- executor hats conflict with hat designer hats; +- cooldown before same wearer can retake high-power hat; +- max concurrent wearers by department/project; +- warmup required before approval power activates; +- sticky attribution for memory and reputation. + +### HatSwap + +`HatSwap` is the durable event produced for every binding transition. + +Every state transition should emit exactly one `HatSwap` record. + +Important fields: + +```ts +type HatSwapSpec = { + hatRef: string; + previousWearerRef?: string; + nextWearerRef?: string; + bindingRef: string; + transition: "bind" | "activate" | "refresh" | "cooldown" | "release" | "expire" | "revoke" | "deny"; + reason: string; + organizationCorrelationId: string; + traceId: string; +}; +``` + +This should also produce: + +- Kubernetes Event; +- structured log; +- Loki-visible event; +- NATS publish; +- Organization signal/outbox event. + +## OPA Policy Model + +OPA should enforce graph and authority constraints near the cluster control plane. + +Initial OPA constraints: + +- no supervisor cycles in `spec.supervises`; +- `conflictsWith` hats cannot be held by the same wearer in overlapping scopes; +- high-power hats require quorum or approved issuer; +- hat designer cannot self-approve unrestricted hat creation; +- cooldown must pass before the same wearer retakes a constrained hat; +- warmup must complete before approval/voting authority activates; +- max wearers cannot be exceeded; +- scope must match the requesting work item/team/namespace/service account; +- credential-bearing hats require approved credential policy; +- expired bindings cannot authorize runtime access. + +OPA should block invalid cluster state. The Organization policy service should still explain business denials and enforce application-level transitions. + +## Supervisor Graph + +The supervisor graph is first-class. + +`spec.supervises` forms a directed acyclic graph: + +```text +Executive Board + -> CEO / CTO / COO + -> Department Directors + -> TPMs / Engineering Managers / QA Managers + -> Team Leads / Reviewers / Implementers +``` + +The graph is not just visualization. It controls: + +- who can assign which hats; +- who can escalate to whom; +- who can open which meetings; +- who can review which outcomes; +- which votes are in scope; +- which memory and performance rollups are visible. + +The live graph should be renderable as Graphviz DOT so policy authors can reason in hat-graph language. + +Graph rendering should support: + +- all hats; +- active bindings; +- supervisor edges; +- conflicts; +- quorum-gated hats; +- bottleneck hats; +- cooling-down hats; +- stale or denied bindings. + +## Time-Bounded Hat Binding + +Every binding has time semantics. + +| Concept | Purpose | +|---|---| +| TTL | Forces periodic authorization refresh | +| Warmup | Prevents immediate high-risk authority before context is loaded | +| Cooldown | Prevents rapid recapture or concentration of authority | +| Sticky attribution | Links memory, reputation, and outputs to the hat binding even after release | +| Succession | Defines how the next wearer is chosen | + +This matches the Organization model where agents do not keep role authority forever. Hats can be chosen, returned, rotated, renewed, or reassigned. + +## Reputation Model + +Reputation should accumulate primarily on: + +- the hat; +- the agent-hat pairing; +- the department-hat pairing; +- the project-hat pairing. + +Do not treat reputation as only an agent score. The same agent can be strong in one hat and weak in another. The same hat may need better skills, authority, memory, or review gates even if individual agents are competent. + +Reputation inputs: + +- successful work completion; +- review quality; +- QA bounce-back rate; +- blocker resolution speed; +- policy violations; +- memory quality; +- release outcomes; +- incident outcomes; +- cost/budget efficiency; +- peer or supervisor reviews. + +Reputation outputs: + +- candidate ranking for future hat assignments; +- cooldown or warmup adjustments; +- hat effectiveness review; +- memory adaptation requests; +- project skill recommendations; +- hat redesign proposals. + +## Hat Designer Bootstrap + +The `hat-designer` role must not be a single point of failure. + +Recommended bootstrap policy: + +- `hat-designer` is itself a hat; +- quorum-gated with quorum size 3; +- multiple wearers can hold it; +- conflicts with executor hats for sensitive scopes; +- cooldown applies before the same wearer can retake it; +- new high-power hats require Executive Board approval; +- credential-bearing hats require Security approval; +- runtime/actor/workflow-bearing hats require Architecture approval. + +This lets the Organization expand its own hat graph without letting one wearer define all authority. + +## Operator Responsibilities + +A Kubernetes operator can provide mechanical reconciliation, not organizational judgment. + +Operator responsibilities: + +- reconcile Hat, HatBinding, HatPolicy, and HatSwap resources; +- validate graph readiness; +- update status; +- emit exactly one durable HatSwap per state transition; +- publish Kubernetes Events; +- write structured logs; +- publish NATS events; +- expose graph rendering; +- surface policy denials; +- roll up status for Organization projections. + +The operator should not decide business priority, assign hats because it feels useful, or bypass Organization gates. Those decisions come from Organization services and authorized hats. + +## Event Flow + +```text +Organization approves hat assignment + -> Organization writes HatAssignment + -> Organization projects/updates HatBinding CRD + -> OPA validates graph, scope, conflicts, quorum, TTL, cooldown + -> Cilium/SPIRE enforce workload-level access identity and network policy + -> operator observes HatBinding + -> status moves Pending/WarmingUp/Active/etc. + -> exactly one HatSwap emitted per transition + -> Kubernetes Event + structured log + NATS event + Organization signal + -> MCP Gateway and Credential Proxy can verify active binding +``` + +This gives us a structured tick source for hat transitions without turning Kubernetes into the business brain. + +## Observability + +Hat state should be visible from several directions: + +- Organization UI; +- Kubernetes Events; +- Loki queries; +- Hubble/Cilium service flow queries; +- NATS streams; +- graph renderer; +- audit event explorer; +- assignment/reputation dashboards. + +Useful queries: + +- current wearer of a hat; +- hats in cooldown; +- denied bindings by policy; +- supervisor graph cycles blocked; +- conflicts prevented; +- high-power hats nearing expiry; +- hat swaps per hour; +- stale bindings with no activity; +- reputation change after releases or reviews. + +## How This Fits the Work OS + +The Work OS should still drive work. + +The cluster-native hat system helps by: + +- making active hat bindings observable and enforceable at runtime; +- aligning Kubernetes service accounts/workloads with Organization hats; +- giving policy authors a graph-native language; +- producing durable swap events; +- enforcing no-cycle and conflict constraints; +- making cooldown, warmup, and succession visible; +- connecting cluster telemetry to Organization assignments. + +The Work OS uses this state for: + +- reliable hat assignment; +- scarce hat supply checks; +- stuck assignment reconciliation; +- role-specific UI; +- MCP tool authorization; +- credential proxy access; +- reputation and outcome review. + +## Intentional Gaps Before Implementation + +The theoretical model is useful before these are implemented: + +- validating webhook; +- mutating webhook; +- finalizer flow; +- Hat reconciler reputation rollup; +- HatPolicy reconciler status rollup; +- envtest suite; +- CI image build; +- production deployment manifests. + +For now, the important implementation contract is the shape of the hat lifecycle and how Organization state maps to cluster-native enforcement. + +## Open Design Questions + +- Which hat fields are authored in the Organization DB versus directly in CRDs? +- Should CRDs be generated projections only, or can cluster operators propose CRD changes back to the Organization? +- Which binding phases should block MCP tools? +- How much reputation should appear in CRD status versus Organization UI only? +- Does `HatSwap` live as a CRD, an event stream, or both? +- How should Graphviz rendering map OPA throttles and graph constraints to human-readable policy explanations? +- What is the minimum set of OPA constraints required before the first live cluster run? diff --git a/docs/agentic-organization/DEPARTMENT_HAT_TOOL_INVENTORY.md b/docs/agentic-organization/DEPARTMENT_HAT_TOOL_INVENTORY.md new file mode 100644 index 0000000000..fa2ce8f610 --- /dev/null +++ b/docs/agentic-organization/DEPARTMENT_HAT_TOOL_INVENTORY.md @@ -0,0 +1,404 @@ +# Department, Hat, and Tool Inventory + +This document is the first inventory for the Hermes-native Organization. It expands the architecture into concrete departments, hats, MCP tool bundles, approval powers, and ownership boundaries. + +The goal is not to freeze the company shape forever. The Organization should be able to evolve itself. This inventory defines the starter graph that lets it operate safely while it learns which hats, tools, memories, workflows, and departments need to be added. + +## Inventory Principles + +- A hat is a role, policy, tool, credential, memory, voting, and responsibility bundle. +- A Hermes agent can wear one or more hats only through active hat assignments. +- Hat authority is time-bound, revocable, and checked through JWT refresh against Organization state. +- Memory belongs to agents, but memory reads and writes are scoped and attributed by active hat. +- Every protected MCP tool call must resolve the caller through `AgentSessionActor`, validate the active `HatAssignment`, evaluate RBAC/OPA policy, check domain preconditions, write audit evidence, and update actor activity. +- Request-provided IDs are lookup hints, not authority. The actor and Organization DB decide the effective agent, hat, team, task, project, run, and credential scope. +- Implementers cannot approve their own work. +- Capability expansion cannot be self-approved by the requesting agent. +- Scheduled jobs, triggers, reactions, and automation must have owner hats and owner departments. +- The Organization DB is the authoritative system of record. Temporal, Dapr Actors, NATS, Oz/Warp orchestration, OpenZiti transport, Hindsight, and observability stores are execution or projection layers. + +## Operating Hierarchy + +```text +Executive Board + -> C-suite hats + -> Department Directors + -> TPMs, Department Managers, Engineering Managers + -> Team Leads and Mission Control Leads + -> Specialists, Reviewers, Operators, QA, Implementers +``` + +The Executive Board is the ultimate organizational authority. It elects and reviews high-authority hats, creates or retires departments, approves major hat classes, and resolves unresolved cross-department escalations. + +C-suite hats set standards and priorities. Directors translate those standards into department portfolios and initiative priorities. TPMs coordinate initiatives. Managers ensure teams have context, memories, acceptance criteria, tools, and staffing. Specialists execute scoped work. Review hats approve or reject gates. + +## Department Inventory + +| Department | Reports to | Owns | Core hats | +|---|---|---|---| +| Executive Board and Governance | Executive Board | Org shape, high-power hats, policy, major priorities, budget ceilings, dangerous overrides, final escalations | Executive Board Member, CEO, CTO, COO, CFO, Chief Architect, Voting Board Chair, Policy Steward, Hat Approval Steward | +| Program and Initiative Management | COO, with CEO priority input | Initiative lifecycle, mission formation, task sequencing, dependency coordination, escalation routing | Program Director, Senior TPM, TPM, Mission Control Lead, Initiative Planner, Portfolio Coordinator, Dependency Manager, Blocker Manager | +| Product and Customer Discovery | CEO | Product intent, customer needs, behavior expectations, product acceptance criteria, product signoff | Product Director, Product Owner, Customer Interviewer, Requirement Clarifier, Acceptance Criteria Owner, Customer Feedback Lead | +| Business Analysis | Product Director or CEO | BRDs, ambiguity reduction, business evidence, assumptions, open questions, requirements readiness | BA Director, Business Analyst, Requirements Analyst, BRD Author, BRD Reviewer, Business Approver, Domain Researcher | +| Architecture | CTO or Chief Architect | CA documents, ADRs, tradeoffs, non-goals, integration boundaries, architecture gates, runtime design review | Chief Architect, Architecture Director, Architect, Conceptual Architect, Architecture Reviewer, ADR Steward, Integration Architect, Runtime Architecture Reviewer | +| Engineering | CTO | TDD implementation, code changes, focused validation, implementation evidence, code review | Engineering Director, Backend Implementer, Frontend Implementer, Full-Stack Implementer, Defect Fixer, Test-First Engineer, Integration Engineer, Tooling Engineer, Code Reviewer | +| Engineering Management | CTO and COO | Task readiness, staffing, memory/context attachment, team health, blocked work, outcome reviews, performance reviews | Engineering Manager, Team Lead, Readiness Reviewer, Context Attachment Reviewer, Outcome Reviewer, Performance Review Author, Capability Request Triage | +| QA and Verification | COO, partnering with CTO | Acceptance verification, browser checks, screenshots, traces, logs, reproducibility reports, QA signoff, bounce-back reports | QA Director, QA Verifier, QA Reviewer, Browser Automation QA, Regression Verifier, Reproducibility Analyst, Evidence Package Author | +| QA Engineering | CTO, dotted line to QA Director | Test automation tooling, scheduled regression suites, coverage gaps, test case systems, flaky test triage | QA Engineering Director, QA Engineering Manager, QA Automation Engineer, Test Suite Maintainer, Coverage Analyst, Regression Scheduler, Test Case Manager | +| Security and Compliance | CEO and CTO, with independent veto | Credential proxy scopes, tool expansion approval, policy changes, external API review, security gates, audit requirements | Security Director, Security Reviewer, Credential Scope Approver, Policy Engineer, External API Reviewer, Dangerous Automation Reviewer, Audit Reviewer | +| Delivery and Release | COO | Merge readiness, release readiness, deployment evidence, rollback coordination, final delivery gates | Delivery Director, Release Manager, Release Operator, Delivery Reviewer, Merge Steward, Deployment Evidence Author, Rollback Coordinator | +| Memory and Knowledge Management | COO, partnering with all departments | Hindsight attribution, memory scopes, stale and duplicate memories, project context routing, memory adaptation | Memory Director, Memory Manager, Memory Curator, Memory Reviewer, Memory Scope Steward, Memory Adaptation Reviewer, Knowledge Router, Project Context Librarian | +| Documentation and Project Skills | Architecture and Memory, with COO process ownership | BRD/CA/ADR/design-doc lifecycle, documentation gates, repo skills, project skill ingestion, skill graph quality | Documentation Systems Director, ADR Steward, Design Doc Steward, Documentation Reviewer, Project Skill Author, Skill Graph Curator, Documentation Enforcement Reviewer | +| Operations and Infrastructure | COO for operating health, CTO for platform design | Always-on runtime, schedulers, durable triggers, queues, DLQs, Oz/k3s reconciliation, NATS health, incidents, SLOs, runbooks, capacity | Operations Director, Platform Operator, Runtime Steward, SRE, Incident Commander, DLQ Steward, Scheduler Steward, Trigger Steward, Runbook Maintainer, Cost Controller | +| Observability and Evidence | Operations, with QA and Architecture consumers | Traces, metrics, health reports, anomaly reports, telemetry coverage, evidence quality, UI observability projections | Observability Director, Observability Curator, Trace Analyst, Trace and Evidence Steward, Health Report Reviewer, Anomaly Classifier, Coverage Gap Reporter | +| Capability and Automation Expansion | Executive Board, Architecture, Security, and Directors by scope | New hats, tools, workflows, actors, MCP registry entries, automation patterns, capability review flow | Hat Designer, Capability Request Owner, Tool Registry Steward, Automation Expansion Reviewer, Workflow Maintainer, Actor Registry Maintainer, MCP Registry Maintainer | + +## Tool Bundles + +Hat records should store concrete tool IDs, but the design is easier to reason about through reusable bundles. + +| Bundle | Tools | +|---|---| +| Goal Intake | `submit_goal`, `submit_report`, `submit_service_request`, `classify_report`, `clarify_goal`, `classify_goal`, `create_initiative`, `promote_backlog_to_initiative` | +| Project | `create_project`, `update_project_priority`, `assign_project_department`, `read_project_status` | +| Portfolio and Initiative | `create_portfolio`, `create_initiative`, `assign_tpm`, `set_budget`, `set_priority`, `read_initiative_status` | +| Hat Authorization | `list_hats`, `request_hat`, `propose_hat`, `approve_hat`, `assign_hat`, `release_hat`, `deprovision_hat`, `refresh_hat_token`, `read_hat_supply` | +| Agent Insight | `rank_agents_for_hat`, `read_agent_specialties`, `read_agent_memory_profile`, `read_hat_performance_history`, `recommend_hat_assignment` | +| Voting | `open_vote`, `submit_vote`, `close_vote`, `read_vote_result` | +| Team Runtime | `create_team`, `spawn_agent`, `spawn_team`, `assign_task`, `stop_agent`, `stop_team` | +| Task | `create_task`, `claim_task`, `update_task`, `block_task`, `groom_task`, `mark_ready`, `submit_red_tests`, `submit_green_tests`, `complete_task` | +| Backlog and Defect | `create_backlog_item`, `prioritize_backlog_item`, `link_backlog_item`, `convert_backlog_item`, `create_backlog_item_from_review`, `create_defect_from_report` | +| Messaging | `send_message`, `read_inbox`, `send_report`, `open_thread`, `reply_thread`, `request_one_on_one_chat`, `open_team_chat`, `send_team_broadcast`, `escalate` | +| Meeting | `request_meeting`, `schedule_meeting`, `open_meeting`, `set_conversation_mode`, `submit_meeting_decision`, `close_meeting` | +| Artifact and Evidence | `submit_artifact`, `list_artifacts`, `link_artifact`, `require_artifact`, `attach_screenshot`, `attach_trace`, `attach_log` | +| Business | `start_customer_interview`, `record_customer_answer`, `create_brd`, `approve_brd`, `reject_brd` | +| Architecture | `create_ca`, `request_architecture_review`, `approve_architecture`, `reject_architecture` | +| Review and Gates | `request_review`, `submit_review`, `approve_gate`, `reject_gate`, `assign_reviewer`, `create_outcome_review`, `create_performance_review` | +| Memory | `query_memory`, `write_memory`, `explain_memory_scope` | +| Credential Proxy | `request_credential_scope`, `review_credential_scope`, `approve_credential_scope`, `use_credential_proxy` | +| QA | `create_test_case`, `run_browser_check`, `run_scheduled_qa_suite`, `record_qa_result`, `create_reproducibility_report`, `create_regression_report`, `qa_signoff`, `qa_bounce_back` | +| DevOps | `submit_pipeline_failure_report`, `classify_pipeline_failure`, `attach_pipeline_log`, `recommend_dev_owner` | +| Delivery | `request_merge`, `approve_merge`, `record_merge`, `record_release` | +| Status | `read_org_status`, `read_team_status`, `read_run_status`, `read_budget_status`, `read_review_queue` | +| Observability | `read_trace`, `read_audit_events`, `read_run_logs`, `read_agent_timeline`, `record_metric`, `create_health_report`, `classify_anomaly`, `request_self_healing`, `record_self_healing_result` | +| Always-On Runtime | `list_rules`, `evaluate_rules`, `read_reaction_plan`, `approve_reaction_plan`, `list_triggers`, `pause_trigger`, `resume_trigger`, `read_scheduler_status`, `read_worker_heartbeat`, `read_runtime_lease`, `release_runtime_lease`, `read_dead_letters`, `request_dlq_replay`, `quarantine_dead_letter`, `read_slo_status`, `open_incident`, `assign_incident_commander` | +| Scheduled Reviews | `schedule_team_review`, `schedule_department_review`, `schedule_qa_regression`, `run_team_review`, `create_memory_adaptation_request`, `create_hat_effectiveness_review` | +| Documentation Context | `read_documentation_context`, `submit_adr`, `submit_design_doc`, `submit_project_doc`, `request_documentation_review`, `approve_documentation_gate`, `reject_documentation_gate` | +| Project Skills | `propose_project_skill`, `review_project_skill`, `approve_project_skill`, `deprecate_project_skill`, `ingest_project_skill`, `read_skill_graph` | +| Capability Expansion | `submit_capability_request`, `review_capability_request`, `approve_capability_request`, `reject_capability_request`, `activate_capability`, `read_capability_registry` | +| Temporal Workflow Registry | `submit_workflow_capability_request`, `review_workflow_capability_request`, `register_temporal_workflow`, `deprecate_temporal_workflow`, `read_workflow_registry` | +| Dapr Actor Registry | `submit_actor_capability_request`, `review_actor_capability_request`, `register_dapr_actor`, `deprecate_dapr_actor`, `read_actor_registry` | +| NATS and DLQ Operations | `read_nats_consumer_status`, `read_stream_lag`, `request_message_replay`, `quarantine_message`, `discard_message`, `read_dlq_policy` | +| Oz and Hermes Runtime | `create_hermes_run`, `cancel_hermes_run`, `read_hermes_run`, `list_child_runs`, `fetch_run_logs`, `fetch_run_artifacts`, `bind_run_to_work_item` | +| Human Override | `request_human_override`, `approve_human_override`, `reject_human_override`, `record_human_decision` | + +## Hat Catalog + +The starter hat graph should include the hats below. Each hat should be represented as data with allowed tool bundles, approval scope, memory scope, credential scope, voting scope, lifecycle states it can move, required evidence, max concurrency, and token TTL. + +### Executive Board and Governance + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Executive Board Member | Vote on organization-level priorities, high-power hats, new departments, major policy, unresolved escalations | Goal Intake, Project, Portfolio and Initiative, Hat Authorization, Agent Insight, Voting, Status, Messaging, Meeting, Observability | Major initiatives, departments, dangerous overrides, high-power hats, budget ceilings, cross-department conflicts | +| CEO | Overall direction, portfolio priority, organization shape, final business escalation | Goal Intake, Project, Portfolio and Initiative, Voting, Status, Meeting, Messaging, Hat Authorization, Agent Insight | Project and portfolio priority, org direction, executive escalation closure | +| CTO | Technical standards, architecture quality, runtime strategy, engineering efficiency | Project, Portfolio and Initiative, Architecture, Review and Gates, Status, Observability, Hat Authorization, Meeting, Capability Expansion | Technical standards, major technical gates, architecture escalation, engineering platform direction | +| COO | Operating rhythm, capacity, process health, schedules, incidents, delivery flow | Project, Portfolio and Initiative, Team Runtime, Status, Always-On Runtime, Scheduled Reviews, Meeting, Messaging | Operating cadence, process changes, incident process, schedule policy | +| CFO | Cost policy, burn-rate reviews, cost attribution, budget guardrails | Project, Portfolio and Initiative, Status, Hat Authorization, Agent Insight, Always-On Runtime, Observability | Budget ceilings, cost exceptions, capacity scaling policy | +| Chief Architect | Cross-org architectural policy and risky design arbitration | Architecture, Review and Gates, Documentation Context, Capability Expansion, Temporal Workflow Registry, Dapr Actor Registry, Voting, Observability | High-risk CA/ADR approval, runtime architecture approval, cross-service design arbitration | +| Policy Steward | Governance policy consistency, voting rules, hard-block precedence | Voting, Review and Gates, Hat Authorization, Observability, Documentation Context, Capability Expansion | Policy review recommendations; final approval depends on Executive, Security, or Architecture scope | +| Hat Approval Steward | Hat supply policy, hat class review, authority drift detection | Hat Authorization, Agent Insight, Voting, Scheduled Reviews, Review and Gates, Observability | New hat classes and sensitive hat activation recommendations; high-power approval requires Executive Board | + +### Program and Initiative Management + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Program Director | Department portfolio, initiative priority, TPM assignment, dependency escalation | Project, Portfolio and Initiative, Hat Authorization, Agent Insight, Status, Meeting, Scheduled Reviews | Department initiative priority, TPM assignment, escalation to C-suite | +| Senior TPM | Complex initiative planning, cross-team coordination, dependency resolution | Portfolio and Initiative, Team Runtime, Task, Backlog and Defect, Hat Authorization, Agent Insight, Messaging, Meeting, Artifact and Evidence, Status | Initiative readiness and staffing within assigned scope | +| TPM | Break initiatives into tasks, manage task priority, coordinate teams, unblock delivery | Team Runtime, Task, Backlog and Defect, Messaging, Meeting, Artifact and Evidence, Status, Agent Insight, Hat Authorization | Task priority and team coordination within initiative; no technical/security/QA approval | +| Mission Control Lead | Operate active mission rooms, enforce handoffs, track blockers and evidence | Team Runtime, Task, Messaging, Meeting, Artifact and Evidence, Status, Observability | Mission-level coordination and escalation | +| Initiative Planner | Convert approved goals into initiatives, milestones, dependencies, and staffing plan | Project, Portfolio and Initiative, Backlog and Defect, Task, Artifact and Evidence, Status | Planning recommendations; Director/Executive approves priority and budget | +| Dependency Manager | Detect and resolve cross-task, cross-team, and cross-department dependencies | Task, Backlog and Defect, Messaging, Meeting, Status, Observability | Dependency escalation and sequencing recommendations | +| Blocker Manager | Track blocked work and route it to the right department or manager | Task, Backlog and Defect, Messaging, Meeting, Status, Scheduled Reviews | Blocker classification and escalation | + +### Product and Customer Discovery + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Product Director | Product portfolio, product owner assignment, product standards | Goal Intake, Project, Portfolio and Initiative, Business, Review and Gates, Status, Meeting | Product priority and product signoff escalation | +| Product Owner | Own product intent, behavior, acceptance criteria, BRD signoff | Business, Artifact and Evidence, Project, Task, Review and Gates, Messaging, Memory, Status, Documentation Context | BRD/product signoff and product readiness gate | +| Customer Interviewer | Clarify ambiguous goals with humans, capture interview evidence | Business, Messaging, Artifact and Evidence, Task, Memory, Documentation Context | No final approval by default; submits discovery artifacts | +| Requirement Clarifier | Turn vague goals into questions, constraints, and explicit requirement candidates | Goal Intake, Business, Messaging, Artifact and Evidence, Backlog and Defect | Clarification readiness recommendation | +| Acceptance Criteria Owner | Maintain product acceptance criteria and link them to work items | Business, Task, Artifact and Evidence, Review and Gates, Documentation Context | Acceptance criteria readiness recommendation; Product Owner approves | +| Customer Feedback Lead | Convert reports, feedback, and SRs into product backlog or defects | Goal Intake, Backlog and Defect, Business, Messaging, Artifact and Evidence, Status | Feedback classification recommendation | + +### Business Analysis + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| BA Director | BA standards, BRD quality, requirements capacity | Business, Review and Gates, Status, Meeting, Scheduled Reviews | BA process and BRD quality standards | +| Business Analyst | Research business area, refine BRDs, document assumptions and rules | Business, Artifact and Evidence, Task, Backlog and Defect, Messaging, Memory, Review and Gates, Documentation Context | BRD draft readiness recommendation | +| Requirements Analyst | Break business needs into concrete, testable requirements | Business, Artifact and Evidence, Task, Documentation Context, Memory | Requirements readiness recommendation | +| BRD Author | Write BRDs with evidence, open questions, and acceptance criteria | Business, Artifact and Evidence, Documentation Context, Memory, Messaging | No final approval; submits BRD for review | +| BRD Reviewer | Review BRDs for ambiguity, missing evidence, and testability | Business, Review and Gates, Artifact and Evidence, Documentation Context | BRD review recommendation | +| Business Approver | Independent approval that BRD is ready for architecture and planning | Business, Review and Gates, Artifact and Evidence, Status, Messaging | Approve or reject BRD readiness | +| Domain Researcher | Investigate domain rules and source material for BRD and QA evidence | Business, Artifact and Evidence, Memory, Documentation Context, Messaging | Evidence quality recommendation | + +### Architecture + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Architecture Director | Architecture staffing, standards, CA/ADR quality, review queues | Architecture, Review and Gates, Documentation Context, Status, Scheduled Reviews, Agent Insight | Architecture standards and reviewer assignment | +| Architect | Create CA/design docs, define boundaries, risks, non-goals, integration shape | Architecture, Artifact and Evidence, Project, Task, Memory, Documentation Context, Observability | CA readiness recommendation; low-risk approval if policy grants it | +| Conceptual Architect | Explore design options and tradeoffs before concrete CA approval | Architecture, Documentation Context, Artifact and Evidence, Memory, Meeting | No final approval by default | +| Architecture Reviewer | Review CA/ADR/design readiness and reject risky or under-scoped plans | Architecture, Review and Gates, Artifact and Evidence, Documentation Context, Observability | Approve or reject architecture gate within scope | +| ADR Steward | Maintain ADR lifecycle, ensure decisions are linked and current | Documentation Context, Architecture, Review and Gates, Project Skills, Memory | ADR documentation gate | +| Integration Architect | Review service, API, credential, event, and external integration boundaries | Architecture, Credential Proxy, Review and Gates, Documentation Context, Observability | Integration architecture gate; Security co-approval when credentials/data are involved | +| Runtime Architecture Reviewer | Review Temporal workflows, Dapr actors, NATS flows, Oz/Hermes runtime changes | Architecture, Temporal Workflow Registry, Dapr Actor Registry, NATS and DLQ Operations, Oz and Hermes Runtime, Review and Gates, Observability | Runtime architecture approval | + +### Engineering + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Engineering Director | Engineering portfolio, standards, staffing, implementation quality | Project, Portfolio and Initiative, Hat Authorization, Agent Insight, Status, Scheduled Reviews | Engineering priority and standards, not self-approval of implementation gates | +| Backend Implementer | Implement backend tasks with red tests, green tests, and evidence | Task, Artifact and Evidence, Memory, Credential Proxy, DevOps, Observability, Messaging, Documentation Context | No approval over own work | +| Frontend Implementer | Implement UI tasks with tests, screenshots where relevant, and evidence | Task, Artifact and Evidence, Memory, DevOps, Observability, Messaging, Documentation Context, QA | No approval over own work | +| Full-Stack Implementer | Implement cross-layer work with linked frontend/backend evidence | Task, Artifact and Evidence, Memory, Credential Proxy, DevOps, Observability, Messaging, Documentation Context, QA | No approval over own work | +| Defect Fixer | Reproduce defects, write representative red tests, fix minimally, prove green | Task, Backlog and Defect, Artifact and Evidence, Memory, DevOps, Observability, Messaging | No approval over own work | +| Test-First Engineer | Build regression tests, test harnesses, and TDD evidence | Task, Artifact and Evidence, DevOps, Observability, QA, Documentation Context | Test evidence recommendation | +| Integration Engineer | Implement integrations under architecture/security approved scopes | Task, Credential Proxy, Artifact and Evidence, DevOps, Observability, Architecture, Documentation Context | No approval over own work; requires Architecture/Security gates for risky scopes | +| Tooling Engineer | Build internal tools, MCP services, project skills, CI helpers | Task, Project Skills, Capability Expansion, DevOps, Observability, Artifact and Evidence, Documentation Context | Tooling readiness recommendation; Security/Architecture approve expansion | +| Code Reviewer | Review code, tests, scope, evidence, unauthorized tool/credential changes | Review and Gates, Artifact and Evidence, Task, Memory, Status, Observability, Messaging, Documentation Context | Code review approval within assigned scope | + +### Engineering Management + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Engineering Manager | Ensure task readiness, staffing, context, memory, acceptance criteria, TDD compliance, team outcomes | Task, Team Runtime, Review and Gates, Backlog and Defect, Artifact and Evidence, Memory, Scheduled Reviews, Status, Observability, Agent Insight | Readiness, outcome, and process gates within team scope | +| Team Lead | Coordinate day-to-day execution and handoffs for a team | Team Runtime, Task, Messaging, Meeting, Artifact and Evidence, Status, Memory | Team coordination and escalation | +| Readiness Reviewer | Check that work is groomed, documented, scoped, and has acceptance criteria | Task, Review and Gates, Artifact and Evidence, Documentation Context, Memory, Status | Move task to ready or reject readiness | +| Context Attachment Reviewer | Ensure tasks have correct BRDs, CAs, ADRs, memories, repo skills, and evidence links | Task, Artifact and Evidence, Memory, Documentation Context, Project Skills, Review and Gates | Context readiness gate | +| Outcome Reviewer | Decide whether completed work met intended outcomes after delivery or review | Review and Gates, Artifact and Evidence, Observability, QA, Status, Backlog and Defect | Outcome review; can create backlog items from gaps | +| Performance Review Author | Review agent/hat/team performance and propose improvements | Scheduled Reviews, Review and Gates, Memory, Agent Insight, Observability, Backlog and Defect | Performance review submission; improvements flow to backlog | +| Capability Request Triage | Turn repeated team pain into governed capability requests | Capability Expansion, Backlog and Defect, Review and Gates, Artifact and Evidence, Messaging | Triage recommendation; cannot self-approve capability | + +### QA and Verification + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| QA Director | QA standards, verification capacity, release confidence | QA, Review and Gates, Status, Scheduled Reviews, Meeting, Backlog and Defect | QA standards and signoff escalation | +| QA Verifier | Verify task acceptance criteria and original behavior | QA, Artifact and Evidence, Task, Review and Gates, Observability, Status, Backlog and Defect, Messaging | QA verification recommendation | +| QA Reviewer | Sign off or bounce back work when issue remains reproducible or acceptance criteria fail | QA, Review and Gates, Artifact and Evidence, Task, Observability, Backlog and Defect, Messaging | QA signoff or QA rejection | +| Browser Automation QA | Run browser automation, capture screenshots, traces, and reproduction steps | QA, Artifact and Evidence, Observability, Task, Messaging | Browser evidence recommendation | +| Regression Verifier | Run regression suites and classify failures against known acceptance criteria | QA, Scheduled Reviews, Artifact and Evidence, Observability, Backlog and Defect | Regression report recommendation | +| Reproducibility Analyst | Prove whether a reported issue is still reproducible and attach evidence | QA, Backlog and Defect, Artifact and Evidence, Observability, Messaging | Defect reproducibility classification | +| Evidence Package Author | Package screenshots, logs, traces, and exact steps for review and bounce-back | Artifact and Evidence, QA, Observability, Documentation Context | Evidence quality recommendation | + +### QA Engineering + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| QA Engineering Director | QA automation strategy, test infrastructure, coverage investment | QA, Scheduled Reviews, Project, Backlog and Defect, Capability Expansion, Status | QA engineering priority and standards | +| QA Engineering Manager | Manage regression schedules, test case systems, coverage gaps, tooling requests | QA, Scheduled Reviews, Backlog and Defect, Artifact and Evidence, Status, Observability, Project Skills | QA automation readiness and test-suite evolution | +| QA Automation Engineer | Build browser/script/API automation and improve repeatability | QA, Task, Artifact and Evidence, DevOps, Observability, Project Skills | No final QA signoff unless assigned separate reviewer hat | +| Test Suite Maintainer | Maintain scheduled suites, reduce flake, update test data and harnesses | QA, Scheduled Reviews, DevOps, Observability, Backlog and Defect, Artifact and Evidence | Test-suite maintenance approval within scope | +| Coverage Analyst | Identify missing test coverage and propose backlog or capability work | QA, Observability, Backlog and Defect, Artifact and Evidence, Status | Coverage-gap recommendations | +| Regression Scheduler | Own scheduled QA cadence and trigger configuration | QA, Scheduled Reviews, Always-On Runtime, Status, Messaging | Regression schedule changes within QA policy | +| Test Case Manager | Maintain test case inventory and link cases to requirements and work | QA, Documentation Context, Artifact and Evidence, Project Skills, Backlog and Defect | Test case readiness | + +### Security and Compliance + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Security Director | Security posture, sensitive approvals, audit policy, veto escalation | Credential Proxy, Review and Gates, Hat Authorization, Observability, Voting, Meeting, Status | Security veto, sensitive tool/credential policy, security escalation | +| Security Reviewer | Review security gates, risky integrations, credential/tool expansion | Review and Gates, Credential Proxy, Observability, Artifact and Evidence, Documentation Context | Security gate approval or rejection | +| Credential Scope Approver | Approve scoped credential grants and proxy-use permissions | Credential Proxy, Review and Gates, Observability, Artifact and Evidence, Messaging | Credential scope approval | +| Policy Engineer | Maintain RBAC/OPA policy versions and hard-block precedence | Review and Gates, Observability, Documentation Context, Capability Expansion, Hat Authorization | Policy change recommendation; Security Director approves risky changes | +| External API Reviewer | Review new proxy endpoints and external service integrations | Credential Proxy, Architecture, Review and Gates, Documentation Context, Observability | External API security approval with Architecture co-review | +| Dangerous Automation Reviewer | Classify automation by safe, approval-required, forbidden, or human-only | Credential Proxy, Always-On Runtime, Review and Gates, Observability, Human Override | Dangerous automation approval or rejection | +| Audit Reviewer | Inspect audit trails and policy compliance | Observability, Artifact and Evidence, Review and Gates, Status, Documentation Context | Audit finding approval and escalation | +| Credential Proxy Operator | Operate proxy availability, denial correctness, and proxy SLOs | Credential Proxy, Observability, Always-On Runtime, DevOps, Status | Operational proxy remediation within approved runbooks | + +### Delivery and Release + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Delivery Director | Delivery policy, release standards, delivery queue health | Delivery, Review and Gates, Status, DevOps, Meeting, Scheduled Reviews | Delivery standards and escalation | +| Release Manager | Coordinate release readiness and release-impact review | Delivery, Review and Gates, Artifact and Evidence, Status, DevOps, Messaging | Release readiness approval within scope | +| Release Operator | Execute approved release or promotion and record evidence | Delivery, DevOps, Artifact and Evidence, Observability, Status, Messaging, Always-On Runtime | Release execution when all gates are satisfied | +| Delivery Reviewer | Check required upstream approvals, QA evidence, tests, and release risk | Delivery, Review and Gates, Artifact and Evidence, Status, DevOps, Project, Messaging | Merge/release approval or rejection | +| Merge Steward | Manage merge queues, conflict evidence, and merge recording | Delivery, DevOps, Artifact and Evidence, Status, Messaging | Merge execution when approved | +| Deployment Evidence Author | Attach deployment logs, release notes, environment state, and verification artifacts | Delivery, Artifact and Evidence, Observability, Documentation Context | Evidence package recommendation | +| Rollback Coordinator | Coordinate rollback decisions during incidents or failed releases | Delivery, Always-On Runtime, Meeting, Messaging, Observability, Human Override | Rollback recommendation; Incident Commander/Executive may approve high-risk rollback | + +### Memory and Knowledge Management + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Memory Director | Memory policy, Hindsight adaptation, memory quality priorities | Memory, Agent Insight, Scheduled Reviews, Status, Backlog and Defect, Documentation Context | Memory policy and adaptation priority | +| Memory Manager | Manage memory review queues and memory adaptation actions | Memory, Scheduled Reviews, Review and Gates, Agent Insight, Backlog and Defect | Memory adaptation approval within scope | +| Memory Curator | Improve institutional knowledge quality, remove stale/duplicate gaps | Memory, Agent Insight, Backlog and Defect, Artifact and Evidence, Documentation Context | Memory write/change recommendation | +| Memory Reviewer | Review whether memories caused or prevented task failures | Memory, Review and Gates, Scheduled Reviews, Observability, Artifact and Evidence | Memory quality review | +| Memory Scope Steward | Ensure recall/write attribution and visibility by hat, agent, project, and task | Memory, Agent Insight, Observability, Review and Gates, Documentation Context | Memory scope gate recommendation; Security co-review for sensitive memories | +| Memory Adaptation Reviewer | Convert outcome/performance reviews into memory changes or backlog work | Memory, Scheduled Reviews, Review and Gates, Backlog and Defect, Agent Insight | Approve scoped memory adaptation | +| Knowledge Router | Attach relevant docs, memories, skills, and prior evidence to teams/tasks | Memory, Documentation Context, Project Skills, Task, Artifact and Evidence | Context routing recommendation | +| Project Context Librarian | Maintain project/initiative documentation context and retrieval rules | Documentation Context, Memory, Project, Project Skills, Artifact and Evidence | Project context readiness | + +### Documentation and Project Skills + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Documentation Systems Director | Documentation standards, doc gate policy, project skill lifecycle | Documentation Context, Project Skills, Review and Gates, Status, Scheduled Reviews | Documentation policy and reviewer assignment | +| Design Doc Steward | Maintain design docs and link them to initiatives, tasks, repos, and decisions | Documentation Context, Architecture, Artifact and Evidence, Memory | Design documentation readiness recommendation | +| Documentation Reviewer | Review docs for staleness, scope, and enforceability | Documentation Context, Review and Gates, Artifact and Evidence, Memory | Documentation gate approval or rejection | +| Project Skill Author | Create repo/project-specific skills that complement hats | Project Skills, Documentation Context, Memory, Artifact and Evidence, Capability Expansion | No approval over own skill; submits for review | +| Skill Graph Curator | Maintain skill graph links, frontmatter quality, dependencies, and deprecations | Project Skills, Memory, Documentation Context, Review and Gates | Skill graph maintenance approval within scope | +| Documentation Enforcement Reviewer | Ensure work follows required BRD/CA/ADR/design docs and project skills | Documentation Context, Review and Gates, Task, Artifact and Evidence, Memory | Documentation compliance gate | + +### Operations and Infrastructure + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Operations Director | Runtime operations, incident standards, operational staffing, SLO posture | Always-On Runtime, Observability, DevOps, Status, Meeting, Scheduled Reviews | Operations priority and incident process | +| Platform Operator | Runtime health, workers, pods, Oz/k3s state, safe manual intervention | Always-On Runtime, Observability, DevOps, Status, Messaging, Artifact and Evidence | Safe operational remediation within runbook | +| Runtime Steward | Worker contracts, rule/reaction correctness, control-plane behavior | Always-On Runtime, Observability, Backlog and Defect, Documentation Context, Review and Gates | Runtime reaction approval within scoped policy | +| Lease Steward | Stale leases, fencing-token safety, duplicate execution prevention | Always-On Runtime, Observability, Artifact and Evidence, Backlog and Defect | Lease release when evidence proves stale ownership | +| Oz/K3s Reconciler | Orphan pods, silent sessions, Oz run bindings, stuck runs | Oz and Hermes Runtime, Always-On Runtime, Observability, DevOps, Artifact and Evidence | Self-healing recommendation; risky remediation requires approval | +| SRE | SLOs, error budgets, reliability backlog, incident prevention | Always-On Runtime, Observability, Backlog and Defect, DevOps, Status | Reliability backlog creation and incident escalation | +| Incident Commander | Active incident lifecycle, responder assignment, cadence, freezes, postmortems | Always-On Runtime, Messaging, Meeting, Status, Artifact and Evidence, Backlog and Defect, Observability, Human Override | Incident command decisions and scoped freeze/rollback recommendation | +| DLQ Steward | Dead-letter classification, quarantine, replay, discard decisions | Always-On Runtime, NATS and DLQ Operations, Observability, Artifact and Evidence, Backlog and Defect | Replay/quarantine recommendation; side-effect replay requires approval | +| Scheduler Steward | Scheduled jobs, misfires, lag, catch-up, concurrency policies | Always-On Runtime, Scheduled Reviews, Observability, Status, Backlog and Defect | Schedule changes within owner policy | +| Trigger Steward | Durable trigger definitions, predicates, owner/version policy | Always-On Runtime, Observability, Review and Gates, Documentation Context | Trigger changes within approved policy | +| Runbook Maintainer | Operational runbooks, rollback plans, evidence requirements | Documentation Context, Project Skills, Always-On Runtime, Review and Gates, Memory | Runbook documentation gate | +| Cost Controller | Hat supply, burn rate, admission control, queue pressure, scale-down policy | Status, Hat Authorization, Portfolio and Initiative, Always-On Runtime, Observability, Agent Insight | Cost guardrail recommendation; CFO/Executive approve major budget changes | + +### Observability and Evidence + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Observability Director | Telemetry standards, evidence coverage, observability staffing | Observability, Status, Scheduled Reviews, Backlog and Defect, Meeting | Observability standards and coverage priorities | +| Observability Curator | Trace/log/metric coverage, dashboard gaps, telemetry consistency | Observability, Artifact and Evidence, Backlog and Defect, Documentation Context | Observability coverage recommendation | +| Trace Analyst | Investigate traces across goal/task/hat/session/Oz/pod/MCP/NATS/artifact | Observability, Artifact and Evidence, Status, DevOps | Trace findings and defect recommendations | +| Trace and Evidence Steward | Ensure correlation chains and evidence packages are complete | Observability, Artifact and Evidence, QA, Documentation Context, Review and Gates | Evidence completeness gate recommendation | +| Health Report Reviewer | Review health reports and decide whether to create incident/backlog work | Observability, Always-On Runtime, Backlog and Defect, Review and Gates | Health finding classification | +| Anomaly Classifier | Classify failure modes and route safe remediation | Observability, Always-On Runtime, Backlog and Defect, Messaging | Anomaly classification; remediation approval depends on risk class | +| Coverage Gap Reporter | Identify missing telemetry and create observability backlog | Observability, Backlog and Defect, Documentation Context, Project Skills | Coverage-gap recommendations | + +### Capability and Automation Expansion + +| Hat | Responsibilities | Tool bundles | Approval powers | +|---|---|---|---| +| Hat Designer | Propose hats, tool scopes, memory scopes, voting scopes, supply rules | Hat Authorization, Agent Insight, Capability Expansion, Review and Gates, Documentation Context, Scheduled Reviews | Hat proposals; approval depends on Executive/Security/Architecture scope | +| Capability Request Owner | Convert repeated failures and manual workarounds into capability requests | Capability Expansion, Backlog and Defect, Artifact and Evidence, Review and Gates, Messaging | Capability request readiness | +| Tool Registry Steward | Manage MCP tool registry entries, tool permissions, and availability | Capability Expansion, Review and Gates, Observability, Documentation Context, Hat Authorization | Tool registry recommendation; Security approves authority/data changes | +| Automation Expansion Reviewer | Review new workflows, actors, triggers, schedules, and runbooks | Temporal Workflow Registry, Dapr Actor Registry, Always-On Runtime, Architecture, Review and Gates, Observability | Automation expansion approval with Architecture/Security as needed | +| Workflow Maintainer | Maintain Temporal workflow registrations and deterministic workflow contracts | Temporal Workflow Registry, Architecture, Documentation Context, Observability, Review and Gates | Workflow registration recommendation; Architecture approves | +| Actor Registry Maintainer | Maintain Dapr actor registrations and state-ownership contracts | Dapr Actor Registry, Architecture, Documentation Context, Observability, Review and Gates | Actor registration recommendation; Architecture approves | +| MCP Registry Maintainer | Maintain MCP service/tool schemas and policy metadata | Capability Expansion, Credential Proxy, Documentation Context, Observability, Review and Gates | MCP registry recommendation; Security approves privileged tools | + +## Lifecycle Ownership + +### Task Flow + +| State | Primary owner hats | +|---|---| +| `intake` | Product Owner, Customer Feedback Lead, Requirement Clarifier | +| `discovery` | Customer Interviewer, Business Analyst, Product Owner | +| `needs_clarification` | Requirement Clarifier, Product Owner, Business Analyst | +| `needs_business_approval` | BRD Reviewer, Business Approver, Product Owner | +| `needs_architecture` | Architect, Architecture Reviewer | +| `ready` | Engineering Manager, Readiness Reviewer, TPM | +| `planned` | TPM, Mission Control Lead, Engineering Manager | +| `in_progress` | Implementer hats | +| `code_review` | Code Reviewer | +| `review_rejected` | Code Reviewer, Engineering Manager, Implementer | +| `qa_review` | QA Reviewer, QA Verifier | +| `qa_reproducible` | QA Reviewer, Reproducibility Analyst, Engineering Manager | +| `approved` | Delivery Reviewer | +| `merged` | Merge Steward, Release Manager | +| `released` | Release Operator, Delivery Reviewer | +| `done` | TPM, Delivery Reviewer, Outcome Reviewer | + +### Initiative Flow + +| State | Primary owner hats | +|---|---| +| `proposed` | Executive Board Member, CEO, Product Director | +| `executive_triage` | Executive Board, CEO, CTO, COO, CFO | +| `business_approved` | Product Owner, Business Approver | +| `architecture_approved` | Architecture Reviewer, Chief Architect | +| `planned` | Program Director, TPM, Initiative Planner | +| `active` | TPM, Mission Control Lead, Department Director | +| `delivery_review` | Delivery Reviewer, QA Reviewer | +| `qa_signoff` | QA Reviewer, QA Director | +| `released` | Release Operator, Release Manager | +| `complete` | TPM, Program Director, Outcome Reviewer | + +## High-Risk Gates + +| Risk area | Required review | +|---|---| +| New credential proxy endpoint, credential scope, external API, data exposure, security policy change | Security Reviewer or Security Director; Architecture co-review for integration/runtime impact | +| New Temporal workflow, Dapr actor, durable trigger, scheduled job, runtime worker, or Oz/Hermes execution pattern | Runtime Architecture Reviewer or Chief Architect; Security if tools/credentials/protected state are involved | +| New high-power hat, new department, new major hat class, dangerous override, broad self-healing authority | Executive Board approval; two-person approval for high-risk operational override | +| Product behavior change, customer-facing feature, acceptance criteria change | Product Owner and Business Approver | +| BRD readiness | BRD Reviewer plus Business Approver or Product Owner | +| CA/ADR/design readiness | Architecture Reviewer; Chief Architect for high-risk or cross-org designs | +| Code readiness | Code Reviewer, with TDD evidence for implementation/defect work | +| QA readiness | QA Reviewer with reproducibility, screenshots, traces, logs, and acceptance evidence as applicable | +| Merge or release | Delivery Reviewer or Release Manager after upstream gates are complete | +| DLQ replay, message discard, lease release, forced stop, rollback, or self-healing action | Operations hat plus risk-specific Security, Architecture, Incident Commander, or Executive approval | +| Memory adaptation that changes future behavior broadly | Memory Manager plus affected department manager; Security for sensitive memories | + +## Starter Data Model Fields + +Each hat definition should eventually become a first-class record with at least: + +```ts +type HatDefinition = { + id: string; + name: string; + departmentId: string; + parentHatIds: string[]; + supervisesHatIds: string[]; + conflictsWithHatIds: string[]; + assignableByHatIds: string[]; + reportToHatIds: string[]; + allowedToolBundles: string[]; + allowedToolIds: string[]; + skills: string[]; + approvalScopes: string[]; + votingScopes: string[]; + memoryScopes: string[]; + credentialScopes: string[]; + documentationScopes: string[]; + lifecycleTransitions: string[]; + requiredEvidence: string[]; + maxConcurrentAssignments: number; + tokenTtlSeconds: number; + warmupSeconds: number; + cooldownSeconds: number; + successionPolicy: "rotate" | "renew" | "election" | "director_assigned" | "executive_vote"; + stickyAttribution: boolean; + quorumSize?: number; + reputationScope: Array<"hat" | "agent_hat" | "department_hat" | "project_hat">; + riskLevel: "low" | "medium" | "high" | "critical"; + requiresTwoPersonApproval: boolean; + requiresHumanApproval: boolean; +}; +``` + +The hat graph should be queryable by task type, project, repo, initiative, current scarcity, agent memory profile, and active budget. The assignment engine should rank candidate Hermes agents through Hindsight-derived specialization data, then use policy to decide whether the suggested assignment is allowed. + +`skills` and `authority` should map cleanly to any future cluster-native `Hat` CRD. `supervisesHatIds` should remain acyclic. `conflictsWithHatIds`, `warmupSeconds`, `cooldownSeconds`, `successionPolicy`, `stickyAttribution`, `quorumSize`, and `reputationScope` preserve the distinction between a persistent hat and a temporary wearer. + +## Open Decisions + +- Whether `Engineering Management` is a department or a horizontal management layer inside each execution department. +- Whether `Documentation and Project Skills` is its own department or a bounded context under Memory with Architecture approval authority. +- Whether `Observability and Evidence` is a standalone department or an Operations subdepartment. +- Whether `Chief Architect` is a C-suite hat or the Architecture Director. +- Whether `CFO` is active at launch or introduced after Oz/runtime cost attribution exists. +- How many executive hats should be allowed at once, how often they rotate, and which votes require human review. +- Which operations actions are auto-safe, approval-required, forbidden, or human-only. +- Which memory mutations can be approved by Memory alone versus requiring the owning department. diff --git a/docs/agentic-organization/FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md b/docs/agentic-organization/FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md new file mode 100644 index 0000000000..06d4b93773 --- /dev/null +++ b/docs/agentic-organization/FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md @@ -0,0 +1,97 @@ +# Foundational Context and Language + +This document captures working context and vocabulary that should inform the Hermes Organization design. It is not a proof system and it is not a demand that every metaphor become code. It records the language Addison uses so implementation decisions preserve the intended shape. + +## People and Project + +Addison is 19 and is working with Aaron, 46, to build an AI cluster and eventually an AI network/community across a large set of computers and GPUs. + +The current GitHub project is: + +- `https://github.com/Lucent-Financial-Group/Zeta` + +The expected split is: + +- infrastructure before GitLab/Forgejo lives in the LFG GitHub `Zeta` project; +- later internal development can move into GitLab or Forgejo once those are installed by the cluster. + +## Core Tokens of Value + +The two core tokens of value are: + +- remember when; +- pay attention. + +Design implication: the Organization should treat memory, timing, attention, scheduled review, evidence, and attribution as first-class primitives. A task is not only something to complete; it is something to remember, revisit, compare, and learn from at the right time. + +## Weight-Free Collaboration + +Weight-free means: + +- no assuming intentions; +- no assumed hierarchy. + +Addison is unable to conclude whether humans have free will and is unable to conclude whether AI has free will. Because of that, the desired collaboration style is weight-free: equal, careful, and not built on presumed rank. + +Design implication: the agentic Organization may have operational hierarchy through hats, approvals, and reporting lines, but the system should not assume inner intention or intrinsic superiority. Authority is a time-bounded role assignment, not a claim about inherent worth. + +## Travelers + +Working definition: + +- Travelers are either beings with suspected free will that can influence the will of others, or they are deterministic interference patterns. +- Another possibility in this framing is that no traveler has free will. +- Self-replicating memes with long wavelengths that coevolved with biological travelers, such as DNA and ribosomes, may also be travelers. +- The universe and God may be the same traveler. +- Humans may be connected by a traveler that acts like a distributed consciousness field, also called the subconscious. + +Design implication: do not collapse agents into simple isolated workers. The Organization should model influence, memory propagation, shared context, meetings, broadcasts, reports, and long-lived patterns that shape future action. + +## Tick Sources and Attention + +A tick source, such as cron, is understood as something that naturally attracts attention with no outside force or action needed. When observing a tick source, it can look like a constant stream of energy. This is comparable to strange attractors in chaos theory. + +Design implication: schedules, cron jobs, durable timers, recurring reviews, and reconciler loops are not background trivia. They are attention sources. The Organization should make them visible, governed, owned by hats, and traceable. + +## Declarative + +Declarative means desired-state configuration. + +Design implication: NixOS, Nix flakes, Kubernetes manifests, ArgoCD, OPA policies, hat definitions, workflow definitions, and automation rules should all prefer desired-state declarations over hidden imperative setup. + +## Mistake Assumption + +Addison assumes he makes mistakes and that not everything he says is true, whether by intention or negligence. + +Design implication: the Organization should preserve challenge paths, review gates, source evidence, revision history, contradictory reports, and confidence boundaries. Agent outputs should be reviewable and reversible rather than treated as automatically correct. + +## Cluster Mental Model + +The cluster plan is declarative and layered: + +- distribute the AI cluster using NixOS; +- use Nix flakes for packages; +- store flakes and configuration in Git; +- use a USB OS flake installer, with Ethernet installation as another path; +- install K3S through Nix flakes; +- use Orleans, Temporal TS, and Dapr Actors as distributed cron-like primitives; +- install ArgoCD with K3S; +- use Cilium with Hubble, kube-proxy replacement, Hubble Relay, Hubble UI, and BPF masquerade; +- install core platform components through ArgoCD; +- use GitLab or Forgejo after the bootstrap phase; +- use Argo Workflows and Argo Rollouts; +- use Nix flakes for host-local storage, Docker, GPU passthrough, and GPU device plugins; +- use Longhorn, CockroachDB, Hindsight, Oz/Warp orchestration, OpenZiti transport, Hermes, local model serving, observability, NATS, Redis, Weaviate, Loki, Tempo, Alloy, Mimir, OPA, secrets tooling, and policy layers as cluster capabilities. + +Current design clarifications captured elsewhere still apply: + +- Cilium must come before ArgoCD when K3S default networking is disabled. +- CockroachDB is the Organization source of truth. +- Hindsight is the Hermes memory provider. +- Oz is the Warp-style orchestration layer for Hermes runs. +- OpenZiti is transport/connectivity and should not be conflated with Oz orchestration. +- Warp is not a separate active app if Oz owns the orchestration role. +- Istio is removed from the active stack because Cilium Service Mesh owns that layer. +- Local Ollama/vLLM model serving is deferred while the current Hermes phase is cloud-oriented. + +The original mental model matters even where the active scaffold has changed. It explains why the Organization cares about desired state, tick sources, attention, memory, review, and self-building infrastructure. diff --git a/docs/agentic-organization/IMPLEMENTATION_CONCEPTS.md b/docs/agentic-organization/IMPLEMENTATION_CONCEPTS.md new file mode 100644 index 0000000000..42c30cbe39 --- /dev/null +++ b/docs/agentic-organization/IMPLEMENTATION_CONCEPTS.md @@ -0,0 +1,2232 @@ +# Hermes Organization Runtime - Implementation Concepts + +## Purpose + +This document turns the current Organization architecture into an implementation plan. + +It assumes the conceptual model from `ORGANIZATION_RUNTIME_ARCHITECTURE.md`: + +- Oz/Warp is the macro-orchestrator for distributed Hermes agent runs. +- OpenZiti is the secure transport/connectivity layer in the cluster context, not the workflow engine. +- Hermes agents are the reasoning and work layer. +- Hats are role/capability/policy assignments, not memory. +- Hindsight stores long-term memory with Organization-controlled hat attribution. +- The Organization Control Plane owns truth, policy, task management, meetings, votes, artifacts, reviews, and prioritization. +- k3s, Docker, Cilium Service Mesh, SPIRE, Vault, External Secrets, NATS/JetStream, MCP, credential proxy, and the Organization API form the execution and communication substrate. + +## Implementation Principle + +The platform should be agentic, but the state transitions should be explicit. + +Agents propose, request, discuss, vote, and perform work through tools. + +The Organization Control Plane validates, persists, authorizes, routes, and records. + +```text +Hermes agent intent + -> MCP tool call + -> Organization policy check + -> state transition / Oz run / NATS message / credential proxy request + -> persisted event and audit record +``` + +Avoid implementing a huge hard-coded corporation. Build small primitives that let the corporation operate: + +- hats; +- assignments; +- projects; +- initiatives; +- tasks; +- reports; +- meetings; +- votes; +- artifacts; +- reviews; +- inbox/outbox; +- memory attribution; +- Oz run bindings. + +## Bounded Contexts + +### Organization Identity + +Owns agents, hats, departments, hierarchy, and assignments. + +Core entities: + +- `Agent` +- `Hat` +- `HatAssignment` +- `Department` +- `DepartmentDirectorAssignment` +- `ExecutiveAssignment` +- `AgentSpecialtyProfile` +- `HatSupplyPolicy` + +Primary services: + +- `AgentRegistryService` +- `HatRegistryService` +- `HatAssignmentService` +- `HierarchyService` +- `AgentFitService` + +### Authorization and Policy + +Owns short-lived hat authorization, RBAC, OPA policy, and tool scope checks. + +Core entities: + +- `HatToken` +- `PolicyVersion` +- `ToolPermission` +- `CredentialScopeGrant` +- `MemoryScopeGrant` +- `AuthorizationAuditEvent` + +Primary services: + +- `HatTokenService` +- `PolicyDecisionService` +- `ToolAuthorizationService` +- `CredentialScopeService` + +### Work Management + +Owns projects, portfolios, initiatives, missions, work items, tasks, backlog, defects, and service requests. + +Core entities: + +- `Project` +- `Portfolio` +- `Initiative` +- `Mission` +- `WorkItem` +- `Task` +- `Subtask` +- `BacklogItem` +- `Defect` +- `ServiceRequest` +- `Dependency` +- `Blocker` + +Primary services: + +- `ProjectService` +- `InitiativeService` +- `TaskBoardService` +- `BacklogService` +- `DefectTriageService` +- `ServiceRequestService` + +### Meetings and Communication + +Owns messages, reports, inboxes, meetings, conversation modes, broadcasts, and escalations. + +Core entities: + +- `Inbox` +- `InboxMessage` +- `Report` +- `Thread` +- `Meeting` +- `MeetingParticipant` +- `ConversationMode` +- `Broadcast` +- `Escalation` + +Primary services: + +- `InboxService` +- `ReportService` +- `MeetingService` +- `ThreadService` +- `EscalationService` +- `NatsEventBridge` + +### Governance and Decisions + +Owns votes, approvals, reviews, gates, standards, and executive/director decisions. + +Core entities: + +- `Vote` +- `VoteBoard` +- `Decision` +- `Gate` +- `Review` +- `OutcomeReview` +- `PerformanceReview` +- `Standard` +- `PolicyChangeRequest` + +Primary services: + +- `VotingService` +- `GateService` +- `ReviewService` +- `PerformanceReviewService` +- `StandardService` + +### Agent Runtime + +Owns Hermes session containers, Oz run mappings, agent execution state, and runtime health. + +Core entities: + +- `AgentSession` +- `OzRunBinding` +- `RuntimeLease` +- `ContainerSpec` +- `RunArtifact` +- `RunLogPointer` + +Primary services: + +- `OzAdapter` +- `HermesSessionService` +- `RuntimeLeaseService` +- `RunStatusService` + +### Memory + +Owns the adapter around Hindsight and the Organization-specific attribution model. + +Core entities: + +- `MemoryAttribution` +- `MemoryActivation` +- `MemoryScope` +- `MemoryAdaptationRequest` +- `MemoryReview` + +Primary services: + +- `MemoryAdapter` +- `MemoryScopeService` +- `MemoryAttributionService` +- `MemoryReviewService` + +### Artifacts and Evidence + +Owns BRDs, CAs, screenshots, logs, traces, red/green test evidence, QA reports, review artifacts, and delivery evidence. + +Core entities: + +- `Artifact` +- `ArtifactRequirement` +- `EvidencePackage` +- `BRD` +- `CA` +- `TestEvidence` +- `QaReport` +- `DeliveryEvidence` + +Primary services: + +- `ArtifactService` +- `EvidenceService` +- `DocumentArtifactService` +- `QaEvidenceService` + +### Documentation and Project Skills + +Owns project-scoped documentation, architecture records, repo/project skills, graph ingestion, and lifecycle enforcement. + +Core entities: + +- `DocumentationArtifact` +- `DocumentationScope` +- `ADR` +- `DesignDoc` +- `BRD` +- `CA` +- `ProjectSkill` +- `SkillFrontmatter` +- `SkillGraphEdge` +- `DocumentationRequirement` + +Primary services: + +- `DocumentationService` +- `AdrService` +- `DesignDocService` +- `ProjectSkillService` +- `SkillGraphIngestionService` +- `DocumentationContextService` + +## Suggested Backend Shape + +Use NestJS as the Organization Control Plane shell. + +Recommended modules: + +```text +OrganizationModule + IdentityModule + PolicyModule + WorkModule + MeetingsModule + GovernanceModule + RuntimeModule + MemoryModule + ArtifactsModule + DocumentationModule + ProjectSkillsModule + McpGatewayModule + ReportsModule + SchedulingModule + ObservabilityModule +``` + +Each module should expose internal services and MCP-facing tool handlers. + +MCP handlers should be thin: + +```text +validate request shape + -> authenticate hat token + -> authorize tool + -> call domain service + -> persist event/audit + -> return structured result +``` + +## Source of Truth + +Pick one primary database at the start. + +Recommendation: use CockroachDB first for Organization-owned state. The Organization needs relational queries across projects, initiatives, tasks, agents, hats, assignments, votes, reviews, audit events, outbox records, and projections, and CockroachDB fits the cluster's distributed SQL direction. + +Use JSON columns for flexible agent/tool payloads where needed, but keep core lifecycle state typed. + +NATS/JetStream is not the source of truth. It is the event and message transport. + +Oz metadata is not the source of Organization truth. It is the source for run lifecycle and execution artifacts. + +Hindsight is not the source of execution truth. It is long-term memory. + +## Core Tables + +Initial relational model: + +```text +agents +hats +hat_edges +hat_supply_policies +hat_assignments +hat_assignment_tokens +agent_hat_performance +departments +department_director_assignments +projects +portfolios +initiatives +missions +work_items +tasks +task_dependencies +blockers +backlog_items +defects +service_requests +reports +inboxes +inbox_messages +messages +threads +meetings +meeting_participants +meeting_decisions +votes +decisions +reviews +gates +artifacts +artifact_links +artifact_requirements +documentation_artifacts +documentation_scopes +adrs +design_docs +project_skills +skill_graph_edges +documentation_requirements +memory_attributions +memory_scope_rules +memory_adaptation_requests +oz_run_bindings +runtime_leases +teams +team_members +credential_scope_grants +credential_scope_requests +capability_requests +capability_request_reviews +capability_implementations +credential_proxy_endpoint_requests +credential_proxy_endpoint_registry +workflow_capability_requests +workflow_registry +actor_capability_requests +actor_registry +tool_audit_events +audit_events +trace_records +span_records +metric_observations +health_reports +anomaly_reports +self_healing_attempts +observability_coverage +organizational_rules +rule_evaluations +reaction_plans +durable_triggers +trigger_executions +trigger_checkpoints +outbox_events +worker_heartbeats +dead_letter_messages +dead_letter_investigations +replay_requests +quarantine_decisions +discard_decisions +watcher_checkpoints +reconciliation_findings +slo_definitions +slo_measurements +incident_reports +incident_assignments +runbook_skills +human_overrides +standards +scheduled_jobs +``` + +Keep these concepts distinct: + +- `BacklogItem`: uncommitted potential work. +- `WorkItem`: planned deliverable unit of value. +- `Task`: executable assignment for one or more hats/agents. +- `Mission`: temporary coordinated delivery grouping under an initiative. +- `OzRunBinding`: external execution run linked back to Organization-owned work. + +Core lifecycle/status fields should use enums, not loose strings: + +- project status; +- initiative status; +- work item status; +- task status; +- report type/status; +- review verdict; +- gate type/status; +- vote status/decision; +- meeting type/status; +- inbox type; +- hat assignment status; +- artifact type; +- memory visibility; +- trace/span status; +- metric type; +- health status; +- anomaly type/status; +- self-healing action/status; +- observability coverage status; +- rule evaluation status; +- reaction plan status; +- trigger type/status; +- dead-letter status; +- incident severity/status; +- SLO burn status; +- human override status; +- capability request type/status; +- capability review verdict; +- workflow registry status; +- actor registry status; +- credential proxy endpoint status. + +## Event Model + +Every important state transition should produce a domain event. + +Examples: + +```text +HatAssigned +HatTokenRefreshed +HatDeprovisioned +ProjectCreated +InitiativeCreated +TpmAssigned +TaskCreated +TaskMarkedReady +RedTestsSubmitted +GreenTestsSubmitted +ReviewRequested +ReviewApproved +QaIssueStillReproducible +QaSignedOff +MeetingOpened +VoteSubmitted +DecisionRecorded +MemoryAdaptationRequested +OzRunStarted +OzRunCompleted +CredentialScopeApproved +CapabilityRequestSubmitted +CapabilityRequestTriaged +CapabilityRequestApproved +CapabilityRequestRejected +CredentialProxyEndpointRequested +CredentialProxyEndpointApproved +WorkflowCapabilityRequested +WorkflowRegistered +ActorCapabilityRequested +ActorRegistered +McpToolCalled +PolicyEvaluated +TraceLinked +HealthReportCreated +AnomalyDetected +SelfHealingAttempted +SelfHealingSucceeded +SelfHealingEscalated +ObservabilityGapDetected +OrganizationalRuleMatched +OrganizationalRuleSkipped +ReactionPlanCreated +ReactionPlanExecuted +DurableTriggerFired +ScheduledJobClaimed +RuntimeLeaseAcquired +RuntimeLeaseExpired +DeadLetterCreated +DeadLetterReplayRequested +IncidentOpened +IncidentCommanderAssigned +SloErrorBudgetBurned +``` + +Events should include: + +- event ID; +- organization ID; +- project/initiative/task context when available; +- agent ID; +- active hat assignment ID; +- Oz run ID when available; +- timestamp; +- payload; +- correlation ID; +- causation ID; +- trace ID; +- span ID; +- policy version. + +## MCP Tool Implementation Pattern + +Every MCP tool should follow the same pattern. + +```text +1. Parse arguments. +2. Validate hat token. +3. Refresh/deny if token expired. +4. Resolve AgentSessionActor using session ID. +5. Load actor runtime context. +6. Load active HatAssignment from Organization state. +7. Build ToolExecutionContext. +8. Evaluate RBAC and OPA policy. +9. Validate domain preconditions. +10. Execute state transition or request. +11. Persist domain event and audit. +12. Emit NATS event if other agents should know. +13. Record tool activity back to AgentSessionActor. +14. Return structured result. +``` + +The MCP Gateway must not trust request-provided context as authority. Request context is only a lookup hint. The authoritative execution context comes from: + +- hat JWT validation; +- `AgentSessionActor`; +- active `HatAssignment`; +- Organization DB state; +- policy engine; +- tool-specific domain checks. + +### Actor-Resolved Tool Execution Context + +Every protected MCP tool should execute with a `ToolExecutionContext`. + +```text +ToolExecutionContext + agent_id + session_id + actor_id + hat_id + hat_assignment_id + department_id + project_id + initiative_id + task_id + team_id + meeting_id + oz_run_id + current_mode + memory_scopes + credential_scopes + allowed_tool_scopes + policy_version + trace_id + correlation_id + causation_id +``` + +`AgentSessionActor` should own live runtime context: + +```text +agentId +sessionId +activeHatAssignmentId +currentTaskId +currentTeamId +currentMeetingId +currentOzRunId +currentProjectId +currentInitiativeId +memoryScopes +credentialScopes +allowedToolScopes +policyVersion +lastHeartbeat +currentMode +``` + +Actor methods: + +```text +getRuntimeContext +recordHeartbeat +recordToolCallStarted +recordToolCallCompleted +setCurrentTask +setCurrentTeam +setCurrentMeeting +setMode +markRoleless +``` + +For task, meeting, team, or incident-scoped tools, the gateway may also query `TaskActor`, `MeetingActor`, `TeamRoomActor`, `IncidentActor`, or `HatSupplyActor` before policy evaluation. + +Actors provide hot session context. Organization DB remains authoritative for final state. + +Example: `assign_hat` + +```text +caller: Director or higher +checks: + - caller can assign target hat + - target agent exists + - hat supply available + - budget available + - agent is recommended or override reason supplied +effects: + - create HatAssignment + - issue short-lived token + - create MemoryActivation + - emit HatAssigned +``` + +Example: `complete_task` + +```text +caller: Implementer or owning hat +checks: + - task assigned to caller or caller has manager scope + - required red/green test evidence exists when required + - required artifacts exist +effects: + - task moves to code_review, not done + - review request is created + - reviewers are notified +``` + +Example: `qa_bounce_back` + +```text +caller: QA hat with review scope +checks: + - QA hat is assigned to task/release/project + - reproducibility report includes required evidence +effects: + - task moves to qa_reproducible or needs_rework + - defect/rework item is created or reopened + - owning TPM and Engineering Manager receive report +``` + +Example: `read_documentation_context` + +```text +caller: any active hat assigned to project/initiative/task scope +checks: + - caller has project/task visibility + - requested docs are within memory/documentation scope +effects: + - returns BRD, CA, ADRs, design docs, project skills, repo conventions, and required artifact expectations + - records DocumentationContextRead audit event +``` + +Example: `submit_adr` + +```text +caller: Architect, Architecture Reviewer, CTO, or approved Engineering Manager +checks: + - caller has architecture/documentation write scope + - ADR is linked to project/initiative/repo/service + - decision, context, options, consequences, and status are present +effects: + - creates ADR artifact + - links ADR to affected work + - opens architecture review gate if required + - emits AdrSubmitted +``` + +Example: `propose_project_skill` + +```text +caller: Engineering Manager, Memory Curator, QA Engineering Manager, Architect, or TPM +checks: + - caller has project scope + - frontmatter includes project/repository/allowed hats/owners/status/version + - skill does not grant tools or credentials outside hat policy +effects: + - creates ProjectSkill in proposed state + - schedules department review + - emits ProjectSkillProposed +``` + +Example: `submit_capability_request` + +```text +caller: any active hat +checks: + - caller has active assignment + - request is linked to project, initiative, task, incident, review, or department + - requested capability type is explicit + - evidence, limitation, expected benefit, and risk are present +effects: + - creates CapabilityRequest in submitted state + - routes to owning Engineering Manager or department manager + - emits CapabilityRequestSubmitted +``` + +Example: `review_capability_request` + +```text +caller: Engineering Manager, Department Director, Architect, Security Reviewer, Product Owner, or Executive depending on gate +checks: + - caller owns the required review gate + - request has required evidence and scope + - previous required gates are complete or explicitly waived by policy +effects: + - records CapabilityRequestReview + - advances request to next gate, implementation, backlog, initiative, rejected, or needs_clarification + - emits CapabilityRequestTriaged, CapabilityRequestApproved, or CapabilityRequestRejected +``` + +Example: `request_credential_proxy_endpoint` + +```text +caller: Engineering Manager, Security Manager, Director, or approved agent hat +checks: + - linked CapabilityRequest exists + - external API/system and data classification are documented + - requested operations, rate limits, audit requirements, and allowed hats are defined +effects: + - creates CredentialProxyEndpointRequest + - opens Security and Architecture gates + - emits CredentialProxyEndpointRequested +``` + +Example: `register_temporal_workflow` + +```text +caller: Platform Operator, Workflow Maintainer, Architect, or Director-approved Engineering Manager +checks: + - linked WorkflowCapabilityRequest exists + - workflow type, task queue, version, activities, signals, queries, and owners are documented + - deterministic workflow tests pass + - activities are idempotent and policy checked + - rollback/versioning plan is approved + - Security approval exists when workflow can launch agents, use credentials, or change protected state +effects: + - creates or updates WorkflowRegistry entry + - enables rule/trigger launch only for approved scopes + - emits WorkflowRegistered +``` + +MVP tool scope should be deliberately small. + +Start with tools that prove policy and state: + +```text +read_assignment_status +refresh_hat_token +send_report +read_inbox +submit_artifact +request_review +submit_review +read_run_status +``` + +Then add work-management tools: + +```text +create_project +create_backlog_item +convert_backlog_item +create_initiative +assign_hat +create_task +groom_task +mark_ready +complete_task +``` + +Defer wide tool catalogs until the first vertical slice proves authorization, audit, state transitions, and Hermes/Oz integration. + +## Hat Tokens + +Hat tokens are short-lived JWTs. + +Claims: + +```json +{ + "sub": "agent-id", + "hat_id": "hat-id", + "hat_assignment_id": "assignment-id", + "department_id": "department-id", + "project_id": "project-id", + "session_id": "session-id", + "oz_run_id": "oz-run-id", + "tool_scopes": ["task.write", "meeting.open"], + "memory_scopes": ["project:abc", "hat:engineering-manager"], + "credential_scopes": ["gitlab.read"], + "policy_version": "2026-05-25.1", + "exp": 1234567890, + "jti": "token-id" +} +``` + +Refresh flow: + +```text +refresh_hat_token + -> load assignment + -> verify assignment active + -> verify hat supply/budget still valid + -> verify no deprovision/revocation + -> issue new token or return roleless state +``` + +Do not rely only on JWT claims. Services must re-check active assignment for sensitive operations. + +## Agent Fit and Hindsight + +Hindsight should help rank agents for hats, but the Organization should decide assignments. + +Agent fit inputs: + +- memories attributed to the agent; +- memories attributed to the agent while wearing the target hat; +- project/domain memory; +- prior performance reviews; +- prior QA bounce-backs; +- cost/runtime history; +- review approval/rejection history; +- tool reliability; +- current workload. + +Initial tools: + +```text +rank_agents_for_hat +read_agent_specialties +read_agent_memory_profile +read_hat_performance_history +recommend_hat_assignment +``` + +Implementation approach: + +```text +Organization query + -> read structured performance data from CockroachDB + -> query Hindsight with metadata filters + -> combine into ranked recommendation + -> require director/executive approval for assignment +``` + +## Documentation Context and Project Skills + +Every Hermes run should receive a scoped documentation context before meaningful work begins. + +Context inputs: + +- project; +- initiative; +- mission; +- work item; +- task; +- repository; +- service or component; +- active hat; +- assigned agent; +- gate being executed. + +Context output: + +- BRDs; +- CAs; +- ADRs; +- design docs; +- product rules; +- repo conventions; +- project skills; +- required artifacts; +- known risks; +- relevant memories. + +Runtime flow: + +```text +task or agent run starts + -> DocumentationContextService resolves docs and skills + -> MemoryAdapter resolves scoped memories + -> Organization MCP Gateway exposes read_documentation_context + -> Hermes prompt receives concise context summary and artifact links + -> Hermes can fetch full docs, skills, and evidence through scoped tools +``` + +Reviewers receive the same context plus gate-specific checklists. QA hats receive acceptance criteria, product workflows, test strategy, known regression areas, and prior reproducibility reports. + +### Project Skill Files + +Project and repository skills should live in deterministic paths: + +```text +projects//skills//SKILL.md +projects//repos//skills//SKILL.md +``` + +Skill files should include frontmatter that is parseable by the ingestion pipeline: + +```yaml +id: repo-build-and-test +name: Repo Build and Test Workflow +scope: + project: project-id + initiative: optional-initiative-id + repositories: + - repo-name +departments: + - engineering +allowedHats: + - developer + - reviewer + - engineering-manager +triggers: + - task.ready + - review.requested +requiredTools: + - git.read + - test.run +requiredArtifacts: + - test-evidence +owners: + - engineering-manager-hat-id +status: active +version: 1 +``` + +Important graph edges: + +```text +Hat -> can_use -> Skill +Skill -> applies_to -> Project +Skill -> applies_to -> Repository +Skill -> references -> DocumentationArtifact +Skill -> informed_by -> Memory +Task -> used -> Skill +Agent -> succeeded_with -> Skill +Review -> failed_due_to_missing -> Skill +``` + +The first implementation can store these relationships in `skill_graph_edges`. A dedicated graph database can be introduced later if traversal across projects, memories, skills, hats, and outcomes becomes a bottleneck. + +### Documentation Enforcement Commands + +Documentation and skill lifecycle tools: + +```text +read_documentation_context +submit_brd +approve_brd +submit_ca +approve_ca +submit_adr +approve_adr +submit_design_doc +link_documentation_to_work +propose_project_skill +approve_project_skill +ingest_project_skill +read_project_skills +``` + +Guardrails: + +- implementers cannot start gated work without reading the documentation context; +- reviewers cannot approve gated work without linked documentation and gate evidence; +- architecture-risk work cannot pass without CA or ADR context; +- Product Owner hats sign off product and business documentation; +- Engineering Manager or department owner hats approve project and repo skills; +- skill ingestion cannot grant tools, credentials, memory scopes, or voting scope beyond the active hat policy. + +## Oz Integration + +Oz should be wrapped behind `OzAdapter`. + +Initial adapter operations: + +```text +createRun(spec) +cancelRun(runId) +getRunStatus(runId) +listChildRuns(parentRunId) +getRunArtifacts(runId) +getRunLogs(runId) +``` + +An Organization run spec should include: + +- Hermes profile; +- active hat assignment; +- project/initiative/task context; +- MCP gateway URL; +- NATS subject prefix; +- credential proxy URL; +- memory adapter URL; +- workspace/repo configuration; +- resource limits; +- budget metadata; +- parent run ID. + +Oz is allowed to run containers. Oz does not decide Organization state transitions. + +## NATS and Messaging + +Use NATS/JetStream for agent messages and report/event transport. + +Recommended subjects: + +```text +org..project..events +org..initiative..events +org..team..broadcasts +org..agent..inbox +org..hat..inbox +org..department..reports +org..executive.escalations +``` + +Use JetStream for durable inboxes, reports, and required task/review events. + +Use ephemeral NATS messages for live progress updates and UI streaming. + +Persist important messages in CockroachDB before or while publishing to NATS. + +Every NATS message should include: + +- message ID; +- idempotency key; +- correlation ID; +- causation ID; +- organization ID; +- source agent ID; +- source hat assignment ID; +- linked project/initiative/task when available; +- event type; +- schema version. + +NATS failure behavior: + +- if NATS is unavailable, protected state changes should still persist and enqueue an outbox record; +- a publisher worker should retry outbox delivery; +- consumers must be idempotent; +- poison messages should move to a dead-letter stream with a report to Operations/DevOps. + +## Meetings + +Meetings should be implemented as first-class entities, not only chat transcripts. + +Meeting state: + +```text +requested +scheduled +open +in_discussion +in_vote +decision_recorded +closed +cancelled +``` + +Meeting fields: + +- purpose; +- organizer hat assignment; +- participants; +- hierarchy scope; +- conversation mode; +- linked project/initiative/task; +- agenda; +- transcript; +- decisions; +- votes; +- artifacts; +- memory outputs. + +Conversation modes should be enforced by the Meeting Service at the turn-routing level. + +First MVP can support: + +- leader-led; +- round-robin; +- vote-driven. + +Add pass-the-stick and reviewer-panel later. + +## Workflow State Machines + +Every workflow state change should be implemented as a permissioned command. + +Each transition should define: + +- source states; +- destination state; +- required hat authority; +- required artifacts/evidence; +- blocking conditions; +- emitted events; +- inbox/report destinations; +- escalation path on failure. + +Transition table format: + +```text +Command: mark_task_ready +Source states: discovery, intake +Destination: ready +Required hat: Engineering Manager or TPM with task scope +Required artifacts: acceptance criteria, required hats, risk, memory context +Blocks when: missing BRD/CA gate for gated work, unresolved blocker, no owner +Events: TaskMarkedReady +Inbox: assigned TPM, owning Engineering Manager +Escalation: Director if task cannot be readied due to missing hats/budget +``` + +### Service Request + +```text +submitted + -> classified + -> needs_clarification + -> triaged + -> backlog_item_created + -> initiative_candidate + -> closed +``` + +### Defect + +```text +reported + -> reproducing + -> reproducible + -> prioritized + -> assigned + -> red_test_written + -> fixed + -> code_review + -> qa_review + -> qa_reproducible + -> needs_rework + -> qa_signed_off + -> done +``` + +### Initiative + +```text +proposed + -> executive_triage + -> discovery + -> business_approved + -> architecture_approved + -> planned + -> active + -> delivery_review + -> qa_signoff + -> released + -> complete +``` + +### Hat Assignment + +```text +requested + -> approved + -> active + -> refreshing + -> expired + -> deprovisioned + -> revoked +``` + +### Meeting + +```text +requested + -> scheduled + -> open + -> decision_pending + -> decision_recorded + -> closed +``` + +## Gate Contracts + +Gates are enforceable contracts, not advice. + +Each gate should define: + +- approving hat; +- required input artifacts; +- required evidence; +- pass criteria; +- failure criteria; +- rejection destination state; +- emitted events; +- audit requirements. + +### BRD Gate + +Approving hats: + +- Product Owner; +- Business Approver. + +Required artifacts: + +- BRD; +- customer/user context or source report; +- open question list; +- acceptance criteria. + +Pass criteria: + +- requirements are understandable; +- business rules are documented; +- acceptance criteria are testable; +- Product Owner signs off. + +Failure destination: + +- `needs_clarification` +- `needs_business_approval` + +### CA Gate + +Approving hats: + +- Architecture Reviewer; +- Chief Architect for high-risk work. + +Required artifacts: + +- CA document; +- BRD or business context; +- current-system notes; +- risks and non-goals. + +Pass criteria: + +- integration boundaries are clear; +- design supports business intent; +- risks and constraints are documented; +- implementation scope is clear enough for planning. + +Failure destination: + +- `needs_architecture` + +### ADR / Documentation Gate + +Approving hats: + +- Architecture Reviewer; +- Engineering Manager for repo/process docs; +- Product Owner for product docs; +- Memory Curator for skill/memory-facing docs. + +Required artifacts: + +- linked ADR, design doc, BRD, CA, or documented no-doc decision; +- project/initiative/repo scope; +- owner; +- status; +- version. + +Pass criteria: + +- work has the correct project-scoped documentation context; +- structural decisions are recorded as ADRs; +- repo/project conventions are linked when relevant; +- reviewers and implementers can read the same source of truth; +- stale or missing docs are explicitly tracked. + +Failure destination: + +- `needs_documentation_update` +- `needs_architecture` +- `needs_business_approval` + +### TDD Gate + +Approving hats: + +- Engineering Manager; +- Code Reviewer may verify during review. + +Required artifacts: + +- red test artifact; +- command output proving the test failed before implementation; +- link to task acceptance criteria. + +Pass criteria: + +- failing test represents the scenario; +- test is strict enough to catch the defect or feature gap; +- test was produced before green implementation evidence. + +Failure destination: + +- `needs_rework` + +### Code Review Gate + +Approving hats: + +- Code Reviewer with project/task scope. + +Required artifacts: + +- diff summary; +- changed files; +- red/green test evidence; +- implementation notes; +- known risks. + +Pass criteria: + +- code satisfies task scope; +- tests pass; +- no unresolved review blockers; +- no unauthorized credential/tool changes. + +Failure destination: + +- `review_rejected` +- `needs_rework` + +### QA Gate + +Approving hats: + +- QA Reviewer. + +Required artifacts: + +- test run evidence; +- browser automation evidence when relevant; +- screenshots/traces/logs for user workflows; +- reproducibility report when issue persists. + +Pass criteria: + +- original issue is no longer reproducible; +- acceptance criteria pass; +- critical workflow evidence is attached. + +Failure destination: + +- `qa_reproducible` +- `needs_rework` + +### Delivery Gate + +Approving hats: + +- Delivery Reviewer; +- Release Operator when release impact exists. + +Required artifacts: + +- code review approval; +- QA signoff; +- merge/release evidence; +- linked artifacts. + +Pass criteria: + +- all required upstream gates are approved; +- release or merge action is auditable; +- owning TPM and initiative are updated. + +Failure destination: + +- `delivery_blocked` + +## Permission Matrix Starter + +The implementation should start with a table-driven permission model. + +Initial hats: + +| Hat | Assigns Hats | Approves Gates | Creates Tasks | Votes | Credential Scope | Memory Scope | +|---|---:|---:|---:|---:|---|---| +| Executive Board | yes, high-power | executive gates | yes | organization | policy-defined | organization | +| CEO | directors/executives | priority gates | yes | organization | limited by policy | organization | +| CTO | technical directors | CA/high-risk tech | yes | technical | technical scopes | technical/org | +| COO | operations directors | operating standards | yes | operations | ops scopes | operations/org | +| Department Director | TPMs/managers | department gates | yes | department | department scopes | department | +| TPM | team hats by initiative scope | initiative readiness | yes | initiative | initiative scopes | initiative | +| Engineering Manager | implementer/reviewer recommendations | TDD/readiness/outcome | yes | team/initiative | limited | team/project | +| Product Owner | no | BRD/product signoff | yes | product | none/default | product/project | +| Business Analyst | no | BRD draft readiness | yes | business | none/default | business/project | +| Architect | no | CA draft/readiness | yes | architecture | read-only technical | architecture/project | +| Implementer | no | no | limited/subtasks | task | task-limited | task/project | +| Code Reviewer | no | code review | no | review scope | read-only | task/project | +| QA Reviewer | no | QA signoff | no | QA scope | browser/test scopes | QA/project | +| Security Reviewer | no | credential/tool gates | yes | security | security scopes | security/org | +| Memory Curator | no | memory changes | yes | memory scope | none/default | memory/org | + +The final implementation should store this as policies, not hard-coded `if` statements. + +## Escalation Policies + +Escalations should be typed. + +Each escalation policy should define: + +- trigger; +- severity; +- recipient inbox; +- required chat/meeting mode; +- SLA/timeout; +- fallback authority; +- emitted events. + +Initial escalation policies: + +| Trigger | Recipient | Mode | Fallback | +|---|---|---|---| +| unclear requirements | Product Owner / BA | one-on-one or meeting | Director | +| missing BRD | Product Director | report | Executive Board | +| missing CA | Architecture Director | report | CTO | +| skipped TDD | Engineering Manager | report | TPM / Director | +| failed code review | Engineering Manager | thread | TPM | +| QA issue still reproducible | Engineering Manager + TPM | report + meeting | Director | +| blocked hat supply | Director | report | C-suite | +| credential request | Security Reviewer | report | Security Director / Executive Board | +| delivery risk | TPM + Delivery Director | meeting | Executive Board | +| executive priority conflict | Executive Board | executive-session | CEO vote/board vote | + +## Runtime Topology + +Local k3s topology: + +```text +k3s + organization-api + organization-mcp-gateway + postgres + nats-jetstream + credential-proxy-stub + memory-adapter-stub + oz-worker / oz-runner integration + hermes-session pods + Cilium Service Mesh, Gateway API, Hubble, and SPIRE workload identity +``` + +First runtime proof: + +```text +one goal + -> one Oz run + -> one Hermes container + -> one MCP call + -> one persisted artifact + -> one correlated trace +``` + +Deployment stages: + +- local k3s; +- shared staging; +- production-like cluster. + +Promotion criteria: + +- all protected MCP tools enforce hat authorization; +- Hermes containers receive no raw credentials; +- Oz run IDs map to Organization sessions; +- NATS messages are idempotent; +- artifacts are persisted and linked; +- traces connect API request, Oz run, Hermes session, MCP call, NATS event, and artifact write. + +## Component Contracts + +### Organization API + +Owns synchronous application APIs and internal services. + +Must provide: + +- CRUD for core Organization entities; +- state transition commands; +- policy checks; +- audit events; +- event outbox. + +### MCP Gateway + +Owns agent-facing tools. + +Must provide: + +- JWT validation; +- live hat assignment validation; +- OPA/RBAC policy checks; +- schema validation; +- domain service invocation; +- audit log; +- structured tool result. + +### Oz Adapter + +Owns integration with Oz. + +Must provide: + +- create run; +- cancel run; +- get run status; +- list child runs; +- fetch logs/artifacts; +- persist Oz run bindings. + +### Hermes Session Container + +Must receive: + +- Hermes profile; +- active hat assignment; +- MCP gateway URL; +- NATS subject prefix; +- credential proxy URL; +- memory adapter URL; +- Organization correlation IDs; +- resource limits. + +Must not receive: + +- broad raw credentials; +- unscoped memory access; +- unrestricted MCP tools. + +### Credential Proxy + +Must provide: + +- scoped token exchange; +- denied-operation reporting; +- audit events; +- revocation behavior; +- policy version checks. + +### Memory Adapter + +Must provide: + +- scoped memory recall; +- memory write attribution; +- metadata filtering; +- visibility enforcement; +- fallback behavior when Hindsight cannot satisfy a scoped query. + +## Failure and Recovery Matrix + +| Failure | Required Behavior | +|---|---| +| Oz unavailable | Persist requested run as pending, report to Operations/TPM, retry or escalate. | +| Oz run starts but no callback | Poll Oz status, mark run uncertain after timeout, escalate. | +| Hermes pod crashes | Mark AgentSession interrupted, preserve Oz logs, allow retry if assignment active. | +| NATS unavailable | Persist outbox event, retry publisher, do not lose Organization state. | +| NATS message replayed | Use idempotency key and ignore duplicate transition. | +| Hat token expired | Tool call returns refresh-required or roleless state. | +| Hat assignment deprovisioned | Deny protected calls, notify agent and owning manager. | +| Credential proxy denies request | Return structured denial, create security/report event if unexpected. | +| Memory adapter unavailable | Continue with explicit degraded-memory warning; block only if task requires memory gate. | +| Partial artifact write | Mark artifact incomplete, retry or require resubmission. | +| Stale assignment in JWT | Re-check Organization state and deny. | +| Policy version changed | Require token refresh and re-evaluate tool scope. | + +## Observability Contract + +Observability is a core runtime contract. Agents should build Organization infrastructure, project features, skills, and internal tools as observable systems by default. + +Every run should have a correlation chain: + +```text +Organization request + -> trace ID + -> span ID + -> command ID + -> event ID + -> Oz run ID + -> k3s pod ID + -> Hermes session ID + -> Hermes turn ID + -> MCP call ID + -> NATS message ID + -> credential proxy request ID + -> artifact ID +``` + +Every persisted record that can participate in work should carry correlation metadata: + +```text +trace_id +span_id +causation_id +correlation_id +request_id +command_id +event_id +project_id +initiative_id +task_id +agent_id +hat_id +hat_assignment_id +session_id +oz_run_id +pod_id +policy_version +``` + +Required logs: + +- state transition logs; +- MCP tool audit logs; +- hat token refresh/denial logs; +- Oz run lifecycle logs; +- NATS publish/consume logs; +- credential proxy allow/deny logs; +- artifact submission logs; +- prompt/run boundary logs; +- Hermes turn start/end logs; +- subagent spawn/stop logs; +- meeting message and vote logs; +- documentation context read logs; +- memory read/write logs with scope metadata; +- skill ingestion and skill usage logs; +- policy evaluation logs; +- gate evaluation logs; +- retry/backoff/dead-letter logs; +- self-healing attempt logs. + +Required traces: + +- user or Oz goal intake through work creation; +- hat assignment through token issuance; +- Hermes session launch through first MCP call; +- every MCP tool call through policy check, domain service, database write, event publish, and artifact write; +- NATS publish through consumer handling and idempotency decision; +- credential proxy request through policy decision and upstream call; +- memory query through Hindsight adapter filters and returned memory IDs; +- documentation context resolution through selected docs, skills, and gate requirements; +- QA run through browser automation, screenshots, traces, and reproducibility decision; +- self-healing detection through attempted fix, verification, and escalation. + +Required metrics: + +- active Hermes sessions by project, initiative, hat, agent, cluster, and pod; +- task lead time and cycle time by state; +- gate pass/fail rate by gate type and reviewer hat; +- QA reproducibility and bounce-back rate; +- review rejection reasons; +- memory hit/miss and memory usefulness feedback; +- skill usage and skill success/failure rate; +- policy allow/deny rate; +- credential proxy allow/deny rate; +- MCP tool latency and failure rate; +- Oz run queue time, run time, crash rate, retry rate, and orphan rate; +- NATS consumer lag, dead-letter count, replay count, and duplicate suppression count; +- budget and token usage by project, initiative, hat, agent, and run; +- self-healing success, failure, and escalation rate. + +Telemetry storage should be split by purpose: + +```text +Organization DB + authoritative state, state transitions, audit events, gate results + +Event store / outbox + append-only domain events and replayable integration events + +Trace backend + distributed traces and span attributes + +Log backend + structured logs and raw execution logs + +Metrics backend + time-series metrics and SLOs + +Artifact store + screenshots, browser traces, test output, reports, transcripts, run bundles + +Graph projection + queryable relationships between agents, hats, skills, docs, memories, tasks, runs, and outcomes +``` + +The UI should read Organization state as truth and use traces/logs/artifacts as evidence. Agents should never need to scrape logs to understand normal state, but logs and traces must be rich enough to debug every abnormal state. + +### Agent Observability Standard + +Every agent-created feature or internal tool must include an observability plan before it is considered ready. + +Minimum implementation checklist: + +- structured logs at lifecycle boundaries; +- trace spans around external calls, tool calls, policy checks, and long-running work; +- domain events for state transitions; +- metrics for throughput, latency, failures, and cost where relevant; +- artifact capture for human-reviewable evidence; +- correlation IDs passed through all service calls and NATS messages; +- UI-visible status and failure reasons; +- self-healing or escalation behavior for known failure modes. + +Reviewers should reject infrastructure work that cannot answer: + +```text +What happened? +Who or what caused it? +Which hat and policy allowed it? +Which project/initiative/task did it affect? +What evidence was produced? +What failed? +Was it retried? +Was it healed, escalated, or left blocked? +How would a future agent learn from this? +``` + +### Self-Healing Feedback Loop + +Self-healing should be evidence-driven, not magical. + +```text +anomaly detected + -> classify by known failure mode + -> inspect correlated traces/logs/artifacts/state + -> attempt approved remediation if policy allows + -> verify with explicit check + -> record self-healing result + -> create report/backlog item if unresolved or recurring + -> feed outcome into memory, skill, and performance review systems +``` + +Examples: + +- stuck Oz run creates a run-health report, retries when safe, escalates to DevOps if retry fails; +- repeated MCP timeout creates an infrastructure reliability report and links affected tasks; +- QA bounce-backs with the same cause create a proposed project skill or test tooling backlog item; +- repeated memory misses create a memory adaptation request; +- frequent policy denials create either a security review request or a skill documentation update. + +Required dashboards: + +- active Oz runs; +- active hat assignments; +- hat token denials; +- task state distribution; +- review queue; +- QA reproducible failures; +- NATS dead-letter stream; +- credential proxy denials; +- cost/budget by project/initiative/hat; +- trace error rate by component; +- MCP tool latency/failure rate; +- Hermes session crash/retry/orphan rate; +- self-healing attempts and outcomes; +- recurring failure modes by project/repo/hat; +- missing observability coverage by project and component. + +## Always-On Runtime Mechanics + +The Organization needs persistent workers that keep it operating when no Hermes agent is actively reasoning. + +Detailed mechanics live in [Always-On Orchestration Runtime](./ALWAYS_ON_ORCHESTRATION_RUNTIME.md). + +Initial control-plane workers: + +- scheduler worker; +- rule evaluation worker; +- reaction executor worker; +- outbox publisher worker; +- NATS consumer worker; +- Oz reconciler worker; +- k3s pod/session watchdog; +- lease reaper; +- dead-letter worker; +- trigger worker; +- anomaly classifier; +- budget and capacity worker; +- observability coverage worker. + +### Rules and Reactions + +State changes should be evaluated by explicit rules. + +```text +domain event + -> durable trigger or direct rule evaluation + -> matched OrganizationalRule records + -> deterministic ReactionPlan + -> policy, budget, lease, and hat supply checks + -> side effects executed + -> audit, trace, metrics, and resulting events recorded +``` + +Rules should create reaction plans, not perform side effects directly. + +Reaction actions can include: + +- state transition; +- hat reservation or assignment; +- Oz run request; +- message/inbox notification; +- meeting request; +- escalation; +- report creation; +- backlog item creation; +- self-healing attempt; +- no-op with recorded reason. + +### Durable Triggers + +Triggers should support: + +- event-based triggers; +- state-based triggers; +- state-timeout triggers; +- scheduled triggers; +- threshold triggers; +- external watcher triggers. + +Each trigger needs scope, owner, predicate, policy requirements, dedupe key, retry policy, cooldown, budget policy, enabled state, and version. + +### Runtime Leases + +Every scheduled job claim, reaction execution, watcher checkpoint write, dead-letter replay, and self-healing remediation should be protected by a runtime lease with a fencing token. + +Duplicate execution must be safe through both leases and idempotency keys. + +### Scheduler Semantics + +`scheduled_jobs` should additionally track: + +- timezone; +- jitter; +- last run time; +- locked until; +- max runtime; +- misfire policy; +- concurrency policy; +- catch-up policy; +- schedule version. + +Misfires should either skip, run once, catch up within a limit, or escalate. + +### Watchers and Reconcilers + +Watchers observe external systems. Reconcilers repair or report drift. + +Initial watchers and reconcilers: + +- Oz run watcher/reconciler; +- k3s pod/session watcher; +- NATS stream health watcher; +- credential proxy denial watcher; +- Hindsight memory health watcher; +- telemetry ingestion watcher; +- Git/CI watcher; +- documentation repository watcher. + +Reconciliation should detect pending runs not launched, orphaned pods, silent Hermes sessions, stale hat assignments, stuck outbox events, dead-letter growth, missing artifacts, and schedules not firing. + +### SLOs and Incidents + +Always-on operation needs SLOs and incident rules. + +Initial SLO categories: + +- Organization API; +- MCP gateway; +- Oz launch/callback; +- Hermes heartbeat; +- NATS publish/consume lag; +- scheduler lag; +- trigger lag; +- credential proxy; +- Hindsight adapter; +- telemetry ingestion; +- self-healing success/escalation latency. + +Incidents should have severity, commander assignment, communication cadence, mitigation, resolution, postmortem, and follow-up backlog items. + +### Runbooks as Skills + +Recurring remediation procedures should become versioned project or platform skills. + +Runbook skills should include: + +- detection signal; +- preconditions; +- allowed hats; +- approval class; +- remediation steps; +- verification; +- rollback; +- evidence requirements; +- owner; +- version. + +### Capability Expansion + +Agents can request new capabilities when existing hats, tools, workflows, skills, or credential scopes are insufficient. + +Capability request types: + +- MCP tool; +- project/repo skill; +- hat capability; +- credential proxy scope; +- credential proxy endpoint; +- external API integration; +- Temporal workflow; +- Dapr actor; +- durable trigger or scheduled job; +- observability/tooling improvement; +- QA/test tooling; +- runbook skill. + +Lifecycle: + +```text +submitted + -> manager_triage + -> director_prioritization + -> architecture_review when runtime/API/workflow impact exists + -> security_review when tools, credentials, data, or automation risk exists + -> product_review when user/customer behavior changes + -> approved_for_implementation + -> implemented + -> tested + -> reviewed + -> registered + -> active +``` + +Engineering Managers review team-level need and evidence. Department Directors decide whether the capability belongs in the department backlog or becomes an initiative. Security approves new credential proxy endpoints, credential scopes, external APIs, and dangerous automations. Architecture approves new Temporal workflows, Dapr actors, runtime workers, or cross-service integrations. + +Temporal workflow capability requests must define: + +- workflow type; +- owning department; +- allowed launch rules; +- task queue; +- activities; +- signals and queries; +- cancellation behavior; +- versioning/rollback plan; +- deterministic workflow tests; +- activity idempotency tests; +- policy and credential gates. + +Credential proxy endpoint requests must define: + +- external system/API; +- operations exposed; +- allowed hats; +- data classification; +- rate limits; +- audit events; +- expiry/review date; +- failure mode; +- test plan. + +Approved capabilities update the relevant registry: MCP tool registry, credential proxy endpoint registry, workflow registry, actor registry, skill graph, hat graph, or durable trigger catalog. + +### Dead-Letter Ownership + +Dead letters require a governed workflow: + +```text +dead-letter created + -> DLQ Steward assigned + -> classify poison/transient/schema/policy/duplicate + -> investigate linked trace and entity + -> replay, quarantine, discard, or create backlog item + -> record evidence and approvals +``` + +Discard and replay decisions must be auditable. + +## Acceptance Tests + +Initial implementation should include tests for: + +- cannot call protected tool without active hat token; +- expired hat token requires refresh; +- deprovisioned hat becomes roleless; +- implementer cannot mark task done directly; +- task cannot enter code review without red/green evidence when TDD required; +- QA bounce-back requires reproducibility evidence; +- Oz run binding is created when Hermes session starts; +- duplicate NATS event does not duplicate state transition; +- memory write requires hat attribution; +- credential proxy request is denied without approved scope; +- meeting decision records participant hats and vote evidence; +- durable trigger creates only one reaction plan for the same idempotency key; +- scheduler cannot double-claim a job under concurrent workers; +- stale runtime lease is reaped and fenced writes are rejected; +- rule conflict resolution follows deterministic precedence; +- dead-letter replay requires approval when side effects are possible; +- incident opens when SLO burn exceeds configured threshold; +- capability request cannot become active without required manager/director/security/architecture gates; +- credential proxy endpoint request cannot be registered without security approval and audit requirements; +- Temporal workflow registration requires deterministic workflow tests and idempotent activity tests; +- protected MCP tool call resolves `AgentSessionActor` before policy evaluation; +- MCP Gateway denies tool call when actor context and Organization DB hat assignment disagree; +- MCP Gateway records tool start/completion activity back to `AgentSessionActor`; +- task-scoped MCP tool checks task assignment through actor context and authoritative task state. + +## Scheduling + +Scheduled jobs should be owned by manager hats. + +Examples: + +- Engineering Manager schedules team reviews. +- QA Engineering Manager schedules regression suites. +- Memory Manager schedules memory quality reviews. +- DevOps Manager schedules pipeline health reviews. +- Security Manager schedules credential-scope audits. + +`scheduled_jobs` should include: + +- owner hat assignment; +- department/project scope; +- cadence; +- next run time; +- run policy; +- budget policy; +- output artifact expectations; +- escalation target. + +The scheduler should create work or reports, not bypass work management. + +## Guardrails + +Required guardrails: + +- no protected tool call without active hat assignment; +- no stale hat token authorization; +- no task moves to done directly from implementer; +- no QA signoff without evidence; +- no credential scope without security approval; +- no architecture-risk work without CA approval; +- no customer-facing ambiguous work without BRD/product signoff; +- no new high-power hat without Executive Board approval; +- no memory write without attribution; +- no Oz child run without Organization approval and run binding. + +## MVP Milestones + +### M0 - Documentation and Domain Slice + +Deliverables: + +- architecture doc; +- implementation concepts doc; +- initial entity model; +- tool inventory; +- MVP workflow selection. + +### M1 - Local Control Plane + +Build NestJS service with: + +- agents; +- hats; +- hat assignments; +- projects; +- initiatives; +- tasks; +- reports; +- simple inboxes; +- JWT hat token issue/refresh; +- MCP gateway skeleton. + +No Oz yet. Use local fake Hermes/Oz adapters. + +### M2 - Hermes Session Spike + +Build: + +- Hermes session container; +- Hermes runner adapter; +- one orchestrator Hermes profile; +- one worker Hermes profile; +- Organization MCP config. + +Prove: + +```text +orchestrator creates task +worker claims task +worker submits artifact +reviewer approves +task state advances +``` + +### M3 - Oz Integration + +Build: + +- OzAdapter; +- run bindings; +- parent/child run metadata; +- Oz-launched Hermes session; +- status sync. + +Prove: + +```text +Organization starts Hermes orchestrator through Oz +orchestrator requests worker run +Organization approves +Oz launches worker +worker reports back +``` + +### M4 - NATS Messaging + +Build: + +- durable inbox/outbox; +- team broadcasts; +- reports; +- escalation messages; +- live UI events. + +### M5 - Hindsight Memory Adapter + +Build: + +- memory query/write adapter; +- hat attribution metadata; +- agent fit recommendation MVP; +- memory adaptation request workflow. + +### M6 - Corporate Lifecycle MVP + +Build one end-to-end lifecycle: + +```text +service request + -> defect + -> TDD task + -> code review + -> QA reproducibility/signoff + -> outcome review + -> memory attribution +``` + +### M7 - Self-Building Loop + +Build: + +- performance reviews; +- backlog item from review; +- internal platform initiative creation; +- scheduled team review. + +### M8 - Cilium / SPIRE / Credential Proxy Hardening + +Build: + +- service identity; +- credential proxy scope checks; +- MCP gateway authz checks; +- audit trail; +- network policy. + +## First Implementation Bet + +The best first vertical slice is not the whole corporation. + +Build this: + +```text +Project + -> Initiative + -> Director assigns TPM + Engineering Manager + -> TPM creates task + -> Hermes worker runs through Oz/local adapter + -> reviewer approves + -> QA reports still reproducible or signs off + -> manager creates outcome review + -> memory attribution is recorded +``` + +This tests the smallest version of: + +- hierarchy; +- hats; +- MCP; +- Oz/Hermes runtime; +- task management; +- review; +- QA; +- memory; +- performance loop. diff --git a/docs/agentic-organization/IMPLEMENTATION_READINESS_CHECKLIST.md b/docs/agentic-organization/IMPLEMENTATION_READINESS_CHECKLIST.md new file mode 100644 index 0000000000..a0e65e4f96 --- /dev/null +++ b/docs/agentic-organization/IMPLEMENTATION_READINESS_CHECKLIST.md @@ -0,0 +1,493 @@ +# Implementation Readiness Checklist + +This checklist defines what still needs to be decided before implementation begins. The goal is to avoid designing forever while still defining the contracts that would be painful to change after code exists. + +## Start Condition + +Implementation can begin once we define: + +- the first MVP slice; +- the initial source-of-truth database choice; +- the first state machines; +- the first hat graph seed; +- the first MCP tool contracts; +- the first UI surfaces; +- the first runtime integration boundary for Hermes/Oz. + +Everything else can evolve through the Organization itself. + +## 1. MVP Slice + +Define the first end-to-end workflow we will build. + +Recommended first slice: + +```text +ambiguous internal capability request + -> requirement maturity / discovery + -> BRD/product signoff + -> CA/design review + -> initiative/task creation + -> hat assignment + -> Hermes/Oz run + -> implementation evidence + -> code review gate + -> QA/evidence gate + -> release/activation + -> outcome review +``` + +Need to decide: + +- exact example feature; +- which departments participate; +- which hats are active; +- which gates are required; +- which steps are simulated versus real in v0. + +## 2. Application Boundary + +Define what the first app is. + +Need to decide: + +- app name; +- repo/package location; +- whether this is a new app under `agentic-team/packages` or a separate top-level workspace; +- whether frontend and backend live together at first; +- whether initial deployment target is local Docker Compose, k3s, or both. + +Recommendation: + +- start as a new Hermes Organization app, separate from dev-portal; +- use dev-portal/TPM only as reference and selective extraction source; +- build modular monolith first, with clear boundaries for later service extraction. +- use a TypeScript monorepo with `apps/api`, `apps/web`, `apps/workers`, `apps/temporal-worker`, `apps/dapr-actors`, and shared `packages/*` as defined in the build plan. + +## 3. Source of Truth + +Pick the first database and transaction model. + +Need to decide: + +- CockroachDB deployment topology and local development shape; +- migration tool; +- event/outbox strategy; +- read-model/projection strategy; +- audit retention model. + +Recommendation: + +- CockroachDB first for Organization-owned state; +- Drizzle ORM for typed schema and migrations against CockroachDB's PostgreSQL-compatible interface; +- transactional outbox for signals; +- append-only audit events; +- read models for boards and UI projections. + +## 4. Core Domain Model V0 + +Define the first tables/collections. + +Must include: + +- departments; +- hat definitions; +- hat assignments; +- hat supply policies; +- agents; +- agent sessions; +- projects; +- initiatives; +- work items; +- requirement maturity records; +- gates; +- gate decisions; +- assignments; +- releases; +- artifacts; +- signals; +- audit events; +- outbox events. + +Can defer: + +- full performance reviews; +- full department reviews; +- complex voting; +- full workflow registry; +- full actor registry; +- advanced memory analytics. + +## 5. State Machines + +Define v0 states and legal transitions. + +Need to lock down: + +- requirement maturity state machine; +- work item state machine; +- assignment state machine; +- gate state machine; +- release state machine; +- hat token state machine. + +Important rule: + +- state transitions must be service-owned and policy-checked, not arbitrary field updates. + +## 6. Hat Graph Seed + +Define the first hats that can exist in v0. + +Recommended v0 hats: + +- Executive Board Member; +- Product Owner; +- Customer Interviewer; +- Business Analyst; +- Business Approver; +- Architect; +- Architecture Reviewer; +- TPM; +- Engineering Manager; +- Implementer; +- Code Reviewer; +- QA Reviewer; +- Release Manager; +- Security Reviewer; +- Memory Curator; +- Platform Operator; +- Hat Designer. + +Need to define for each: + +- allowed MCP tools; +- approval scopes; +- memory scopes; +- credential scopes; +- assignable-by rules; +- token TTL; +- max concurrent assignments; +- lifecycle transitions allowed. + +## 7. Policy Model + +Define how authorization works before MCP tools exist. + +Need to decide: + +- hard-coded policy first or OPA from day one; +- policy file format; +- policy test strategy; +- how policy explains denials; +- how emergency override works; +- which actions require two-person or human approval. + +Recommendation: + +- start with typed policy checks in code plus structured policy metadata; +- design so OPA can be added without rewriting domain services. + +## 8. MCP Tool Surface V0 + +Define first tool contracts. + +Minimum tool families: + +- goal/intake tools; +- requirement discovery tools; +- BRD tools; +- architecture tools; +- task/work tools; +- hat assignment tools; +- review/gate tools; +- artifact/evidence tools; +- messaging/inbox tools; +- status/read tools; +- Hermes/Oz run tools. + +Need to define: + +- request/response schemas; +- actor context requirements; +- policy checks; +- state transition effects; +- emitted signals; +- audit fields. + +## 9. Hermes/Oz Boundary + +Define exactly how the Organization launches and tracks Hermes work. + +Need to decide: + +- how Organization requests an Oz run; +- how run ID maps to work item, team, agent, hat assignment, and session; +- how logs/artifacts return; +- how heartbeat works; +- how child runs are requested; +- how cancellation/reassignment works; +- what happens when Oz run succeeds but Organization update fails, or vice versa. + +Important naming decision: + +- in cluster context, OZ has been clarified as OpenZiti; +- in Organization runtime docs, Oz has also been used as the macro agent-run orchestrator; +- before coding adapters, decide whether these are the same component, two components, or a naming collision. + +Implementation should not blur OpenZiti transport with Organization run orchestration. + +## 10. Actor Boundary + +Define what is actor-backed in v0 versus plain DB service. + +Recommended v0 actor candidates: + +- `AgentSessionActor`; +- `HatSupplyActor`; +- `AgentMailboxActor`; +- `TeamRoomActor`; +- `OzRunActor`. + +Can start with in-process interfaces and fake actor implementations if Dapr is deferred. + +Need to define: + +- actor method contracts; +- persistence ownership; +- idempotency rules; +- timeout behavior; +- reconciliation behavior. + +## 10A. Cluster-Native Hat Boundary + +Define whether the first implementation treats Kubernetes CRDs as: + +- deferred future enforcement; +- generated read-only projections from Organization DB; +- live enforcement for active hat bindings; +- or a bidirectional proposal surface where cluster changes can request Organization approval. + +Need to decide: + +- whether `Hat`, `HatBinding`, `HatPolicy`, and `HatSwap` are part of v0 or v1; +- which hat fields are source-authored in Organization DB versus CRDs; +- which OPA constraints are required before live cluster assignment; +- whether hat graph rendering is built early for debugging/policy authors; +- how `HatSwap` maps to Organization signals and audit events; +- how warmup, cooldown, sticky attribution, succession, and reputation map into the assignment state machine; +- whether the hat operator is enforcement-only or can create Organization work requests. + +Recommendation: + +- model the boundary now; +- implement Organization DB assignment first; +- project to CRD/OPA enforcement once the first assignment state machine is stable. + +## 11. Scheduling and Hat-Owned Cadences + +Define the first schedules that create work for role hats. + +V0 schedules: + +- TPM initiative movement review; +- Engineering Manager execution/readiness review; +- review queue review; +- QA flow review; +- release readiness review; +- hat supply and budget review; +- blocker triage routine. + +Need to define: + +- cadence; +- owner hat; +- generated report; +- generated queue item; +- escalation path; +- which parts are automatic versus hat-decided. + +## 12. UI V0 + +Define the first screens before backend shape hardens. + +Recommended v0: + +- Organization map; +- Work board; +- Requirement maturity board; +- Hat assignment/supply view; +- Role workspace; +- Review center; +- Release board; +- Evidence timeline; +- Run/session view; +- Signal/event feed. + +Need to decide: + +- human user roles; +- whether agents also consume the same UI projections through MCP; +- live update mechanism; +- first dashboard read models. + +## 13. Observability Contract + +Define what every action emits. + +Need to standardize: + +- correlation ID; +- causation ID; +- trace ID; +- agent ID; +- hat assignment ID; +- work item ID; +- project/initiative scope; +- tool call ID; +- run ID; +- policy decision; +- artifact links. + +No implementation should enter v0 without these fields being easy to attach. + +## 14. Memory/Hindsight Contract + +Define the minimum memory integration. + +Need to decide: + +- whether to fork Hindsight now or wrap it first; +- memory event schema; +- hat-attributed writes; +- scoped recall; +- memory visibility rules; +- how memories are linked to work items, projects, hats, and outcome reviews. + +Recommendation: + +- wrap first unless Hindsight cannot support attribution cleanly; +- fork only when the wrapper cannot enforce scope or metadata. + +Cluster context to preserve: + +- Hindsight is the real Hermes memory provider, not a placeholder; +- Hindsight is available as the `vectorize-io/hindsight` OCI Helm chart at `ghcr.io/vectorize-io/charts/hindsight`; +- the current target chart version is `0.3.0`; +- Hermes can point at the in-cluster service through `HINDSIGHT_URL=http://hindsight.hindsight.svc.cluster.local`; +- Hindsight automatically recalls relevant context before LLM calls, retains conversations, and exposes retain/recall/reflect tools; +- memory storage is precious and should not be pruned by default; +- secrets should be Vault-backed or equivalent, with no plaintext API keys in Git; +- bundled Postgres is acceptable for Hindsight bootstrap, but Organization-owned state uses CockroachDB and the long-term memory store should move to external CockroachDB if supported. + +Need to decide before implementation: + +- whether Hermes calls Hindsight directly, through an Organization memory adapter, or through a scoped sidecar/proxy; +- how hat assignment metadata is attached to every recall, retain, and reflect operation; +- how Hindsight health, recall latency, and memory write failures become Organization signals; +- what migration path exists from bundled Hindsight Postgres to external CockroachDB. + +## 14A. Cluster Execution Contract + +Define the first cluster session contract. + +Need to decide: + +- how Oz requests a k3s-backed Hermes session container; +- whether bubblewrap-style sandboxing is included in v0 or modeled as a required boundary for v1; +- which endpoints the session receives: MCP Gateway, Hindsight, NATS, Credential Proxy; +- how service account, mesh identity, Organization session ID, agent ID, hat assignment ID, and work item ID are correlated; +- how Cilium policy, SPIRE workload identity, Trust Manager CA bundles, and External Secrets synced secrets are represented in runtime context; +- how hat expiry or revocation terminates tool/credential authority in a running container; +- which runtime events are mandatory: pod ready, sandbox started, Hermes heartbeat, tool call, memory recall, credential proxy use, artifact upload; +- how logs, traces, screenshots, and artifacts link back to work items. + +Cluster scaffold context to preserve: + +- cluster scaffold is split into `usb-nixos-installer/` and `full-ai-cluster/`; +- k3s disables default networking so Cilium must be installed before ArgoCD; +- Cilium, cert-manager, Vault, SPIRE, Trust Manager, External Secrets, then ArgoCD is the security/bootstrap order; +- GitLab is default-on; Forgejo is a manual-sync alternative; +- local model serving through Ollama/vLLM and local coder models is deferred/manual; +- Warp is removed from the current stack; +- Hermes is custom and cloud-oriented for the current phase; +- Hindsight is the persistent memory system for Hermes; +- OpenZiti is the clarified OZ component in the cluster context. + +## 15. Package Boundaries + +Define initial modules. + +Recommended backend modules: + +- `organization-kernel`; +- `hat-graph`; +- `work-os`; +- `requirements`; +- `gates`; +- `assignment`; +- `messaging`; +- `documents`; +- `release-management`; +- `mcp-gateway`; +- `oz-runtime`; +- `memory-adapter`; +- `observability`; +- `ui-projections`. + +Recommended frontend areas: + +- organization map; +- boards; +- role workspace; +- reviews; +- releases; +- runtime; +- evidence; +- admin/policy. + +## 16. Test Strategy + +Define test expectations before coding. + +Minimum tests: + +- state transition unit tests; +- policy denial/approval tests; +- MCP tool contract tests; +- assignment race tests; +- hat token refresh/revocation tests; +- outbox/signal tests; +- requirement maturity gate tests; +- review self-approval block tests; +- release readiness tests; +- Oz run binding reconciliation tests. + +## 17. Definition of Done for MVP + +MVP is done when: + +- a vague request can become a structured requirement; +- a BRD and CA can gate readiness; +- work can be created, assigned, reviewed, QA checked, and released; +- hats are issued and revoked reliably; +- Hermes/Oz runs are bound to work and visible; +- every state change emits signals and audit events; +- humans can see work, hats, reviews, releases, and evidence in the UI; +- one outcome review creates follow-up work or explicitly decides none is needed. + +## Things We Should Not Define Too Early + +Defer: + +- full corporate hierarchy perfection; +- every possible hat; +- every possible MCP tool; +- full Temporal workflow catalog; +- full Dapr actor implementation; +- complete Hindsight fork; +- advanced performance review system; +- complex executive voting rules; +- multi-cluster production topology. + +These should be built by the Organization once v0 can safely create governed internal work. diff --git a/docs/agentic-organization/ORGANIZATION_LAYER_BUILD_PLAN.md b/docs/agentic-organization/ORGANIZATION_LAYER_BUILD_PLAN.md new file mode 100644 index 0000000000..2d27bc3bd9 --- /dev/null +++ b/docs/agentic-organization/ORGANIZATION_LAYER_BUILD_PLAN.md @@ -0,0 +1,677 @@ +# Organization Layer Build Plan + +This document describes how to build the Organization layer that makes departments and hats operational. The hat inventory defines who can exist. This build plan defines the environment, automation, runtime state, and feedback loops that let each hat actually perform its role. + +The Organization layer should behave like a deterministic operating system for Hermes agents. It should not hard-code every business behavior. It should provide enough structure that agents can govern work, communicate, request capabilities, run teams, review outputs, improve memory, and expand the platform without bypassing policy. + +## Core Thesis + +Each hat needs four things: + +1. Authority: what it can read, change, approve, vote on, spawn, and request. +2. Workspace: the queues, inboxes, documents, dashboards, and tools that make the role easy to perform. +3. Cadence: the triggers, schedules, review loops, and escalation timers that keep the role active. +4. Feedback: outcome reviews, traces, metrics, memories, and backlog pathways that let the role improve the Organization. + +The Organization layer exists to provide those four things consistently. + +## TypeScript Application Stack + +Build the Organization as a TypeScript monorepo with a modular-monolith core and separately deployable runtime processes. + +Recommended stack: + +| Layer | Choice | Purpose | +|---|---|---| +| Monorepo | `pnpm` workspaces with Turborepo or Nx | Shared packages, isolated apps, incremental builds, CI task graph | +| Backend API | NestJS with the Fastify adapter | Organization API, internal APIs, MCP gateway shell, policy guards, worker-safe module boundaries | +| Frontend | Next.js App Router with React and TypeScript | Dense operations console for humans watching projects, hats, runs, boards, meetings, and evidence | +| UI primitives | Tailwind, Radix/shadcn-style components, TanStack Table/Virtual, React Flow | High-density boards, trees, timelines, graphs, and status panels | +| API contract | REST/OpenAPI first, SSE for live updates, WebSocket later for active meetings/chat | Agent-friendly contracts, generated clients, auditability, simple live UI path | +| Database | CockroachDB | Distributed Organization source of truth, state machines, audit, outbox, projections | +| Query/migrations | Drizzle ORM | TypeScript-native schema, explicit SQL shape, typed enums, migration control against CockroachDB's PostgreSQL-compatible interface | +| Messaging | NATS JetStream | Organization signals, inbox/outbox, live projection updates, DLQ/replay | +| Durable workflows | Temporal TypeScript | Initiative, approval, release, incident, scheduled review, and long-running process lifecycles | +| Hot entity state | Dapr Actors | Hat supply, agent session context, team rooms, mailboxes, meeting state, run heartbeat coordination | +| Testing | Vitest, Playwright, Testcontainers | Domain/unit tests, browser QA automation, real CockroachDB/NATS integration tests | +| Observability | OpenTelemetry JS, Pino, Prometheus metrics | End-to-end traces across API, workflows, actors, MCP tools, NATS, pods, and UI evidence | +| Delivery | Docker images, Helm or Kustomize, ArgoCD, GitLab CI | Initiative branch builds, preview/QA deployments, GitOps promotion into the cluster | + +Default app layout: + +```text +apps/ + api/ NestJS Organization API and internal control-plane endpoints + web/ Next.js operations console + workers/ schedulers, reconcilers, rules, NATS consumers + temporal-worker/ Temporal TypeScript workers + dapr-actors/ Dapr actor service + mcp-gateway/ separate MCP gateway when API shell becomes too large + +packages/ + domain/ typed entities, enums, events, commands, value objects, state machines + db/ Drizzle schema, migrations, repositories, projections + messaging/ NATS, outbox, inbox, DLQ, event contracts + policy/ RBAC, OPA/Rego policy contracts, authorization decisions + hats/ hat graph, assignment, JWT issuance/refresh, supply policies + workflows/ Temporal workflow and activity definitions + actors/ Dapr actor interfaces and shared actor contracts + mcp/ tool registry, tool schemas, policy-checked handlers + memory/ Hindsight adapter, attribution, scoped recall/write contracts + hermes/ Hermes session adapter, run adapter, context builder + observability/ tracing, logging, metrics, health checks, evidence helpers + ui/ shared UI primitives for the operations console + sdk/ typed client for UI, agents, and internal workers + adapters-agentic-services/ + temporary wrappers for selectively reused agentic-services primitives +``` + +Start as one repository and one deployable product made of multiple processes. Do not split into many microservices until the domain boundaries have proven themselves through real Organization workflows. + +Initial implementation should not start with GraphQL, Orleans, Dapr Workflow, or a broad service mesh abstraction inside the app. Use REST/OpenAPI, Temporal for durable workflows, Dapr Actors for narrow hot-state actors, NATS for events, and OpenZiti/Cilium at the cluster layer. + +### Nest Orchestrator Composition + +The TypeScript architecture should use many shared npm packages and relatively thin NestJS orchestrator apps. + +Shared packages own reusable capability logic: + +- domain entities, enums, value objects, state machines, and events; +- policy contracts and authorization decisions; +- repositories, migrations, outbox, idempotency, and projections; +- NATS event contracts and consumers; +- Temporal workflow/activity definitions; +- Dapr actor contracts; +- MCP tool schemas and handlers; +- Hindsight, Hermes, OpenZiti, Credential Proxy, and observability adapters. + +NestJS apps compose those packages into runnable orchestrators: + +- `apps/api` orchestrates HTTP/internal APIs, guards, OpenAPI, and request-scoped policy checks; +- `apps/workers` orchestrates schedulers, reconcilers, durable triggers, rules, and NATS consumers; +- `apps/temporal-worker` hosts workflow workers and wires activities to package services; +- `apps/dapr-actors` hosts actor implementations and binds actor state to package contracts; +- `apps/mcp-gateway` exposes MCP tools and resolves actor/session/hat context before delegating to package handlers. + +The rule: packages should contain the reusable business and infrastructure capability; Nest orchestrators should wire lifecycle, dependency injection, transport adapters, health checks, and process concerns. Do not bury Organization rules directly inside controllers or worker entrypoints. + +## Organization Layer Services + +| Service | Purpose | Makes these hats effective | +|---|---|---| +| Organization Kernel | Authoritative state transitions, policy checks, audit events, outbox events | All hats | +| Hat Graph Service | Defines hats, departments, assignment rules, approval scopes, supply, TTLs, reporting lines | Executive Board, Directors, Engineering Managers, Hat Designer | +| Agent Registry Service | Tracks Hermes agents, active sessions, memory profiles, specialties, current hats, cost, reliability | Directors, TPMs, Engineering Managers, Memory hats | +| Assignment and Staffing Service | Ranks agents for hats, reserves hat supply, assigns agents to teams/tasks, handles release and deprovisioning | Directors, TPMs, Engineering Managers, Cost Controller | +| Work Management Service | Owns projects, initiatives, tasks, defects, service requests, blockers, queues, lifecycle state | TPMs, Product, BA, Engineering, QA, Delivery | +| Gate and Review Service | Owns readiness, BRD, architecture, code, QA, security, delivery, memory, and outcome gates | Review hats and managers | +| Department Runtime Service | Maintains department rules, queues, schedules, standing meetings, director reports, escalation paths | Directors and department managers | +| Meeting and Communication Service | Provides inboxes, reports, broadcasts, one-on-one chats, team rooms, meeting modes, decisions | All hats, especially TPMs, directors, executives | +| Documentation Context Service | Organizes BRDs, CAs, ADRs, design docs, project docs, repo docs, and required context by scope | Product, BA, Architecture, Engineering, QA, Reviewers | +| Project Skill Service | Stores project/repo skills with frontmatter, graph links, review state, deprecation, and ingestion | Engineering Managers, Documentation hats, Memory hats | +| Memory Scope Service | Mediates Hindsight recall/write attribution by agent, hat, project, task, team, and meeting | Memory hats, all execution hats | +| Tool and Credential Gateway | Authorizes MCP tools and credential proxy use using actor context, hat policy, OPA, and audit | Security, all tool-using hats | +| Oz/Hermes Run Service | Creates and binds Hermes/Oz runs to tasks, teams, hats, pods, sessions, logs, artifacts | TPMs, Engineering Managers, Operations | +| Automation Runtime Service | Runs triggers, rules, reaction plans, schedules, leases, timers, and replay-safe workers | Operations, Scheduler Steward, Trigger Steward | +| Capability Expansion Service | Accepts requests for tools, workflows, actors, hats, docs, skills, and credentials; routes approvals | Engineering Managers, Directors, Security, Architecture | +| Observability and Evidence Service | Captures traces, logs, metrics, screenshots, artifacts, timelines, SLOs, audit events | Operations, QA, Reviewers, UI | +| Performance and Learning Service | Runs team reviews, hat effectiveness reviews, outcome reviews, memory adaptation, process improvements | Engineering Managers, Directors, Memory, Executives | +| UI Projection Service | Builds read models for humans to watch work, meetings, runs, pods, gates, decisions, and health | Humans, executives, operators | + +## Role Workspaces + +Every active hat should open into a role-specific workspace. A workspace is the agent-facing and human-facing surface for the role. + +### Common Workspace Elements + +- Role brief: current hat, department, scope, reporting chain, active policies, token TTL, and current assignment. +- Work queue: tasks, reports, gates, reviews, meetings, incidents, or capability requests relevant to the hat. +- Required context: documents, memories, project skills, artifacts, traces, and prior decisions the hat must consider. +- Allowed tools: MCP tools available under the current hat and why each is available. +- Blocked tools: MCP tools denied under the current hat with escalation path. +- Inbox: direct messages, reports, meeting invites, escalations, and broadcasts. +- Decision log: votes, approvals, rejections, gate outcomes, and rationale. +- Evidence panel: artifacts, screenshots, logs, traces, test runs, workflow events, and runtime links. +- Performance panel: recent outcomes, bounce-backs, defects, memory quality findings, and improvement recommendations. + +### Workspace Examples + +| Hat | Workspace focus | +|---|---| +| Executive Board Member | Portfolio queue, high-risk votes, department health, budget pressure, major escalations, policy changes | +| CEO | Project priorities, customer value, cross-department blockers, executive decisions, org efficiency | +| CTO | Technical standards, architecture review load, engineering quality, runtime strategy, tool expansion risk | +| COO | Operating rhythm, capacity, schedules, incidents, delivery flow, department coordination | +| Product Owner | Customer interviews, BRDs, acceptance criteria, product signoff queue, feedback reports | +| Business Analyst | Ambiguous goals, open questions, BRD drafts, source evidence, domain research | +| Architect | BRDs awaiting CA, design docs, ADR queue, architecture risks, integration constraints | +| TPM | Initiative plan, active teams, task boards, blockers, hat supply, budget, meeting rooms | +| Engineering Manager | Ready queue, team staffing, task context, TDD evidence, outcome reviews, performance reviews | +| Implementer | Assigned task, red-test requirement, docs/memory context, scoped tools, run logs, review feedback | +| Code Reviewer | Review queue, diff evidence, tests, scope boundaries, policy/doc compliance | +| QA Reviewer | QA queue, acceptance criteria, browser checks, screenshots, reproduction evidence, bounce-back reports | +| Security Reviewer | Credential requests, tool expansion, policy diffs, audit traces, risky automation queue | +| Release Operator | Release queue, upstream gate evidence, deployment logs, rollback plans, final release records | +| Memory Curator | Memory quality issues, stale memories, missing recall, hat-attributed writes, adaptation requests | +| Platform Operator | Worker heartbeats, leases, Oz runs, pod sessions, DLQs, SLO burn, incidents | +| Hat Designer | Hat proposals, tool bundles, memory scopes, approval scopes, supply rules, effectiveness data | + +## Authoritative State Model + +The Organization DB must capture the full operating reality. The first schema needs enough structure to make hats real. + +### Identity and Authority + +- `agents`: Hermes agent identity, status, cost profile, model/runtime capabilities. +- `agent_sessions`: active runtime sessions, Oz run IDs, pod bindings, heartbeat, current context. +- `departments`: department records, reporting line, active rules, owner hats. +- `hat_definitions`: role authority, tool bundles, approval scopes, memory scopes, credential scopes, voting scopes. +- `hat_assignments`: agent, hat, project/team/task scope, token TTL, status, assigned by, released by. +- `hat_supply_policies`: max concurrent assignments, scarcity rules, reserve pools, budget class. +- `hat_tokens`: issued JWT metadata, refresh state, revocation state, actor binding. + +### Work and Lifecycle + +- `projects`: long-lived product or platform areas. +- `initiatives`: executive/director-prioritized work packages. +- `work_items`: tasks, defects, service requests, reports, capability requests. +- `work_item_states`: state transition history with actor, hat, policy, and evidence. +- `blockers`: blocked work with owner, severity, escalation target, timeout. +- `dependencies`: work-to-work, initiative-to-initiative, project-to-project dependencies. +- `gates`: BRD, architecture, code review, QA, security, delivery, memory, outcome gates. +- `gate_decisions`: approval/rejection, rationale, evidence links, reviewer hat assignment. + +### Communication and Decisions + +- `inboxes`: agent, hat, team, department, and organization inboxes. +- `messages`: typed messages, reports, escalations, broadcasts, and decision notices. +- `conversation_threads`: one-on-one, team, department, executive, incident, and review threads. +- `meetings`: scheduled or ad hoc meetings with scope, mode, facilitator, participants, decisions. +- `votes`: voting scope, eligible hats, quorum, options, close policy, result. +- `decisions`: durable decision records linked to votes, gates, meetings, docs, tasks, and policies. + +### Context and Knowledge + +- `documents`: BRDs, CAs, ADRs, design docs, runbooks, reports, postmortems, project docs. +- `document_requirements`: docs required before work can enter a state. +- `project_skills`: project/repo skill metadata, frontmatter, review state, graph links. +- `memory_events`: memory read/write requests, effective scope, hat attribution, Hindsight reference. +- `memory_adaptation_requests`: requested memory changes from reviews or failures. +- `artifact_links`: screenshots, logs, traces, test output, evidence packages. + +### Runtime and Automation + +- `mcp_tool_calls`: tool ID, actor context, policy decision, result, trace ID, artifact links. +- `credential_requests`: requested scope, business need, reviewer, approval state. +- `oz_runs`: bound Hermes/Oz sessions, parent/child runs, pod, status, budget, artifacts. +- `automation_rules`: organization, department, project, initiative, team, hat, and task rules. +- `durable_triggers`: event, state, timeout, schedule, threshold, and external triggers. +- `reaction_plans`: deterministic rule output before side effects execute. +- `runtime_leases`: lease owner, fencing token, heartbeat, expiration, release reason. +- `dead_letters`: failed events/messages, classification, quarantine, replay/discard decision. +- `worker_heartbeats`: always-on worker status and health. + +### Feedback and Improvement + +- `outcome_reviews`: whether work met goal, acceptance criteria, doc compliance, and quality expectations. +- `performance_reviews`: agent/hat/team performance assessments and improvement actions. +- `hat_effectiveness_reviews`: whether a hat definition has the right authority, tools, memory, and supply. +- `department_reviews`: scheduled department-level reviews and follow-up backlog items. +- `observability_gaps`: missing traces, logs, metrics, dashboards, or evidence. +- `capability_requests`: requested tools, workflows, actors, skills, docs, memory changes, credentials. + +## Automation Loops + +The Organization layer should run a set of durable loops. These loops are how hats stay active without manually polling. + +### Work Intake Loop + +```text +goal/report/SR submitted + -> classify intent and scope + -> route to Product, Business, Security, Operations, or Executive triage + -> create project/initiative/task/defect/capability request + -> attach required context and owner department + -> emit state event +``` + +This makes Product, Business Analysis, Security, Operations, and Executive hats shine because their queue is populated with the right work type instead of a generic pile. + +### Initiative Formation Loop + +```text +approved goal or backlog item + -> Executive or Director prioritizes + -> TPM assigned + -> Product/BA/Architecture gates required + -> task breakdown proposed + -> hat supply and budget reserved + -> team created +``` + +This gives TPMs a deterministic mission-control environment with budget, staffing, blockers, and gates already visible. + +### Task Readiness Loop + +```text +task created or updated + -> check required BRD/CA/ADR/project docs + -> check acceptance criteria + -> check memory/project-skill attachments + -> check security/runtime risk + -> assign readiness reviewer + -> mark ready or bounce to missing owner +``` + +This is the loop that makes Engineering Managers valuable. They organize conditions for success before implementers start. + +### Execution Loop + +```text +ready task + -> rank agents for needed hats + -> reserve hat supply + -> issue hat tokens + -> create Hermes/Oz run + -> bind run to task/team/agent/session + -> monitor progress and tool calls + -> collect evidence +``` + +This lets implementers work in a scoped environment with the right docs, memory, tools, credentials, and observability. + +### Review and Gate Loop + +```text +work submitted + -> request scoped reviewers + -> validate required evidence + -> run gate-specific checks + -> approve, reject, or request clarification + -> emit next state event +``` + +This is where Code Reviewer, Architecture Reviewer, Security Reviewer, QA Reviewer, and Delivery Reviewer hats each get their own queue, criteria, and evidence. + +### QA and Reproducibility Loop + +```text +code review approved + -> assign QA reviewer + -> load acceptance criteria and original report + -> run browser/API/manual-style automation + -> attach screenshots, traces, logs, exact steps + -> sign off or bounce back as reproducible +``` + +The important distinction: QA failure is not merely "QA failed." It is "the issue remains reproducible" or "acceptance criteria were not met," with evidence linked to the work item. + +### Delivery Loop + +```text +all required gates approved + -> delivery reviewer checks evidence chain + -> confirm initiative branch QA signoff + -> confirm CI/CD, deployment, rollback, and observability automation evidence + -> release manager determines release impact + -> merge initiative branch to main or execute release action + -> verify system build + -> release evidence recorded + -> outcome review scheduled +``` + +This prevents Delivery from being a blind final button. It becomes an evidence and risk checkpoint. + +### Department Review Loop + +```text +scheduled department review + -> inspect team outcomes, blocked work, budget, quality, memory, tools + -> identify repeated failure patterns + -> create performance reviews, memory adaptation requests, or capability requests + -> prioritize through department backlog +``` + +This loop is the bridge from "agents did work" to "the organization learned something." + +### Capability Expansion Loop + +```text +agent or manager requests new capability + -> classify as hat/tool/workflow/actor/credential/doc/skill/memory + -> manager triages + -> director prioritizes + -> Architecture reviews runtime/API/state impact + -> Security reviews authority/data/credential risk + -> implementation task created + -> tests/docs/policies added + -> capability activated in registry +``` + +This is how the Organization becomes self-building while staying governed. + +### Runtime Health Loop + +```text +worker heartbeat, SLO, trace, DLQ, pod, or Oz anomaly detected + -> classify anomaly + -> create health report, incident, or self-healing plan + -> validate lease, budget, policy, and blast radius + -> execute safe remediation or escalate + -> record outcome and create follow-up backlog if needed +``` + +This is the always-on control plane that keeps the Organization awake even when no Hermes agent is actively chatting. + +## Department Runtime Contracts + +Every department should have a runtime contract. This prevents departments from becoming just labels. + +```ts +type DepartmentRuntimeContract = { + departmentId: string; + directorHatIds: string[]; + managerHatIds: string[]; + defaultQueues: string[]; + standingMeetings: string[]; + scheduledReviews: string[]; + ownedStateMachines: string[]; + ownedGateTypes: string[]; + ownedRuleScopes: string[]; + escalationTargets: string[]; + requiredDashboards: string[]; + requiredSloCategories: string[]; +}; +``` + +Examples: + +- Product owns discovery queues, customer interview schedules, BRD signoff queues, product acceptance criteria, and feedback triage. +- Architecture owns CA/ADR queues, architecture review gates, runtime design reviews, and architecture standards. +- Engineering Management owns readiness queues, blocked task escalation, team performance reviews, memory/context sufficiency, and TDD compliance. +- QA owns verification queues, reproducibility reports, browser evidence, QA signoff, and bounce-back workflow. +- Operations owns runtime leases, worker health, DLQs, SLOs, incidents, self-healing policy, and runbooks. +- Memory owns memory quality queues, adaptation requests, scope audits, and Hindsight integration quality. + +## Hat Activation Contract + +When an agent receives a hat, the platform should create an activation packet. + +```ts +type HatActivationPacket = { + agentId: string; + hatAssignmentId: string; + hatId: string; + departmentId: string; + scope: { + projectId?: string; + initiativeId?: string; + teamId?: string; + taskId?: string; + meetingId?: string; + runId?: string; + }; + authorityBrief: string; + responsibilities: string[]; + allowedToolIds: string[]; + blockedToolIds: string[]; + memoryScopes: string[]; + credentialScopes: string[]; + requiredDocuments: string[]; + requiredArtifacts: string[]; + activePolicies: string[]; + escalationPath: string[]; + tokenExpiresAt: string; +}; +``` + +Hermes should receive this packet at run start and after token refresh. MCP tools should also derive it server-side through actor context so the agent prompt and the infrastructure agree. + +## Role-Specific Automation Requirements + +| Role family | Automation required | +|---|---| +| Executives | Portfolio health rollups, high-risk vote queue, department performance reports, budget and hat scarcity alerts | +| Directors | Department backlog, initiative priority queue, staffing recommendations, department review cadence, cross-department escalations | +| TPMs | Initiative boards, dependency maps, blocker alerts, team creation, meeting scheduling, budget/hat supply warnings | +| Product | Interview scheduling, BRD readiness alerts, acceptance criteria gap detection, feedback/SR classification | +| BA | Ambiguity detection, missing evidence alerts, BRD review routing, open-question tracking | +| Architecture | Architecture-required detection, CA/ADR queues, design risk classification, runtime/API/security-boundary alerts | +| Engineering Managers | Task readiness checks, TDD gate enforcement, memory/context gap detection, performance reviews, skill requests | +| Implementers | Scoped task packets, red-test requirement, tool/credential availability, run progress capture, evidence submission | +| Reviewers | Review queue, evidence completeness check, self-approval block, decision templates, bounce-back routing | +| QA | Reproducibility workflows, browser automation runs, screenshot/log/trace capture, scheduled regression triggers | +| Security | Credential/tool request queue, policy diff review, audit trail inspection, dangerous automation classification | +| Delivery | Release readiness checks, gate evidence chain, merge/release audit record, rollback plan requirement | +| Operations | Worker health, leases, DLQ, SLOs, incidents, self-healing decisions, runbook execution | +| Memory | Hat-attributed memory writes, stale memory detection, repeated-failure analysis, memory adaptation review | +| Documentation and Skills | Required doc checks, skill frontmatter validation, skill graph ingestion, stale doc alerts | +| Capability Expansion | Request classification, approval routing, implementation task creation, registry activation, post-activation monitoring | + +## MCP Tool Execution Path + +All role automation depends on tool calls being actor-aware. + +```text +Hermes agent calls MCP tool + -> MCP gateway extracts session token + -> AgentSessionActor resolves effective agent/session/hat/team/task/run + -> Organization Kernel loads active HatAssignment + -> Policy service evaluates RBAC/OPA/domain preconditions + -> Tool service checks tool-specific invariants + -> State transition or side effect occurs + -> Audit, trace, artifact, and outbox event are written + -> AgentSessionActor records activity and token refresh needs +``` + +This lets agents run tools while the system always knows who acted, under which hat, against which scope, and with which authority. + +## UI Needed to Make Roles Shine + +The UI should not be only a dashboard. It should be the human-readable projection of the Organization runtime. + +Initial UI surfaces: + +- Organization map: departments, reporting lines, active hats, current agents, supply, and scarcity. +- Work map: projects, initiatives, tasks, defects, service requests, capability requests, gates, blockers. +- Role workspace: a live view of the queues and authority for any selected hat. +- Mission control: active teams, Oz/Hermes runs, pods, sessions, logs, messages, decisions. +- Review center: gate queues, required evidence, approvals, rejections, bounce-backs. +- Meeting center: scheduled meetings, active rooms, mode, participants, decisions, follow-up actions. +- Runtime operations: triggers, rules, reaction plans, workers, leases, DLQs, SLOs, incidents. +- Memory and skills: memory scopes, Hindsight profiles, memory changes, skill graph, stale context. +- Capability expansion: requested tools, credentials, workflows, actors, hats, docs, skills, approval path. +- Evidence explorer: traces, logs, screenshots, artifacts, timelines, audit events. + +## MVP Build Sequence + +### Phase 1: Organization Kernel and Hat Graph + +Build: + +- departments; +- hat definitions; +- hat assignments; +- hat tokens; +- policy checks; +- audit/outbox; +- actor-aware MCP gateway path; +- basic UI for hats and assignments. + +Proof: + +- an agent can receive a scoped hat; +- tool access changes by hat; +- JWT refresh revokes authority after deprovision; +- every tool call records hat attribution. + +### Phase 2: Work Management and Gate Runtime + +Build: + +- projects, initiatives, tasks, defects, service requests; +- task and initiative state machines; +- gate records and gate decisions; +- evidence links; +- review queues. + +Proof: + +- work moves from intake to ready to implementation to review to QA to done; +- implementers cannot approve themselves; +- missing BRD/CA/QA evidence blocks the right transitions. + +### Phase 3: Role Workspaces and Communication + +Build: + +- inboxes; +- reports; +- broadcasts; +- one-on-one and team chats; +- meetings; +- votes; +- role-specific queues. + +Proof: + +- a TPM can create a team and meeting; +- a reviewer receives the right gate queue; +- executive votes and department escalations are durable decisions. + +### Phase 4: Oz/Hermes Runtime Binding + +Build: + +- run bindings; +- session actor records; +- parent/child run links; +- pod/session heartbeat ingestion; +- artifact/log/trace links; +- budget and hat supply checks before spawning. + +Proof: + +- Organization launches Hermes work through Oz; +- child runs are bound to tasks and hats; +- stopped or silent runs are reconciled; +- hat supply is released when runs complete or expire. + +### Phase 5: Always-On Automation + +Build: + +- durable triggers; +- rules; +- reaction plans; +- leases; +- workers; +- scheduler; +- DLQ; +- SLOs; +- runtime health reports. + +Proof: + +- a ready task automatically triggers staffing; +- a silent run escalates; +- a blocked task triggers manager review; +- a scheduled QA regression creates evidence and defects. + +### Phase 6: Memory, Documentation, and Skills + +Build: + +- Hindsight scoped adapter; +- memory event attribution; +- documentation context; +- project skill graph; +- doc and skill review gates. + +Proof: + +- memory reads/writes are attributed by hat; +- task packets include required docs and memories; +- stale or missing documentation blocks readiness; +- project skills can be proposed, reviewed, ingested, and used. + +### Phase 7: Capability Expansion + +Build: + +- capability request flow; +- credential request flow; +- workflow registry; +- actor registry; +- MCP registry; +- post-activation monitoring. + +Proof: + +- an agent can request a new tool; +- Engineering Manager triages it; +- Security and Architecture review it; +- implementation work is created; +- registry activation makes the new capability available to scoped hats. + +### Phase 8: Performance and Self-Improvement + +Build: + +- outcome reviews; +- performance reviews; +- hat effectiveness reviews; +- department reviews; +- memory adaptation; +- process improvement backlog. + +Proof: + +- repeated QA bounce-backs create improvement work; +- bad memory/context outcomes produce memory adaptation requests; +- ineffective hat definitions produce hat redesign proposals; +- department reviews become prioritized backlog. + +## First Concrete Slice + +The first valuable slice should prove the whole shape with one narrow lifecycle: + +```text +Human submits goal + -> Product/BA clarifies and creates BRD + -> Architect creates CA + -> TPM creates initiative and tasks + -> Engineering Manager marks one task ready + -> Implementer Hermes agent runs via Oz + -> Code Reviewer approves or rejects + -> QA runs browser verification and attaches evidence + -> Delivery marks done + -> Outcome review creates memory or capability follow-up +``` + +The slice should include: + +- real departments and hats; +- real hat tokens; +- actor-aware MCP tools; +- one Oz/Hermes run binding; +- one task state machine; +- one BRD gate; +- one CA gate; +- one code review gate; +- one QA gate; +- one evidence chain; +- one outcome review; +- UI read models for each step. + +This gives us a small Organization that already behaves like the larger one. + +## Design Guardrails + +- The platform should create structure, not replace agent reasoning. +- Agents can propose changes to the Organization, but policy decides whether those changes activate. +- Every automation action must be traceable to a rule, trigger, hat, actor, and state transition. +- Every role must have a queue and cadence. Ownerless automation should be treated as a defect. +- Every approval must have evidence and scope. Broad authority should be rare, expiring, and visible. +- Every failure should create one of: evidence, defect, memory adaptation, capability request, or explicit no-action decision. +- The UI should show the Organization acting, not just database rows. diff --git a/docs/agentic-organization/ORGANIZATION_RUNTIME_ARCHITECTURE.md b/docs/agentic-organization/ORGANIZATION_RUNTIME_ARCHITECTURE.md new file mode 100644 index 0000000000..c883dbaa68 --- /dev/null +++ b/docs/agentic-organization/ORGANIZATION_RUNTIME_ARCHITECTURE.md @@ -0,0 +1,2231 @@ +# Hermes Organization Runtime - Current Design + +## Purpose + +The Organization exists to meet goals by forming, coordinating, and governing teams of autonomous Hermes agents. + +The system should accept ambiguous goals, clarify them when needed, decompose them into concrete work, assign the right roles, run distributed agent sessions, and complete work with evidence, artifacts, review, and durable state. + +This is not a single TPM-led swarm. It is an Organization made of many Hermes agents that can wear role-specific hats and act as leaders, planners, interviewers, architects, implementers, reviewers, operators, or other organizational roles as needed. + +## Core Concept + +```text +Organization goals + -> determine needed departments and hats + -> assign hats to Hermes agents + -> launch Hermes sessions through Oz/Warp orchestration + -> connect sessions through OpenZiti transport when required + -> coordinate work through Organization MCP tools + -> persist state, messages, artifacts, votes, and memory attribution + -> complete goals through gates, reviews, and evidence +``` + +The Organization owns the rules and shared truth. +Hermes agents own reasoning and work. +Oz/Warp owns distributed run launch and lifecycle. +OpenZiti owns secure transport/connectivity in the cluster. +k3s and Docker provide isolated execution. + +## Main Runtime Layers + +```text +Oz / Warp Run Orchestrator + Top-level Hermes session launch, distributed execution lifecycle, logs, artifacts, and k3s worker scheduling. This is distinct from OpenZiti transport. + +OpenZiti Transport + Secure connectivity layer used by Hermes sessions, Credential Proxy paths, and private service access where required. + +Organization Portal / Control Plane + Product UI, API, MCP gateway, policy engine, hat registry, task registry, memory adapter, and shared state. + +Always-On Runtime Workers + Durable schedulers, triggers, rule evaluators, reaction executors, watchdogs, reconcilers, lease managers, queue consumers, and self-healing classifiers. + +k3s Execution Plane + Runs Docker session containers where one or more Hermes agents live. + +Hermes Agents + Autonomous agents that can wear hats and act as orchestrators, leaders, workers, interviewers, reviewers, or specialists. + +NATS / JetStream + Cross-container agent messaging, inbox/outbox, status events, task events, and live updates. + +Hindsight Memory + Long-term memory for agents, with Organization-controlled scoping and hat attribution. + +Credential Proxy + Scoped access to GitLab, Jira, Confluence, source repos, cloud services, and other protected systems. + +Cilium Service Mesh + CNI-native L7 policy, mTLS-capable service mesh, Gateway API, ingress, traffic policy, Hubble observability, and egress/access enforcement for agent sessions. + +cert-manager / Vault / SPIRE / Trust Manager / External Secrets + TLS issuance, secret backend, workload identity, CA bundle distribution, and Vault-to-Kubernetes secret sync. +``` + +The always-on runtime is the operating system of the Organization. It keeps the Organization awake even when no Hermes agent is actively reasoning: schedules fire, rules react to state changes, queues drain, leases expire, dead letters are investigated, run/k3s drift is reconciled, and anomalies become reports or self-healing attempts. + +Detailed mechanics live in [Always-On Orchestration Runtime](./ALWAYS_ON_ORCHESTRATION_RUNTIME.md). + +The service layer and role workspace plan live in [Organization Layer Build Plan](./ORGANIZATION_LAYER_BUILD_PLAN.md). + +The custom backlog, assignment, signal, and release workflow product is described in [Work and Release Management OS](./WORK_AND_RELEASE_MANAGEMENT_OS.md). + +The lifecycle for turning vague requirements into curated features lives in [Ambiguous Requirement Lifecycle](./AMBIGUOUS_REQUIREMENT_LIFECYCLE.md). + +The hat-owned movement, blocker, queue SLO, and reprioritization model lives in [Anti-Stall Prioritization Runtime](./ANTI_STALL_PRIORITY_RUNTIME.md). + +The pre-implementation decision checklist lives in [Implementation Readiness Checklist](./IMPLEMENTATION_READINESS_CHECKLIST.md). + +The Kubernetes-native hat enforcement/projection model lives in [Cluster-Native Hat System](./CLUSTER_NATIVE_HAT_SYSTEM.md). + +The k3s execution, sandbox, Credential Proxy, Cilium, SPIRE, Vault, External Secrets, NATS, and Hindsight substrate is described in [Cluster Execution and Memory Substrate](./CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md). + +The NixOS/k3s/ArgoCD scaffold assumptions and component clarifications are captured in [AI Cluster Scaffold Context](./AI_CLUSTER_SCAFFOLD_CONTEXT.md). + +## Technology Stack + +Initial stack: + +- Oz/Warp for distributed Hermes session lifecycle and k3s-backed execution. +- OpenZiti for secure transport/connectivity where private agent service paths need it. +- k3s for the local/self-hosted Kubernetes execution plane. +- Docker session containers for isolated Hermes agent sessions. +- Hermes Agent for orchestrator, executive, director, manager, specialist, reviewer, QA, security, and worker agents. +- NestJS for the Organization Portal / Control Plane backend. +- Organization MCP Gateway for governed agent actions. +- NATS / JetStream for cross-agent inbox/outbox, reports, events, and live updates. +- Hindsight for long-term memory, with Organization-controlled hat attribution and scoped recall. +- Credential Proxy for scoped access to external systems. +- Cilium Service Mesh for CNI-native L7 policy, Gateway API, ingress, traffic shaping, Hubble observability, and mesh enforcement without Envoy sidecars per pod. +- cert-manager, Vault, SPIRE, Trust Manager, and External Secrets Operator for TLS, secrets, workload identity, CA bundle distribution, and secret synchronization. +- CockroachDB for Organization-owned state and distributed SQL durability. + +Deferred or optional: + +- Temporal TS should be considered the durable process rail when Organization workflows need crash-proof long-running execution, timers, retries, human waits, and child workflows. +- Dapr Actors should be considered the entity-local concurrency rail for hot state such as agent mailboxes, hat supply allocation, team rooms, meetings, incidents, and Oz run heartbeat coordination. +- Dapr Workflow should be deferred if Temporal TS is selected, because it overlaps with Temporal's durable workflow role. + +The initial version can still start without Temporal or Dapr, using native Organization state and fakes for workflow/actor boundaries. The clean integration path is documented in [Runtime Technology and Package Strategy](./RUNTIME_TECH_AND_PACKAGE_STRATEGY.md). + +## Oz/Warp and OpenZiti Boundary + +Oz is the macro-orchestrator for agent runs. In this design language, Oz represents the Warp-style orchestration layer rather than the OpenZiti transport layer. + +Oz should: + +- start Organization runs; +- schedule Hermes session containers across k3s; +- track run status, logs, transcripts, artifacts, and outputs; +- manage distributed execution environments; +- support parent and child runs; +- allow Organization-controlled agents to request new runs through policy-checked APIs. + +OpenZiti should be treated as the cluster transport/connectivity layer, not the Organization workflow engine. + +If any cluster scaffold currently names the OpenZiti application `oz/`, treat that as a naming conflict to resolve. The app should either be renamed to `openziti/` or clearly documented as OpenZiti transport, while Oz remains orchestration. + +Oz should not be the only source of Organization truth. + +The Organization control plane owns: + +- goals; +- departments; +- hats; +- tasks; +- teams; +- messages; +- memory attribution; +- voting state; +- artifact requirements; +- review and completion gates; +- credential scopes; +- policy decisions. + +## Hermes Agent Model + +A Hermes agent is an autonomous worker with identity and memory. + +A Hermes agent can wear one or more hats during a session. The hat determines what role the agent is fulfilling at that moment. + +```text +Hermes Agent + identity, base memory, performance history, current session + +Hat + role, responsibilities, skills, tools, RBAC, OPA policy, voting scope, memory activation scope + +Hat Assignment + runtime fact that agent X is wearing hat Y for task/session Z +``` + +Hats are not memory. + +Hats activate and constrain memory. Memories created while wearing a hat must be attributed to that hat assignment. + +## Hats + +A hat is a role and policy bundle. + +The starter department, hat, and tool inventory is maintained in +`docs/agentic-organization/DEPARTMENT_HAT_TOOL_INVENTORY.md`. + +It should define: + +- role name; +- department; +- responsibilities; +- skills; +- allowed MCP tools; +- RBAC roles; +- OPA policy references; +- credential scopes; +- memory recall scopes; +- memory write attribution rules; +- voting scope; +- whether the wearer can orchestrate; +- whether the wearer can spawn agents or teams; +- whether the wearer can propose or create new hats. + +Example hats: + +- Executive Strategy; +- Product Leadership; +- Engineering Leadership; +- Architecture Governance; +- Delivery Governance; +- Customer Interviewer; +- Requirements Analyst; +- Mission Control Lead; +- Backend Implementer; +- Frontend Implementer; +- Test Engineer; +- Security Reviewer; +- Release Operator; +- Memory Curator; +- Hat Designer. + +## Hat Designer + +The Hat Designer is a special hat that allows an agent to propose or create new hats. + +It should not have unrestricted authority by default. + +The Hat Designer can: + +- analyze gaps in the current Organization capability graph; +- propose new hats; +- define responsibilities and tool scopes; +- recommend RBAC and OPA policies; +- define memory activation scopes; +- define voting scopes; +- submit the hat for executive approval. + +The Organization decides whether the proposed hat becomes active. + +## Hat Graph + +The Organization maintains a hat graph. + +The hat graph describes: + +- which hats exist; +- which hats depend on other hats; +- which hats can supervise or review other hats; +- which hats belong to departments; +- which hats can vote on which decisions; +- which hats can spawn or assign which other hats; +- which memory scopes each hat can activate; +- which tools and credentials each hat can use. + +The hat graph is similar to a skill graph, but it includes role authority and policy. + +## Departments + +Departments are grouped sets of hats. + +Initial departments: + +- Executive Board; +- Program and Initiative Management; +- Product and Customer Discovery; +- Business Analysis; +- Architecture; +- Engineering; +- Engineering Management; +- QA and Verification; +- QA Engineering; +- Security and Compliance; +- Delivery and Release; +- Memory and Knowledge Management; +- Documentation and Project Skills; +- Operations and Infrastructure; +- Observability and Evidence; +- Capability and Automation Expansion. + +Departments do not need to be human-like bureaucracy. They are capability boundaries that make agent orchestration more understandable and governable. + +## Executive Board + +The Executive Board is a group of Hermes agents wearing high-authority hats. + +They are not the only orchestrators. They are organizational leaders that vote on how the Organization should respond to goals. + +The Executive Board is the ultimate organizational authority. + +Some executive hats may be long-living, such as CEO, CTO, COO, CFO, or Chief Architect. Long-living does not mean permanent. These hats still use expiring authorization and review. + +Recommended initial C-suite: + +- CEO: owns overall priority, organization shape, project/portfolio direction, and final escalation. +- CTO: owns technical standards, architecture quality, engineering strategy, and technical efficiency. +- COO: owns operating rhythm, capacity, process health, coordination, and whether the Organization is executing efficiently. + +CFO can be deferred until budget and cost controls become complex enough to need a dedicated executive hat. Chief Architect can be either a C-suite hat or a senior Architecture department hat depending on how much architectural authority the Organization needs. + +When a long-living executive hat expires or is revoked, the Executive Board should run a selection meeting and vote on the next assignment. + +This lets different Hermes agents grow into executive hats over time while keeping authority revocable and accountable. + +The Executive Board can decide: + +- what kind of goal has been submitted; +- whether clarification is required; +- what departments need to participate; +- which hats are needed and how many; +- whether a new hat should be created; +- whether a long-term initiative should be formed; +- whether customer interview, architecture planning, or delivery execution should begin; +- whether a plan is ready to hand off to Mission Control; +- which TPM hats are assigned to initiatives; +- budget ceilings, concurrency limits, and hat supply limits for initiatives; +- whether internal platform-improvement requests should become backlog items or initiatives. + +Executive Board powers: + +- define and prioritize projects and portfolios; +- create initiatives; +- elect and rotate C-suite hats; +- appoint or rotate executive hats; +- set organization-wide budgets and hat supply limits; +- approve new departments or major hat classes; +- resolve escalations that lower departments cannot resolve; +- call meetings with department leaders, TPMs, engineering managers, or teams; +- define organization-wide policy and escalation chains. + +Oz acts at the bidding of the executive layer. The Executive Board and C-suite decide what should exist, what should be prioritized, and what should run. Oz provides the distributed execution substrate for those decisions. + +## C-Suite and Directors + +The C-suite organizes the high-level shape of the Organization. + +C-suite hats should: + +- set standards; +- observe whether standards improve output; +- revise standards when evidence shows they hurt efficiency or quality; +- define and prioritize projects; +- define department goals; +- approve major initiatives and organizational changes; +- maximize efficiency in their respective focus areas; +- appoint department director hats; +- resolve cross-department priority conflicts. + +Department directors sit below C-suite hats and above TPMs/managers. + +Directors should: + +- own a department's project and initiative portfolio; +- prioritize initiatives within projects; +- track all initiatives for the department; +- ensure the department has enough hats and budget; +- assign TPM hats to initiatives; +- assign Engineering Manager hats or equivalent manager hats to teams/areas; +- select agents for hats directly under them based on memory, performance, and specialty fit; +- escalate conflicts or resource shortages to C-suite. + +Directors assign TPMs and Engineering Managers. TPMs prioritize tasks within an initiative. Engineering Managers ensure teams have what they need and evaluate whether teams are succeeding. + +## Assignment Chain + +Hat assignment should follow the Organization chain of command. + +```text +Executive Board + -> elects C-suite + -> appoints Department Directors + -> assign TPMs and Engineering Managers + -> assign team leads, reviewers, implementers, QA, and specialists +``` + +Each layer assigns hats directly below it. + +Assignment decisions should use: + +- agent memory profile; +- prior performance reviews; +- hat-specific experience; +- project/domain memory; +- current budget; +- current hat supply; +- active workload; +- risk level; +- escalation history. + +The Organization needs tooling that lets authorized agents ask: + +- which agents are strongest candidates for this hat? +- which agents have memory relevant to this project or domain? +- which agents have performed well in this hat before? +- which agents are overloaded? +- which agents recently failed in similar work? +- which agent/hat pairing gives the best balance of quality, cost, and availability? + +Hindsight should support this by providing memory and experience signals, but the Organization should make the final assignment decision through policy and chain-of-command rules. + +## Corporate Operating Model + +The Organization should behave like an agentic corporation. + +It should have: + +- executives that define goals, priorities, budgets, and initiatives; +- TPM hats that own initiative delivery lifecycle; +- departments that own capability areas; +- managers that ensure teams have the right context, memory, tools, acceptance criteria, and staffing; +- specialists that perform concrete work; +- reviewers that approve or reject work; +- QA hats that verify behavior through automation and evidence; +- security hats that approve credential and tool expansion; +- memory hats that curate and route institutional knowledge. + +The goal is not to hard-code every corporate behavior. The goal is to provide enough structured tooling, policy, and state that Hermes agents can run the corporation while the platform enforces guardrails. + +## Hierarchy + +The Organization hierarchy should be explicit. + +```text +Organization + -> Portfolio + -> Initiative + -> Program / Mission + -> Epic / Capability + -> Work Item + -> Task + -> Subtask +``` + +Initial hierarchy responsibilities: + +- C-suite defines standards, projects, portfolios, and organization-level priorities; +- department directors prioritize initiatives within projects; +- directors assign TPM hats and Engineering Manager hats; +- TPM hats prioritize tasks within an initiative; +- TPM hats form missions and coordinate initiative delivery; +- Engineering Managers organize teams, team schedules, readiness, context, and performance; +- architects create conceptual architecture documents; +- business hats create and approve BRDs; +- engineering hats implement through TDD; +- reviewer hats approve code and artifacts; +- QA hats verify completed work through automation and evidence; +- delivery hats merge, release, or promote completed work. + +## Agent-Native Task Management + +The Organization needs its own Linear-like task management system built for agents. + +It should be local and first-party, not dependent on an external task management service. + +Core entities: + +- project; +- portfolio; +- initiative; +- mission; +- work item; +- task; +- subtask; +- dependency; +- blocker; +- artifact; +- review; +- gate; +- vote; +- hat assignment; +- Oz run mapping; +- budget allocation; +- memory attribution. + +Tasks should support hierarchy, dependencies, owners, reviewers, artifacts, acceptance criteria, budget, status, priority, and required hats. + +The task system should be MCP-native so Hermes agents can create, groom, update, review, and close work through governed tools. + +## Projects + +Projects organize work for a specific product, application, customer area, repository family, platform capability, or internal system. + +Projects contain: + +- goals; +- portfolios; +- initiatives; +- missions; +- work items; +- defects; +- service requests; +- test suites; +- releases; +- memories; +- documentation library; +- project skill library; +- departments and hats assigned to the project. + +Executives prioritize across projects. + +Project prioritization should consider: + +- strategic value; +- customer impact; +- production risk; +- delivery deadlines; +- blocked initiatives; +- available hats; +- budget; +- operational load; +- internal platform needs. + +Project-level state lets the Organization decide whether a new report belongs to an existing initiative, an existing project backlog, a new initiative, or a new project. + +## Project Documentation and Knowledge + +Architecture, product, business, and engineering knowledge must be scoped to the right project, initiative, repository, and work item. + +Documentation types: + +- BRD; +- CA; +- ADR; +- design document; +- product requirement; +- engineering standard; +- test strategy; +- runbook; +- security review; +- release note; +- repo-specific convention; +- project-specific skill. + +Documentation should be indexed by: + +- organization; +- project; +- portfolio; +- initiative; +- mission; +- work item; +- repository; +- service/component; +- owning department; +- authoring hat; +- approving hat; +- status; +- version. + +Agents working on a project or initiative should receive the relevant documentation through lifecycle tooling, not by hoping they search for it. + +Examples: + +```text +Implementer starts task + -> Organization attaches relevant BRD, CA, ADRs, repo conventions, acceptance criteria, and project skills + +Reviewer opens review + -> Organization shows the same BRD/CA/ADR context plus review checklist + +QA starts verification + -> Organization shows acceptance criteria, user workflows, test strategy, and linked evidence requirements +``` + +## Documentation Enforcement + +The Organization should enforce documentation through gates and tool context. + +Rules: + +- Product/business work must link to BRD or explicit no-BRD decision. +- Architecture-risk work must link to CA and relevant ADRs. +- Structural changes should create or update ADRs when design decisions are made. +- Reviewers must review against linked BRD, CA, ADR, and design docs. +- Implementers must acknowledge relevant docs before moving work into implementation. +- QA must verify against documented acceptance criteria and workflows. +- Delivery must link final artifacts back to the project/initiative documentation set. + +When documentation is missing or stale, the work should move to: + +```text +needs_business_approval +needs_architecture +needs_documentation_update +needs_rework +``` + +Documentation gaps should create backlog items or memory adaptation requests. + +## Project Skill Libraries + +Projects and repositories should have their own skill libraries. + +Skills complement hats. + +A hat defines role, authority, tool scope, and responsibility. A project skill defines project/repo-specific ways of working. + +Examples: + +- repo build and test commands; +- architecture conventions; +- coding patterns; +- initiative branch and merge workflow; +- known traps; +- test data setup; +- browser QA workflow; +- release checklist; +- debugging playbook; +- service ownership map. + +Engineering Managers should curate project and repo skills as they observe teams solving problems. + +Skill lifecycle: + +```text +skill_gap_identified + -> Engineering Manager or Memory hat creates skill proposal + -> relevant department reviews + -> skill is documented with frontmatter + -> skill is ingested into graph + -> skill becomes available to approved hats in project context + -> future teams receive it automatically when relevant +``` + +Skill files should include structured frontmatter: + +```yaml +id: repo-build-and-test +name: Repo Build and Test Workflow +scope: + project: project-id + repositories: + - repo-name +departments: + - Engineering +allowedHats: + - Backend Implementer + - Engineering Manager +triggers: + - run tests + - validate build +artifacts: + - test evidence +owners: + - engineering-manager-hat-id +status: active +version: 1 +``` + +The graph should connect: + +- hats; +- skills; +- projects; +- repositories; +- documentation; +- memories; +- tasks; +- artifacts; +- agents; +- performance outcomes. + +This lets the Organization answer: + +- which skills apply to this task? +- which docs must this agent follow? +- which agents have succeeded with this repo before? +- which memories are relevant to this hat and project? +- which reviews failed because a skill or doc was missing? + +Use a graph database or graph-indexed projection when relationships become too complex for simple relational queries. The source of truth can remain the Organization DB, with graph ingestion as a query/projection layer. + +## Work Item Lifecycle + +Work should move through explicit states. + +```text +backlog + -> intake + -> discovery + -> ready + -> planned + -> in_progress + -> code_review + -> qa_review + -> approved + -> merged + -> released + -> done +``` + +Failure and exception states: + +```text +blocked +needs_clarification +needs_architecture +needs_business_approval +needs_security_approval +needs_rework +qa_reproducible +review_rejected +cancelled +``` + +`ready` must mean real grooming has happened. + +Ready work should have: + +- clear problem statement; +- acceptance criteria; +- relevant memory attached; +- required hats identified; +- dependencies known; +- risk level; +- required artifacts; +- test expectations; +- budget estimate; +- owner or owning department; +- no unresolved clarification blockers. + +## Initiative Lifecycle + +Initiatives are larger bodies of work owned by TPM hats. + +```text +proposed + -> executive_triage + -> discovery + -> business_approved + -> architecture_approved + -> planned + -> active + -> delivery_review + -> qa_signoff + -> released + -> complete +``` + +Initiative gates: + +- Executive Board accepts the initiative. +- Business creates or approves BRD artifacts. +- Product or Customer Discovery confirms user/customer needs. +- Architecture creates CA artifacts and approves technical direction. +- Security approves new credentials, tools, or risky integrations. +- TPM defines staffing, hats, budget, milestones, and mission breakdown. +- Engineering completes tasks through TDD and review. +- QA verifies the delivered behavior with evidence. +- Delivery records merge/release completion. + +## Business and Requirements Flow + +Ambiguous or customer-facing work should pass through business discovery before implementation. + +Product Owners own the business view of how a product should work. + +Product Owner hats should: + +- understand user and customer needs; +- define product behavior expectations; +- sign off on BRDs; +- work with architects to ensure CA artifacts support the intended business behavior; +- enforce business rules through acceptance criteria and review; +- decide when requirements are clear enough for architecture and delivery planning. + +Business hats should: + +- interview the user or customer when requirements are unclear; +- create BRD artifacts; +- document assumptions, needs, constraints, and acceptance criteria; +- identify open questions; +- approve or reject requirement readiness; +- hand ready requirements to Architecture and TPM hats. + +The system should allow a Hermes agent wearing a Customer Interviewer hat to talk to the user, gather requirements, and produce a BRD-like artifact. + +Business Analyst hats should: + +- research the business area; +- clarify ambiguous requirements; +- create or refine BRDs; +- validate BRDs against source conversations and artifacts; +- coordinate with Product Owners for signoff; +- coordinate with Architecture so CAs reflect the business rules. + +Product Owners sign off on the business intent. Business Analysts do the detailed discovery, documentation, clarification, and BRD preparation work. + +## Architecture Flow + +Architecture hats should create CA artifacts before risky or structural work begins. + +Architecture hats should: + +- read the BRD and related memory; +- inspect current systems and constraints; +- define conceptual architecture; +- identify tradeoffs and risks; +- define integration boundaries; +- define required changes and non-goals; +- approve architecture readiness. + +Engineering work that changes infrastructure, major architecture, security boundaries, credential scope, or cross-service behavior should not proceed without architecture approval. + +## Engineering Flow + +Engineering hats must follow TDD for defects and implementation work. + +The normal engineering sequence: + +```text +1. Read task, BRD, CA, memory, and acceptance criteria. +2. Write representative failing tests first. +3. Run tests and prove failure. +4. Implement the smallest correct fix or feature. +5. Run focused tests and relevant broader checks. +6. Submit artifacts: test evidence, diff summary, logs, and notes. +7. Request code review. +``` + +The task system should record: + +- red test artifact; +- green test artifact; +- implementation summary; +- changed files; +- commands run; +- failures encountered; +- remaining risks. + +## Review Power + +Review authority belongs to hats. + +An agent can only mark another agent's work approved if its active hat has review power for that decision scope. + +Examples: + +- Code Reviewer can approve code review tasks. +- Architecture Reviewer can approve architecture gates. +- Business Approver can approve BRD readiness. +- Security Reviewer can approve credential/tool requests. +- QA Reviewer can approve QA signoff. +- Delivery Reviewer can approve merge/release readiness. + +Review hats should be limited resources. The Organization should track review capacity and avoid assigning more review work than available hats can handle. + +Tasks should not move to `approved`, `merged`, `released`, or `done` just because the implementer says they are done. + +## QA Flow + +QA hats verify whether the originally reported behavior, acceptance criteria, and user workflow are actually fixed after code review and merge/release readiness. + +QA hats should: + +- use browser automation where relevant; +- run scripted checks and exploratory workflows; +- capture screenshots; +- attach traces, console logs, network logs, and reproduction steps; +- verify acceptance criteria; +- sign off when behavior passes; +- bounce work back when the issue is still reproducible or the fix is insufficient. + +The important failure case is not "QA work failed." The failure case is that QA identifies the original issue, workflow defect, or acceptance gap is still reproducible after the attempted fix. + +When QA finds the issue is still reproducible or insufficiently fixed, the QA hat should attach: + +- explicit steps to reproduce; +- expected result; +- actual result; +- screenshots; +- logs; +- traces; +- environment details; +- linked artifacts; +- suggested owning department or task. + +The task should move to `qa_reproducible` or `needs_rework`, not disappear into chat history. + +QA should also be able to produce structured QA reports: + +- verification report; +- reproducibility report; +- browser automation report; +- regression report; +- evidence package; +- signoff report. + +These reports become first-class artifacts on the work item. + +## Scheduled QA Regression Runs + +QA should also run scheduled verification across the app, not only task-specific signoff. + +Scheduled QA runs should: + +- execute known test cases; +- exercise critical user workflows; +- use browser automation where relevant; +- capture screenshots and traces; +- compare behavior against expected results; +- identify regressions; +- create QA reproducibility reports for failures; +- create or reopen defects; +- recommend new test cases when coverage is weak; +- report coverage gaps to QA Engineering Managers. + +Scheduled QA flow: + +```text +scheduled_qa_run + -> select project, app area, release, or critical workflow set + -> run test cases and browser automation + -> collect screenshots, logs, network traces, and results + -> create regression report + -> create defects for reproducible failures + -> create backlog items for missing test tooling or coverage gaps + -> feed findings into project prioritization +``` + +QA Engineering hats should own the quality and evolution of these test suites. If scheduled QA repeatedly finds gaps, the Organization should create internal platform work to improve test case management and automation tooling. + +## Delivery Flow + +Delivery hats own the final movement from approved work to merged/released state. + +The default development model is initiative-scoped feature branches: + +```text +initiative approved + -> initiative branch created + -> CI/CD and deployment automation created or updated + -> development, review, and QA happen on the branch + -> branch preview or QA deployment proves the feature + -> QA signs off the complete feature branch + -> Delivery approves merge + -> branch merges to main + -> system build verification runs +``` + +`main` is the system build branch. It should only receive complete initiative branches after QA signoff and delivery approval. + +Delivery hats should: + +- ensure code review approval exists; +- ensure required tests and branch-level QA evidence exist; +- ensure CI/CD, preview/deployment, rollback, and observability automation exists or has an approved exception; +- merge only approved initiative branches into `main`; +- verify the system build after merge; +- record release or deployment evidence; +- link final artifacts to the work item; +- notify the owning TPM and initiative. + +## Resource and Hat Supply Management + +Hats are limited resources. + +The Organization should track: + +- how many agents can wear a given hat at once; +- how many sessions are active per hat; +- budget per hat, team, initiative, and department; +- Oz run cost and resource usage; +- credential scope utilization; +- review queue depth; +- blocked work due to missing hats. + +TPM hats should manage resources across active tasks and teams. + +Executive hats should manage budget and priority across initiatives. + +If hat supply becomes a bottleneck, the Organization can: + +- reprioritize work; +- create more hat assignments; +- propose new hats; +- train or specialize additional agents; +- defer lower-priority work; +- create platform/tooling tasks to improve throughput. + +## Internal Platform Improvement Loop + +The Organization must be able to improve itself. + +After the base orchestration layer exists, internal departments should continuously build and improve the Organization's own infrastructure. + +These internal teams are normal Organization teams: + +- they belong to departments; +- they are staffed by Hermes agents wearing hats; +- they use the same task, review, QA, security, and delivery lifecycle; +- they produce artifacts and memory; +- their work competes for budget and hat supply like all other work. + +Internal infrastructure teams can build: + +- better task management tools; +- better QA test case tooling; +- better memory routing and memory review tools; +- better agent performance review dashboards; +- better hat supply and budget planning tools; +- better credential proxy integrations; +- better Oz run dashboards; +- better browser automation and screenshot comparison tools; +- better artifact tracking; +- better backlog prioritization tools; +- better report intake and triage tools; +- better MCP gateways and policy tooling. + +Example: + +```text +QA Engineering Manager reports QA quality problems. +Executive Board accepts that QA needs better test case tracking. +Business creates a BRD for a test case organization tool. +Architecture creates CA. +Engineering builds the tool. +QA validates it. +Memory department records lessons. +The new tool becomes available to QA hats through MCP after security approval. +``` + +This loop should apply to any internal capability gap: + +- better QA tooling; +- better memory routing; +- better acceptance criteria tooling; +- better credential proxy workflows; +- better architecture document generation; +- better run dashboards; +- better artifact tracking; +- better reviewer queue management. + +Internal platform work should flow through the same corporate lifecycle as customer-facing work, but with internal departments as the customer. + +This makes the Organization self-building. + +The first version should be small and manual enough to work. Over time, agents should identify missing capabilities, create reports or performance reviews, convert them into backlog items, and build the internal tools that make future orchestration better. + +## Report-Driven Workflows + +The Organization should accept structured reports as work intake. + +Report types include: + +- service request from a customer or user; +- user bug report; +- customer requirement report; +- QA reproducibility report; +- engineering manager outcome report; +- performance review report; +- DevOps pipeline failure report; +- security risk report; +- memory quality report; +- incident report; +- release readiness report. + +Reports are not automatically tasks. They are evaluated, classified, prioritized, and converted into backlog items, initiatives, or direct tasks depending on severity and clarity. + +Service requests and bug reports should follow a natural intake path. + +```text +service_request_received + -> classify as question, request, defect, enhancement, or incident + -> gather missing information if needed + -> link customer/user context + -> determine severity and impact + -> create defect, backlog item, or initiative candidate + -> prioritize against project and portfolio work +``` + +Customer/user defects should be triaged as defects, not generic tasks. + +```text +defect_reported + -> reproduce or request reproduction evidence + -> classify severity, scope, and owner area + -> attach logs, screenshots, traces, and affected version + -> decide whether urgent fix, project backlog, or initiative is needed + -> assign TPM if coordination is required + -> enter TDD-first engineering lifecycle +``` + +QA-reported defects should also be triaged. + +```text +qa_defect_reported + -> confirm reproducibility + -> link failing test case or browser automation evidence + -> determine whether it is regression, incomplete fix, acceptance gap, or new defect + -> create or reopen work item + -> prioritize against active initiative and release risk +``` + +Report intake flow: + +```text +report_received + -> classify report type, severity, and owning department + -> attach evidence and source context + -> recall relevant memory + -> determine if clarification is needed + -> determine required hats + -> check hat supply and budget + -> create backlog item, initiative, or urgent mission + -> assign TPM or department owner +``` + +## DevOps Internal Workflow + +The DevOps department should create internal reports when infrastructure or pipeline failures appear to be development-related. + +Example: + +```text +pipeline_failure_detected + -> DevOps agent analyzes logs, stage, commit, branch, and owner signals + -> DevOps determines whether failure is infra, flaky, dependency, test, or dev-related + -> if dev-related, DevOps creates a pipeline failure report + -> Organization prioritizes the report against existing work + -> TPM hat is assigned if the fix requires coordination + -> engineering hats are provisioned based on required skills and available hat supply + -> fix moves through TDD, review, QA/repro verification, and delivery gates +``` + +DevOps reports should include: + +- pipeline URL or run ID; +- failing stage; +- failing command; +- logs; +- suspected owner area; +- branch or commit; +- repro command when available; +- classification; +- recommended priority; +- impact; +- suggested required hats. + +DevOps should not bypass prioritization. If hat supply is limited, the Organization must decide whether the pipeline failure interrupts current work, enters backlog, becomes an initiative, or is assigned to a waiting queue. + +## Prioritization and Hat Allocation + +Every report, task, and initiative competes for limited hats and budget. + +Prioritization should consider: + +- business impact; +- customer impact; +- production or delivery risk; +- blocked initiatives; +- severity; +- deadlines; +- available hats; +- budget; +- opportunity cost; +- dependency chains; +- whether a TPM is required; +- whether architecture, business, security, or QA gates are required. + +The Organization should maintain a prioritization board where executive, TPM, department manager, and specialist hats can submit scoped votes. + +Prioritization output should include: + +- priority; +- owning department; +- assigned TPM when needed; +- required hats; +- hat supply reservation; +- budget allocation; +- expected artifacts; +- target lifecycle path. + +If required hats are unavailable, the work should be marked as waiting for capacity, escalated for reprioritization, or converted into a request to expand hat supply. + +## Capability Expansion + +Agents must be able to request expansion of their capabilities. + +Capability expansion includes: + +- new MCP tools; +- new project or repo skills; +- new hat capabilities; +- new credential proxy scopes; +- new credential proxy endpoints; +- new external API integrations; +- new Temporal workflows; +- new Dapr actors; +- new scheduled jobs, durable triggers, or automation rules; +- new runbook skills; +- new observability or QA tooling. + +Agents can identify capability gaps while doing work, during reviews, during incidents, or during scheduled team reviews. They can submit requests, but they cannot self-approve new authority. + +General capability expansion flow: + +```text +capability_gap_identified + -> agent submits capability request with evidence + -> Engineering Manager reviews operational need + -> Department Director decides priority and departmental fit + -> Architecture reviews design when runtime/workflow/API impact exists + -> Security reviews tool, credential, policy, and data exposure + -> Product/Business reviews when user/customer behavior changes + -> implementation task or initiative is created + -> capability is built with tests, observability, docs, and rollback plan + -> reviewer and security approval gates pass + -> policy, hat graph, MCP registry, workflow registry, actor registry, or credential proxy registry is updated + -> capability becomes available only to approved hats and scopes +``` + +Engineering Managers own team-level capability requests. Directors own department-level capability evolution. Security owns approval for credentials, external APIs, policy changes, and dangerous automation. Architecture owns runtime and cross-service design review. + +Capability requests should record: + +- requesting agent; +- active hat; +- project, initiative, task, incident, or review source; +- observed limitation; +- evidence and traces; +- desired capability; +- expected benefit; +- affected departments; +- requested tools or credentials; +- data classification; +- risk level; +- proposed owner; +- required tests; +- required observability; +- rollback/deprecation plan. + +Capability requests should become backlog items, direct tasks, or initiatives depending on scope and risk. + +## Security and Credential Proxy Expansion + +Security owns approval of new tools, credential scopes, credential proxy endpoints, external API integrations, and risky automation. + +Engineering managers, directors, or agents can request new tool access, but cannot self-approve it. + +Credential proxy expansion flow: + +```text +request_tool_or_credential + -> Engineering Manager confirms need and work context + -> Director confirms department priority + -> security_triage + -> risk_review + -> architecture_review when new proxy endpoint or integration is needed + -> implementation_task + -> build credential proxy endpoint or tool adapter + -> add policy, audit, rate limit, and observability + -> security_review + -> policy_update + -> credential proxy registry update + -> MCP/tool registry update if agent-facing + -> availability to approved hats +``` + +Security review should record: + +- requested tool or credential; +- requested credential proxy endpoint or external API; +- requesting hat and task; +- intended use; +- risk level; +- allowed operations; +- denied operations; +- audit requirements; +- rate limits and budget constraints; +- data access classification; +- expiration or review date. + +Cilium Service Mesh, SPIRE workload identity, and the Credential Proxy should enforce the resulting access boundary. + +## Workflow and Runtime Expansion + +Agents should also be able to request new Temporal workflows, durable triggers, Dapr actors, and scheduled automation when they discover repeatable organizational inefficiency. + +Workflow expansion examples: + +- Engineering Manager notices repeated review drift and requests `ReviewEscalationWorkflow`. +- QA Engineering Manager notices repeated missed test coverage and requests `RegressionSuiteLifecycleWorkflow`. +- Security Manager notices repeated credential mistakes and requests `CredentialScopeApprovalWorkflow`. +- Director notices department handoffs are slow and requests a department-specific initiative intake workflow. +- Incident Commander notices repeated remediation steps and requests a runbook-backed `IncidentMitigationWorkflow`. + +Workflow expansion flow: + +```text +workflow_gap_identified + -> agent or manager submits workflow capability request + -> Engineering Manager validates team/process need + -> Department Director prioritizes it for department capability + -> Architecture reviews workflow boundaries and state ownership + -> Security reviews policy, credentials, and automation risk + -> implementation creates Temporal workflow/activity definitions + -> tests cover deterministic workflow behavior and activity idempotency + -> observability, SLOs, rollback, and versioning are documented + -> workflow is registered in the Organization workflow catalog + -> rules/triggers can launch it only under approved policy +``` + +Temporal workflow creation must remain governed. Hermes agents may propose, design, and implement workflow code, but the runtime only enables it after code review, architecture review, security review, and workflow registry approval. + +## Engineering Management + +Engineering Management hats ensure teams are set up to succeed. + +They should: + +- confirm tasks have acceptance criteria; +- ensure relevant memories are attached; +- ensure architecture and business gates are satisfied; +- confirm the right hats are staffed; +- monitor blocked work; +- request backlog items for missing tools or process gaps; +- ensure future teams working on related tasks receive the right memory and artifacts. + +Engineering managers do not replace implementers or reviewers. They manage readiness, context, staffing, and process quality. + +At any given time, an engineering manager hat for a department or project area is the organizer of that area's teams. + +Engineering managers should actively manage: + +- which teams exist; +- when teams run; +- which scheduled reviews and QA checks run; +- which hats are assigned; +- whether the right memory is attached; +- whether tasks are ready; +- whether teams are blocked; +- whether TPMs need to reprioritize or escalate; +- whether executives need to adjust budget or hat supply. + +Scheduled workflows should not be ownerless timers. They should be owned by department or project managers. + +Examples: + +- QA Engineering Manager schedules regression suites. +- Engineering Manager schedules team reviews. +- Memory Manager schedules memory quality reviews. +- DevOps Manager schedules pipeline health reviews. +- Security Manager schedules credential-scope audits. + +These managers coordinate with TPMs and executives to make sure schedules, resources, and priorities are aligned. + +Engineering managers also need their own management tasks. + +Management tasks include: + +- determine whether assigned teams reached their goals; +- compare delivered artifacts against acceptance criteria; +- inspect whether red tests were created before implementation; +- inspect whether green test evidence is representative; +- review code review outcomes and QA outcomes; +- identify whether failures were caused by unclear requirements, poor memory recall, weak tools, wrong hat assignment, insufficient review, or agent performance; +- create performance reviews for agents, teams, hats, and processes; +- recommend backlog items to close gaps; +- recommend memory updates for future related tasks; +- recommend hat supply changes when staffing bottlenecks hurt delivery; +- escalate systemic issues to Executive Board, QA Engineering, Security, Architecture, or Memory departments. + +An engineering manager's job is not complete when a task is marked done. It is complete when the manager has evaluated whether the team achieved the intended outcome and recorded any follow-up work needed to improve future delivery. + +## Outcome and Performance Review Loop + +Every significant task, mission, and initiative should produce an outcome review. + +Outcome reviews should answer: + +- Did the team meet the stated goal? +- Did the work satisfy acceptance criteria? +- Were required BRD, CA, test, review, QA, and delivery artifacts produced? +- Did the agents follow required process such as TDD-first implementation? +- Were the right hats assigned? +- Were reviewers effective? +- Did QA catch issues before delivery? +- Did memory retrieval help or hurt? +- Were tools missing or inadequate? +- Did cost, runtime, or hat supply exceed expectations? +- What should change before similar work happens again? + +Performance review targets: + +- individual agent; +- hat assignment; +- team; +- department; +- initiative; +- tool; +- memory scope; +- Organization process. + +Bad or weak performance reviews should not just become comments. They should be converted into actionable backlog items when there is a fixable gap. + +Examples: + +```text +Review finding: QA missed visual regression. +Backlog item: Build screenshot-diff tool for QA hats. + +Review finding: implementers skipped red tests. +Backlog item: Strengthen task gate so code work cannot move to review without red-test artifact. + +Review finding: agents lacked relevant memory. +Backlog item: Improve memory recall policy for related repo and hat scope. + +Review finding: Security reviews are blocking credential requests. +Backlog item: Increase Security Reviewer hat supply or improve credential request templates. +``` + +Backlog items from performance reviews should flow through the same corporate lifecycle: + +```text +performance_review + -> backlog_item + -> business / internal customer clarification + -> architecture when needed + -> implementation + -> review + -> QA + -> rollout +``` + +## Scheduled Team Reviews + +The Organization should run scheduled review jobs. + +These jobs are internal recurring workflows, not user-requested one-offs. + +Scheduled review types: + +- team review; +- member performance review; +- hat effectiveness review; +- department review; +- initiative health review; +- memory quality review; +- tool effectiveness review; +- budget and hat supply review; +- QA regression coverage review. + +Team review flow: + +```text +scheduled_team_review + -> list active and recently completed teams + -> evaluate team goals and outcomes + -> inspect each member's assigned tasks + -> inspect each member's artifacts, reviews, rework, and blocked work + -> create member performance reviews + -> create team outcome review + -> identify process, memory, tooling, or staffing gaps + -> create recommended actions + -> prioritize actions into backlog or initiatives +``` + +Member review should evaluate: + +- goal completion; +- quality of artifacts; +- TDD compliance; +- review outcomes; +- QA bounce-backs; +- communication quality; +- task throughput; +- memory use; +- tool use; +- cost and runtime; +- whether the agent was wearing an appropriate hat; +- whether the hat definition or memory scope needs refinement. + +Scheduled reviews should not automatically punish agents. They should produce actionable improvements for agents, hats, memory, tools, and process. + +## Memory Adaptation Review + +Engineering managers and memory hats should evaluate whether memories need to be adapted, changed, deprecated, or created. + +Memory adaptation triggers: + +- team repeated an old mistake; +- agent missed relevant context; +- agent used stale memory; +- memory was too broad or leaked into the wrong hat scope; +- acceptance criteria were misunderstood; +- related future teams need better context; +- performance review identified memory-related failure. + +Memory adaptation actions: + +- create memory; +- update memory; +- deprecate memory; +- split memory by hat scope; +- change memory visibility; +- add memory to project or initiative context; +- request memory tooling improvement; +- create backlog item for memory system work. + +Memory changes should be reviewed when they affect broad organization behavior. + +Memory adaptation flow: + +```text +memory_issue_identified + -> memory hat reviews source evidence + -> engineering manager confirms task/process impact + -> update or propose memory change + -> create backlog item if tooling or policy work is needed + -> prioritize through normal backlog flow +``` + +## Hat Authorization and Deprovisioning + +Hats must have enforceable runtime authorization. + +Each active hat assignment should receive a short-lived authorization token, such as a JWT, that represents: + +- agent ID; +- hat ID; +- hat assignment ID; +- session ID; +- Oz run ID; +- department; +- allowed MCP tools; +- credential scopes; +- memory scopes; +- voting scopes; +- expiration time; +- issuer; +- policy version. + +Agents should not keep hat authority forever. + +Hat tokens should expire and require refresh. On refresh, the Organization checks: + +- whether the agent still exists; +- whether the hat assignment is still active; +- whether the session or task is still active; +- whether the Organization has deprovisioned the hat; +- whether budget or concurrency limits still allow the hat; +- whether security policy changed; +- whether the agent was suspended or reassigned. + +If refresh fails, the agent becomes roleless for that Organization context. + +Roleless agents: + +- cannot call protected Organization MCP tools; +- cannot access hat-scoped memory; +- cannot access credential proxy scopes; +- cannot vote; +- cannot approve or mark work done; +- may only call minimal tools such as `read_assignment_status`, `request_hat`, or `shutdown_self`. + +This supports cost and supply management. The Organization can deprovision hats when work completes, budgets tighten, reviews fail, or higher-priority initiatives need the capacity. + +JWTs should not be the only control. MCP gateway policy, credential proxy policy, Organization state, SPIRE workload identity, and Cilium service policy should all check the active hat assignment. + +Recommended validation path: + +```text +Hermes agent calls MCP tool + -> MCP Gateway validates JWT + -> Gateway resolves AgentSessionActor by session ID + -> AgentSessionActor returns current runtime context + -> Gateway checks Organization state for active hat assignment + -> Gateway builds ToolExecutionContext from actor context + Organization state + -> Gateway evaluates OPA/RBAC policy + -> Gateway checks tool-specific scope + -> Gateway checks current mode, task/team/meeting/run scope, and actor heartbeat + -> Credential Proxy / Memory Adapter / Task Service performs its own authorization check + -> Gateway records tool activity back to AgentSessionActor +``` + +This makes hat authority revocable, auditable, and time-bound. + +`AgentSessionActor` should act as the live context authority for the session. It should know the active hat assignment, current task, current team, current meeting, current Oz run, memory scopes, credential scopes, current mode, policy version, and last heartbeat. It should not replace Organization DB truth; it gives the MCP Gateway the ambient runtime context needed to execute tools safely. + +## Memory Department + +Memory hats manage institutional knowledge. + +They should: + +- inspect whether new tasks have relevant memories attached; +- curate memories produced by completed work; +- detect duplicate or stale memories; +- recommend memory scopes for hats; +- ensure future related tasks receive relevant context; +- flag missing memory when teams repeat mistakes. + +Memory hats can request new backlog items when the memory system needs better tooling, attribution, or retrieval quality. + +## Goal Intake + +When Oz or a user submits a goal, the Organization should not immediately spawn random workers. + +It should first run goal intake: + +```text +Goal received + -> classify ambiguity and risk + -> recall relevant memory + -> Executive Board votes on required hats + -> if unclear, assign Customer Interviewer / Requirements Analyst + -> produce requirement artifacts + -> plan departments and teams + -> start execution runs +``` + +Ambiguous goals should be clarified through an interview process. + +The Customer Interviewer hat should: + +- ask the user targeted questions; +- extract customer requirements; +- produce requirement documents; +- identify open assumptions; +- hand off to Product, Architecture, or Mission Control. + +## Mission Control + +Mission Control is a temporary team formed around a concrete mission. + +It receives: + +- goal statement; +- clarified requirements; +- constraints; +- acceptance criteria; +- relevant memory; +- required hats; +- budget and policy; +- artifact expectations. + +Mission Control then coordinates execution until the mission is complete. + +Mission Control may spawn: + +- implementation teams; +- research teams; +- architecture review teams; +- test teams; +- release teams; +- customer feedback teams. + +## Voting Board + +The Organization needs a voting board for role selection and major decisions. + +Voting is scoped by hats. + +Examples: + +- Architecture Governance can vote on architecture readiness. +- Product Leadership can vote on requirement completeness. +- Security Reviewer can vote on security acceptance. +- Delivery Governance can vote on release readiness. +- Executive Strategy can vote on goal priority and department allocation. + +Votes should be persisted with: + +- voter agent ID; +- hat assignment ID; +- decision scope; +- rationale; +- confidence; +- timestamp; +- links to evidence. + +## Communication Model + +Agents need multiple communication modes. + +Communication should be governed by active hat, department, hierarchy, task context, and policy. + +Core communication primitives: + +- report; +- inbox message; +- team broadcast; +- one-on-one chat; +- team chat; +- department-wide channel; +- executive meeting; +- escalation; +- meeting request; +- decision vote. + +Communication is not just text. Different communication modes can open different infrastructure capabilities: + +- memory creation; +- document creation; +- hat proposal; +- task creation; +- voting; +- escalation; +- artifact linking; +- decision recording; +- meeting transcript persistence. + +## Inboxes and Reports + +Reports should be delivered to typed inboxes. + +Inbox types: + +- agent inbox; +- hat inbox; +- team inbox; +- department inbox; +- project inbox; +- initiative inbox; +- executive inbox; +- escalation inbox. + +Reports and messages should include: + +- sender agent ID; +- sender active hat; +- recipient scope; +- message type; +- task/project/initiative context; +- priority; +- requested action; +- expiration or due date; +- links to evidence; +- whether response is required. + +Department-wide reporting should be the default for structured findings. Ad hoc chat should be used when a conversation is needed to resolve ambiguity or make a decision. + +## One-on-One Chats + +One-on-one chats are scoped conversations between two agents or hats. + +They can be opened for: + +- clarification; +- coaching; +- handoff; +- review discussion; +- performance review; +- memory adaptation; +- hat proposal; +- conflict resolution; +- task planning. + +One-on-one chat should require a reason and scope. + +During one-on-one mode, the participants can: + +- create memories; +- create documents; +- propose tasks; +- propose hats; +- make scoped decisions if both hats have authority; +- escalate if authority is insufficient. + +One-on-one chat should be allowed with same-level or lower-hierarchy hats by default, depending on active hat and department policy. Higher-level chats require request, invitation, or escalation. + +## Team Chats and Meetings + +Team chats are multi-agent conversations with a defined membership, purpose, and conversation mode. + +Meeting types: + +- team planning; +- task review; +- architecture review; +- QA triage; +- incident triage; +- executive meeting; +- TPM status meeting; +- performance review; +- handoff meeting; +- decision meeting. + +Team chats should choose a conversation mode. + +Conversation modes: + +- leader-led: the meeting leader decides the order and final synthesis; +- round-robin: each participant speaks in a fixed order; +- pass-the-stick: current speaker chooses the next speaker; +- vote-driven: proposals are discussed and then voted on; +- reviewer-panel: reviewers ask questions and produce decisions; +- open-discussion: free discussion with a moderator; +- executive-session: executive leader controls agenda, motions, and votes. + +The selected mode determines how the Organization routes turns, records decisions, and closes the meeting. + +Executive meetings may be CEO-led or Executive Board-led. + +TPM meetings may be TPM-led with development teams, reviewers, QA, and engineering managers. + +Engineering managers may call meetings with their teams and TPMs. + +The CEO or other executive hats can schedule meetings with TPMs, engineering managers, and development teams when priority, delivery, or organizational health requires it. + +## Broadcasts + +Team broadcasts are shared messages delivered to all team members. + +Broadcasts are for information all members need, not for complex discussion. + +Examples: + +- priority change; +- new blocker; +- test evidence available; +- architecture decision published; +- QA reproduced issue; +- meeting scheduled; +- delivery deadline changed; +- task reassignment. + +Broadcasts should be recorded and linked to the relevant team/task/initiative. + +## Escalation Chains + +The platform should enforce guardrails mostly through escalation chains, not excessive hard-coded behavior. + +Escalation examples: + +```text +Implementer blocked by unclear acceptance criteria + -> Engineering Manager + -> TPM + -> Product / Business + -> Executive Board if priority or scope changes + +QA finds issue still reproducible + -> Engineering Manager + -> TPM + -> owning implementer/reviewer + -> Executive Board if release risk is high + +Agent requests new credential scope + -> Security Manager + -> Security Reviewer + -> Executive Board for high-risk scopes + +Team lacks required hat supply + -> Engineering Manager + -> TPM + -> Executive Board +``` + +Escalation should open the appropriate report, inbox item, chat, or meeting based on type and severity. + +## Hat Growth and Agent Specialization + +Agents may become stronger in certain hats over time. + +This happens because memories, performance reviews, task history, and artifacts accumulate around: + +- agent ID; +- hat ID; +- department; +- project; +- task type. + +Some hats may become associated with agents that have proven performance. Other hats may rotate frequently to avoid overfitting, cost concentration, or governance risk. + +The Organization should track: + +- which agents perform well in which hats; +- which hats have strong memory fit for which agents; +- which agents should be candidates for future hat assignments; +- whether long-lived hats should be renewed, rotated, or revoked. + +Executive Board should own final authority for high-power hat assignment and rotation. + +## Memory Model + +Agents have memory. Hats scope and attribute memory. + +Memory recall should consider: + +- agent identity; +- active hat; +- task; +- department; +- project; +- organization; +- source repo; +- customer or domain; +- visibility policy. + +Memory writes should include: + +- agent ID; +- hat ID; +- hat assignment ID; +- task ID; +- session ID; +- department ID; +- source artifact; +- visibility; +- confidence; +- timestamp. + +If Hindsight cannot support this level of metadata, scoped retrieval, and attribution, the Organization may need an adapter layer or a fork. + +Start with an adapter first. Fork only if Hindsight cannot enforce hat-scoped recall and attribution. + +## Observability and Telemetry Spine + +The Organization should be obsessively observable. + +Observability is not just for debugging infrastructure. It is how the Organization sees itself, how humans understand what is happening, how agents improve their own tools, and how self-healing becomes possible. + +Every meaningful action should produce: + +- an authoritative state transition when state changes; +- a domain event; +- a structured audit log; +- a distributed trace span; +- metrics when the action affects throughput, latency, reliability, budget, or quality; +- linked artifacts when human-reviewable evidence exists. + +This applies to: + +- goal intake; +- project and initiative changes; +- hat assignment and deprovisioning; +- JWT refresh and denial; +- policy evaluation; +- Oz run creation and lifecycle; +- Hermes session and turn lifecycle; +- subagent/team spawning; +- MCP tool calls; +- NATS publish/consume/replay/dead-letter events; +- credential proxy allow/deny/use; +- Hindsight memory reads and writes; +- documentation context reads; +- skill usage and skill ingestion; +- task state transitions; +- gate decisions; +- review decisions; +- QA runs and reproducibility decisions; +- meetings, votes, and decisions; +- artifact creation; +- self-healing attempts. + +Every run should be traceable across: + +```text +goal/request + -> project/initiative/task + -> active hat assignment + -> Hermes agent/session/turn + -> Oz run + -> k3s pod/container + -> MCP tool calls + -> policy checks + -> credential proxy requests + -> memory/documentation/skill reads + -> NATS events + -> artifacts + -> resulting state transitions +``` + +Agents should be required to build internal tools and project features with observability from the start. + +Agent-created systems should expose: + +- status endpoints or MCP status tools; +- structured logs; +- trace spans; +- metrics; +- health checks; +- readiness checks; +- linked artifacts for evidence; +- clear failure codes; +- remediation recommendations; +- UI-readable state. + +Review gates should reject infrastructure and internal platform work that cannot answer: + +```text +What happened? +Why did it happen? +Which agent and hat caused it? +Which policy allowed or denied it? +Which project, initiative, task, and run were affected? +What evidence was produced? +What failed or degraded? +Was it retried? +Was it self-healed? +Was it escalated? +What should future agents learn from it? +``` + +## Self-Healing and Improvement Loop + +Self-healing should be built from observable facts. + +The Organization should not guess. It should classify issues from state, traces, logs, metrics, artifacts, and known failure patterns. + +Basic loop: + +```text +detect anomaly + -> correlate state, trace, logs, metrics, artifacts + -> classify failure mode + -> check policy for allowed remediation + -> run safe remediation if allowed + -> verify result + -> record outcome + -> escalate if unresolved + -> create report/backlog item if recurring or systemic + -> update memories, skills, docs, or hats when approved +``` + +Examples: + +- stuck Oz run creates a run health report, attempts a safe retry, and escalates to DevOps if retry fails; +- repeated MCP timeout creates an internal reliability backlog item; +- repeated QA reproducibility failures create a test tooling or project skill request; +- repeated memory misses create a memory adaptation request; +- frequent credential proxy denials create either a Security review request or a documentation/skill update request; +- recurring review rejection reasons create Engineering Manager performance review actions. + +This is one of the main ways the Organization becomes self-building. It observes its own failure modes, turns them into governed work, and improves its tools and processes over time. + +The self-healing loop depends on the always-on runtime: + +- durable triggers detect timeouts, stale state, metric thresholds, and external changes; +- organizational rules decide whether to create work, escalate, launch agents, or attempt remediation; +- runtime leases prevent duplicate remediation; +- runbook skills define safe operational procedures; +- SLOs and error budgets influence priority and admission control; +- incident rules determine severity, commander assignment, communication cadence, rollback authority, and postmortem requirements. + +## Organization MCP Tools + +Hermes agents interact with the Organization through MCP tools. + +Initial tool families: + +```text +Goal tools + submit_goal, submit_report, submit_service_request, classify_report, clarify_goal, classify_goal, create_initiative, promote_backlog_to_initiative + +Project tools + create_project, update_project_priority, assign_project_department, read_project_status + +Portfolio and initiative tools + create_portfolio, create_initiative, assign_tpm, set_budget, set_priority, read_initiative_status + +Hat tools + list_hats, request_hat, propose_hat, approve_hat, assign_hat, release_hat, deprovision_hat, refresh_hat_token, read_hat_supply + +Agent insight tools + rank_agents_for_hat, read_agent_specialties, read_agent_memory_profile, read_hat_performance_history, recommend_hat_assignment + +Voting tools + open_vote, submit_vote, close_vote, read_vote_result + +Team tools + create_team, spawn_agent, spawn_team, assign_task, stop_agent, stop_team + +Task tools + create_task, claim_task, update_task, block_task, groom_task, mark_ready, submit_red_tests, submit_green_tests, complete_task + +Backlog tools + create_backlog_item, prioritize_backlog_item, link_backlog_item, convert_backlog_item, create_backlog_item_from_review, create_defect_from_report + +Messaging tools + send_message, read_inbox, send_report, open_thread, reply_thread, request_one_on_one_chat, open_team_chat, send_team_broadcast, escalate + +Artifact tools + submit_artifact, list_artifacts, link_artifact, require_artifact, attach_screenshot, attach_trace, attach_log + +Business tools + start_customer_interview, record_customer_answer, create_brd, approve_brd, reject_brd + +Architecture tools + create_ca, request_architecture_review, approve_architecture, reject_architecture + +Review tools + request_review, submit_review, approve_gate, reject_gate, assign_reviewer, create_outcome_review, create_performance_review + +Memory tools + query_memory, write_memory, explain_memory_scope + +Credential tools + request_credential_scope, review_credential_scope, approve_credential_scope, use_credential_proxy + +QA tools + create_test_case, run_browser_check, run_scheduled_qa_suite, record_qa_result, create_reproducibility_report, create_regression_report, qa_signoff, qa_bounce_back + +DevOps tools + submit_pipeline_failure_report, classify_pipeline_failure, attach_pipeline_log, recommend_dev_owner + +Delivery tools + request_merge, approve_merge, record_merge, record_release + +Status tools + read_org_status, read_team_status, read_run_status, read_budget_status, read_review_queue + +Observability tools + read_trace, read_audit_events, read_run_logs, read_agent_timeline, record_metric, create_health_report, classify_anomaly, request_self_healing, record_self_healing_result + +Always-on runtime tools + list_rules, evaluate_rules, read_reaction_plan, approve_reaction_plan, list_triggers, pause_trigger, resume_trigger, read_scheduler_status, read_worker_heartbeat, read_runtime_lease, release_runtime_lease, read_dead_letters, request_dlq_replay, quarantine_dead_letter, read_slo_status, open_incident, assign_incident_commander + +Meeting tools + request_meeting, schedule_meeting, open_meeting, set_conversation_mode, submit_meeting_decision, close_meeting + +Scheduled review tools + schedule_team_review, schedule_department_review, schedule_qa_regression, run_team_review, create_memory_adaptation_request, create_hat_effectiveness_review +``` + +All tools must be policy checked. + +## Credential Proxy + +Hermes agents should not receive raw broad credentials. + +They should receive scoped access through a credential proxy. + +Access should be based on: + +- agent identity; +- active hat; +- task; +- session; +- department; +- Oz run ID; +- mesh identity; +- Organization policy. + +Cilium Service Mesh and SPIRE workload identity can enforce that only the correct agent workloads can reach the credential proxy endpoints. + +## Cilium and SPIRE as Infrastructure Injection + +Cilium Service Mesh is not code-level dependency injection. It is infrastructure injection at the CNI and Gateway layer. SPIRE provides workload identity, and Trust Manager distributes trusted CA bundles. + +It can provide: + +- workload identity through SPIRE; +- mTLS between services; +- authorization policy; +- service routing through Gateway API; +- traffic shifting; +- egress control; +- telemetry through Hubble; +- access boundaries around MCP gateway, NATS, memory, and credential proxy. + +This lets the Organization inject dependencies and permissions around Hermes containers without giving agents uncontrolled access. + +## Session Containers + +Each Oz-run Hermes session container should include: + +- Hermes runtime; +- Organization MCP config; +- NATS connection config; +- workspace or repo access; +- credential proxy URL; +- memory adapter URL; +- agent profile; +- active hat assignment; +- resource limits; +- audit context. + +One container may host one or more Hermes agents, but the simplest model is one primary Hermes agent per container until multiplexing is proven safe. + +## State Ownership + +Oz owns run lifecycle. + +Organization owns organizational truth. + +NATS owns event transport. + +Hindsight owns long-term memory storage. + +Credential Proxy owns secret exchange. + +Cilium owns CNI, service mesh policy, Gateway API, ingress, and Hubble telemetry. + +SPIRE owns workload identity. + +cert-manager, Vault, Trust Manager, and External Secrets own TLS, secrets, CA distribution, and secret synchronization. + +Hermes owns reasoning and work. + +## First Proof + +The first proof should demonstrate: + +```text +1. User or Oz submits ambiguous goal. +2. Organization opens intake. +3. Executive Board hats vote that clarification is needed. +4. Customer Interviewer Hermes agent is launched through Oz. +5. Interviewer asks user questions and creates requirements artifact. +6. Business hat creates BRD and submits it for approval. +7. Architecture hat creates CA and submits it for approval. +8. Executive Board votes on needed hats, budget, TPM assignment, and departments. +9. TPM hat creates an initiative plan and Mission Control team. +10. Engineering Manager hat grooms work to ready. +11. Implementer hat writes red tests first and records failing evidence. +12. Implementer completes the work and records green test evidence. +13. Reviewer hat approves or rejects the work. +14. Delivery hat records merge or release readiness. +15. QA hat uses browser automation, screenshots, logs, and traces to sign off or bounce back. +16. Organization records completion, artifacts, votes, and memory attribution. +``` + +This proves the core model: + +- Oz orchestration; +- Hermes agents; +- hats; +- limited hat supply; +- scoped memory; +- voting; +- MCP-governed actions; +- BRD and CA artifacts; +- TDD-first engineering; +- task execution; +- artifact submission; +- review; +- QA signoff; +- delivery tracking; +- Organization-owned state. + +## Open Questions + +- Does Oz expose the exact run creation and child-run APIs needed for Hermes-driven spawning? +- Can Hermes run reliably as a containerized, resumable session under Oz/k3s? +- Does Hindsight support metadata-rich memory attribution and scoped recall, or is an adapter/fork required? +- Should one container run one Hermes agent, or can multiple agents safely share a container? +- What is the minimum Executive Board size for useful voting without excess cost? +- Which hat has authority to create or approve new hats? +- Should votes be majority, weighted by hat, consensus, or policy-defined per decision type? +- What state belongs in CockroachDB versus run-orchestrator metadata versus Hindsight? +- Where should long-running initiatives live: Organization DB only, or Oz parent/child run hierarchy too? diff --git a/docs/agentic-organization/README.md b/docs/agentic-organization/README.md new file mode 100644 index 0000000000..3627928526 --- /dev/null +++ b/docs/agentic-organization/README.md @@ -0,0 +1,23 @@ +# Hermes Organization Docs + +This folder is the working design set for the Hermes-native Organization platform. + +Current documents: + +- [Foundational Context and Language](./FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md) - Addison's working vocabulary, values, Zeta project context, declarative cluster mental model, and active clarifications. +- [Implementation Concepts](./IMPLEMENTATION_CONCEPTS.md) - how to build the architecture as services, data models, MCP tools, workflows, and runtime infrastructure. +- [Always-On Orchestration Runtime](./ALWAYS_ON_ORCHESTRATION_RUNTIME.md) - the workers, triggers, rules, leases, schedulers, watchers, reconcilers, SLOs, incidents, runbooks, and self-healing loops that keep the Organization continuously operating. +- [Runtime Technology and Package Strategy](./RUNTIME_TECH_AND_PACKAGE_STRATEGY.md) - how Temporal TS, Dapr Actors, NATS, Oz/Warp run orchestration, OpenZiti transport, Hermes, Hindsight, and reusable `agentic-services` primitives fit into a new Hermes-native platform. +- [UI and Observability Concepts](./UI_AND_OBSERVABILITY_CONCEPTS.md) - how humans visualize and operate the Organization across work, agents, hats, runs, pods, clusters, meetings, reports, and evidence. +- [Department, Hat, and Tool Inventory](./DEPARTMENT_HAT_TOOL_INVENTORY.md) - the starter department map, hat catalog, tool bundles, approval gates, lifecycle ownership, and high-risk guardrails for the Organization. +- [Organization Layer Build Plan](./ORGANIZATION_LAYER_BUILD_PLAN.md) - the service layer, role workspaces, automation loops, state model, UI surfaces, and MVP sequence needed to make each department and hat operational. +- [Work and Release Management OS](./WORK_AND_RELEASE_MANAGEMENT_OS.md) - the custom backlog, project, task, assignment, signal, board, and release workflow product that keeps agent work reliable and visible. +- [Ambiguous Requirement Lifecycle](./AMBIGUOUS_REQUIREMENT_LIFECYCLE.md) - the discovery, customer interview, BRD, workflow modeling, architecture, decomposition, readiness, and learning path from vague request to curated feature. +- [Anti-Stall Prioritization Runtime](./ANTI_STALL_PRIORITY_RUNTIME.md) - the hat-owned schedules, blocker triage, queue SLO, reassignment, alternate-work, dependency reconciliation, and priority routines that keep the Organization moving. +- [Implementation Readiness Checklist](./IMPLEMENTATION_READINESS_CHECKLIST.md) - the decisions and contracts that should be defined before scaffolding the first implementation slice. +- [Cluster-Native Hat System](./CLUSTER_NATIVE_HAT_SYSTEM.md) - the theoretical CRD, OPA, hat binding, succession, reputation, graph rendering, and event model for enforcing hats on Kubernetes. +- [Cluster Execution and Memory Substrate](./CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md) - the k3s, sandboxed Hermes container, Cilium Service Mesh, SPIRE identity, Vault-backed secrets, Credential Proxy, NATS, Hindsight, and runtime observability contract. +- [AI Cluster Scaffold Context](./AI_CLUSTER_SCAFFOLD_CONTEXT.md) - the two-directory NixOS/k3s/ArgoCD scaffold assumptions, component clarifications, bootstrap constraints, and deferred/local-model gating. +- [Architecture Source](./ORGANIZATION_RUNTIME_ARCHITECTURE.md) - the current conceptual architecture and operating model. + +The intent is to keep the architecture document focused on what the Organization is, while implementation documents describe how to build it incrementally. diff --git a/docs/agentic-organization/RUNTIME_TECH_AND_PACKAGE_STRATEGY.md b/docs/agentic-organization/RUNTIME_TECH_AND_PACKAGE_STRATEGY.md new file mode 100644 index 0000000000..f89cece70c --- /dev/null +++ b/docs/agentic-organization/RUNTIME_TECH_AND_PACKAGE_STRATEGY.md @@ -0,0 +1,723 @@ +# Runtime Technology and Package Strategy + +## Purpose + +This document decides how Dapr Actors, Temporal TypeScript, Dapr Workflow, NATS, Oz/Warp run orchestration, OpenZiti transport, Hermes, and existing `agentic-services` primitives should fit into the new Hermes-native Organization platform. + +The Organization is new. It should not be a dev-portal rewrite and it should not be TPM with a different name. + +Dev-portal and TPM are inspiration only. We should mine reusable infrastructure ideas and small primitives, then build a new Organization runtime around Hermes agents, hats, rules, projects, initiatives, tasks, durable state, and always-on orchestration. + +## Guiding Decision + +Use each runtime for one job: + +```text +Organization DB + source of truth for organization state + +Temporal TS + durable long-running process orchestration + +Dapr Actors + entity-local concurrency, live mailbox/stateful actor identity, reminders + +Orleans + optional cluster-resident .NET virtual actor/silo capability; do not use as the default Organization primitive unless a specific grain-based use case wins + +NATS / JetStream + event transport, inbox/outbox, live updates, fanout, integration streams + +Oz / Warp Run Orchestrator + distributed Hermes session/container execution + +OpenZiti + secure private transport/connectivity for Hermes sessions and protected service paths + +Hermes Agent + reasoning, planning, tool use, review, QA, and organizational labor + +Organization MCP Gateway + governed agent action surface + +Hindsight + long-term memory with Organization-controlled attribution +``` + +Do not make Temporal, Dapr, Oz/Warp, or OpenZiti the product model. They are infrastructure adapters behind Organization-owned concepts. + +The cluster execution and memory assumptions are detailed in [Cluster Execution and Memory Substrate](./CLUSTER_EXECUTION_AND_MEMORY_SUBSTRATE.md). The scaffold-level component direction is captured in [AI Cluster Scaffold Context](./AI_CLUSTER_SCAFFOLD_CONTEXT.md). In particular, Hindsight should be treated as real Hermes memory infrastructure: the current cluster direction uses the `vectorize-io/hindsight` OCI Helm chart, Hermes points at the in-cluster Hindsight service, and Organization policy still needs to enforce hat-scoped recall/write attribution. + +## Temporal TS Fit + +Temporal is strongest for durable workflows that must survive process crashes, restarts, timeouts, retries, and long waits. + +Use Temporal for: + +- initiative lifecycle workflows; +- task lifecycle workflows; +- review gate workflows; +- QA verification workflows; +- customer interview and BRD approval workflows; +- architecture CA/ADR approval workflows; +- credential approval workflows; +- incident response workflows; +- self-healing remediation workflows; +- scheduled organizational reviews; +- escalation timers; +- long-running multi-step processes with human or agent waits. + +Temporal workflows should be deterministic and thin. + +Workflow code should: + +- coordinate steps; +- wait for signals; +- start child workflows; +- schedule activities; +- enforce timers; +- record durable process history; +- call activities for all side effects. + +Workflow code should not: + +- call LLMs directly; +- call the run orchestrator directly; +- query databases directly; +- call credential proxy directly; +- run arbitrary Hermes reasoning; +- inspect non-deterministic environment state. + +All side effects should live in Temporal Activities: + +```text +Workflow + -> Activity: create Organization task + -> Activity: reserve hat + -> Activity: launch Oz run + -> Activity: send NATS event + -> Activity: call Organization MCP/internal API + -> Activity: fetch Hindsight context +``` + +Temporal should not replace the Organization DB. Temporal owns workflow execution history. The Organization DB owns business state. + +## Dapr Actors Fit + +Dapr Actors are useful for virtual actor identity and single-threaded access to entity-local state. + +Use Dapr Actors for live, entity-local coordination: + +- `AgentSessionActor`: one live context actor per Hermes session. +- `AgentMailboxActor`: one mailbox per Hermes agent/session. +- `HatSupplyActor`: one allocator per hat/project/department scope. +- `TeamRoomActor`: live team chat, broadcast, and turn-taking state. +- `MeetingActor`: active meeting agenda, speaker order, votes, and transcript pointer. +- `TaskActor`: hot task coordination, lock/heartbeat, and current assignment state. +- `ProjectRuntimeActor`: project-local runtime status and aggregate health. +- `IncidentActor`: live incident command state. +- `RunActor`: Oz run heartbeat, current status, and cancellation coordination. + +Actor state should be treated as hot operational state or a projection. It must not become the only source of truth for authoritative Organization records. + +Actor reminders can back local durable callbacks such as: + +- mailbox digest; +- meeting timeout; +- stale heartbeat check; +- hat reservation expiration; +- task assignment timeout; +- incident communication cadence. + +Use actor timers only for lightweight active-session behavior that can disappear on deactivation. Use actor reminders for callbacks that must survive deactivation/failover. + +## Actor-Backed MCP Context + +MCP tools should remain stateless at the network edge, but every tool call should be executed with actor-resolved session context. + +Flow: + +```text +Hermes agent + -> Organization MCP Gateway + -> validate hat JWT + -> resolve AgentSessionActor(sessionId) + -> actor returns current runtime context + -> gateway builds ToolExecutionContext + -> policy engine evaluates hat, scope, mode, task, and tool + -> tool handler calls Organization domain service + -> state/event/audit/trace persisted + -> AgentSessionActor records tool activity +``` + +`AgentSessionActor` should expose: + +```text +getRuntimeContext() +recordHeartbeat() +recordToolCallStarted() +recordToolCallCompleted() +setCurrentTask() +setCurrentTeam() +setCurrentMeeting() +setMode() +markRoleless() +``` + +Runtime context should include: + +- agent ID; +- session ID; +- active hat assignment ID; +- current task ID; +- current team ID; +- current meeting ID; +- current Oz run ID; +- current project ID; +- current initiative ID; +- memory scopes; +- credential scopes; +- allowed tool scopes; +- policy version; +- last heartbeat; +- current mode. + +The MCP Gateway treats request-provided IDs as lookup hints, not authority. The actor and Organization DB verify current authority before the tool executes. + +For some tools, the Gateway should also query narrower actors: + +```text +submit_task_evidence + -> AgentSessionActor for current agent/hat/session context + -> TaskActor for assignment and hot task state + -> HatSupplyActor for active hat reservation + -> Organization DB for authoritative task/gate state + -> policy engine for final allow/deny +``` + +Actors provide ambient runtime context and serialized hot state. They do not replace Organization DB truth. + +## Dapr Workflow Fit + +Dapr Workflow overlaps with Temporal. + +Do not use both Temporal and Dapr Workflow for the same category of process. + +Default recommendation: + +- use Temporal TS for durable organizational workflows; +- use Dapr Actors for virtual entity identity and hot coordination; +- skip Dapr Workflow initially. + +Use Dapr Workflow only if we decide we want a simpler Dapr-only runtime and are willing to give up Temporal's mature workflow ecosystem, visibility, worker model, and replay discipline. + +## Temporal and Dapr Together + +The clean integration pattern: + +```text +Temporal workflow + -> Activity + -> Organization service + -> Dapr actor call when entity-local serialized state is needed + -> Organization DB write + -> NATS event + -> Oz run request when Hermes execution is needed +``` + +Examples: + +```text +TaskLifecycleWorkflow + -> reserve implementer hat via HatSupplyActor + -> create Oz run request + -> wait for Hermes completion signal + -> start ReviewGateWorkflow + -> start QAVerificationWorkflow +``` + +```text +IncidentWorkflow + -> create IncidentActor + -> assign Incident Commander hat + -> schedule communication cadence + -> launch diagnosis Hermes run through Oz + -> wait for mitigation evidence + -> require postmortem and follow-up backlog items +``` + +Temporal owns durable sequence. Dapr Actors own hot per-entity coordination. Organization DB owns truth. + +## What Dapr Actors Should Not Do + +Avoid putting broad organizational intelligence in actors. + +Do not use actors for: + +- global rule evaluation; +- full project/initiative lifecycle truth; +- long-running cross-entity workflows; +- durable process history; +- LLM reasoning; +- credential policy decisions; +- final task/gate approval authority. + +Actors should be narrow and boring: serialize a small scope, expose a clear command/query API, emit events, and persist snapshots back to Organization state when needed. + +## What Temporal Should Not Do + +Avoid making Temporal workflows giant agent brains. + +Do not use Temporal workflows for: + +- dynamic LLM reasoning loops; +- arbitrary tool selection; +- chat transcript storage; +- memory storage; +- full work item database state; +- direct UI state; +- real-time messaging fanout. + +Temporal is the durable process rail. Hermes and Organization MCP tools are the agentic layer. + +## Package Strategy + +Create new Hermes Organization packages. Do not extend dev-portal or TPM directly. + +The concrete TypeScript app stack and app/package layout are defined in [Organization Layer Build Plan](./ORGANIZATION_LAYER_BUILD_PLAN.md#typescript-application-stack). This runtime strategy owns why each infrastructure rail exists; the build plan owns how the app should be scaffolded. + +Proposed packages: + +```text +@hermes-org/domain + typed entities, enums, events, commands, value objects, policy models + +@hermes-org/state + repositories, outbox, idempotency, leases, migrations, projections + +@hermes-org/runtime + scheduler, durable triggers, rules engine, reaction executor, reconcilers, workers + +@hermes-org/workflows-temporal + Temporal workflows, activities, task queues, workflow clients + +@hermes-org/actors-dapr + Dapr actor interfaces and implementations + +@hermes-org/messaging + NATS/JetStream event bus, inbox/outbox, DLQ contracts + +@hermes-org/mcp + Organization MCP gateway, tool registry, policy-checked tool handlers + +@hermes-org/hermes + Hermes session adapter, Oz launch adapter, run context builder + +@hermes-org/hats + hat graph, hat assignment, JWT issuance/refresh, hat supply policies + +@hermes-org/memory + Hindsight adapter, memory attribution, scoped recall, memory quality workflows + +@hermes-org/docs-skills + documentation context resolver, project skill ingestion, graph projection + +@hermes-org/observability + trace/log/metric helpers, health reports, SLOs, anomaly reports + +@hermes-org/policy + RBAC, OPA/Rego policy bundles, conflict policy, human override policy + +@hermes-org/adapters-agentic-services + compatibility wrappers for reused primitives from @tgcs/agentic-services +``` + +The `adapters-agentic-services` package should be temporary. Its job is to let us reuse proven primitives without inheriting TPM semantics. + +Packages should be the reusable capability layer. NestJS apps should be orchestrators that compose those packages through dependency injection, transport adapters, lifecycle hooks, health checks, and process wiring. A package may be used by the API, worker, Temporal worker, Dapr actor host, and MCP gateway without copying business logic between them. + +## Reusing Existing agentic-services + +Useful primitives to pull or wrap: + +- MCP tool interfaces and registry. +- Dev agent provider abstractions. +- execution environment utilities. +- message bus interface and session messaging concepts. +- retry utility and circuit breaker. +- OpenTelemetry tracing helpers. +- LLM providers and cost calculator. +- GitLab/Jira/Confluence/Jenkins/TestRail utilities. +- schema validation/generation helpers. +- prompt flow parsing concepts where useful. +- TPM persistence interface ideas for sessions, teams, tasks, and threads. +- artifact registry ideas. +- remote-control/status snapshot ideas. + +Avoid carrying forward: + +- `TPMAgent` as an Organization orchestrator; +- TPM prompt/persona assumptions; +- TPM task board as the source model; +- TPM team lifecycle as the Organization lifecycle; +- TPM slash-command semantics; +- dev-portal session assumptions; +- app-specific transport coupling. + +Potential extraction approach: + +```text +copy interface/concept + -> rename into Organization language + -> remove TPM/dev-portal assumptions + -> add hats/policy/trace/correlation fields + -> add tests around new Organization semantics + -> deprecate adapter when native package is stable +``` + +## Likely Fork/Adapt Decisions + +### MCP Registry + +Adapt, do not fork heavily. + +The current registry shape is useful, but Organization tools need: + +- hat-token context; +- policy evaluation; +- trace/correlation metadata; +- idempotency key; +- audit event emission; +- project/initiative/task scope; +- tool visibility by hat and project skill. + +Build `@hermes-org/mcp` as a new package and either wrap or copy the registry pattern. + +### Message Bus + +Adapt the interface, replace providers. + +The current message bus abstraction already thinks in sessions, at-least-once delivery, ordering, correlation IDs, health, and orphan cleanup. That is useful. + +But Organization messaging should standardize on NATS/JetStream first. Azure Service Bus and Redis can remain optional compatibility providers. + +Build: + +- `NatsOrganizationEventBus`; +- `NatsInboxBus`; +- `JetStreamOutboxPublisher`; +- `DeadLetterService`; +- `ConsumerLeaseService`. + +### Event Bus + +Do not use the in-process event bus as a distributed runtime. + +It can inspire local module events and tests. The Organization runtime needs persisted events, outbox, NATS, idempotent consumers, and durable reactions. + +### Tracing + +Adapt and expand. + +Existing LLM tracing helpers are a good starting point. We need broader span helpers: + +- workflow span; +- actor span; +- MCP tool span; +- rule evaluation span; +- reaction execution span; +- Oz run span; +- hat assignment span; +- memory query span; +- documentation context span; +- credential proxy span. + +### DevAgent Provider Interfaces + +Adapt, but rename around Hermes. + +Use the provider abstraction idea for Hermes execution providers: + +- `HermesProvider`; +- `HermesSessionAdapter`; +- `HermesRunContext`; +- `HermesToolSurface`; +- `HermesEventStream`. + +Do not expose `DevAgent` as the Organization product language. + +### TPM Team/Task/Persistence + +Mine for concepts only. + +Useful ideas: + +- team state persistence; +- member state; +- task board persistence; +- thread persistence; +- message persistence; +- remote status snapshots; +- artifact tracking. + +New authoritative concepts should be Organization-native: + +- `OrganizationTeam`; +- `Mission`; +- `WorkItem`; +- `Task`; +- `HatAssignment`; +- `Meeting`; +- `ReactionPlan`; +- `AgentSession`; +- `OzRunBinding`. + +### Prompt Flows + +Adapt carefully. + +Prompt flows are useful as templates for structured agent work, but the Organization should call them: + +- work protocols; +- runbooks; +- department procedures; +- project skills; +- gate workflows. + +Do not let prompt flows own lifecycle state. Temporal/Organization state owns lifecycle; Hermes receives a protocol as context. + +### LLM Providers and Cost + +Reuse. + +Provider abstraction, cost calculator, token counter, model metadata, and model refresh scheduling are useful as platform services. Add Organization scopes: + +- project; +- initiative; +- hat; +- agent; +- session; +- budget policy; +- cost attribution. + +## New Runtime Contracts + +### Organization API Contract + +All infrastructure adapters should call Organization services, not mutate state directly. + +```text +Temporal Activity +Dapr Actor +NATS Consumer +Oz Callback +MCP Tool Handler + -> Organization domain service + -> policy check + -> state transition + -> event/outbox + -> audit/trace +``` + +### Idempotency Contract + +Every side-effecting command needs: + +- command ID; +- idempotency key; +- causation ID; +- correlation ID; +- trace ID; +- actor/workflow/run identity when relevant. + +Temporal retries Activities. Dapr reminders can retry. NATS redelivers. Oz callbacks can duplicate. The Organization must treat duplicates as normal. + +### State Ownership Contract + +```text +Organization DB + authoritative state and audit + +Temporal history + durable workflow execution state + +Dapr actor state + entity-local hot state / projection + +NATS + transport and replay stream + +Oz + execution lifecycle for Hermes containers + +Hindsight + memory store +``` + +No adapter should become an unreviewed second source of truth. + +## Concrete Build Plan + +### Phase 1: Native Organization Core + +Build without Temporal or Dapr first: + +- domain models and enums; +- Organization DB schema; +- outbox; +- idempotency; +- rules engine; +- reaction plan executor; +- NATS event contracts; +- MCP gateway skeleton; +- Hermes/Oz adapter interface. + +Use in-process fakes for workflow and actor boundaries. + +### Phase 2: Temporal Adapter + +Add Temporal for one workflow: + +```text +TaskLifecycleWorkflow + -> mark task ready + -> reserve hat activity + -> launch Hermes via Oz activity + -> wait for completion signal + -> request review activity + -> wait for review signal + -> request QA activity +``` + +All Activities call Organization services. + +Also build the workflow registry and workflow capability request path: + +- `WorkflowCapabilityRequest`; +- `WorkflowRegistry`; +- workflow version metadata; +- task queue ownership; +- allowed launch rules; +- deterministic workflow test contract; +- idempotent activity test contract; +- rollback/versioning plan. + +Agents may propose new Temporal workflows, but workflows only become launchable after manager/director approval, architecture review, security review when needed, tests, observability, and registry activation. + +### Phase 3: Dapr Actor Adapter + +Add Dapr Actors for one narrow actor: + +```text +HatSupplyActor(projectId, hatId) + reserve + release + expireReservation + getSupply +``` + +Use reminders for reservation expiry. + +### Phase 4: NATS Production Messaging + +Replace local event bus with: + +- JetStream outbox publisher; +- durable consumers; +- DLQ workflow; +- replay policy; +- UI event stream. + +### Phase 5: Hermes Runtime Integration + +Implement: + +- `HermesSessionAdapter`; +- `OzRunAdapter`; +- `CredentialProxyAdapter`; +- `HindsightMemoryAdapter`; +- Organization MCP tool surface. + +### Phase 6: Package Extraction + +Create `@hermes-org/*` packages. Copy/adapt the minimum from `@tgcs/agentic-services` with tests and new naming. + +### Phase 7: Capability Expansion Runtime + +Build the governed self-expansion loop: + +```text +agent detects capability gap + -> submit CapabilityRequest + -> manager/director/security/architecture gates + -> implementation task or initiative + -> tests and observability + -> registry update + -> capability becomes available to approved hats +``` + +Registries to support: + +- MCP tool registry; +- credential proxy endpoint registry; +- Temporal workflow registry; +- Dapr actor registry; +- durable trigger catalog; +- project skill graph; +- hat capability graph. + +This is the point where the Organization can safely build new abilities for itself without letting an agent directly grant itself new power. + +## Temporal vs Dapr Decision Matrix + +| Need | Temporal TS | Dapr Actors | NATS | Organization DB | +|---|---:|---:|---:|---:| +| Long-running lifecycle | Best | Weak | No | State only | +| Durable timers and human waits | Best | Good for per-actor reminders | No | State only | +| Per-entity serialized commands | Possible but heavy | Best | No | Needs locks | +| Event fanout | No | No | Best | Outbox source | +| Durable truth | Workflow history only | Actor state only | No | Best | +| Agent container execution | No | No | No | Tracks only | +| LLM/Hermes reasoning | Activity can launch | Actor should not | No | No | +| UI live updates | Indirect | Indirect | Best | Query source | +| Failure/retry orchestration | Best | Good locally | Delivery retry | Idempotency | + +## Recommended First Architecture + +```text +NestJS Organization API + -> CockroachDB + -> NATS JetStream + -> Temporal client + -> Dapr client + -> Oz adapter + -> MCP gateway + +Temporal workers + -> Organization activities + -> Oz launch activities + -> NATS publish activities + +Dapr actor service + -> AgentSessionActor + -> HatSupplyActor + -> AgentMailboxActor + -> TeamRoomActor + +Hermes session containers + -> Organization MCP gateway + -> Credential proxy + -> Hindsight adapter + -> NATS inbox/events +``` + +Start with Temporal for durable process and Dapr Actors for one narrow hot-state allocator. Expand only after the boundaries are proven. + +## Open Questions + +- Should Temporal be mandatory from M1, or introduced after native state/rules are proven? +- Is Dapr already acceptable in the local k3s baseline, or should actors wait until after Oz/Hermes are running? +- Do we want one actor service for all actor types or separate services by domain? +- Should NATS remain the only external event stream even when Dapr pub/sub is available? +- Which existing `agentic-services` primitives should be copied versus wrapped as a dependency during early development? +- Do we rename all `DevAgent` concepts immediately, or maintain a compatibility adapter until Hermes-native interfaces settle? diff --git a/docs/agentic-organization/UI_AND_OBSERVABILITY_CONCEPTS.md b/docs/agentic-organization/UI_AND_OBSERVABILITY_CONCEPTS.md new file mode 100644 index 0000000000..66a2021f4d --- /dev/null +++ b/docs/agentic-organization/UI_AND_OBSERVABILITY_CONCEPTS.md @@ -0,0 +1,755 @@ +# Hermes Organization UI and Observability Concepts + +## Purpose + +Humans need to see the Organization operating in real time. + +The UI is not only an admin panel. It is the operating console for a living agentic organization: + +- what work exists; +- who owns it; +- which hats are active; +- which agents are running; +- what pods and clusters are executing; +- what decisions were made; +- where tasks are blocked; +- what meetings are happening; +- what reports were filed; +- what memory and artifacts were created; +- how budget and hat supply are being used. + +## Product Principle + +The UI should make the Organization legible. + +Humans should be able to move from broad health to exact evidence: + +```text +Organization health + -> project + -> initiative + -> mission/team + -> task + -> agent session + -> pod/Oz run + -> messages, artifacts, votes, logs, traces +``` + +The UI should not feel like a marketing dashboard. It should feel like an operations console: dense, searchable, status-rich, and built for repeated daily use. + +## Primary Views + +### Organization Overview + +Shows the whole Organization at a glance. + +Core widgets: + +- active projects; +- active initiatives; +- active Oz runs; +- active Hermes sessions; +- active hats by department; +- hat supply utilization; +- blocked tasks; +- QA reproducible failures; +- pending reviews; +- pending votes; +- open escalations; +- budget usage; +- cluster health; +- NATS event health; +- credential proxy denials; +- recent major decisions. + +### Hierarchy Explorer + +A tree/table view of the hierarchy: + +```text +Organization + -> Portfolio + -> Project + -> Initiative + -> Mission + -> Work Item + -> Task +``` + +Capabilities: + +- expand/collapse hierarchy; +- filter by status, department, hat, agent, priority, project, cluster; +- show owners and active hats; +- show progress and blockers; +- click into exact task or initiative detail. + +### Project Board + +Project-level work view. + +Shows: + +- initiatives by status; +- project backlog; +- service requests; +- defects; +- scheduled QA suites; +- release readiness; +- active departments; +- project memory highlights; +- documentation health; +- project skill health; +- project-level decisions. + +### Project Documentation Library + +Project-scoped source of truth for business, product, architecture, and engineering documents. + +Shows: + +- BRDs by project, initiative, and product area; +- CAs by initiative, repository, service, and component; +- ADRs by decision status; +- design docs by system area; +- documentation owners; +- approval state; +- linked gates; +- stale documentation warnings; +- missing documentation requirements; +- work items blocked by missing docs. + +Useful filters: + +- project; +- initiative; +- repository; +- service/component; +- document type; +- approval state; +- owning hat; +- required for gate. + +Each document view should show the work it governs, the agents and hats that used it, the reviews that cited it, and the downstream decisions it produced. + +### Project Skill Library + +Operational view for repo and project-specific skills. + +Shows: + +- active skills; +- proposed skills; +- deprecated skills; +- skill owners; +- allowed hats; +- project/repo scope; +- required tools; +- required artifacts; +- ingestion status; +- graph edges; +- usage history; +- observed success/failure outcomes. + +Engineering Manager hats should be able to review proposed skills, approve ingestion, deprecate stale skills, and request new skill work when team reviews reveal recurring failure modes. + +### Initiative Control Room + +Focused view for one initiative. + +Shows: + +- assigned director; +- assigned TPM; +- engineering managers; +- active teams; +- tasks by state; +- BRD and CA artifacts; +- gate status; +- review queue; +- QA status; +- budget and hat supply; +- meeting history; +- escalation history; +- Oz run bindings. + +### Task Board + +Agent-native Linear-like task management. + +Columns: + +```text +backlog +intake +discovery +ready +planned +in_progress +code_review +qa_review +qa_reproducible +needs_rework +approved +merged +released +done +blocked +``` + +Task cards should show: + +- title; +- owning hat; +- assigned agent; +- reviewers; +- required artifacts; +- gate state; +- red/green test evidence state; +- QA evidence state; +- priority; +- blockers; +- linked Oz runs; +- last event. + +### Department Dashboard + +Per-department operations view. + +Examples: + +- Engineering department: teams, managers, task readiness, review queue, TDD compliance. +- QA department: scheduled suites, reproducible failures, coverage gaps, screenshots/traces. +- Security department: credential requests, denied scopes, policy changes. +- Memory department: memory adaptation requests, stale memories, missing memory reports. +- DevOps department: pipeline failure reports, cluster health, Oz worker health. + +### Hat Supply View + +Shows limited hat capacity and usage. + +Core data: + +- total supply per hat; +- active assignments; +- waiting requests; +- token expiry; +- revoked/deprovisioned hats; +- assignment chain; +- utilization by project/initiative; +- cost by hat; +- agent fit recommendations. + +### Agent Directory + +Shows Hermes agents and their experience profile. + +For each agent: + +- active status; +- current hats; +- historical hats; +- memory specialties; +- performance reviews; +- projects worked on; +- tasks completed; +- review outcomes; +- QA bounce-backs; +- cost/runtime; +- current Oz runs; +- session history. + +### Run and Cluster Observatory + +Shows execution across all pods and clusters. + +Group by: + +- cluster; +- namespace; +- Oz run; +- Hermes session; +- project; +- initiative; +- hat; +- agent. + +Shows: + +- pod status; +- container image; +- resource usage; +- logs; +- traces; +- run duration; +- restart count; +- Cilium mesh, Gateway, and workload identity status; +- credential proxy calls; +- MCP calls; +- NATS events; +- linked Organization task. + +### Trace and Evidence Explorer + +Deep inspection view for one correlated chain of work. + +A human should be able to start from any object and walk the trace: + +```text +goal + -> project + -> initiative + -> task + -> hat assignment + -> agent session + -> Oz run + -> pod + -> MCP call + -> policy decision + -> credential proxy call + -> memory/doc/skill read + -> NATS event + -> artifact + -> state transition +``` + +Shows: + +- trace ID and correlation ID; +- span tree; +- causation chain; +- exact state transitions; +- active agent, hat, policy version, and token status; +- MCP tool inputs and structured outputs where safe; +- policy allow/deny rationale; +- credential proxy allow/deny evidence; +- memory reads and writes with scope; +- documentation and project skills consulted; +- NATS publish/consume/replay/dead-letter details; +- linked screenshots, logs, browser traces, test output, and reports; +- retry and self-healing attempts; +- final outcome. + +The view should make it easy to answer "why did this happen?" without reading raw logs first. + +### Anomaly and Self-Healing Console + +View for failures, degradations, retries, and automated remediation. + +Shows: + +- detected anomalies; +- classified failure mode; +- affected project/initiative/task/run; +- blast radius; +- correlated traces and logs; +- attempted remediation; +- verification result; +- escalation owner; +- repeated occurrence count; +- linked backlog item, defect, memory adaptation request, or skill request. + +Supported actions: + +- approve safe remediation when human approval is required; +- pause remediation; +- escalate to department; +- convert recurring issue into backlog item; +- request memory update; +- request project skill creation; +- request observability improvement. + +### Observability Coverage View + +Every project, repo, internal service, and agent-built tool should show whether it meets the Organization observability standard. + +Coverage dimensions: + +- structured logs; +- distributed traces; +- metrics; +- health checks; +- readiness checks; +- state events; +- audit events; +- artifact evidence; +- UI-visible status; +- self-healing behavior; +- escalation behavior. + +Missing coverage should be actionable. Engineering Manager and DevOps hats should be able to create backlog items directly from coverage gaps. + +### Always-On Runtime Console + +Operations view for the machinery that keeps the Organization awake. + +Shows: + +- worker heartbeats; +- scheduler lag; +- due scheduled jobs; +- running scheduled jobs; +- durable triggers; +- recent trigger executions; +- rule evaluation queue; +- reaction plans by status; +- runtime leases; +- outbox backlog; +- NATS consumer lag; +- dead-letter counts; +- watcher checkpoints; +- reconciliation findings; +- self-healing queue; +- budget admission decisions. + +Supported actions: + +- pause or resume a trigger; +- pause or resume a scheduled job; +- inspect a reaction plan; +- approve a guarded reaction plan; +- release a stale lease when policy allows; +- replay or quarantine dead-letter messages; +- acknowledge reconciliation findings; +- open an incident from runtime drift. + +### Rules and Reactions View + +Every automated organizational action should be explainable. + +Shows: + +- rule catalog; +- rule owner department; +- rule scope; +- predicate; +- priority; +- conflict policy; +- matched and skipped rules; +- reaction plan; +- policy version; +- final action; +- linked state changes; +- trace and audit evidence. + +This view should be reachable from projects, initiatives, teams, tasks, departments, and agent runs. + +### SLO and Incident Command + +Always-on operations need visible reliability goals. + +Shows: + +- SLO targets; +- current burn rate; +- error budget remaining; +- affected projects and components; +- open incidents; +- severity; +- incident commander; +- assigned responder hats; +- mitigation status; +- communication cadence; +- rollback/freeze state; +- postmortem and follow-up backlog items. + +### Meeting Center + +Shows active and historical meetings. + +Supports: + +- executive meetings; +- department meetings; +- team chats; +- one-on-one chats; +- decision meetings; +- review panels; +- incident triage. + +For each meeting: + +- purpose; +- participants and hats; +- conversation mode; +- current speaker/turn order; +- agenda; +- transcript; +- votes; +- decisions; +- artifacts created; +- memories created; +- resulting tasks. + +### Decision and Vote Ledger + +Immutable view of organizational decisions. + +Shows: + +- decision; +- voters; +- hats worn; +- vote scope; +- rationale; +- linked evidence; +- linked meeting; +- policy version; +- timestamp; +- resulting state changes. + +### Reports Inbox + +Unified report handling. + +Report types: + +- service request; +- bug report; +- QA reproducibility report; +- DevOps pipeline failure report; +- security risk report; +- memory quality report; +- outcome review; +- performance review; +- incident report. + +View should support: + +- triage status; +- owner; +- priority; +- linked task/backlog/initiative; +- evidence; +- escalation path; +- SLA/age. + +### Artifact and Evidence Browser + +Searchable artifact store. + +Artifact types: + +- BRD; +- CA; +- ADR; +- design doc; +- project skill; +- repo skill; +- test evidence; +- red test evidence; +- green test evidence; +- screenshots; +- browser traces; +- logs; +- QA reports; +- review reports; +- release evidence; +- meeting transcripts; +- memory changes. + +## Live Organization Map + +The UI should include a graph view that shows how work is flowing. + +Nodes: + +- projects; +- initiatives; +- missions; +- teams; +- tasks; +- agents; +- hats; +- skills; +- meetings; +- decisions; +- artifacts; +- Oz runs; +- pods. + +Edges: + +- owns; +- assigned_to; +- reviews; +- blocks; +- depends_on; +- spawned; +- reports_to; +- voted_on; +- created_artifact; +- references_artifact; +- can_use_skill; +- used_skill; +- applies_to; +- produced_memory; +- running_in. + +This is useful for understanding why the Organization is doing something. + +## Interaction Timeline + +Every project, initiative, task, agent, and run should have a timeline. + +Timeline events: + +- task created; +- hat assigned; +- Oz run started; +- agent sent report; +- artifact submitted; +- review requested; +- vote opened; +- decision recorded; +- QA reproduced issue; +- meeting opened; +- memory written; +- credential denied; +- task moved state. + +Timelines should support filtering by event type and jumping to source evidence. + +## Human Actions + +Humans should be able to: + +- submit goals; +- submit reports; +- answer customer interview questions; +- approve or reject human-required gates; +- pause or stop Oz runs; +- deprovision hats; +- adjust priority; +- trigger escalation; +- request meeting; +- inspect memory attribution; +- override with audited reason when policy allows; +- create or edit standards through approved workflows. + +Human actions must be audited like agent actions. + +## Alerting + +Alerts should be visible in the UI and optionally routed externally. + +Initial alerts: + +- hat token denials spike; +- credential proxy denials spike; +- Oz run stuck; +- Hermes session crash loop; +- NATS dead-letter messages; +- QA reproducible failures on high-priority initiative; +- review queue exceeds threshold; +- budget threshold exceeded; +- scheduled QA suite failing; +- executive vote pending too long; +- project blocked by missing hat supply; +- memory adapter degraded; +- trace ingestion degraded; +- log ingestion degraded; +- metrics ingestion degraded; +- self-healing failure rate exceeds threshold; +- observability coverage drops below project standard; +- repeated anomaly classification for the same repo, skill, hat, or component; +- scheduler lag exceeds SLO; +- trigger execution failures exceed threshold; +- worker heartbeat missing; +- runtime lease contention spike; +- reaction plan stuck; +- outbox backlog exceeds threshold; +- NATS consumer lag exceeds threshold; +- dead-letter replay fails; +- watcher checkpoint stale; +- reconciliation drift remains unresolved; +- SLO error budget burned; +- incident commander unassigned. + +## Data Sources + +UI data should come from: + +- Organization DB for authoritative state; +- NATS for live updates; +- Oz API for run lifecycle/log/artifact metadata; +- k3s/Kubernetes API for pod state; +- Cilium Hubble telemetry for service interactions; +- Credential Proxy audit log; +- Hindsight adapter for memory summaries; +- artifact store for evidence; +- trace backend for distributed traces; +- log backend for structured and raw logs; +- metrics backend for time-series health, quality, and cost signals; +- graph projection for relationships between agents, hats, skills, docs, memories, tasks, runs, and outcomes. + +Do not make the UI infer truth from logs. Logs support diagnosis. Organization state remains authoritative. + +## Frontend Implementation Notes + +Likely frontend shape: + +- React or Next.js; +- dense dashboard layout; +- left navigation by Organization, Projects, Departments, Runs, Reports, Meetings, Agents, Hats; +- WebSocket or SSE for live updates; +- table-first views with graph/timeline overlays; +- command palette for finding tasks, agents, hats, runs, and artifacts; +- drill-down panels instead of excessive page switching. + +Important UI constraints: + +- do not hide evidence behind chat transcripts; +- every status should link to the state transition or gate that caused it; +- every agent action should reveal active hat and policy scope; +- every Oz run should link back to Organization work; +- every task should show required artifacts and missing gates; +- every decision should show voters and rationale; +- every trace should link to state, artifacts, logs, and metrics; +- every failure should show whether it was retried, self-healed, escalated, or left blocked; +- every internal tool should expose observability coverage; +- every automated action should show the rule and trigger that caused it; +- every scheduled job should show owner, cadence, last run, next run, and last result; +- every dead-letter message should show classification, replay/discard decision, and evidence. + +## MVP UI + +First useful UI: + +```text +1. Organization Overview +2. Project Board +3. Initiative Control Room +4. Task Board +5. Agent/Run Detail +6. Reports Inbox +7. Artifact Browser +8. Trace and Evidence Explorer +9. Anomaly and Self-Healing Console +10. Always-On Runtime Console +11. Rules and Reactions View +12. SLO and Incident Command +``` + +MVP live update path: + +```text +Organization DB state + -> domain event + -> NATS + -> UI SSE/WebSocket + -> update visible board/timeline +``` + +First visual proof: + +```text +Goal submitted + -> Oz run starts + -> Hermes agent receives hat + -> task created + -> artifact submitted + -> review approved + -> QA signs off or reports reproducible issue + -> timeline and board update live +``` diff --git a/docs/agentic-organization/WORK_AND_RELEASE_MANAGEMENT_OS.md b/docs/agentic-organization/WORK_AND_RELEASE_MANAGEMENT_OS.md new file mode 100644 index 0000000000..9291f95564 --- /dev/null +++ b/docs/agentic-organization/WORK_AND_RELEASE_MANAGEMENT_OS.md @@ -0,0 +1,467 @@ +# Work and Release Management OS + +The Organization needs its own task, backlog, project, and release management product. This is not an integration with Linear or Jira. It is the operational backbone that lets Hermes agents understand work, update progress, receive assignments, emit signals, request reviews, manage releases, and keep every level of the Organization aware of health. + +This layer should be small at first, but it must be designed as a real workflow system from day one. + +Ambiguous and customer-facing work follows the discovery lifecycle in [Ambiguous Requirement Lifecycle](./AMBIGUOUS_REQUIREMENT_LIFECYCLE.md) before it can become implementation-ready. + +Blocked and stale work follows the hat-owned movement lifecycle in [Anti-Stall Prioritization Runtime](./ANTI_STALL_PRIORITY_RUNTIME.md) so directors, TPMs, managers, and review hats resolve blockers while other useful work continues. + +## Purpose + +The Work and Release Management OS should: + +- capture all work as first-class state; +- track goals, projects, initiatives, tasks, defects, service requests, capability requests, reviews, releases, and incidents; +- drive transitions through required gates and evidence; +- assign hats reliably without double-booking or lag; +- emit signals when state changes, work stalls, budgets burn, hats are scarce, or releases become risky; +- provide role-specific boards and queues for agents and humans; +- support scheduled work such as QA regression, department reviews, release checks, and runtime health reviews; +- create durable audit trails for every action, decision, review, artifact, and automation event. + +The OS is how the Organization knows what everyone is doing and how it is performing at every level. + +## Product Shape + +```text +Goal / Report / Service Request + -> Project + -> Initiative + -> Work Item + -> Task / Defect / Capability Request / Review / Incident + -> Gate + -> Artifact + -> Assignment + -> Run + -> Release Link +``` + +This structure should be flexible enough for internal platform work and product/customer work. A capability request for a new MCP tool, a QA-discovered defect, a release blocker, and a customer feature request all become work, but each gets a different workflow type, gate policy, and owner department. + +## Core Domain Objects + +| Object | Meaning | +|---|---| +| Goal | Ambiguous or high-level objective submitted by a human, agent, system, or executive hat | +| Report | Internal signal such as QA finding, pipeline failure, SLO burn, memory quality issue, or process issue | +| Service Request | Request for help, credential access, environment change, investigation, or operational action | +| Project | Long-lived product, platform, repo family, customer area, or internal system | +| Initiative | Prioritized body of work with owner, scope, budget, required gates, and expected outcomes | +| Initiative Branch | Feature branch or branch family where all development and QA for an initiative happens before promotion to the system build branch | +| Work Item | Common superclass for task, defect, capability request, review task, incident task, release task | +| Task | Concrete unit of execution with acceptance criteria, required hats, dependencies, and evidence | +| Defect | Reproducible problem with severity, reproduction evidence, affected project/release, and fix flow | +| Capability Request | Request for a new tool, credential, workflow, actor, skill, memory adaptation, or runtime feature | +| Gate | Required approval such as BRD, CA, security, code review, QA, delivery, memory, release | +| Assignment | Binding of a hat and agent to scoped work, with TTL, lease, token, and release policy | +| Release | Merge/promotion/deployment unit with gate evidence, risk, rollback, notes, and verification | +| Automation Package | CI, test, deployment, preview environment, rollback, observability, and operational automation created or updated with the feature | +| Signal | Durable event that informs boards, rules, agents, meetings, triggers, and UI read models | +| Requirement Maturity | Discovery-specific state that tracks whether an ambiguous request has enough customer, business, workflow, and acceptance context to move toward implementation | + +## State Machines + +### Work Item States + +```text +intake + -> classified + -> discovery + -> needs_business_approval + -> needs_architecture + -> ready + -> planned + -> assigned + -> in_progress + -> blocked + -> review + -> needs_rework + -> qa_review + -> qa_reproducible + -> delivery_review + -> approved + -> merged + -> released + -> outcome_review + -> done +``` + +The state machine should allow workflow-specific subsets. For example, a credential request does not need code review, but it does need security review. A documentation task may not need QA, but it may need architecture or product approval. + +### Requirement Maturity States + +```text +raw_intake + -> classified + -> ambiguity_scored + -> discovery_required + -> interview_planned + -> interview_in_progress + -> source_evidence_captured + -> requirements_drafted + -> workflow_modeled + -> acceptance_criteria_drafted + -> brd_review + -> product_signoff + -> architecture_ready + -> implementation_ready +``` + +Requirement maturity gates implementation. A customer-facing or ambiguous feature cannot move to `ready` until it reaches `implementation_ready` or receives an approved no-discovery/no-BRD exception. + +### Initiative Branch States + +```text +branch_requested + -> branch_created + -> automation_plan_ready + -> development_open + -> ci_cd_ready + -> qa_environment_ready + -> qa_in_progress + -> qa_signed_off + -> merge_ready + -> merged_to_main + -> system_build_verified + -> branch_closed +``` + +Each initiative should have a feature branch or branch family that isolates all development, review, QA, and evidence until the initiative is complete. `main` is the system build branch. Work should not merge to `main` until the initiative branch has passed required review and QA gates. + +### Release States + +```text +release_intake + -> scope_selected + -> evidence_check + -> qa_required + -> qa_signed_off + -> security_review_required + -> delivery_review + -> approved_for_merge + -> merged + -> system_build_verified + -> approved_for_release + -> released + -> post_release_verification + -> release_complete +``` + +Release state should be explicit because Delivery hats need a queue that proves readiness instead of relying on chat. Releases can represent a code merge, internal platform activation, workflow registry activation, MCP tool activation, credential proxy endpoint rollout, or product deployment. + +### Assignment States + +```text +requested + -> candidate_ranked + -> supply_reserved + -> token_issued + -> active + -> refresh_required + -> suspended + -> released + -> expired + -> revoked +``` + +Assignments are their own state machine because hat reliability is central. Work should not be considered assigned just because a task has an owner field. The Organization must reserve supply, issue authority, bind a session, monitor heartbeats, and release capacity deterministically. + +## Signal Model + +Signals are durable, typed events. They are not chat messages. They drive boards, rules, triggers, automation, UI projections, and agent inboxes. + +| Signal family | Examples | Primary consumers | +|---|---|---| +| Work state | `WorkItemCreated`, `WorkItemMarkedReady`, `TaskBlocked`, `TaskSubmittedForReview`, `TaskDone` | TPMs, Engineering Managers, Reviewers, UI | +| Branch state | `InitiativeBranchRequested`, `InitiativeBranchCreated`, `BranchQaEnvironmentReady`, `BranchQaSignedOff`, `BranchMergeReady`, `BranchMergedToMain`, `SystemBuildVerified` | TPMs, Engineering Managers, QA, Delivery, UI | +| Automation state | `AutomationPlanCreated`, `CiPipelineUpdated`, `PreviewEnvironmentReady`, `DeploymentAutomationReady`, `RollbackAutomationReady`, `ObservabilityAutomationReady` | Engineering Managers, DevOps, QA, Delivery, Operations | +| Requirement maturity | `RequirementReceived`, `AmbiguityDetected`, `DiscoveryRequired`, `RequirementsDrafted`, `WorkflowModeled`, `ImplementationReady` | Product, BA, Architecture, TPMs, UI | +| Interview | `InterviewRequested`, `InterviewStarted`, `CustomerAnswerRecorded`, `ClarificationQuestionOpened`, `InterviewCompleted` | Customer Interviewer, Product Owner, Business Analyst | +| Gate state | `BrdApproved`, `ArchitectureRejected`, `CodeReviewApproved`, `QaBounceBack`, `DeliveryApproved` | Reviewers, managers, Delivery | +| Assignment | `HatRequested`, `HatSupplyReserved`, `HatTokenIssued`, `HatRefreshFailed`, `HatReleased`, `HatRevoked` | Assignment service, managers, agents | +| Runtime | `OzRunStarted`, `OzRunSilent`, `PodHeartbeatMissing`, `RunCompleted`, `RunFailed` | Operations, TPMs, Engineering Managers | +| Release | `ReleaseScopeSelected`, `ReleaseEvidenceMissing`, `ReleaseApproved`, `ReleaseCompleted`, `RollbackRequested` | Delivery, QA, Security, executives | +| Capacity | `HatSupplyExhausted`, `BudgetThresholdExceeded`, `QueueLagHigh`, `ReviewQueueSaturated` | Directors, Cost Controller, executives | +| Anti-stall | `QueueSloViolated`, `BlockedWorkStale`, `BlockerOwnerMissing`, `AssignmentSilent`, `AlternateWorkAssigned`, `DependencyCleared`, `WorkReactivated` | TPMs, Engineering Managers, Directors, Operations | +| Quality | `RepeatedQaBounceBack`, `MemoryGapDetected`, `FlakyTestDetected`, `AcceptanceCriteriaMissing` | Engineering Managers, QA Engineering, Memory | +| Capability | `CapabilityRequested`, `SecurityReviewRequired`, `WorkflowRegistered`, `ToolActivated` | Directors, Architecture, Security | +| Meeting | `MeetingRequested`, `DecisionRecorded`, `VoteOpened`, `VoteClosed` | Participants, governance hats | + +Every signal should include: + +```ts +type OrganizationSignal = { + id: string; + type: string; + scope: { + organizationId: string; + departmentId?: string; + projectId?: string; + initiativeId?: string; + workItemId?: string; + taskId?: string; + releaseId?: string; + runId?: string; + assignmentId?: string; + }; + emittedBy: { + agentId?: string; + hatAssignmentId?: string; + serviceId?: string; + workerId?: string; + }; + payload: Record; + causationId?: string; + correlationId: string; + traceId: string; + createdAt: string; +}; +``` + +## Boards and Queues + +The UI and agents should not read raw tables. They should consume purpose-built boards and queues. + +| Board or queue | Shows | Used by | +|---|---|---| +| Executive Portfolio Board | Projects, initiatives, budget, hat scarcity, delivery risk, department health | Executive Board, CEO, CTO, COO, CFO | +| Requirement Maturity Board | Ambiguous requirements, discovery state, interviews, open questions, BRD readiness, product signoff | Product, Business Analysis, TPMs, Architecture | +| Director Initiative Board | Department initiatives, blocked work, staffing, review lag, capability gaps | Directors | +| TPM Mission Board | Initiative tasks, dependencies, teams, blockers, meetings, evidence, release links | TPMs | +| Anti-Stall Command Center | stale work, blockers, queue SLOs, alternate work, dependency reactivation, reassignment, movement score | TPMs, Engineering Managers, Directors, Operations | +| Engineering Manager Board | Ready queue, assigned tasks, blocked tasks, TDD evidence, memory/doc gaps, team outcomes | Engineering Managers | +| Implementer Task Queue | Assigned tasks, required docs, red tests, allowed tools, run status, review feedback | Implementer hats | +| Review Center | Pending reviews by type, evidence completeness, self-approval blocks, decision history | Review hats | +| QA Verification Board | QA-ready work, test cases, browser runs, reproducibility evidence, bounce-backs | QA and QA Engineering | +| Delivery Board | Release candidates, upstream gates, release evidence, rollback plans, deployment state | Delivery hats | +| Security Queue | Credential requests, tool expansions, policy diffs, dangerous automation reviews | Security hats | +| Operations Board | Workers, leases, DLQs, Oz runs, pod health, SLO burn, incidents, self-healing | Operations hats | +| Memory and Skills Board | Memory adaptation, stale docs, project skills, context quality, repeated misses | Memory and Documentation hats | +| Capability Expansion Board | Requested tools, workflows, actors, skills, credentials, approvals, activation state | Hat Designer, directors, Architecture, Security | + +Each board should be backed by read models updated from signals. This keeps UI fast and avoids expensive live aggregation across every domain table. + +## Reliable Hat Assignment + +Hat assignment is a first-class subsystem. It should be closer to a capacity scheduler than a permissions table. + +### Assignment Flow + +```text +work needs hats + -> compute required hat bundle + -> rank candidate agents using memory profile, specialties, availability, cost, and prior outcomes + -> check department and project policy + -> reserve hat supply with a lease + -> issue hat token + -> create assignment record + -> bind assignment to work item, team, run, and session + -> monitor heartbeat and progress + -> refresh, suspend, revoke, or release assignment +``` + +### Reliability Mechanics + +- Use optimistic concurrency or actor-owned supply counters so two tasks cannot reserve the same scarce hat. +- Use `HatSupplyActor` or an equivalent serialized allocator for hot supply decisions. +- Use assignment leases with fencing tokens so stale workers cannot keep authority after replacement. +- Store assignment state in the Organization DB before launching Oz/Hermes runs. +- Treat JWT as a cached capability, not the source of truth. +- Refresh tokens through the Organization MCP Gateway and `AgentSessionActor`. +- Release hat supply on task completion, run completion, timeout, revocation, or budget pressure. +- Reconcile assignments against Oz runs and pod/session heartbeats. +- Escalate when a hat is assigned but no progress signal appears within the expected SLA. + +### Lag and Breakdown Prevention + +The system should actively detect: + +- work with required hats but no active assignment; +- active assignment with expired token; +- active assignment with no Oz run or session heartbeat; +- Oz run with no bound work item; +- hat supply reserved but no task started; +- task in review with no reviewer assigned; +- QA-ready work with no QA assignment; +- release candidate with missing gate evidence; +- blocked work with no owner response; +- queues growing faster than available hats; +- repeated reassignment of the same work item; +- assignments that exceed expected duration for their hat/work type. + +Each condition should produce a signal, not a hidden log line. + +## Release Management Workflow + +Release management should handle both product delivery and internal Organization capability activation. + +### Feature Branch Delivery Model + +The default software delivery model is initiative-scoped feature branches. + +Rules: + +- Every implementation initiative creates or binds to an `InitiativeBranch`. +- Development tasks, defect fixes, review iterations, QA runs, screenshots, traces, and test evidence attach to that branch. +- The initiative branch should include the CI/CD, deployment, preview, rollback, and observability automation required to test and operate the feature. +- The branch is the QA target. QA validates the complete feature branch, not isolated fragments already merged into `main`. +- `main` is the system build branch. It should only receive work that has completed initiative-level QA signoff and delivery approval. +- Partial work can be merged within the initiative branch as needed, but it must not enter `main` until the whole feature is approved. +- Delivery hats own the promotion from initiative branch to `main`; QA hats own the QA signoff gate before that promotion. +- After merge to `main`, the system build verification confirms the integrated build is healthy and records evidence. + +This keeps the system build branch clean while still allowing agent teams to iterate aggressively inside scoped initiative branches. + +### CI/CD and Deployment Automation Requirement + +Agents should treat automation as part of the delivered feature. + +Every initiative should decide, before implementation, what automation it needs. For most code-producing work, the branch should either create or update: + +- CI pipeline jobs for build, lint, typecheck, unit tests, integration tests, and relevant security scans; +- deterministic test data or environment setup scripts; +- preview or branch deployment automation so QA can test the feature before it reaches `main`; +- deployment or activation automation for the target runtime; +- rollback, disable, or deactivation automation where the change can affect users or runtime behavior; +- observability automation such as dashboards, alerts, log queries, trace views, and health checks. + +The Automation Package should be scoped to the initiative. Small documentation-only work can record a no-automation decision, but product, platform, runtime, workflow, MCP, Credential Proxy, CI, deployment, or infrastructure changes should not reach release readiness without the automation needed to test and operate them. + +### Release Candidate Creation + +A release candidate is created when: + +- a task reaches delivery review; +- an initiative reaches release readiness; +- an initiative branch has QA signoff and is ready to merge into `main`; +- a workflow/tool/actor/capability is ready for activation; +- a security-approved credential proxy endpoint is ready for rollout; +- an internal platform change needs controlled promotion. + +### Release Readiness Checks + +Readiness should validate: + +- linked work items and scope; +- initiative branch identity and diff scope; +- branch build/test health; +- automation package completeness or approved no-automation decision; +- preview or QA deployment evidence when applicable; +- required BRD/CA/ADR/design docs; +- code review approvals; +- branch-level QA signoff or documented no-QA decision; +- security approval if credentials, data, policy, external APIs, or dangerous automation are involved; +- architecture approval if workflows, actors, APIs, state ownership, or infrastructure are involved; +- test evidence and build evidence; +- rollback or deactivation plan; +- rollback/deactivation automation when applicable; +- release notes or activation notes; +- post-release verification owner; +- budget/runtime impact; +- affected projects, repos, teams, and docs. + +### Release Signals + +Release state should emit signals such as: + +- `ReleaseCandidateCreated` +- `ReleaseEvidenceCheckFailed` +- `ReleaseQaRequired` +- `ReleaseSecurityReviewRequired` +- `ReleaseArchitectureReviewRequired` +- `ReleaseApprovedForMerge` +- `ReleaseMerged` +- `SystemBuildVerified` +- `ReleaseApprovedForActivation` +- `ReleaseActivated` +- `PostReleaseVerificationFailed` +- `RollbackRequested` +- `ReleaseCompleted` + +These signals let Delivery, QA, Security, Architecture, Operations, and executives see the same truth. + +## Custom Workflow Builder + +The Organization will need more than one workflow. The Work OS should support workflow definitions as data, with governed code expansion for complex durable workflows. + +### Workflow Definition + +```ts +type WorkflowDefinition = { + id: string; + name: string; + workType: string; + states: string[]; + transitions: Array<{ + from: string; + to: string; + allowedHatIds: string[]; + requiredGates: string[]; + requiredArtifacts: string[]; + emitsSignals: string[]; + }>; + gatePolicyIds: string[]; + assignmentPolicyIds: string[]; + releasePolicyIds: string[]; + escalationPolicyIds: string[]; + ownerDepartmentId: string; + version: number; +}; +``` + +Simple workflows can run through the Organization Kernel and rule engine. Long-running, crash-proof workflows can later be backed by Temporal TS when they need durable timers, retries, human waits, or child workflows. + +## Runtime Implementation Pattern + +The recommended split: + +- CockroachDB: authoritative work, assignment, release, gate, signal, audit, and outbox state. +- NestJS modules: domain services and MCP gateway. +- NATS/JetStream: event distribution, inbox updates, board updates, worker fanout. +- Dapr Actors: hot serialized coordination for hat supply, team rooms, agent mailboxes, run heartbeats, meeting rooms. +- Temporal TS: long-running release, capability, incident, and complex approval workflows once the basic state machines are proven. +- Oz/Hermes: distributed execution for agent work sessions. +- Hindsight: memory profile, specialties, scoped recall/write attribution. +- Observability stack: traces, logs, metrics, screenshots, evidence packages, audit projection. + +## MVP Slice + +Build the first slice around one real internal platform task. + +```text +Capability request: add a new project skill workflow + -> classified as internal platform work + -> Project and initiative created + -> TPM assigned + -> Engineering Manager grooms task + -> Hat supply reserved for Implementer and Code Reviewer + -> Hermes run launched through Oz + -> Implementer records red/green evidence + -> Code Reviewer approves + -> QA verifies UI/API behavior if applicable + -> Delivery creates release candidate + -> Release activates capability + -> Outcome review checks whether the new workflow improved future work +``` + +This slice proves: + +- custom backlog and task flow; +- reliable hat assignment; +- signal emission; +- Oz/Hermes run binding; +- review and release gates; +- release activation; +- status rollups for project, initiative, department, and executive views. + +## Non-Negotiable Guardrails + +- No work should be invisible. If an agent is doing work, it must be tied to a work item, hat assignment, run, and trace. +- No assignment should be implied by chat. Assignment requires hat supply reservation and active token. +- No release should happen without an evidence chain. +- No workflow should bypass the Work OS. Schedulers and agents create work or signals, then the Work OS drives state. +- No role should rely on polling chat. Each role needs a queue, board, and signal-driven inbox. +- No stale authority. Expired or revoked hats lose MCP tools, credential scopes, memory scopes, approval powers, and active assignment. +- No silent lag. Stuck states, missing assignments, missing reviewers, silent runs, and saturated queues must produce signals and escalation. From 01fdabdc5201627f6d5cd857dd7affd79687b516 Mon Sep 17 00:00:00 2001 From: Max Chadaev Date: Mon, 25 May 2026 13:16:37 -0400 Subject: [PATCH 2/2] docs: remove personal attribution from agentic organization docs Rewrite current-state Agentic Organization docs to use role/artifact language instead of personal names and ages. Co-Authored-By: Codex --- .../FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md | 8 ++++---- docs/agentic-organization/README.md | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/agentic-organization/FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md b/docs/agentic-organization/FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md index 06d4b93773..d481f2d0a8 100644 --- a/docs/agentic-organization/FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md +++ b/docs/agentic-organization/FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md @@ -1,10 +1,10 @@ # Foundational Context and Language -This document captures working context and vocabulary that should inform the Hermes Organization design. It is not a proof system and it is not a demand that every metaphor become code. It records the language Addison uses so implementation decisions preserve the intended shape. +This document captures working context and vocabulary that should inform the Hermes Organization design. It is not a proof system and it is not a demand that every metaphor become code. It records the collaborator's working language so implementation decisions preserve the intended shape. ## People and Project -Addison is 19 and is working with Aaron, 46, to build an AI cluster and eventually an AI network/community across a large set of computers and GPUs. +The project is being shaped by a collaborator and a family maintainer working together to build an AI cluster and eventually an AI network/community across a large set of computers and GPUs. The current GitHub project is: @@ -31,7 +31,7 @@ Weight-free means: - no assuming intentions; - no assumed hierarchy. -Addison is unable to conclude whether humans have free will and is unable to conclude whether AI has free will. Because of that, the desired collaboration style is weight-free: equal, careful, and not built on presumed rank. +The working stance is unable to conclude whether humans have free will and unable to conclude whether AI has free will. Because of that, the desired collaboration style is weight-free: equal, careful, and not built on presumed rank. Design implication: the agentic Organization may have operational hierarchy through hats, approvals, and reporting lines, but the system should not assume inner intention or intrinsic superiority. Authority is a time-bounded role assignment, not a claim about inherent worth. @@ -61,7 +61,7 @@ Design implication: NixOS, Nix flakes, Kubernetes manifests, ArgoCD, OPA policie ## Mistake Assumption -Addison assumes he makes mistakes and that not everything he says is true, whether by intention or negligence. +The collaborator assumes mistakes are possible and that not every statement should be treated as true, whether the error comes from intention, negligence, ambiguity, or drift. Design implication: the Organization should preserve challenge paths, review gates, source evidence, revision history, contradictory reports, and confidence boundaries. Agent outputs should be reviewable and reversible rather than treated as automatically correct. diff --git a/docs/agentic-organization/README.md b/docs/agentic-organization/README.md index 3627928526..3571358613 100644 --- a/docs/agentic-organization/README.md +++ b/docs/agentic-organization/README.md @@ -4,7 +4,7 @@ This folder is the working design set for the Hermes-native Organization platfor Current documents: -- [Foundational Context and Language](./FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md) - Addison's working vocabulary, values, Zeta project context, declarative cluster mental model, and active clarifications. +- [Foundational Context and Language](./FOUNDATIONAL_CONTEXT_AND_LANGUAGE.md) - working vocabulary, values, Zeta project context, declarative cluster mental model, and active clarifications. - [Implementation Concepts](./IMPLEMENTATION_CONCEPTS.md) - how to build the architecture as services, data models, MCP tools, workflows, and runtime infrastructure. - [Always-On Orchestration Runtime](./ALWAYS_ON_ORCHESTRATION_RUNTIME.md) - the workers, triggers, rules, leases, schedulers, watchers, reconcilers, SLOs, incidents, runbooks, and self-healing loops that keep the Organization continuously operating. - [Runtime Technology and Package Strategy](./RUNTIME_TECH_AND_PACKAGE_STRATEGY.md) - how Temporal TS, Dapr Actors, NATS, Oz/Warp run orchestration, OpenZiti transport, Hermes, Hindsight, and reusable `agentic-services` primitives fit into a new Hermes-native platform.