lobu-ai · buremba · May 20, 2026 · May 20, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -40,8 +40,13 @@ All chat platforms (Telegram, Slack, Discord, WhatsApp, Teams) run through Chat
 `mode: "polling"` is rejected when `LOBU_CLOUD_MODE=1` — a polling worker shares one Telegram edge connection across tenants, so a misbehaving one degrades delivery for everyone. Self-hosters (`LOBU_CLOUD_MODE` unset/0) keep polling for tunnel-less dev.
 
 #### Orchestration
-- **Embedded-only deployment.** Gateway, workers, embeddings, and the Lobu memory backend run in one Node process (`lobu run`, or `bun run dev` in the monorepo). Workers spawn from `EmbeddedDeploymentManager` as `child_process.spawn` subprocesses with `cwd = ./workspaces/{agentId}/` and `WORKSPACE_DIR` env. On Linux production hosts the manager wraps the spawn in `systemd-run --user --scope` (MemoryMax, CPUQuota, IPAddressDeny=any + IPAddressAllow=127.0.0.1, capability drops). No Docker or Kubernetes.
+- **Embedded process model.** Within a single app process the gateway, the worker *orchestrator*, embeddings, and the Lobu memory backend run together. Workers spawn from `EmbeddedDeploymentManager` as `child_process.spawn` subprocesses with `cwd = ./workspaces/{agentId}/` and `WORKSPACE_DIR` env (on Linux, wrapped in `systemd-run --user --scope` for MemoryMax/CPUQuota/IPAddressDeny+capability drops). "No Docker/Kubernetes" applies to **worker orchestration only** — workers are child processes, never pods. The app process itself runs as a multi-replica k8s Deployment (see below).
 - Postgres (with `pgvector`; optionally `postgis` for geo enrichment) is the only user-provided external. The Node process connects out via `DATABASE_URL`. Runtime state — queues, chat connection rows, grant cache, MCP proxy sessions — lives in dedicated Postgres tables.
+- **🚨 Multi-replica k8s is the production reality — every change MUST be correct under N>1 app replicas.** The app ships as a k8s `Deployment` (`charts/lobu`) whose `app.replicaCount` is routinely >1, behind `sessionAffinity: ClientIP` because per-pod state — `SseManager` connections **and** its event backlog (`sse-manager.ts`), the in-process `workers` map, the deployment-creation lock cache — is **in-memory and pod-local with no cross-pod fan-out**. On every task:
+  - A client's SSE stream, its `POST /messages`, and its conversation's worker are co-located on one pod **only** because ClientIP affinity pins them. Don't assume two requests for the same conversation hit the same pod for any other reason.
+  - Cross-replica delivery rides Postgres: a worker reply reaches the client's SSE pod via the `thread_response` queue (any pod's consumer may claim a row and broadcasts to *its* local `SseManager`). An event broadcast on the wrong pod is silently dropped. Platform responses are owner-routed (`ChatResponseBridge.canHandle` re-queues until the owning pod claims); **API/SSE responses are not** — a known gap.
+  - **Never introduce shared state as an in-memory Map/singleton that another replica needs to read or mutate.** Per-pod in-memory state is fine only for data that pod exclusively owns (its own SSE connections, its own spawned workers). Anything observed/coordinated across replicas goes in Postgres.
+  - Before claiming a feature works, answer explicitly: *"does this hold with 3 app replicas behind ClientIP affinity?"* If a fix relies on one component (dispatch) seeing another component's event (completion) and they can land on different pods, it is broken in prod — use a Postgres-mediated signal instead.
 - Workers are sandboxed and **never see real credentials**. The gateway's `secret-proxy` swaps `lobu_secret_<uuid>` placeholders for real keys at egress; workers receive only the placeholders.
 
 #### MCP