fix(gateway): Owletto MCP sync, Slack markdown/threading, tool-approval lifecycle, deployment coalescing#210
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0c72efe80
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // No per-message user equality check: deployment names are derived from | ||
| // the canonical conversation key (`platform:channelId:conversationId`) | ||
| // *without* userId, so a thread is intentionally shared by every user | ||
| // who posts in it. The SSE queue (`thread_message_<deployment>`) is the | ||
| // routing boundary — only messages for this deployment reach this | ||
| // worker. The per-message `data.userId` is the actual sender of *this* | ||
| // turn and is propagated downstream via `payloadToWorkerConfig` | ||
| // (response addressing, MCP namespacing, audit). `this.userId` is the | ||
| // user that originally spawned the deployment, kept for audit only. |
There was a problem hiding this comment.
Reinstate user isolation before queuing thread messages
Dropping the per-message userId check here allows any participant in a shared thread to execute turns inside a worker that is still authenticated with the deployment-scoped WORKER_TOKEN created for the original user. In that scenario, tool approvals/execution and MCP auth context are attributed to the wrong user, which can expose one user’s grants or integrations to another user in the same thread. The new comment says data.userId is enough for namespacing, but gateway auth for MCP routes is still derived from the worker token, not this payload field.
Useful? React with 👍 / 👎.
| stream = { | ||
| iterator: new AsyncPushIterator<string>(), | ||
| streamPromise: Promise.resolve(), | ||
| buffer: payload.delta, | ||
| streamFailed: true, // Force completion to use the post-buffer path | ||
| wasFullyReplaced: !!payload.isFullReplacement, | ||
| target: null, | ||
| }; |
There was a problem hiding this comment.
Preserve a Slack post fallback target for completion errors
In the Slack-specific delta path, stream.target is always initialized to null, but completion tries to fall back to stream.target.post(...) when postSlackMarkdown is unavailable or throws. That makes the fallback path unreachable and can silently drop Slack responses whenever the adapter client is missing/uninitialized or markdown posting fails. Populate target (or resolve one in completion) so the documented fallback actually works.
Useful? React with 👍 / 👎.
b0c72ef to
83a780a
Compare
…al lifecycle, deployment coalescing Bundled follow-up work on top of 283c66a (declared vs user auth split). Each theme is independent and fixes a concrete user-facing bug; kept as one PR because they share the same test surface and landed together during hardening. ## Owletto MCP global server sync - core-services: mirror the resolved `MEMORY_URL` env var into `McpConfigService` as a global `owletto` streamable-http server at startup AND on lobu.toml reload. Without this, `/mcp/owletto` proxy lookups via `getHttpServer("owletto")` return undefined and the worker-side plugin fails with "MCP server 'owletto' not found". - config-service: new `upsertGlobalServer()` (replace-on-write, vs the existing `registerGlobalServers()` which is register-once). Also switch the per-agent MCP lookup from `getSettings` → `getEffectiveSettings` so MCP servers declared in lobu.toml (via the declared-agent registry introduced in 283c66a) actually resolve. - Intentionally omit `oauth: {}` on the owletto entry: worker-side `owletto_login` owns the device-code flow, adding it would trigger a parallel gateway OAuth/PKCE discovery and surface two competing login links. ## Tool-approval lifecycle fixes - Bump `PENDING_TOOL_TTL` 5 min → 24 h (proxy.ts + in-memory card timer in interaction-bridge.ts). Tool-approval cards sit in Slack for hours; 5 min silently expired real clicks. - cli/gateway.ts `approveToolCall`: switch GET-then-DEL to atomic GETDEL so a double-POST to /api/v1/agents/approve can't double-execute the tool. Matches the interaction-bridge path. - Capture `channelId / conversationId / teamId / connectionId` on the pending-tool Redis record so approval results route to the conversation the tool call originated from, not the thread of the clicked card (which can differ on Slack). - Expired-card UX: when GETDEL returns null but the in-memory approval card is still tracked, edit the card and post a "this approval expired" notice instead of silently no-op'ing. Real retries still no-op silently (no tracked card). ## Slack markdown rendering (chat-response-bridge.ts) - Skip the SDK streaming path for Slack; buffer deltas, then at completion post a single chunked `chat.postMessage` with `markdown_text`. Fixes (a) mid-line splits from the Slack streaming API's fixed-size chunking and (b) `<`/`>`/`&` HTML-escaping that `text` field inflicts on code snippets. - Split on paragraph boundaries (`\n\n`), then line boundaries, then hard-slice, with an 11,000-char per-post margin under Slack's 12k limit. - Pre-decode HTML entities (worker may have produced them upstream from MCP tool results) and strip empty markdown links `[text]()` (some tools emit these as unlinked citation footnotes). - Resolve the SDK fallback target during delta setup so a missing Slack adapter at completion still delivers via `target.post()` instead of silently dropping. ## Slack session behavior - New `SlackInstructionProvider` contributes Slack bot identity (`@username` / `U0123ABC`) through the `InstructionProvider` interface so the model knows its own Slack handle and doesn't treat its own ID as a stranger. Auto-registered for Slack platforms; `gateway-main.ts` now eagerly registers providers for platforms added after `start()`. - chat-instance-manager: mirror `adapter.botUserId` onto `connection.metadata` alongside `botUsername` so the mention strip and the instruction provider can find it. - message-handler-bridge: strip `<@uxxx>` and bare `@Uxxx` bot-mention tokens Slack delivers raw. Collapse to canonical `thread.id` for DM thread replies (previously fell back to channelId, so replies to bot messages inside a DM thread landed in the main DM pane). - Thread-history backfill: on first mention in a group thread, fetch up to 50 prior messages via `adapter.fetchMessages` (Slack → conversations.replies) and append to conversation history, so mid-thread mentions have context. Atomic one-shot-per-thread guard via new `ConversationStateStore.claimThreadBackfill` / `releaseThreadBackfill` (Redis SETNX + 24h TTL, release on failure so transient errors can retry). ## Deployment ensure coalescing - `BaseDeploymentManager`: rename abstract `createDeployment` → `spawnDeployment` (protected) and add a public `ensureDeployment` wrapper with an in-flight Promise map. Concurrent calls for the same `deploymentName` within one gateway process share a single spawn; cross-process concurrency is handled by the orchestrator's atomic name uniqueness (Docker 409, K8s 409, embedded workers.has check). - docker-deployment, k8s/deployment: treat 409 AlreadyExists as benign success — inspect/start an existing container, no-op for k8s. - embedded-deployment: short-circuit on `workers.has(deploymentName)`. ## Smaller items - config/index.ts `loadEnvFile`: pass `override: true` to dotenv so `.env` wins over stale shell exports. Matches the single-source-of- truth contract introduced by PR #209 (compose no longer re-exports `DEPLOYMENT_MODE`). Production skips loadEnvFile entirely, so production deploys are unaffected. - config/index.ts `isPluginInstalled`: walk up parent directories for the `node_modules/<pkg>` folder, falling back from `require.resolve()`. Fixes detection of ESM-only packages whose `exports` map omits a `require` condition (e.g. @lobu/owletto-openclaw). - routes/public/connections.ts: bump webhook-processing log from debug → info so Slack/Telegram inbound traffic is visible in prod without flipping LOG_LEVEL. - worker/src/openclaw/worker.ts: drop the proactive `owletto_login` call at worker startup. Now that the gateway registers the owletto MCP server correctly (theme 1), the model can call `owletto_login` lazily when a memory tool first returns an auth error — more natural than pushing an unsolicited login link at every session start. - worker/src/gateway/sse-client.ts: signature updates matching the gateway-side changes above. ## Tests (+~1,000 LOC) Expanded coverage: base-deployment-grants (ensureDeployment coalescing, spawnDeployment rename), chat-response-bridge (Slack markdown buffer-and-post path), docker-deployment (409 handling), embedded- deployment (workers.has short-circuit), interaction-bridge action handlers (expired-card UX), mcp-config-service (upsertGlobalServer), message-handler-bridge (backfill, mention strip, DM-thread conversationId).
83a780a to
75cb4c8
Compare
|
Addressed both Codex comments in the latest push (75cb4c8): P1 (sse-client.ts:511 — per-user isolation): Fixed. You're right — Restored the P2 (chat-response-bridge.ts:363 — Slack fallback target): Fixed. Same fix I flagged during my self-review in the previous round: now resolving the SDK |
…r-channel by design
|
Correction on the P1 (sse-client.ts): I've reverted the userId equality check in 834d665. Channels/threads are intentionally one-worker-per- The underlying concern — |
Picks up: deploy: add Postgres PITR recovery runbook + seal summaries-db-app (#210) Before: 83c9a09 After: bf904a7
Bundle of follow-up hardening on top of #207 (declared vs user auth split). Each theme is independent and fixes a concrete user-facing bug; kept as one PR because they share the same test surface and landed together during hardening.
Theme A — Owletto MCP global server sync
Files:
services/core-services.ts,auth/mcp/config-service.tsMEMORY_URLenv var intoMcpConfigServiceas a globalowlettostreamable-http server at startup AND onlobu.tomlreload. Without this,/mcp/owlettoproxy lookups viagetHttpServer(\"owletto\")return undefined and the worker-side plugin fails with "MCP server 'owletto' not found".upsertGlobalServer()— replace-on-write (vs existingregisterGlobalServers()which is register-once).getSettings→getEffectiveSettingsso MCP servers declared inlobu.toml(via the declared-agent registry from refactor(gateway): split declared agent config from user-scoped auth state #207) actually resolve.oauth: {}on the owletto entry: worker-sideowletto_loginowns the device-code flow; adding oauth would trigger a parallel gateway OAuth/PKCE discovery and surface two competing login links.Theme B — Tool-approval lifecycle fixes
Files:
auth/mcp/proxy.ts,cli/gateway.ts,connections/interaction-bridge.tsPENDING_TOOL_TTL5 min → 24 h (proxy + in-memory card timer in interaction-bridge). Tool-approval cards sit in Slack for hours; 5 min silently expired real clicks.cli/gateway.ts approveToolCall: switch GET-then-DEL to atomic GETDEL so a double-POST to/api/v1/agents/approvecan't double-execute the tool. Matches the interaction-bridge path.channelId / conversationId / teamId / connectionIdon the pending-tool Redis record so approval results route to the conversation the tool call originated from, not the thread of the clicked card (which can differ on Slack).Theme C — Slack markdown rendering
Files:
connections/chat-response-bridge.ts(+229 LOC)chat.postMessagewithmarkdown_text. Fixes (a) mid-line splits from the Slack streaming API's fixed-size chunking and (b)</>/&HTML-escaping that thetextfield inflicts on code snippets.\n\n), then line boundaries, then hard-slice, with an 11,000-char per-post margin under Slack's 12k limit.[text]()(some tools emit these as unlinked citation footnotes).targetduring delta setup so a missing Slack adapter at completion still delivers viatarget.post()instead of silently dropping.Theme D — Slack session behavior
Files:
connections/{slack-instruction-provider.ts (new), chat-instance-manager.ts, message-handler-bridge.ts, conversation-state-store.ts},gateway-main.tsSlackInstructionProvidercontributes Slack bot identity (@username/U0123ABC) through theInstructionProviderinterface so the model knows its own Slack handle and doesn't treat its own ID as a stranger. Auto-registered for Slack platforms;gateway-main.tsnow eagerly registers providers for platforms added afterstart().chat-instance-manager: mirroradapter.botUserIdontoconnection.metadataalongsidebotUsernameso mention-strip and the instruction provider can find it.message-handler-bridge: strip<@Uxxx>and bare@Uxxxbot-mention tokens Slack delivers raw. Collapse to canonicalthread.idfor DM thread replies (previously fell back tochannelId, so replies to bot messages inside a DM thread landed in the main DM pane).adapter.fetchMessages(Slack →conversations.replies) and append to conversation history, so mid-thread mentions have context. Atomic one-shot-per-thread guard via newConversationStateStore.claimThreadBackfill/releaseThreadBackfill(Redis SETNX + 24h TTL, release on failure so transient errors can retry).Theme E — Deployment ensure coalescing
Files:
orchestration/base-deployment-manager.ts,orchestration/impl/{docker,embedded,k8s}-deployment.tsBaseDeploymentManager: rename abstractcreateDeployment→spawnDeployment(protected) and add publicensureDeploymentwrapper with an in-flight Promise map. Concurrent calls for the samedeploymentNamewithin one gateway process share a single spawn; cross-process concurrency is handled by the orchestrator's atomic name uniqueness (Docker 409, K8s 409, embeddedworkers.hascheck).docker-deployment,k8s/deployment: treat 409 AlreadyExists as benign success — inspect/start existing container for Docker, no-op for K8s.embedded-deployment: short-circuit onworkers.has(deploymentName).Smaller items (not a theme, bundled for proximity)
config/index.ts loadEnvFile: passoverride: trueto dotenv so.envwins over stale shell exports. Matches the single-source-of-truth contract introduced by chore(dev): simplify compose DEPLOYMENT_MODE and opt-in OpenAI for careops #209 (compose no longer re-exportsDEPLOYMENT_MODE). Production (NODE_ENV=production) skipsloadEnvFileentirely, so prod deploys are unaffected.config/index.ts isPluginInstalled: walk up parent directories for thenode_modules/<pkg>folder, falling back fromrequire.resolve(). Fixes detection of ESM-only packages whoseexportsmap omits arequirecondition (e.g.@lobu/owletto-openclaw).routes/public/connections.ts: bump webhook-processing logdebug→infoso Slack/Telegram inbound traffic is visible in prod without flippingLOG_LEVEL.worker/src/openclaw/worker.ts(-62 LOC): drop the proactiveowletto_logincall at worker startup. Now that the gateway registers the owletto MCP server correctly (Theme A), the model can callowletto_loginlazily when a memory tool first returns an auth error — more natural than pushing an unsolicited login link at every session start.worker/src/gateway/sse-client.ts: signature updates matching the gateway-side changes above.Tests (+~1,000 LOC)
Expanded coverage:
base-deployment-grants: ensureDeployment coalescing, spawnDeployment rename.chat-response-bridge: Slack markdown buffer-and-post path.docker-deployment: 409 handling.embedded-deployment: workers.has short-circuit.interaction-bridge-action-handlers: expired-card UX.mcp-config-service: upsertGlobalServer semantics.message-handler-bridge: backfill, mention strip, DM-thread conversationId.Test plan
make build-packagespasses (typecheck clean locally).bun test packages/gateway/srcgreen.make devwith a careops agent configured vialobu.toml— agent loads, Owletto MCP calls reach memory (no "MCP server 'owletto' not found").<@Uxxx>mention → bot responds in-thread with markdown rendering (no HTML entities visible for code blocks), replies to bot DM messages stay in the DM thread, tool-approval card expired after 24h+ shows the expiry notice.