fix(backend): close concurrent-apply races on agent + connection upsert by buremba · Pull Request #466 · lobu-ai/lobu

buremba · 2026-04-30T02:26:49Z

Summary

Fixes the check-then-write races pi flagged in the merged lobu apply v1:

POST /agents: same-org duplicate path was check-then-write. Two concurrent POSTs could both pass the existence check then both run saveMetadata, overwriting metadata + resetting the auto-injected Owletto MCP server. Now uses atomic ON CONFLICT DO NOTHING; only the actual insert path runs the MCP injection.
PUT /connections/by-stable-id/:stableId: create branch had the same race. Now atomic; concurrent creates serialize cleanly into one create + one noop/update.

Test plan

New concurrent-POST test: 2x parallel calls -> exactly 1x201, 1x200, 1 DB row
New concurrent-PUT test: 2x parallel calls (identical config) -> 1xcreated, 1xnoop/updated, 1 DB row
All existing tests still pass

Out of scope

The hardening branch is fixing the noop comparison (settings field). No overlap.
No org-level apply lock (deferred — needed if multi-resource transactional batch is ever introduced).

Known follow-up

Pi self-review summarized the patch correctly with no substantive bugs flagged but rate-limited before issuing an explicit SHIP/FIX verdict. Re-run pi when quota refreshes if a formal verdict is required.

github-actions · 2026-04-30T02:29:06Z

Triage decision: needs-human

Reasons:

PR size exceeds auto-merge threshold (416 lines, 2 files; limits 300 lines, 10 files)

Next: Manual merge required after final review due to size threshold. PR is otherwise well-formed with good test coverage for the concurrent race fix.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32c0b340e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-30T02:31:59Z

+      // We lost the race — someone else inserted the row between our read and
+      // our INSERT. Re-read and fall through to the update path so concurrent
+      // PUTs with the same config converge on a single noop response.
+      const reread = await connectionStore.getConnection(stableId);
+      if (!reread) {


Block loser-path updates until create outcome is known

When a concurrent PUT loses the ON CONFLICT DO NOTHING claim, this branch immediately re-reads and falls into the normal update path against the placeholder row. In that path, chatManager.updateConnection() can return 200 without running create-time startup/validation (because the placeholder is stopped), so a malformed or non-startable config can be reported as success while the winning request later fails and deletes the row. This creates false-positive convergence for concurrent applies on new stable IDs.

Useful? React with 👍 / 👎.

POST /agents was check-then-write: two concurrent applies in the same org could both pass the existence check, both run saveMetadata, and both run the Owletto MCP auto-injection — clobbering operator-set mcpServers. Use INSERT ... ON CONFLICT (id) DO NOTHING RETURNING so only the actual inserter runs the auto-injection; losers fall through to the idempotent 200 path. PUT /:agentId/connections/by-stable-id/:stableId had the same race in its create branch: two concurrent PUTs both observed `existing===null`, both called chatManager.addConnection (or the fallback persist) with the same stableId, and the second clobbered the first via ON CONFLICT DO UPDATE plus spawned a duplicate platform instance. Atomically claim the row with INSERT ... ON CONFLICT DO NOTHING; on conflict, re-read and fall through to the existing update path (which keeps the hardening branch's settingsChanged + allowGroups defaults intact) so concurrent identical PUTs converge on a single noop or update response.

github-actions · 2026-04-30T03:04:12Z

Triage decision: needs-human

Reasons:

Review comment contains escalation keyword "P1" from trusted bot reviewer
PR size exceeds auto-merge threshold (435 lines; limit 300 lines)
Concurrent race condition issue flagged for human judgment on update path safety

Next: Human review required for P1-flagged concurrency issue. Bot identified potential false-positive convergence risk in loser-path updates that needs design review.

Pi flagged two follow-ups on PR #466: 1. POST /agents — INSERT and saveSettings(mcpServers) ran as two writes, so a concurrent loser returning 200 in the idempotent branch could PATCH operator config in between, only for the winner's deferred saveSettings to clobber it. Fold mcp_servers into the same INSERT so row + auto-injection land atomically. 2. PUT /:agentId/platforms/by-stable-id/:stableId — the PR-466 atomic-claim works for the no-manager fallback, but with the real ChatInstanceManager a "loser" could re-read the just-created row mid-addConnection and call updateConnection against half-initialized state (potentially double-spawning the chat instance). Add an in-process per-stableId Promise chain (primary serialization for embedded-only deployment) plus a short-lived pg_advisory_xact_lock keyed on (NAMESPACE, hashStableId(stableId)) for cross-process defense-in-depth and pg_locks visibility. Tests: 20-iteration concurrent POST asserts mcpServers.owletto.url is populated regardless of which handler ran the INSERT; concurrent PUT with a counting mock manager asserts addConnection is called exactly once.

* refactor(cli): merge owletto bin into lobu memory namespace Consolidates the standalone `owletto` CLI into `@lobu/cli` as the `lobu memory` namespace and deletes the entire `packages/owletto-cli` package. The merge: - Drops the `owletto` binary entirely (no compat wrapper). - Drops `owletto dev` (docker-compose, dead — there's no compose file). - Drops `owletto start` and `owletto version` (lobu run + lobu --version cover them; one boot path: the embedded backend bundle). - Folds 13 commands into `lobu memory <verb>`: seed, run, login, token, health, configure, browser-auth, skills list/add, init, org current/set. - Adds top-level `lobu doctor` (system deps + memory MCP reachability; folds owletto's `doctor` and `health` checks). - `lobu init` now wires the memory plugin config inline at the end of scaffold when memory is enabled (no separate `owletto init` step). Code moves: - packages/owletto-cli/src/lib/* → packages/cli/src/commands/memory/_lib/ - packages/owletto-cli/src/commands/{openclaw,seed,browser-auth}.ts → _lib/*-cmd.ts (citty wrappers stripped, citty defineCommand exports replaced by plain async functions consumed by thin commander handlers) - skills/owletto and skills/owletto-openclaw now bundle through cli's build.cjs alongside skills/lobu (one bundled-skills directory). Build / CI / release-please cleanup: - Drop packages/owletto-cli from workspace globs, tsconfig excludes, biome ignores, release-please-config.json, .release-please-manifest, publish-packages.mjs, bump-version.mjs, knip workspace overrides. - ci.yml: stop running `bun test packages/owletto-cli` and stop typechecking it in the matrix. - packages/cli/package.json: pull in @clack/prompts, @lobu/owletto-sdk, playwright (formerly transitive via owletto-cli). - README + landing site (rename reference/owletto-cli.md → reference/lobu-memory.md, update astro nav, sidebar links, internal anchors) + skills/owletto/references: replace `npx owletto@latest` with `npx @lobu/cli@latest memory`. - docs/RELEASING.md: drop the unscoped `owletto` package; six → five. - src/index.ts (lobu admin) test that scans owletto-cli for callTool refs now points at packages/cli/src/commands/memory/. Validation: - make build-packages clean. - bun run typecheck clean. - bun run check (biome) clean across 281 files. - bun test packages/owletto-backend/src/auth/__tests__/tool-access.test.ts: 47 pass (the cli-source walker correctly resolves the new memory dir). - lobu --help, lobu memory --help, lobu memory token --help, lobu memory init --help all render the new shape. * fix(cli): consolidate memory auth and skills into lobu * refactor(cli): drop dead memory-token helpers and restore doctor server ping - Remove unused `printMemoryToken`, `TokenOptions`, and `resolveConfiguredOrgUrl` from memory/_lib/openclaw-cmd.ts left behind after the auth consolidation moved memory token retrieval to top-level `lobu token`. - Restore the server-URL connectivity check in `lobu doctor` (default mode) that used to ship as part of `owletto doctor`. `--memory-only` still shortcuts to authenticated MCP validation. - Update plan doc references from `lobu memory token` to the actual unified surface `lobu token --raw`. * feat(backend): persist egressConfig + preApprovedTools + guardrails on agents (#460) * docs(plans): lobu apply — file → cloud converger plan Three-PR rollout for lobu apply, the file-first → cloud converger. Reuse-first design: existing /agents and /manage_entity_schema routes are mostly idempotent already; only one upsert route is added. CLI loops over them in dependency order. Re-running converges on partial failure. Defers: --prune, --force, lobu pull, watchers, raw SKILL.md round-trip, org-scoped agent IDs, secret upload, memory data — all v2/v3. * feat(backend): persist egressConfig + preApprovedTools + guardrails on agents The file-loader produces these three settings fields from lobu.toml but the agents table had no columns for them, so postgres-stores.ts silently dropped them on every saveSettings(). This blocks `lobu apply`: cloud would never accept the values local agents already have, leaving every re-run as drift. Adds the three jsonb columns (with empty defaults) and round-trips them through rowToSettings / saveSettings / deleteSettings. New unit tests cover populated, absent, and post-delete shapes against the test DB. PR-1 of docs/plans/lobu-apply.md. * feat(backend): idempotent agent create + stable-id connection upsert (#461) PR-2 of `docs/plans/lobu-apply.md`. Two narrow extensions to `lobu/agent-routes.ts` so `lobu apply` can converge by re-running: 1. `POST /` with an agent that already exists in the same org now returns 200 with the existing payload instead of 409. Cross-org collision still returns 409. The Owletto-MCP auto-injection only runs on the actual create path, never on the idempotent return — so re-applying never clobbers operator-edited `mcpServers`. 2. New `PUT /:agentId/connections/by-stable-id/:stableId` for deterministic-ID upsert. Reuses `ChatInstanceManager.addConnection` (which already accepts a caller-supplied `stableId`) for create, `ChatInstanceManager.updateConnection` for change. Returns `{ noop: true, connection }` when submitted config matches existing, `{ updated: true, willRestart: true, connection }` when it changes, 201 with the connection on first create. Tests cover the 200/409 shapes, the no-clobber MCP guarantee, the noop/updated/create branches of the upsert, and the unknown-agent 404. Test harness mocks `mcpAuth` + `getChatInstanceManager` so the routes exercise the no-manager fallback path against a pglite Postgres. * feat(cli): lobu apply — files → org converger (#462) * feat(cli): lobu apply — files → org converger PR-3 of docs/plans/lobu-apply.md. Implements `lobu apply`: - Parses lobu.toml via existing cli/src/config/loader.ts:loadConfig - Walks $VAR refs to enforce required-secrets check before any mutation - Computes per-resource diff client-side (create/update/noop/drift) - Loops over existing CRUD endpoints in dependency order: agents → settings → connections → entity types → relationship types - Re-runs converge on partial failure (every endpoint is idempotent) CLI surface: lobu apply [--dry-run] [--yes] [--only agents|memory] [--org <slug>] Reuses _lib/openclaw-auth.ts (PR #459) for token + URL resolution. Inlines buildStableConnectionId from owletto-backend's file-loader.ts:56 with a sync-comment — keep matching when either side changes. Snapshot tests cover diff rendering for create/update/noop/drift and the will-restart warning. End-to-end test runs once PR-1 + PR-2 land. * fix(cli): apply — substitute $VAR refs and align connection wire shape Two showstopper bugs caught by pi review before integration test: 1. Connection upsert was sending `type` but PR-2's `PUT /by-stable-id` route expects `platform`. Server would 400 on every connection apply. Aligns the client and apply-cmd payloads to `{ platform, name?, config }`. 2. `$VAR` references in connection configs were collected for the secrets check but never substituted. Apply would send literal `"botToken": "$TELEGRAM_BOT_TOKEN"` to the server. Now resolves against `process.env` (or an injected env for tests) and fails loudly with the var name when missing or empty. Adds `env` option to `loadDesiredState` for testability. Test coverage extended: existing test asserts the resolved value is in the desired config; new test asserts unset $VAR throws a clear error. Validation: typecheck, biome, 27 cli tests — all clean. * docs(plans): lobu secrets push — security-conscious design (#465) * fix(backend): align connection-config encrypt/decrypt prefix (#464) encryptConfig() in postgres-stores.ts wrote raw `iv:tag:ciphertext` output from @lobu/core's `encrypt()`, but decryptConfig() only decrypts strings prefixed with `enc:v1:`. So any secret-named field that ran through encryptConfig was stored as prefixless ciphertext and round-tripped as that ciphertext literal on read. Production was unaffected because ChatInstanceManager normalizes secrets to `secret://` refs before persisting to chat_connections. The agent_connections.config rows written by persistConnectionSnapshot on the older lobu agent-routes paths (and the no-manager fallback added in PR #461) are the affected surface. Fix: - encryptConfig prepends `enc:v1:` to the AES-GCM output. - decryptConfig strips the prefix before delegating to `decrypt()`. - Migration 20260430022231 backfills agent_connections.config rows whose values match the prefixless `iv:tag:ciphertext` shape (24 hex IV + 32 hex tag + hex ciphertext) by re-prefixing them. Idempotent — re-running is a noop because the regex skips already-prefixed values. Adds round-trip + boundary tests in packages/owletto-backend/src/lobu/stores/__tests__/postgres-stores.test.ts. * feat: lobu apply hardening + e2e harness (#468) Three things on top of the merged lobu apply v1: Bug A — PUT /agents/:id/connections/by-stable-id/:stableId now folds `settings` (allowFrom, allowGroups, …) into the noop comparison so settings-only edits trip willRestart instead of being silently noop'd. Symmetric default `allowGroups: true` on the create-fallback path keeps follow-up PUTs round-tripping as noop. Bug B — `diff.canonical` no longer collapses `[]` and `{}` to null; clearing a remote allowlist by setting it to `[]` now produces an `update` action. The `platform` key the route injects into stored connection config is stripped before comparing so an unchanged connection round-trips as `=`. Bootstrap PAT — `start-local.ts` mints a default user / org `dev` / owner-scoped PAT on first boot under `LOBU_LOCAL_BOOTSTRAP=true`, prints it once with the org slug + URL, and persists it under `${OWLETTO_DATA_DIR}/bootstrap-pat.txt`. Production never sets the flag, so the path is dead in cloud. Required collateral: `multi-tenant.ts` now hydrates `c.var.user` for PAT bearers so REST routes that read `c.get('user')` (e.g. POST /agents) work the same as for cli-tokens; the asymmetry was previously masked by the test mock setting `user` directly. CLI client.ts dropped the trailing slash on POST /api/:org/agents — Hono `app.route()` does not match `path/` against `routes.post('/')`. `scripts/e2e-lobu-apply.sh` exercises create → noop → update → drift detect against PGlite end-to-end, asserting REST rows landed. Self-cleans server, data dir, and project dir; tears down on any failure with a tail of the server log. Out of scope: concurrent-apply lock, connection encryption asymmetry, `lobu pull` (v2), secrets push (v3). * fix(backend): close concurrent-apply races on agent + connection upsert (#466) POST /agents was check-then-write: two concurrent applies in the same org could both pass the existence check, both run saveMetadata, and both run the Owletto MCP auto-injection — clobbering operator-set mcpServers. Use INSERT ... ON CONFLICT (id) DO NOTHING RETURNING so only the actual inserter runs the auto-injection; losers fall through to the idempotent 200 path. PUT /:agentId/connections/by-stable-id/:stableId had the same race in its create branch: two concurrent PUTs both observed `existing===null`, both called chatManager.addConnection (or the fallback persist) with the same stableId, and the second clobbered the first via ON CONFLICT DO UPDATE plus spawned a duplicate platform instance. Atomically claim the row with INSERT ... ON CONFLICT DO NOTHING; on conflict, re-read and fall through to the existing update path (which keeps the hardening branch's settingsChanged + allowGroups defaults intact) so concurrent identical PUTs converge on a single noop or update response. * docs(plans): lobu pull — cloud → files round-trip design (#463) * refactor: rename connections to platforms (CLI + TOML + server route) (#470) User-facing rename across the chat-platform CRUD surface: - `lobu connections (list|add)` → `lobu platforms (list|add)`; description reads "Manage chat platforms". - `packages/cli/src/commands/connections/` → `packages/cli/src/commands/platforms/`, with `platforms.ts` → `platform-prompts.ts` to avoid name clash with the command file. - `lobu.toml` schema: `[[agents.<id>.connections]]` → `[[agents.<id>.platforms]]`. The file-loader emits a clear migration error if the legacy key is encountered, pointing the user at the new block. No backwards-compat alias. - Server routes under `/api/:org/agents/:id/` rename `connections` → `platforms` for list / create / get / delete / start / stop / by-stable-id upsert. Body shape (`platform` field) and DB internals (`agent_connections` table, `ChatInstanceManager`) stay as-is — internal naming. - `lobu apply` plumbing: `DesiredConnection` → `DesiredPlatform`, `buildStableConnectionId` → `buildStablePlatformId`, `rowsByKind('connection')` → `rowsByKind('platform')`, plan renderer labels, snapshots, e2e script. - Docs sweep: `packages/landing/src/content/docs/{reference/lobu-toml.md, reference/lobu-apply.md, platforms/*.mdx}`, `docs/plans/lobu-apply.md`, `lobu init` wizard scaffolding + generated `lobu.toml`. Owletto-web submodule rename is a separate PR (the `useAgentConnections` hook collides with an existing `useAgentPlatforms` schema-catalog hook there). Tests: existing tests updated to use new field/route names + a new `file-loader-platforms.test.ts` pinning the migration-error path. * fix(backend): bootstrap PAT mode 0600 + PAT auth source disambiguation (#472) * fix(backend): also gate platform start/stop routes with requireSessionOrAdminPat (#473) Pi follow-up to #472: POST /:agentId/platforms/:platformId/start and .../stop were the last admin-tier mutations on the bare mcpAuth gate. A weak PAT (mcp:read or mcp:read+mcp:write but no mcp:admin) could bounce a connection's lifecycle. Add the helper + 2 admission tests. Also fixes a stale path in the existing PUT /:agentId/connections/by-stable-id admission test — #470 renamed the route to /platforms/by-stable-id but the admission test added in #472 still used the old path and only passed by landing in the 404 branch (now correctly 403 via the renamed path). * fix(backend): close residual races in agent + connection upsert (#474) Pi flagged two follow-ups on PR #466: 1. POST /agents — INSERT and saveSettings(mcpServers) ran as two writes, so a concurrent loser returning 200 in the idempotent branch could PATCH operator config in between, only for the winner's deferred saveSettings to clobber it. Fold mcp_servers into the same INSERT so row + auto-injection land atomically. 2. PUT /:agentId/platforms/by-stable-id/:stableId — the PR-466 atomic-claim works for the no-manager fallback, but with the real ChatInstanceManager a "loser" could re-read the just-created row mid-addConnection and call updateConnection against half-initialized state (potentially double-spawning the chat instance). Add an in-process per-stableId Promise chain (primary serialization for embedded-only deployment) plus a short-lived pg_advisory_xact_lock keyed on (NAMESPACE, hashStableId(stableId)) for cross-process defense-in-depth and pg_locks visibility. Tests: 20-iteration concurrent POST asserts mcpServers.owletto.url is populated regardless of which handler ran the INSERT; concurrent PUT with a counting mock manager asserts addConnection is called exactly once. * fix(cli): address apply review gaps * fix(cli): satisfy CodeQL token and regex checks * chore(cli): remove unused memory helpers * chore(submodule): bump owletto-web to current main * fix(backend): align platform-update fallback settings to manager semantics Fallback path was merging current.settings with the request, so apply could not drop fields like allowFrom — diverging from both the manager path (line 1079) and the change-detection at line 1068 which model replace semantics. Use { allowGroups: true, ...settings } to match.

github-actions Bot added the triage:needs-human Triage agent escalated for human review label Apr 30, 2026

github-actions Bot assigned buremba Apr 30, 2026

chatgpt-codex-connector Bot reviewed Apr 30, 2026

View reviewed changes

buremba force-pushed the feat/concurrent-apply-race-fix branch from 32c0b34 to de85b27 Compare April 30, 2026 03:02

buremba merged commit 3d5993c into feat/owletto-cli-merge Apr 30, 2026
5 checks passed

buremba deleted the feat/concurrent-apply-race-fix branch April 30, 2026 03:06

buremba mentioned this pull request Apr 30, 2026

fix(backend): close residual races in agent + connection upsert #474

Merged

6 tasks

buremba restored the feat/concurrent-apply-race-fix branch May 12, 2026 00:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend): close concurrent-apply races on agent + connection upsert#466

fix(backend): close concurrent-apply races on agent + connection upsert#466
buremba merged 1 commit into
feat/owletto-cli-mergefrom
feat/concurrent-apply-race-fix

buremba commented Apr 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buremba commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Out of scope

Known follow-up

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

buremba commented Apr 30, 2026 •

edited

Loading