Skip to content

fix(memory): guard Qdrant singleton against non-default port overwrite#27459

Merged
dvargasfuertes merged 2 commits into
mainfrom
apollo/fix-qdrant-singleton-overwrite
Apr 22, 2026
Merged

fix(memory): guard Qdrant singleton against non-default port overwrite#27459
dvargasfuertes merged 2 commits into
mainfrom
apollo/fix-qdrant-singleton-overwrite

Conversation

@vellum-apollo-bot
Copy link
Copy Markdown
Contributor

@vellum-apollo-bot vellum-apollo-bot Bot commented Apr 22, 2026

Summary

The daemon's initQdrantClient singleton was being silently clobbered by background memory jobs — notably bootstrapFromHistory (triggered by the graph_bootstrap memory job) — that re-initialized against the schema-default URL http://127.0.0.1:6333.

When the CLI spawned the daemon with a non-default port via QDRANT_HTTP_PORT (e.g. 20200 for multi-local instances), every subsequent getQdrantClient() call — PKB reconciliation, all memory jobs — pointed at the wrong Qdrant instance and burned ConnectionRefused errors against the schema default.

Root cause

  1. initQdrantClient unconditionally overwrote the process-wide singleton on every call.
  2. bootstrapFromHistory and graph/inspect.ts re-initialized the client using config.memory.qdrant.url ?? "http://127.0.0.1:6333" — no env precedence, no fallback awareness.
  3. Both of those call sites wrapped the re-init in try { … } catch {}, so even if a type error had surfaced the fallback would have been invisible.

Fix

  • resolveQdrantUrl(config) — single authoritative helper with precedence: QDRANT_HTTP_PORTQDRANT_URLconfig.memory.qdrant.url. Every caller routes through it.
  • Idempotent initQdrantClient — if the singleton already exists, return it; log a warning on URL mismatch instead of replacing. Callers that genuinely need a new client must call resetQdrantClient() first (tests only).
  • resetQdrantClient() — explicit reset for tests.
  • Removed the silent-swallow try/catch around re-init in bootstrap.ts and inspect.ts.
  • Updated all call sites: daemon/lifecycle.ts, memory/graph/bootstrap.ts, memory/graph/inspect.ts, cli/commands/conversations.ts.

Tests

New qdrant-client.test.ts covers:

  • resolveQdrantUrl precedence (env > env > config, non-default config respected)
  • initQdrantClient idempotency — same URL returns same instance
  • initQdrantClient different-URL call is a no-op (regression test for the overwrite bug)
  • resetQdrantClient allows re-init with a new URL, and getQdrantClient() throws after reset without re-init

Checklist

  • tsc --noEmit clean
  • eslint clean
  • CI

Open in Devin Review

devin-ai-integration[bot]

This comment was marked as resolved.

* Logs a warning if a second init attempts a different URL so accidental
* divergence is visible in logs.
*/
export function initQdrantClient(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's focus this PR on just centralizing resolveQdrantUrl in the consumer sites instead of making changes to the qdrant initializer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scoped down — reverted the idempotency, _instanceConfig, and resetQdrantClient bits. initQdrantClient is back to unconditionally overwriting the singleton. The PR is now just: add resolveQdrantUrl and route every consumer site through it.

// Initialize DB and Qdrant before anything else
initializeDb();
const config = getConfig();
try {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of OOS'ing initQdrantClient, keep out try/catch wrapping

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — restored the try/catch here and in bootstrap.ts as part of OOS-ing the initializer changes.

Comment thread assistant/src/daemon/lifecycle.ts Outdated
Comment on lines +742 to +746
// `resolveQdrantUrl` encapsulates the precedence:
// QDRANT_HTTP_PORT → QDRANT_URL → config.memory.qdrant.url
// Keep every caller routed through it so a background job that reads
// the config directly can never overwrite the daemon's live client
// with a URL pointing at the wrong Qdrant instance.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this comment now that all of the commented logic is in resolveQdrantUrl

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -247,7 +246,7 @@ Examples:
);

const config = getConfig();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would've expected qdrant.url to be "http://127.0.0.1:20200" in the workspace config if "resources.qdrantPort": 20200 in the lock file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The CLI does not currently write memory.qdrant.url from the lock file — cli/src/lib/local.ts sets env.QDRANT_HTTP_PORT when spawning the daemon but never materializes the port into the workspace config. That is why resolveQdrantUrl has to fall back to the env var.

Treating that as a follow-up: the CLI should either (a) update memory.qdrant.url in the workspace config whenever resources.qdrantPort changes, or (b) remove memory.qdrant.url from the static config entirely and make the resolved URL derive solely from allocated resources + env overrides. Option (b) is probably cleaner — the static config default has caused divergence more than once.

Happy to open a separate ticket/PR for that once this centralization lands.

Every site that constructs a Qdrant URL (daemon/lifecycle.ts,
memory/graph/bootstrap.ts, memory/graph/inspect.ts,
cli/commands/conversations.ts) now routes through a single helper
`resolveQdrantUrl(config)` with the canonical precedence:

  1. `QDRANT_HTTP_PORT` — locally-spawned sidecar on 127.0.0.1:<port>
     (set by the CLI for multi-local instances)
  2. `QDRANT_URL` — external Qdrant (K8s sidecar, remote URL)
  3. `config.memory.qdrant.url` — static config default

Previously bootstrap.ts and inspect.ts constructed the URL as
`config.memory.qdrant.url ?? "http://127.0.0.1:6333"` — ignoring
both env vars. When the CLI spawned the daemon on a non-default port
via `QDRANT_HTTP_PORT` (e.g. 20200), the background `graph_bootstrap`
memory job re-initialized the singleton against the schema default,
silently redirecting every subsequent `getQdrantClient()` call away
from the real sidecar.
@vellum-apollo-bot vellum-apollo-bot Bot force-pushed the apollo/fix-qdrant-singleton-overwrite branch from d3706cb to 5fba201 Compare April 22, 2026 14:43
dvargasfuertes
dvargasfuertes previously approved these changes Apr 22, 2026
Ten test files use `mock.module("../qdrant-client.js", ...)` with
explicit export shapes. Adding `resolveQdrantUrl` to the exported
surface broke graph-search.test.ts because its mock didn't declare
the new symbol, and a later module-reload pathway needed it.

Stubs return the schema default (`http://127.0.0.1:6333`); no test
exercises the URL resolution itself, so the value is inert.
@dvargasfuertes dvargasfuertes merged commit b9a2b00 into main Apr 22, 2026
17 of 18 checks passed
@dvargasfuertes dvargasfuertes deleted the apollo/fix-qdrant-singleton-overwrite branch April 22, 2026 16:44
vellum-apollo-bot Bot added a commit that referenced this pull request May 27, 2026
…k.module sites

Bun's `mock.module(...)` fully replaces a module's exports, so any test
that mocks `memory/conversation-crud.js` and transitively loads the
new persistence default plugin (which now imports `reserveMessage`)
crashes at import-time with:

  SyntaxError: Export named 'reserveMessage' not found in module

CI caught `background-workers-disk-pressure.test.ts` on this PR
because its fixture chain loads the persistence plugin. The other 80
mock.module sites don't trigger today but will the moment any of them
gain a transitive import that reaches the persistence pipeline — so
we patch them all up-front, the same way PR #27459 taught us to handle
`resolveQdrantUrl`.

Stub shape mirrors the other primitives (`mock(async () => ({ id: ... }))`)
so callers that `await reserveMessage(...)` see a Promise-shaped return.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant