Skip to content

QVAC-19900 feat[api]: add managed mode to @qvac/ai-sdk-provider#2408

Merged
simon-iribarren merged 22 commits into
tetherto:mainfrom
simon-iribarren:feat/qvac-19900-ai-sdk-provider-managed-mode
Jun 10, 2026
Merged

QVAC-19900 feat[api]: add managed mode to @qvac/ai-sdk-provider#2408
simon-iribarren merged 22 commits into
tetherto:mainfrom
simon-iribarren:feat/qvac-19900-ai-sdk-provider-managed-mode

Conversation

@simon-iribarren

@simon-iribarren simon-iribarren commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

@qvac/ai-sdk-provider v1 (0.1.0) is external-mode only: the user must hand-author a qvac.config.json and keep qvac serve openai running in a separate terminal before the provider works. This PR adds managed mode (QVAC-19900) so the provider runs qvac serve itself.

The hard part isn't spawning a serve — it's the lifecycle once you do it for real:

  • A naive "spawn on start, kill on close" design orphans serves (a hard crash leaves qvac serve running forever — we hit exactly this in testing) and duplicates them (every session/tool reloads the model into memory).
  • Coding-agent harnesses (OpenCode, Cline, Aider) connect straight to the baseURL, so any "is it still in use?" heuristic based on the provider's own request traffic is blind to them.
  • The serve default ctx_size of 1024 is far too small for an agent's tool-laden system prompt, and reasoning models spam <think> blocks.

So managed mode is built as a shared, self-cleaning serve daemon that is robust standalone and usable by any tool, not just one in-process session. External mode is left byte-for-byte unchanged.

📝 How does it solve it?

createQvac({ mode: 'managed', models: [...] }) synthesizes an ephemeral qvac.config.json from SDK model-constant names and brings up a serve — but the serve is owned by a detached runner, not by your process:

  • Reuse via a fleet key (src/managed/fleet-key.ts): a hash of { model set + per-model config + host } keys a cross-process registry under ~/.qvac/managed-serves/. createQvac attaches to a matching healthy serve instead of cold-starting a duplicate. Different models/config ⇒ different key ⇒ its own serve.
  • Detached runner owns the serve (src/managed/runner.ts): it spawns qvac serve, publishes the registry record, and reaps the serve once no consumer process has been alive for serveIdleTimeout (default 5 min). Liveness — not request traffic — is the signal, so it works for tools that hit baseURL directly. This survives your script exiting, so a serve can be shared and still cleaned up.
  • close() detaches, it doesn't kill: it deregisters this session as a consumer. A serve still in use by another session keeps running; an unused one is reaped after the idle timeout. Abrupt exits (Ctrl-C/crash) are handled by dead-PID pruning.
  • Non-destructive sweep (src/managed/registry.ts): every start reaps only dead/orphaned serves — never a healthy serve a live runner owns. This fixes the original bug where a second session would SIGKILL a serve another session was still downloading into.
  • Respawn-on-failure: the provider's fetch wrapper transparently re-resolves (reattach or spawn) and retries once on ECONNREFUSED, so a serve crash mid-session self-heals.
  • Per-model config (QvacManagedModel): models entries may be { name, config, preload, default }, so callers set ctx_size / reasoning_budget per model — the agent-friendly setup. Duplicate model names are rejected up front (DuplicateManagedModelError) instead of silently overwriting.
  • Private serves: reuse: false (or pinning servePort) gives a dedicated, non-shared serve reaped as soon as its owner exits.

src/managed/serve-process.ts holds the low-level spawn / health-poll / SIGTERM→grace→SIGKILL primitives used by the runner. @qvac/cli is an optional peer dependency, resolved via process.execPath + its entry (or an explicit serveBinPath). The whole managed subsystem is lazily dynamic-imported only when mode: 'managed' is set, so external mode stays synchronous and pays nothing. Portable node: APIs only — identical on Node 20+ and Bun.

🧪 How was it tested?

  • bun run lint (tsc), bun run build, bun run test — green: 51 pass / 1 skip (the skip is the real-model integration test, gated by QVAC_INTEGRATION_TEST=1).
  • New/updated unit + integration suites:
    • managed-registry — record round-trip, consumer markers + dead-PID pruning, findReusableServe health gating, and a non-destructive sweep that drops dead records, kills runner-orphaned serves, and leaves a healthy owned serve untouched.
    • managed-fleet-key — stable, order-insensitive key; changes with per-model config and host.
    • managed-runner — pure idle-reap decision (consumer liveness + timeout, incl. zero-timeout private serves).
    • managed-serve-process — fake-serve driven: healthy, start-timeout, early-exit/crash, SIGKILL-escalation.
    • managed-provider — end-to-end through the real detached runner: auto-spawn, reuse (second createQvac → same serve pid), and idle-reap (serve gone + registry cleared after the timeout).
    • managed-config — per-model config emission + duplicate-name rejection.
  • Manually verified end-to-end against a real cached Qwen3-8B serve (tasks harness): healthy spawn → real /v1/chat/completions → second session reused the same pid → close() left it running while a consumer remained → idle-reaped (process gone + registry empty) once the last consumer left.

🔌 API Changes

New mode: 'managed' overload of createQvac (returns a Promise<ManagedQvacProvider>), plus the QvacManagedOptions / QvacManagedModel / ManagedQvacProvider types. External createQvac is unchanged.

import { createQvac } from '@qvac/ai-sdk-provider'
import { generateText } from 'ai'

// Auto-spawns (or REUSES) a shared `qvac serve`; resolves once it's healthy.
const qvac = await createQvac({
  mode: 'managed',
  models: [
    // Per-model serve config — agents need a real ctx window + no <think> blocks.
    { name: 'QWEN3_8B_INST_Q4_K_M', config: { ctx_size: 16384, reasoning_budget: 0 }, default: true }
  ],
  serveIdleTimeout: 300_000 // keep the shared serve 5 min after the last user exits
})

try {
  const { text } = await generateText({
    model: qvac('QWEN3_8B_INST_Q4_K_M'),
    prompt: 'Write a haiku about local-first AI.'
  })
  console.log(text)
} finally {
  await qvac.close() // detaches this session; a shared serve keeps running for others
}

// AsyncDisposable — `await using` handles detach automatically:
// await using qvac = await createQvac({ mode: 'managed', models: ['QWEN3_600M_INST_Q4'] })

QvacManagedOptions (all optional besides mode/models): servePort, serveHost, serveStartTimeout, serveBinPath, reuse (default true; false ⇒ private serve), serveIdleTimeout (default 300000), plus the shared apiKey / headers / fetch. QvacManagedModel: { name, config?, preload?, default? }. ManagedQvacProvider exposes close(), [Symbol.asyncDispose], port, pid, baseURL.


Follow-up (separate task): version bump to 0.2.0 + publish via the fork → release branch → main flow.

Add `mode: 'managed'` so the provider can synthesize an ephemeral
qvac.config.json from a model-constant list, spawn and supervise
`qvac serve` on a free port, and tear it down on host exit. External
mode is unchanged and stays synchronous; the managed supervisor is
lazily dynamic-imported so external-mode users pay no startup cost.

@qvac/cli becomes an optional peer dependency.
@simon-iribarren simon-iribarren requested review from a team as code owners June 3, 2026 09:47
@simon-iribarren simon-iribarren added tier1 verified Authorize secrets / label-gate in PR workflows labels Jun 3, 2026
…json (QVAC-19900)

The published @qvac/cli ships a string `exports` field ("./dist/index.js"),
which makes the `./package.json` subpath non-resolvable
(ERR_PACKAGE_PATH_NOT_EXPORTED). Managed mode relied on resolving
`@qvac/cli/package.json` to locate the bin, so it would fail to find the CLI
on a clean install. Fall back to resolving the package main entry, which for
@qvac/cli is the same file as the `qvac` bin.
Comment thread packages/ai-sdk-provider/src/managed/supervisor.ts Outdated
Comment thread packages/ai-sdk-provider/src/managed/config-synthesizer.ts Outdated
Comment thread packages/ai-sdk-provider/src/managed/supervisor.ts Outdated
Managed mode `models` now accepts spec objects ({ name, config, preload,
default }) alongside bare constant names, so callers can set per-model serve
options — notably `ctx_size` and `reasoning_budget` — that coding agents like
OpenCode require. The synthesized qvac.config.json carries the config block,
honors explicit `preload`/`default`, and validates names inside spec objects.

Exports the new `QvacManagedModel` type and documents per-model config plus a
managed-mode OpenCode example in the README.
Rework managed mode from a per-provider supervisor into a shared,
self-cleaning serve daemon so it is robust standalone and usable by any
tool, not just a single session.

- Reuse via a fleet key (model set + per-model config + host) keyed in a
  cross-process registry under ~/.qvac/managed-serves/; createQvac attaches
  to a matching healthy serve instead of cold-starting a duplicate.
- A detached runner owns the qvac serve child and reaps it once no consumer
  process has been alive for serveIdleTimeout (default 5m). Liveness, not
  request traffic, is the signal, so it works for tools that hit baseURL
  directly (OpenCode/Cline/Aider).
- close() now detaches (deregisters the consumer) instead of killing; a
  shared serve survives until its last user is gone.
- Sweep only reaps dead/orphaned serves, never a healthy serve a live
  process owns (fixes a second session SIGKILLing a downloading serve).
- Respawn-on-failure: fetch re-resolves and retries once on ECONNREFUSED.
- reuse:false (or a pinned servePort) yields a private serve reaped as soon
  as its owner exits.

Refactor into serve-process.ts (spawn/health/stop), registry.ts,
fleet-key.ts, runner.ts; remove supervisor.ts and pid-tracker.ts. Add
reuse and serveIdleTimeout options. Rewrite tests and add reuse/idle-reap
end-to-end coverage; document the shared lifecycle in the README.
Each managed model maps to a single serve alias keyed by its name, so a
repeated name silently overwrote the earlier entry — and could drop its
`default: true`. Reject duplicates up front with DuplicateManagedModelError
instead of resolving them ambiguously. Addresses PR review feedback.
- Per-instance consumer markers (<pid>.<rand>) so two providers in one
  process sharing a fleet key don't deregister each other on close (A).
- Restrict respawn retry to ECONNREFUSED so an in-flight completion is
  never blindly replayed on ECONNRESET/EPIPE (C).
- Health-check the recorded baseURL before SIGTERM-ing an orphaned serve,
  guarding against killing a recycled pid (D).
- Use dirname() instead of a posix-only regex for ephemeral config cleanup (E).
- Fold serveBinPath into the fleet key so distinct local builds don't share
  a serve (G).
- Export managed error classes + QvacManagedErrorCode for instanceof checks (H).
- Reject more than one explicit default: true (I).
- Deregister the consumer if resolveServe throws (F); drop dead
  firstConsumerPid runner param (J).

Tests: per-instance markers, health-gated orphan sweep (kills serving
orphan, spares non-serving stranger pid), fleet-key serveBinPath sensitivity,
multiple-default rejection. README updated.
Lifecycle correctness:
- Spawn lock: steal only when the owner pid is dead (with an mtime fallback for
  an unreadable lock), so a legitimate multi-minute cold start no longer loses
  its lock after 30s and spawns a duplicate runner/serve (tetherto#1).
- close(): the fetch path now bails out instead of re-resolving once closed, so
  a request racing close() can't silently re-add a consumer / spawn a runner (tetherto#3).
- sweepServes: when an orphaned serve's pid is alive but its health check fails,
  keep the record instead of dropping it — dropping stranded a live serve with
  no registry trace. We only reap once it answers as ours, or drop once its pid
  dies (tetherto#4).
- servePort: fold a pinned port into the fleet key so pinned-port callers don't
  reuse an auto-allocated serve on a different port, and distinct pins don't
  collide (tetherto#5).
- Respawn: expose baseURL/port/pid as getters over live state, updated on every
  reconnect, so diagnostics/external clients see the real serve after recovery (tetherto#6).
- retargetUrl now handles Request inputs (not just string/URL) so a respawn stays
  transparent if the SDK ever switches input shapes (tetherto#8).

Docs:
- README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend
  liveness; document the long-lived-sentinel/wrapper pattern and fix the
  misleading "the script doesn't have to stay running" note (tetherto#2).
- Reconcile version wording: README/changelog now describe managed mode as
  unreleased (package is 0.1.0); docs-site integration page documents managed
  mode + the async overload (tetherto#7).

Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the
runner-dead + serve-alive + health-failing sweep case. Build + suite green
(60 pass / 1 integration skip).
Comment thread docs/website/content/docs/cli/http-server/integration.mdx Outdated
…naged mode

Add `models.qvacCatalog`, a public models.dev-style catalog that maps
friendly ids (`qwen3.5-9b`) to the SDK constant the serve loads
(`QWEN3_5_9B_MULTIMODAL_Q4_K_M`), so the id a user picks from models.dev
resolves end-to-end with no translation layer in front of the serve.

Managed mode now accepts catalog ids as model names: the synthesized
serve config keys the alias by the friendly id while `model` resolves to
the underlying SDK constant, so the serve answers `qwen3.5-9b` directly.
Bare SDK constants keep working unchanged. A drift unit test fails CI if
any catalog constant disappears from the generated SDK catalog.

@lauripiisang lauripiisang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few notes that probably should be addressed.
Also naming isn't super obvious re: const re= await reresolve()

Comment thread packages/ai-sdk-provider/src/managed/index.ts Outdated
Comment thread packages/ai-sdk-provider/src/managed/registry.ts Outdated
Harden managed-mode lifecycle so a managed serve never leaks its `bare`
inference worker or outlives the process that owns it.

- Process-group teardown: spawn `qvac serve` detached (its own group) and,
  when stopServe must escalate past the grace window, SIGKILL the whole
  group. A plain SIGKILL of the serve pid never cascades to the grandchild
  bare worker, so previously a wedged serve orphaned the worker. The
  graceful SIGTERM is still sent to the serve process only, so a healthy
  serve orchestrates its own shutdown and releases the global worker lock
  (no stale lock left behind); the group SIGKILL is the wedged-path fallback.

- `closeOnParentExit` option: for a daemon-style host whose sole job is to
  keep a managed serve alive for a parent process (e.g. an editor/agent
  plugin). The provider watches its parent pid and, the moment the parent
  exits (on POSIX we are reparented to init, ppid → 1), closes itself —
  deregistering the consumer so the runner reaps the serve — and exits.
  Without it a hard-killed parent would leave a reparented host alive,
  keeping its consumer marker forever so the serve was never reaped.

Tests: a stubborn-grandchild fake serve proves group teardown reaps the
worker; `parentIsGone` unit-tests the parent-watch decision.
…ce and crash-respawn

- Undo the consumer re-registration when close() wins the race against an
  in-flight fetch retry: resolveServe re-adds the marker after close() removed
  it, which would keep the shared serve warm until the process exits.
- Preserve live consumer markers when sweepServes reaps a crashed/orphaned
  serve, so a respawned runner inherits the still-alive sessions instead of
  idle-reaping the fresh serve out from under them.
- docs: bump managed-mode ctx_size examples to 32768 for agent-sized prompts.
opaninakuffo
opaninakuffo previously approved these changes Jun 9, 2026
Comment thread packages/ai-sdk-provider/src/managed/registry.ts Outdated
Comment thread packages/ai-sdk-provider/src/managed/registry.ts Outdated
removeConsumer/removeConsumerSync and removeRecord/removeRecordSync were a
confusing sync/async mirror: the async removeConsumer was only ever called right
after the sync one (a guaranteed no-op), and the removeRecord pair was really two
teardown semantics under near-identical names. Marker/record teardown is a single
unlink/rm, cheap enough to be synchronous everywhere — including process 'exit'
handlers where async can't run — so collapse each pair into one sync function.
No behaviour change; addresses review feedback on tetherto#2408.
Tighten the sync-rationale comments on removeRecord/removeConsumer and drop a
stale, broken leftover comment above ensureDirSync. Keeps the non-obvious intent
(why sync, preserveConsumers semantics) without the narration.
Comment thread packages/ai-sdk-provider/src/defaults.ts Outdated
Comment thread packages/ai-sdk-provider/src/managed/config-synthesizer.ts Outdated
Both were dead: DEFAULT_SERVE_BIN was never imported (serve-process spawns the
resolved CLI path verbatim) and ephemeralConfigName was an unused helper
(writeEphemeralConfig uses a fixed name inside an mkdtemp dir). Removing the
latter also drops the now-unused randomBytes import.
@simon-iribarren

Copy link
Copy Markdown
Contributor Author

/review

@github-actions

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (2/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

@simon-iribarren simon-iribarren merged commit 95eb489 into tetherto:main Jun 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tier1 verified Authorize secrets / label-gate in PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants