testing pull request event#8
Closed
Proletter wants to merge 1 commit into
Closed
Conversation
Contributor
|
Requesting review from: @ignaciolarranaga [auto_pr_review_request] |
4 tasks
simon-iribarren
added a commit
to simon-iribarren/qvac
that referenced
this pull request
Jun 8, 2026
Lifecycle correctness: - Spawn lock: steal only when the owner pid is dead (with an mtime fallback for an unreadable lock), so a legitimate multi-minute cold start no longer loses its lock after 30s and spawns a duplicate runner/serve (tetherto#1). - close(): the fetch path now bails out instead of re-resolving once closed, so a request racing close() can't silently re-add a consumer / spawn a runner (tetherto#3). - sweepServes: when an orphaned serve's pid is alive but its health check fails, keep the record instead of dropping it — dropping stranded a live serve with no registry trace. We only reap once it answers as ours, or drop once its pid dies (tetherto#4). - servePort: fold a pinned port into the fleet key so pinned-port callers don't reuse an auto-allocated serve on a different port, and distinct pins don't collide (tetherto#5). - Respawn: expose baseURL/port/pid as getters over live state, updated on every reconnect, so diagnostics/external clients see the real serve after recovery (tetherto#6). - retargetUrl now handles Request inputs (not just string/URL) so a respawn stays transparent if the SDK ever switches input shapes (tetherto#8). Docs: - README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend liveness; document the long-lived-sentinel/wrapper pattern and fix the misleading "the script doesn't have to stay running" note (tetherto#2). - Reconcile version wording: README/changelog now describe managed mode as unreleased (package is 0.1.0); docs-site integration page documents managed mode + the async overload (tetherto#7). Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the runner-dead + serve-alive + health-failing sweep case. Build + suite green (60 pass / 1 integration skip).
simon-iribarren
added a commit
that referenced
this pull request
Jun 10, 2026
* feat[api]: add managed mode to @qvac/ai-sdk-provider (QVAC-19900)
Add `mode: 'managed'` so the provider can synthesize an ephemeral
qvac.config.json from a model-constant list, spawn and supervise
`qvac serve` on a free port, and tear it down on host exit. External
mode is unchanged and stays synchronous; the managed supervisor is
lazily dynamic-imported so external-mode users pay no startup cost.
@qvac/cli becomes an optional peer dependency.
* fix: resolve @qvac/cli via main entry when its exports block package.json (QVAC-19900)
The published @qvac/cli ships a string `exports` field ("./dist/index.js"),
which makes the `./package.json` subpath non-resolvable
(ERR_PACKAGE_PATH_NOT_EXPORTED). Managed mode relied on resolving
`@qvac/cli/package.json` to locate the bin, so it would fail to find the CLI
on a clean install. Fall back to resolving the package main entry, which for
@qvac/cli is the same file as the `qvac` bin.
* doc: update ai-sdk provider agent setup after queue (QVAC-19900)
* QVAC-19900 feat[api]: per-model config for managed mode
Managed mode `models` now accepts spec objects ({ name, config, preload,
default }) alongside bare constant names, so callers can set per-model serve
options — notably `ctx_size` and `reasoning_budget` — that coding agents like
OpenCode require. The synthesized qvac.config.json carries the config block,
honors explicit `preload`/`default`, and validates names inside spec objects.
Exports the new `QvacManagedModel` type and documents per-model config plus a
managed-mode OpenCode example in the README.
* QVAC-19900 feat[api]: shared idle-reaped managed serve daemon
Rework managed mode from a per-provider supervisor into a shared,
self-cleaning serve daemon so it is robust standalone and usable by any
tool, not just a single session.
- Reuse via a fleet key (model set + per-model config + host) keyed in a
cross-process registry under ~/.qvac/managed-serves/; createQvac attaches
to a matching healthy serve instead of cold-starting a duplicate.
- A detached runner owns the qvac serve child and reaps it once no consumer
process has been alive for serveIdleTimeout (default 5m). Liveness, not
request traffic, is the signal, so it works for tools that hit baseURL
directly (OpenCode/Cline/Aider).
- close() now detaches (deregisters the consumer) instead of killing; a
shared serve survives until its last user is gone.
- Sweep only reaps dead/orphaned serves, never a healthy serve a live
process owns (fixes a second session SIGKILLing a downloading serve).
- Respawn-on-failure: fetch re-resolves and retries once on ECONNREFUSED.
- reuse:false (or a pinned servePort) yields a private serve reaped as soon
as its owner exits.
Refactor into serve-process.ts (spawn/health/stop), registry.ts,
fleet-key.ts, runner.ts; remove supervisor.ts and pid-tracker.ts. Add
reuse and serveIdleTimeout options. Rewrite tests and add reuse/idle-reap
end-to-end coverage; document the shared lifecycle in the README.
* QVAC-19900 fix: reject duplicate model names in managed mode
Each managed model maps to a single serve alias keyed by its name, so a
repeated name silently overwrote the earlier entry — and could drop its
`default: true`. Reject duplicates up front with DuplicateManagedModelError
instead of resolving them ambiguously. Addresses PR review feedback.
* QVAC-19900 fix[api]: address managed-mode self-review findings
- Per-instance consumer markers (<pid>.<rand>) so two providers in one
process sharing a fleet key don't deregister each other on close (A).
- Restrict respawn retry to ECONNREFUSED so an in-flight completion is
never blindly replayed on ECONNRESET/EPIPE (C).
- Health-check the recorded baseURL before SIGTERM-ing an orphaned serve,
guarding against killing a recycled pid (D).
- Use dirname() instead of a posix-only regex for ephemeral config cleanup (E).
- Fold serveBinPath into the fleet key so distinct local builds don't share
a serve (G).
- Export managed error classes + QvacManagedErrorCode for instanceof checks (H).
- Reject more than one explicit default: true (I).
- Deregister the consumer if resolveServe throws (F); drop dead
firstConsumerPid runner param (J).
Tests: per-instance markers, health-gated orphan sweep (kills serving
orphan, spares non-serving stranger pid), fleet-key serveBinPath sensitivity,
multiple-default rejection. README updated.
* QVAC-19900 fix[api]: address managed-mode lifecycle review (round 2)
Lifecycle correctness:
- Spawn lock: steal only when the owner pid is dead (with an mtime fallback for
an unreadable lock), so a legitimate multi-minute cold start no longer loses
its lock after 30s and spawns a duplicate runner/serve (#1).
- close(): the fetch path now bails out instead of re-resolving once closed, so
a request racing close() can't silently re-add a consumer / spawn a runner (#3).
- sweepServes: when an orphaned serve's pid is alive but its health check fails,
keep the record instead of dropping it — dropping stranded a live serve with
no registry trace. We only reap once it answers as ours, or drop once its pid
dies (#4).
- servePort: fold a pinned port into the fleet key so pinned-port callers don't
reuse an auto-allocated serve on a different port, and distinct pins don't
collide (#5).
- Respawn: expose baseURL/port/pid as getters over live state, updated on every
reconnect, so diagnostics/external clients see the real serve after recovery (#6).
- retargetUrl now handles Request inputs (not just string/URL) so a respawn stays
transparent if the SDK ever switches input shapes (#8).
Docs:
- README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend
liveness; document the long-lived-sentinel/wrapper pattern and fix the
misleading "the script doesn't have to stay running" note (#2).
- Reconcile version wording: README/changelog now describe managed mode as
unreleased (package is 0.1.0); docs-site integration page documents managed
mode + the async overload (#7).
Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the
runner-dead + serve-alive + health-failing sweep case. Build + suite green
(60 pass / 1 integration skip).
* docs: use canonical qvac.tether.io URL in ai-sdk-provider README
* QVAC-19900 feat[api]: public model catalog + catalog-id aliases in managed mode
Add `models.qvacCatalog`, a public models.dev-style catalog that maps
friendly ids (`qwen3.5-9b`) to the SDK constant the serve loads
(`QWEN3_5_9B_MULTIMODAL_Q4_K_M`), so the id a user picks from models.dev
resolves end-to-end with no translation layer in front of the serve.
Managed mode now accepts catalog ids as model names: the synthesized
serve config keys the alias by the friendly id while `model` resolves to
the underlying SDK constant, so the serve answers `qwen3.5-9b` directly.
Bare SDK constants keep working unchanged. A drift unit test fails CI if
any catalog constant disappears from the generated SDK catalog.
* QVAC-19900 feat[api]: process-group serve teardown + closeOnParentExit
Harden managed-mode lifecycle so a managed serve never leaks its `bare`
inference worker or outlives the process that owns it.
- Process-group teardown: spawn `qvac serve` detached (its own group) and,
when stopServe must escalate past the grace window, SIGKILL the whole
group. A plain SIGKILL of the serve pid never cascades to the grandchild
bare worker, so previously a wedged serve orphaned the worker. The
graceful SIGTERM is still sent to the serve process only, so a healthy
serve orchestrates its own shutdown and releases the global worker lock
(no stale lock left behind); the group SIGKILL is the wedged-path fallback.
- `closeOnParentExit` option: for a daemon-style host whose sole job is to
keep a managed serve alive for a parent process (e.g. an editor/agent
plugin). The provider watches its parent pid and, the moment the parent
exits (on POSIX we are reparented to init, ppid → 1), closes itself —
deregistering the consumer so the runner reaps the serve — and exits.
Without it a hard-killed parent would leave a reparented host alive,
keeping its consumer marker forever so the serve was never reaped.
Tests: a stubborn-grandchild fake serve proves group teardown reaps the
worker; `parentIsGone` unit-tests the parent-watch decision.
* QVAC-19900 fix: keep managed serve lifecycle correct under close() race and crash-respawn
- Undo the consumer re-registration when close() wins the race against an
in-flight fetch retry: resolveServe re-adds the marker after close() removed
it, which would keep the shared serve warm until the process exits.
- Preserve live consumer markers when sweepServes reaps a crashed/orphaned
serve, so a respawned runner inherits the still-alive sessions instead of
idle-reaping the fresh serve out from under them.
- docs: bump managed-mode ctx_size examples to 32768 for agent-sized prompts.
* QVAC-19900 fix: rename reresolve result to resolved for clarity in managed fetch
* QVAC-19900 mod: collapse redundant sync/async registry teardown helpers
removeConsumer/removeConsumerSync and removeRecord/removeRecordSync were a
confusing sync/async mirror: the async removeConsumer was only ever called right
after the sync one (a guaranteed no-op), and the removeRecord pair was really two
teardown semantics under near-identical names. Marker/record teardown is a single
unlink/rm, cheap enough to be synchronous everywhere — including process 'exit'
handlers where async can't run — so collapse each pair into one sync function.
No behaviour change; addresses review feedback on #2408.
* QVAC-19900 mod: trim verbose comments in managed registry
Tighten the sync-rationale comments on removeRecord/removeConsumer and drop a
stale, broken leftover comment above ensureDirSync. Keeps the non-obvious intent
(why sync, preserveConsumers semantics) without the narration.
* QVAC-19900 mod: drop unused DEFAULT_SERVE_BIN and ephemeralConfigName
Both were dead: DEFAULT_SERVE_BIN was never imported (serve-process spawns the
resolved CLI path verbatim) and ephemeralConfigName was an unused helper
(writeEphemeralConfig uses a fixed name inside an mkdtemp dir). Removing the
latter also drops the now-unused randomBytes import.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.