Skip to content

testing pull request event#8

Closed
Proletter wants to merge 1 commit into
mainfrom
qvac-cli-integration-test2
Closed

testing pull request event#8
Proletter wants to merge 1 commit into
mainfrom
qvac-cli-integration-test2

Conversation

@Proletter

Copy link
Copy Markdown
Collaborator

No description provided.

@github-actions

github-actions Bot commented Jan 8, 2026

Copy link
Copy Markdown
Contributor

Requesting review from: @ignaciolarranaga [auto_pr_review_request]

@Proletter Proletter closed this Jan 8, 2026
@Proletter Proletter deleted the qvac-cli-integration-test2 branch January 8, 2026 15:44
simon-iribarren added a commit to simon-iribarren/qvac that referenced this pull request Jun 8, 2026
Lifecycle correctness:
- Spawn lock: steal only when the owner pid is dead (with an mtime fallback for
  an unreadable lock), so a legitimate multi-minute cold start no longer loses
  its lock after 30s and spawns a duplicate runner/serve (tetherto#1).
- close(): the fetch path now bails out instead of re-resolving once closed, so
  a request racing close() can't silently re-add a consumer / spawn a runner (tetherto#3).
- sweepServes: when an orphaned serve's pid is alive but its health check fails,
  keep the record instead of dropping it — dropping stranded a live serve with
  no registry trace. We only reap once it answers as ours, or drop once its pid
  dies (tetherto#4).
- servePort: fold a pinned port into the fleet key so pinned-port callers don't
  reuse an auto-allocated serve on a different port, and distinct pins don't
  collide (tetherto#5).
- Respawn: expose baseURL/port/pid as getters over live state, updated on every
  reconnect, so diagnostics/external clients see the real serve after recovery (tetherto#6).
- retargetUrl now handles Request inputs (not just string/URL) so a respawn stays
  transparent if the SDK ever switches input shapes (tetherto#8).

Docs:
- README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend
  liveness; document the long-lived-sentinel/wrapper pattern and fix the
  misleading "the script doesn't have to stay running" note (tetherto#2).
- Reconcile version wording: README/changelog now describe managed mode as
  unreleased (package is 0.1.0); docs-site integration page documents managed
  mode + the async overload (tetherto#7).

Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the
runner-dead + serve-alive + health-failing sweep case. Build + suite green
(60 pass / 1 integration skip).
simon-iribarren added a commit that referenced this pull request Jun 10, 2026
* feat[api]: add managed mode to @qvac/ai-sdk-provider (QVAC-19900)

Add `mode: 'managed'` so the provider can synthesize an ephemeral
qvac.config.json from a model-constant list, spawn and supervise
`qvac serve` on a free port, and tear it down on host exit. External
mode is unchanged and stays synchronous; the managed supervisor is
lazily dynamic-imported so external-mode users pay no startup cost.

@qvac/cli becomes an optional peer dependency.

* fix: resolve @qvac/cli via main entry when its exports block package.json (QVAC-19900)

The published @qvac/cli ships a string `exports` field ("./dist/index.js"),
which makes the `./package.json` subpath non-resolvable
(ERR_PACKAGE_PATH_NOT_EXPORTED). Managed mode relied on resolving
`@qvac/cli/package.json` to locate the bin, so it would fail to find the CLI
on a clean install. Fall back to resolving the package main entry, which for
@qvac/cli is the same file as the `qvac` bin.

* doc: update ai-sdk provider agent setup after queue (QVAC-19900)

* QVAC-19900 feat[api]: per-model config for managed mode

Managed mode `models` now accepts spec objects ({ name, config, preload,
default }) alongside bare constant names, so callers can set per-model serve
options — notably `ctx_size` and `reasoning_budget` — that coding agents like
OpenCode require. The synthesized qvac.config.json carries the config block,
honors explicit `preload`/`default`, and validates names inside spec objects.

Exports the new `QvacManagedModel` type and documents per-model config plus a
managed-mode OpenCode example in the README.

* QVAC-19900 feat[api]: shared idle-reaped managed serve daemon

Rework managed mode from a per-provider supervisor into a shared,
self-cleaning serve daemon so it is robust standalone and usable by any
tool, not just a single session.

- Reuse via a fleet key (model set + per-model config + host) keyed in a
  cross-process registry under ~/.qvac/managed-serves/; createQvac attaches
  to a matching healthy serve instead of cold-starting a duplicate.
- A detached runner owns the qvac serve child and reaps it once no consumer
  process has been alive for serveIdleTimeout (default 5m). Liveness, not
  request traffic, is the signal, so it works for tools that hit baseURL
  directly (OpenCode/Cline/Aider).
- close() now detaches (deregisters the consumer) instead of killing; a
  shared serve survives until its last user is gone.
- Sweep only reaps dead/orphaned serves, never a healthy serve a live
  process owns (fixes a second session SIGKILLing a downloading serve).
- Respawn-on-failure: fetch re-resolves and retries once on ECONNREFUSED.
- reuse:false (or a pinned servePort) yields a private serve reaped as soon
  as its owner exits.

Refactor into serve-process.ts (spawn/health/stop), registry.ts,
fleet-key.ts, runner.ts; remove supervisor.ts and pid-tracker.ts. Add
reuse and serveIdleTimeout options. Rewrite tests and add reuse/idle-reap
end-to-end coverage; document the shared lifecycle in the README.

* QVAC-19900 fix: reject duplicate model names in managed mode

Each managed model maps to a single serve alias keyed by its name, so a
repeated name silently overwrote the earlier entry — and could drop its
`default: true`. Reject duplicates up front with DuplicateManagedModelError
instead of resolving them ambiguously. Addresses PR review feedback.

* QVAC-19900 fix[api]: address managed-mode self-review findings

- Per-instance consumer markers (<pid>.<rand>) so two providers in one
  process sharing a fleet key don't deregister each other on close (A).
- Restrict respawn retry to ECONNREFUSED so an in-flight completion is
  never blindly replayed on ECONNRESET/EPIPE (C).
- Health-check the recorded baseURL before SIGTERM-ing an orphaned serve,
  guarding against killing a recycled pid (D).
- Use dirname() instead of a posix-only regex for ephemeral config cleanup (E).
- Fold serveBinPath into the fleet key so distinct local builds don't share
  a serve (G).
- Export managed error classes + QvacManagedErrorCode for instanceof checks (H).
- Reject more than one explicit default: true (I).
- Deregister the consumer if resolveServe throws (F); drop dead
  firstConsumerPid runner param (J).

Tests: per-instance markers, health-gated orphan sweep (kills serving
orphan, spares non-serving stranger pid), fleet-key serveBinPath sensitivity,
multiple-default rejection. README updated.

* QVAC-19900 fix[api]: address managed-mode lifecycle review (round 2)

Lifecycle correctness:
- Spawn lock: steal only when the owner pid is dead (with an mtime fallback for
  an unreadable lock), so a legitimate multi-minute cold start no longer loses
  its lock after 30s and spawns a duplicate runner/serve (#1).
- close(): the fetch path now bails out instead of re-resolving once closed, so
  a request racing close() can't silently re-add a consumer / spawn a runner (#3).
- sweepServes: when an orphaned serve's pid is alive but its health check fails,
  keep the record instead of dropping it — dropping stranded a live serve with
  no registry trace. We only reap once it answers as ours, or drop once its pid
  dies (#4).
- servePort: fold a pinned port into the fleet key so pinned-port callers don't
  reuse an auto-allocated serve on a different port, and distinct pins don't
  collide (#5).
- Respawn: expose baseURL/port/pid as getters over live state, updated on every
  reconnect, so diagnostics/external clients see the real serve after recovery (#6).
- retargetUrl now handles Request inputs (not just string/URL) so a respawn stays
  transparent if the SDK ever switches input shapes (#8).

Docs:
- README + docs-site: direct-baseURL tools (OpenCode/Cline/Aider) don't extend
  liveness; document the long-lived-sentinel/wrapper pattern and fix the
  misleading "the script doesn't have to stay running" note (#2).
- Reconcile version wording: README/changelog now describe managed mode as
  unreleased (package is 0.1.0); docs-site integration page documents managed
  mode + the async overload (#7).

Tests: spawn-lock steal/keep matrix, fleet-key pinned-port sensitivity, and the
runner-dead + serve-alive + health-failing sweep case. Build + suite green
(60 pass / 1 integration skip).

* docs: use canonical qvac.tether.io URL in ai-sdk-provider README

* QVAC-19900 feat[api]: public model catalog + catalog-id aliases in managed mode

Add `models.qvacCatalog`, a public models.dev-style catalog that maps
friendly ids (`qwen3.5-9b`) to the SDK constant the serve loads
(`QWEN3_5_9B_MULTIMODAL_Q4_K_M`), so the id a user picks from models.dev
resolves end-to-end with no translation layer in front of the serve.

Managed mode now accepts catalog ids as model names: the synthesized
serve config keys the alias by the friendly id while `model` resolves to
the underlying SDK constant, so the serve answers `qwen3.5-9b` directly.
Bare SDK constants keep working unchanged. A drift unit test fails CI if
any catalog constant disappears from the generated SDK catalog.

* QVAC-19900 feat[api]: process-group serve teardown + closeOnParentExit

Harden managed-mode lifecycle so a managed serve never leaks its `bare`
inference worker or outlives the process that owns it.

- Process-group teardown: spawn `qvac serve` detached (its own group) and,
  when stopServe must escalate past the grace window, SIGKILL the whole
  group. A plain SIGKILL of the serve pid never cascades to the grandchild
  bare worker, so previously a wedged serve orphaned the worker. The
  graceful SIGTERM is still sent to the serve process only, so a healthy
  serve orchestrates its own shutdown and releases the global worker lock
  (no stale lock left behind); the group SIGKILL is the wedged-path fallback.

- `closeOnParentExit` option: for a daemon-style host whose sole job is to
  keep a managed serve alive for a parent process (e.g. an editor/agent
  plugin). The provider watches its parent pid and, the moment the parent
  exits (on POSIX we are reparented to init, ppid → 1), closes itself —
  deregistering the consumer so the runner reaps the serve — and exits.
  Without it a hard-killed parent would leave a reparented host alive,
  keeping its consumer marker forever so the serve was never reaped.

Tests: a stubborn-grandchild fake serve proves group teardown reaps the
worker; `parentIsGone` unit-tests the parent-watch decision.

* QVAC-19900 fix: keep managed serve lifecycle correct under close() race and crash-respawn

- Undo the consumer re-registration when close() wins the race against an
  in-flight fetch retry: resolveServe re-adds the marker after close() removed
  it, which would keep the shared serve warm until the process exits.
- Preserve live consumer markers when sweepServes reaps a crashed/orphaned
  serve, so a respawned runner inherits the still-alive sessions instead of
  idle-reaping the fresh serve out from under them.
- docs: bump managed-mode ctx_size examples to 32768 for agent-sized prompts.

* QVAC-19900 fix: rename reresolve result to resolved for clarity in managed fetch

* QVAC-19900 mod: collapse redundant sync/async registry teardown helpers

removeConsumer/removeConsumerSync and removeRecord/removeRecordSync were a
confusing sync/async mirror: the async removeConsumer was only ever called right
after the sync one (a guaranteed no-op), and the removeRecord pair was really two
teardown semantics under near-identical names. Marker/record teardown is a single
unlink/rm, cheap enough to be synchronous everywhere — including process 'exit'
handlers where async can't run — so collapse each pair into one sync function.
No behaviour change; addresses review feedback on #2408.

* QVAC-19900 mod: trim verbose comments in managed registry

Tighten the sync-rationale comments on removeRecord/removeConsumer and drop a
stale, broken leftover comment above ensureDirSync. Keeps the non-obvious intent
(why sync, preserveConsumers semantics) without the narration.

* QVAC-19900 mod: drop unused DEFAULT_SERVE_BIN and ephemeralConfigName

Both were dead: DEFAULT_SERVE_BIN was never imported (serve-process spawns the
resolved CLI path verbatim) and ephemeralConfigName was an unused helper
(writeEphemeralConfig uses a fixed name inside an mkdtemp dir). Removing the
latter also drops the now-unused randomBytes import.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant