Skip to content

chore[bc|notask]: migrate SDK plugins to new addon constructor shape#1510

Closed
donriddo wants to merge 106 commits into
tetherto:mainfrom
donriddo:chore/sdk-addon-interface-refactor
Closed

chore[bc|notask]: migrate SDK plugins to new addon constructor shape#1510
donriddo wants to merge 106 commits into
tetherto:mainfrom
donriddo:chore/sdk-addon-interface-refactor

Conversation

@donriddo

@donriddo donriddo commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

The addon interface refactors landed on main without the matching SDK plugin migration:

The SDK plugins (llamacpp-completion, llamacpp-embedding, sdcpp-generation) were still instantiating the addons with the old two-argument shape (new Addon({ loader, diskPath, modelName, ... }, config)), which the refactored addons no longer accept.

📝 How does it solve it?

  • Migrate the three SDK plugins to the new addon constructor: single { files, config, logger, opts } argument, { model, loader: undefined } return.
  • New helper packages/sdk/server/utils/expand-gguf-shards.ts turns a single sharded GGUF path into the ordered list the addon's files.model expects (metadata .tensors.txt first, then each shard). Pure string manipulation, no filesystem access, cross-platform separator handling.
  • Drop FilesystemDL, parseModelPath, and the asLoader adapter from the migrated plugins — addons now take absolute paths directly. (nmtcpp-translation and parakeet-transcription still use the old pattern until their own refactors land; their imports are unaffected.)
  • Bump SDK deps to the published addon versions: @qvac/embed-llamacpp@^0.14.0, @qvac/llm-llamacpp@^0.16.0, @qvac/diffusion-cpp@^0.3.0.

🧪 How was it tested?

  • bun run build — lint + typecheck clean (--max-warnings=0).
  • bun run test:unit — new expand-gguf-shards.test.ts passes (8/8 tests, 28/28 asserts).
  • bun install --frozen-lockfile resolves cleanly; no file:../ links remain in bun.lock.
  • packages/cli/ and packages/sdk/tests-qvac/ grepped for stale references to old addon shapes — none.

🔌 API Changes

No public SDK API change. Plugin internals only.

💥 Breaking Changes

None for SDK consumers. Internal plugin contract with the native addons is the only thing that changed, and that contract was already broken by the addon-side refactors.

donriddo added 27 commits April 9, 2026 06:17
…embed addon

Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider

Constructor now takes { files: { model: string[] }, config, logger, opts }
instead of { loader, diskPath, modelName }.
…LLM addon

Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider

Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger, opts }
instead of { loader, diskPath, modelName, projectionModel } + config.

All finetune, media, and filtered logger functionality preserved.
…t callback

FinetuneProgress must call updateStats(data.stats), not updateOutput(data).
Finetune terminal JobEnded must call ended(data) as result, not updateStats.
Remove FilesystemDL dependency, use files: { model: [...] } pattern.
…r shape

Update 13 examples and sharded model test to use files: { model: [...] } pattern.
Remove FilesystemDL dependency from all examples and tests.
The network loader test used the old loader-based constructor.
Rewritten to download shards via HttpDL to disk, then pass absolute paths.
Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization

Constructor now takes { files: { model, clipL?, clipG?, t5Xxl?, llm?, vae? }, config, logger, opts }
instead of { diskPath, modelName, clipLModel, ... } + config.

All examples and tests updated to new constructor shape.
Add optional backendDevice (cpu | gpu) to completion and embed stats.
Maps through from addon RuntimeStats to SDK response schemas.
Depends on addon PR tetherto#1393 which adds the stat at the C++ layer.
The C++ embed addon converts config map to llama.cpp CLI flags via
--key value format. Boolean flags like --no-mmap must be passed with
an empty string value so the C++ side emits just --no-mmap (no value).
String true was being passed as --no-mmap true which llama.cpp rejects.
Adds embedWithStats() to the SDK client API so consumers can read the
embed addon runtime stats (totalTime, tokensPerSecond, totalTokens,
backendDevice) without giving up the simple embed() shape that returns
just the vectors. Re-exports EmbedStats for typed consumption.

Adds unit tests covering backendDevice on both embedStatsSchema and
completionStatsSchema, including round-tripping through the response
schemas, to lock in the cpu/gpu enum and prevent regressions.
The embed, LLM, and diffusion addons (PRs tetherto#1493, tetherto#1494, tetherto#1496) drop
BaseInference + WeightsProvider in favour of composable utilities and
expose a single-arg constructor of the form
`new Addon({ files, config, logger, opts })` where `files.model` is
either a string (diffusion) or an array of absolute paths to all GGUF
shards (embed/LLM). Per the addon-loader-abstraction proposal, shard
expansion now belongs to the SDK plugin layer rather than the addon.

This commit:

- Adds `expandGGUFIntoShards()` in `server/utils`, mirroring the C++
  `GGUFShards::expandGGUFIntoShards` regex contract. It is pure string
  manipulation so it works under both bun (unit tests) and bare
  (production), and handles POSIX and Windows separators.

- Rewrites `llamacpp-embedding`, `llamacpp-completion`, and
  `sdcpp-generation` plugins to call the new constructor. FilesystemDL
  and the loader-adapter shim are dropped from these three plugins —
  they no longer create or pass a loader, returning
  `{ model, loader: undefined }` (the SD plugin already followed this
  pattern). The diffusion artifact keys are renamed
  `clipLModel`/`t5XxlModel`/etc. → `clipL`/`t5Xxl`/etc. to match the
  new `DiffusionFiles` shape.

- Pins `@qvac/embed-llamacpp`, `@qvac/llm-llamacpp`, and
  `@qvac/diffusion-cpp` to `file:../...` so the SDK consumes the
  in-monorepo refactored addons until the addon PRs land and publish
  versioned releases. These pins must be bumped back to semver ranges
  before merging.

- Adds a unit test for `expandGGUFIntoShards` covering non-sharded
  paths, sharded paths supplied at any index, nested directories,
  single-shard sharded models, relative paths, and Windows
  separators.

`@qvac/dl-filesystem` stays in `dependencies` because OCR, parakeet,
nmtcpp, and whispercpp plugins still rely on it.
donriddo added 27 commits April 14, 2026 15:32
…oval

The extensive JSDoc documenting every run() parameter (prompt, steps,
width, height, guidance, cfg_scale, sampling_method, scheduler, seed,
batch_count, vae_tiling, cache_preset, init_image, strength) was
accidentally removed during the BaseInference removal refactor when
run() was split into run() + _runInternal(). Restore it on the public
run() method since that is the caller-facing contract.
…oval

The JSDoc documenting run()'s prompt and runOptions parameters was
accidentally removed during the BaseInference removal refactor when
run() was split into run() + _runInternal(). Restore it on the public
run() method, and reference the full RunOptions type (which already
documents prefill / generationParams / cacheKey / saveCacheToDisk in
index.d.ts) so the docs stay authoritative in one place.
…erface-refactor

# Conflicts:
#	packages/qvac-lib-infer-llamacpp-llm/test/integration/afriquegemma-translation.test.js
…ckend-device-stat

# Conflicts:
#	packages/sdk/bun.lock
#	packages/sdk/client/api/index.ts
#	packages/sdk/index.ts
#	packages/sdk/package.json
…n-interface-refactor

# Conflicts:
#	packages/sdk/bun.lock
#	packages/sdk/client/api/index.ts
#	packages/sdk/index.ts
#	packages/sdk/package.json
…don-interface-refactor

# Conflicts:
#	packages/sdk/bun.lock
#	packages/sdk/package.json
…to SDK root

Address PR tetherto#1495 review (Opanin):
- Revert SDK version bump (0.9.0 → 0.8.3) — version is handled during release.
- Revert SDK CHANGELOG entry — handled during release.
- Drop duplicate `export type { EmbedStats }` from client/api/embed.ts; the
  sdk/index.ts re-export (sourced from ./schemas) is sufficient and matches
  the precedent of CompletionStats / DiffusionStats.
- Move `type EmbedStats` in sdk/index.ts from the `./client/api` block to
  the `./schemas` block so all stat types have one consistent home.
The afriquegemma-edge-cases.test.js file came in via the upstream/main
merge but still used the pre-refactor constructor shape:
  new LlmLlamacpp({ loader, modelName, diskPath, ... }, config)
with a FilesystemDL loader. All 7 tests in the file are now migrated to:
  new LlmLlamacpp({ files: { model: [path.join(dirPath, modelName)] },
                    config, logger, opts })
Removed FilesystemDL import and all loader.close() calls. Added
isMobile skip flag matching the pattern in afriquegemma-translation.

Caught by the qvac-staff-code-reviewer agent as a "merge brought in a
new consumer of the old API" — restore-the-class issue across the family.
…rn on unknown events, revert AddonCtor rename

Address review findings from the qvac-staff-code-reviewer agent:

- CHANGELOG: document the capability loss (in-memory streaming from network
  sources — URLs, Hyperdrive — is no longer supported). Matches the LLM
  addon's CHANGELOG and is what the PR description promised but the
  CHANGELOG was missing.
- index.js: extract pickPrimaryGgufPath(files) as a named function with
  JSDoc (mirrors packages/qvac-lib-infer-llamacpp-llm/index.js precedent).
  Export via module.exports.pickPrimaryGgufPath for unit testing.
- test/unit/pick-primary-gguf-path.test.js: 4 unit tests documenting the
  tensors.txt-first ordering contract and single-file fallback, matching
  the LLM addon's test shape.
- index.js: unknown addon events now log at warn (not debug). The inline
  comment already said "reaching this branch indicates a native-layer
  change worth surfacing" — debug level was hiding exactly what we want
  to surface.
- CHANGELOG: update the Bug Fixes entry to reflect the warn level.
- benchmarks: revert the AddonCtor → addonCtor casing rename. PascalCase
  for a constructor reference is the conventional JS style, and the
  rename was a while-I'm-here change unrelated to the refactor.
Address review findings from the qvac-staff-code-reviewer agent:

- CHANGELOG quoted a fabricated error message ("must be an absolute path
  to the main model weights") that the code does not throw. Replace with
  the actual messages emitted by assertAbsolute(): "must be an absolute
  path string" and "must be an absolute path (got: <value>)". Note that
  the same validation applies to the optional companion fields.
- index.js: remove `this._files.model || ''` fallbacks in _load(). The
  constructor's assertAbsolute('model', files.model) already guarantees
  a non-empty absolute string, so the fallbacks are unreachable and
  encode a phantom contract (empty model path) that can never hold.
…ttern)

Per the thread review, Yury pushed back on the earlier throw behavior:
  - A second load() is always accidental — SDK has no code path that
    calls load twice on the same instance.
  - If it's always accidental, throwing forces SDK to try/catch and
    swallow, which is wasted ceremony vs. silent no-op.
  - Aligns with the ReadyResource pattern QVAC already uses elsewhere.

Change: `load()` on an already-loaded instance now returns immediately
instead of throwing. CHANGELOG updated to document idempotency.
Callers that intentionally want to swap weights still must call
unload() first.
Second load() on an already-loaded instance returns immediately instead
of throwing. Matches the ReadyResource pattern used elsewhere in QVAC:
open/load is idempotent; explicit unload() is required to swap weights.

CHANGELOG updated.
Second load() on an already-loaded instance returns immediately instead
of throwing. Matches the ReadyResource pattern used elsewhere in QVAC:
open/load is idempotent; explicit unload() is required to swap weights.

CHANGELOG updated.
… shape

The SDK's embed() now returns `{ embedding, stats? }` instead of raw
vectors. Update the SDKModule interface to match, and destructure the
result inside sdkEmbed() so the CLI's internal `number[] | number[][]`
contract is preserved — callers of sdkEmbed (notably the OpenAI
embeddings route) stay unchanged.
…erface-refactor

# Conflicts:
#	packages/lib-infer-diffusion/CHANGELOG.md
#	packages/lib-infer-diffusion/README.md
#	packages/lib-infer-diffusion/index.d.ts
#	packages/lib-infer-diffusion/index.js
#	packages/lib-infer-diffusion/package.json
#	packages/sdk/bun.lock
#	packages/sdk/package.json
…erface-refactor

# Conflicts:
#	packages/lib-infer-diffusion/CHANGELOG.md
#	packages/lib-infer-diffusion/addon.js
#	packages/lib-infer-diffusion/index.d.ts
#	packages/lib-infer-diffusion/index.js
#	packages/lib-infer-diffusion/package.json
#	packages/sdk/bun.lock
#	packages/sdk/package.json
The three refactored addons are now published:
@qvac/embed-llamacpp@0.14.0, @qvac/llm-llamacpp@0.16.0,
@qvac/diffusion-cpp@0.3.0. Move SDK dependency ranges to
the same caret style used by every other @qvac/* dep so patch
uptake is automatic and `bun install --frozen-lockfile`
resolves through the npm registry.

bun.lock regenerated; no file:../ links remain.
@donriddo

Copy link
Copy Markdown
Contributor Author

Superseded by #1688.

This branch carried intermediate merge commits from the three addon-refactor side-branches (#1493, #1494, #1496). After each refactor merged to main via its own PR, those commits became redundant and the branch went into permanent conflict with main — a rebase would have replayed 65 commits, each conflicting.

#1688 re-raises the SDK migration on a clean branch cut directly from current upstream/main, with the same 7-file net diff (the three plugin.ts migrations, expand-gguf-shards.ts helper + tests, and dep bumps to the published addon versions).

@github-actions

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ❌ PENDING

**Requirements:**
- 1 Team Member approval ❌ (0/1)
- 1 Team Lead OR Management approval ❌ (0/1)



---
*This comment is automatically updated when reviews change.*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant