chore[bc|notask]: migrate SDK plugins to new addon constructor shape#1510
Closed
donriddo wants to merge 106 commits into
Closed
chore[bc|notask]: migrate SDK plugins to new addon constructor shape#1510donriddo wants to merge 106 commits into
donriddo wants to merge 106 commits into
Conversation
…embed addon
Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider
Constructor now takes { files: { model: string[] }, config, logger, opts }
instead of { loader, diskPath, modelName }.
…LLM addon
Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider
Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger, opts }
instead of { loader, diskPath, modelName, projectionModel } + config.
All finetune, media, and filtered logger functionality preserved.
…t callback FinetuneProgress must call updateStats(data.stats), not updateOutput(data). Finetune terminal JobEnded must call ended(data) as result, not updateStats.
Remove FilesystemDL dependency, use files: { model: [...] } pattern.
…r shape
Update 13 examples and sharded model test to use files: { model: [...] } pattern.
Remove FilesystemDL dependency from all examples and tests.
The network loader test used the old loader-based constructor. Rewritten to download shards via HttpDL to disk, then pass absolute paths.
Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
Constructor now takes { files: { model, clipL?, clipG?, t5Xxl?, llm?, vae? }, config, logger, opts }
instead of { diskPath, modelName, clipLModel, ... } + config.
All examples and tests updated to new constructor shape.
Add optional backendDevice (cpu | gpu) to completion and embed stats. Maps through from addon RuntimeStats to SDK response schemas. Depends on addon PR tetherto#1393 which adds the stat at the C++ layer.
The C++ embed addon converts config map to llama.cpp CLI flags via --key value format. Boolean flags like --no-mmap must be passed with an empty string value so the C++ side emits just --no-mmap (no value). String true was being passed as --no-mmap true which llama.cpp rejects.
Adds embedWithStats() to the SDK client API so consumers can read the embed addon runtime stats (totalTime, tokensPerSecond, totalTokens, backendDevice) without giving up the simple embed() shape that returns just the vectors. Re-exports EmbedStats for typed consumption. Adds unit tests covering backendDevice on both embedStatsSchema and completionStatsSchema, including round-tripping through the response schemas, to lock in the cpu/gpu enum and prevent regressions.
The embed, LLM, and diffusion addons (PRs tetherto#1493, tetherto#1494, tetherto#1496) drop BaseInference + WeightsProvider in favour of composable utilities and expose a single-arg constructor of the form `new Addon({ files, config, logger, opts })` where `files.model` is either a string (diffusion) or an array of absolute paths to all GGUF shards (embed/LLM). Per the addon-loader-abstraction proposal, shard expansion now belongs to the SDK plugin layer rather than the addon. This commit: - Adds `expandGGUFIntoShards()` in `server/utils`, mirroring the C++ `GGUFShards::expandGGUFIntoShards` regex contract. It is pure string manipulation so it works under both bun (unit tests) and bare (production), and handles POSIX and Windows separators. - Rewrites `llamacpp-embedding`, `llamacpp-completion`, and `sdcpp-generation` plugins to call the new constructor. FilesystemDL and the loader-adapter shim are dropped from these three plugins — they no longer create or pass a loader, returning `{ model, loader: undefined }` (the SD plugin already followed this pattern). The diffusion artifact keys are renamed `clipLModel`/`t5XxlModel`/etc. → `clipL`/`t5Xxl`/etc. to match the new `DiffusionFiles` shape. - Pins `@qvac/embed-llamacpp`, `@qvac/llm-llamacpp`, and `@qvac/diffusion-cpp` to `file:../...` so the SDK consumes the in-monorepo refactored addons until the addon PRs land and publish versioned releases. These pins must be bumped back to semver ranges before merging. - Adds a unit test for `expandGGUFIntoShards` covering non-sharded paths, sharded paths supplied at any index, nested directories, single-shard sharded models, relative paths, and Windows separators. `@qvac/dl-filesystem` stays in `dependencies` because OCR, parakeet, nmtcpp, and whispercpp plugins still rely on it.
…oval The extensive JSDoc documenting every run() parameter (prompt, steps, width, height, guidance, cfg_scale, sampling_method, scheduler, seed, batch_count, vae_tiling, cache_preset, init_image, strength) was accidentally removed during the BaseInference removal refactor when run() was split into run() + _runInternal(). Restore it on the public run() method since that is the caller-facing contract.
…oval The JSDoc documenting run()'s prompt and runOptions parameters was accidentally removed during the BaseInference removal refactor when run() was split into run() + _runInternal(). Restore it on the public run() method, and reference the full RunOptions type (which already documents prefill / generationParams / cacheKey / saveCacheToDisk in index.d.ts) so the docs stay authoritative in one place.
…-interface-refactor
…n-interface-refactor
…erface-refactor # Conflicts: # packages/qvac-lib-infer-llamacpp-llm/test/integration/afriquegemma-translation.test.js
…ckend-device-stat # Conflicts: # packages/sdk/bun.lock # packages/sdk/client/api/index.ts # packages/sdk/index.ts # packages/sdk/package.json
…n-interface-refactor # Conflicts: # packages/sdk/bun.lock # packages/sdk/client/api/index.ts # packages/sdk/index.ts # packages/sdk/package.json
…don-interface-refactor # Conflicts: # packages/sdk/bun.lock # packages/sdk/package.json
…to SDK root Address PR tetherto#1495 review (Opanin): - Revert SDK version bump (0.9.0 → 0.8.3) — version is handled during release. - Revert SDK CHANGELOG entry — handled during release. - Drop duplicate `export type { EmbedStats }` from client/api/embed.ts; the sdk/index.ts re-export (sourced from ./schemas) is sufficient and matches the precedent of CompletionStats / DiffusionStats. - Move `type EmbedStats` in sdk/index.ts from the `./client/api` block to the `./schemas` block so all stat types have one consistent home.
…don-interface-refactor
The afriquegemma-edge-cases.test.js file came in via the upstream/main
merge but still used the pre-refactor constructor shape:
new LlmLlamacpp({ loader, modelName, diskPath, ... }, config)
with a FilesystemDL loader. All 7 tests in the file are now migrated to:
new LlmLlamacpp({ files: { model: [path.join(dirPath, modelName)] },
config, logger, opts })
Removed FilesystemDL import and all loader.close() calls. Added
isMobile skip flag matching the pattern in afriquegemma-translation.
Caught by the qvac-staff-code-reviewer agent as a "merge brought in a
new consumer of the old API" — restore-the-class issue across the family.
…n-interface-refactor
…rn on unknown events, revert AddonCtor rename Address review findings from the qvac-staff-code-reviewer agent: - CHANGELOG: document the capability loss (in-memory streaming from network sources — URLs, Hyperdrive — is no longer supported). Matches the LLM addon's CHANGELOG and is what the PR description promised but the CHANGELOG was missing. - index.js: extract pickPrimaryGgufPath(files) as a named function with JSDoc (mirrors packages/qvac-lib-infer-llamacpp-llm/index.js precedent). Export via module.exports.pickPrimaryGgufPath for unit testing. - test/unit/pick-primary-gguf-path.test.js: 4 unit tests documenting the tensors.txt-first ordering contract and single-file fallback, matching the LLM addon's test shape. - index.js: unknown addon events now log at warn (not debug). The inline comment already said "reaching this branch indicates a native-layer change worth surfacing" — debug level was hiding exactly what we want to surface. - CHANGELOG: update the Bug Fixes entry to reflect the warn level. - benchmarks: revert the AddonCtor → addonCtor casing rename. PascalCase for a constructor reference is the conventional JS style, and the rename was a while-I'm-here change unrelated to the refactor.
Address review findings from the qvac-staff-code-reviewer agent:
- CHANGELOG quoted a fabricated error message ("must be an absolute path
to the main model weights") that the code does not throw. Replace with
the actual messages emitted by assertAbsolute(): "must be an absolute
path string" and "must be an absolute path (got: <value>)". Note that
the same validation applies to the optional companion fields.
- index.js: remove `this._files.model || ''` fallbacks in _load(). The
constructor's assertAbsolute('model', files.model) already guarantees
a non-empty absolute string, so the fallbacks are unreachable and
encode a phantom contract (empty model path) that can never hold.
…don-interface-refactor
…-interface-refactor
…ttern)
Per the thread review, Yury pushed back on the earlier throw behavior:
- A second load() is always accidental — SDK has no code path that
calls load twice on the same instance.
- If it's always accidental, throwing forces SDK to try/catch and
swallow, which is wasted ceremony vs. silent no-op.
- Aligns with the ReadyResource pattern QVAC already uses elsewhere.
Change: `load()` on an already-loaded instance now returns immediately
instead of throwing. CHANGELOG updated to document idempotency.
Callers that intentionally want to swap weights still must call
unload() first.
Second load() on an already-loaded instance returns immediately instead of throwing. Matches the ReadyResource pattern used elsewhere in QVAC: open/load is idempotent; explicit unload() is required to swap weights. CHANGELOG updated.
Second load() on an already-loaded instance returns immediately instead of throwing. Matches the ReadyResource pattern used elsewhere in QVAC: open/load is idempotent; explicit unload() is required to swap weights. CHANGELOG updated.
…don-interface-refactor
…n-interface-refactor
…-interface-refactor
… shape
The SDK's embed() now returns `{ embedding, stats? }` instead of raw
vectors. Update the SDKModule interface to match, and destructure the
result inside sdkEmbed() so the CLI's internal `number[] | number[][]`
contract is preserved — callers of sdkEmbed (notably the OpenAI
embeddings route) stay unchanged.
…don-interface-refactor
…erface-refactor # Conflicts: # packages/lib-infer-diffusion/CHANGELOG.md # packages/lib-infer-diffusion/README.md # packages/lib-infer-diffusion/index.d.ts # packages/lib-infer-diffusion/index.js # packages/lib-infer-diffusion/package.json # packages/sdk/bun.lock # packages/sdk/package.json
…erface-refactor # Conflicts: # packages/lib-infer-diffusion/CHANGELOG.md # packages/lib-infer-diffusion/addon.js # packages/lib-infer-diffusion/index.d.ts # packages/lib-infer-diffusion/index.js # packages/lib-infer-diffusion/package.json # packages/sdk/bun.lock # packages/sdk/package.json
The three refactored addons are now published: @qvac/embed-llamacpp@0.14.0, @qvac/llm-llamacpp@0.16.0, @qvac/diffusion-cpp@0.3.0. Move SDK dependency ranges to the same caret style used by every other @qvac/* dep so patch uptake is automatic and `bun install --frozen-lockfile` resolves through the npm registry. bun.lock regenerated; no file:../ links remain.
Contributor
Author
|
Superseded by #1688. This branch carried intermediate merge commits from the three addon-refactor side-branches (#1493, #1494, #1496). After each refactor merged to main via its own PR, those commits became redundant and the branch went into permanent conflict with #1688 re-raises the SDK migration on a clean branch cut directly from current |
Contributor
Tier-based Approval Status |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
The addon interface refactors landed on main without the matching SDK plugin migration:
The SDK plugins (
llamacpp-completion,llamacpp-embedding,sdcpp-generation) were still instantiating the addons with the old two-argument shape (new Addon({ loader, diskPath, modelName, ... }, config)), which the refactored addons no longer accept.📝 How does it solve it?
{ files, config, logger, opts }argument,{ model, loader: undefined }return.packages/sdk/server/utils/expand-gguf-shards.tsturns a single sharded GGUF path into the ordered list the addon'sfiles.modelexpects (metadata.tensors.txtfirst, then each shard). Pure string manipulation, no filesystem access, cross-platform separator handling.FilesystemDL,parseModelPath, and theasLoaderadapter from the migrated plugins — addons now take absolute paths directly. (nmtcpp-translationandparakeet-transcriptionstill use the old pattern until their own refactors land; their imports are unaffected.)@qvac/embed-llamacpp@^0.14.0,@qvac/llm-llamacpp@^0.16.0,@qvac/diffusion-cpp@^0.3.0.🧪 How was it tested?
bun run build— lint + typecheck clean (--max-warnings=0).bun run test:unit— newexpand-gguf-shards.test.tspasses (8/8 tests, 28/28 asserts).bun install --frozen-lockfileresolves cleanly; nofile:../links remain inbun.lock.packages/cli/andpackages/sdk/tests-qvac/grepped for stale references to old addon shapes — none.🔌 API Changes
No public SDK API change. Plugin internals only.
💥 Breaking Changes
None for SDK consumers. Internal plugin contract with the native addons is the only thing that changed, and that contract was already broken by the addon-side refactors.