chore[bc]: LLM addon interface refactor - remove BaseInference and WeightsProvider#1494
Merged
gianni-cor merged 53 commits intoApr 17, 2026
Merged
Conversation
…LLM addon
Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider
Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger, opts }
instead of { loader, diskPath, modelName, projectionModel } + config.
All finetune, media, and filtered logger functionality preserved.
…t callback FinetuneProgress must call updateStats(data.stats), not updateOutput(data). Finetune terminal JobEnded must call ended(data) as result, not updateStats.
…r shape
Update 13 examples and sharded model test to use files: { model: [...] } pattern.
Remove FilesystemDL dependency from all examples and tests.
The network loader test used the old loader-based constructor. Rewritten to download shards via HttpDL to disk, then pass absolute paths.
donriddo
added a commit
to donriddo/qvac
that referenced
this pull request
Apr 10, 2026
The embed, LLM, and diffusion addons (PRs tetherto#1493, tetherto#1494, tetherto#1496) drop BaseInference + WeightsProvider in favour of composable utilities and expose a single-arg constructor of the form `new Addon({ files, config, logger, opts })` where `files.model` is either a string (diffusion) or an array of absolute paths to all GGUF shards (embed/LLM). Per the addon-loader-abstraction proposal, shard expansion now belongs to the SDK plugin layer rather than the addon. This commit: - Adds `expandGGUFIntoShards()` in `server/utils`, mirroring the C++ `GGUFShards::expandGGUFIntoShards` regex contract. It is pure string manipulation so it works under both bun (unit tests) and bare (production), and handles POSIX and Windows separators. - Rewrites `llamacpp-embedding`, `llamacpp-completion`, and `sdcpp-generation` plugins to call the new constructor. FilesystemDL and the loader-adapter shim are dropped from these three plugins — they no longer create or pass a loader, returning `{ model, loader: undefined }` (the SD plugin already followed this pattern). The diffusion artifact keys are renamed `clipLModel`/`t5XxlModel`/etc. → `clipL`/`t5Xxl`/etc. to match the new `DiffusionFiles` shape. - Pins `@qvac/embed-llamacpp`, `@qvac/llm-llamacpp`, and `@qvac/diffusion-cpp` to `file:../...` so the SDK consumes the in-monorepo refactored addons until the addon PRs land and publish versioned releases. These pins must be bumped back to semver ranges before merging. - Adds a unit test for `expandGGUFIntoShards` covering non-sharded paths, sharded paths supplied at any index, nested directories, single-shard sharded models, relative paths, and Windows separators. `@qvac/dl-filesystem` stays in `dependencies` because OCR, parakeet, nmtcpp, and whispercpp plugins still rely on it.
The comment said "Event-name normalization lives in addon.js (mapAddonEvent)", but the very next line imports and calls mapAddonEvent — the code already tells the reader where event mapping lives. Remove the line so the code speaks for itself.
The refactor commit unintentionally rephrased FinetuneOptions JSDoc lines that the refactor itself did not change. Revert those fields back to main's original wording so the diff only carries structural changes tied to the interface migration.
The refactor commit silently dropped the _load() progress logs ('Creating
addon with configuration', 'Activating addon'), the 'Error during model
load' error log, and the JSDoc block on _createAddon(). Put them back so
the refactor only changes what needs to change.
The 'test' alias was only consumed by 'test:all', and neither was referenced in CI workflows or the README. 'test:all' ran test:unit twice because it called both test:unit and the 'test' alias. Remove 'test' and rewrite 'test:all' to run test:unit, test:integration, and test:cpp directly.
0.15.x still used the old (args, config) constructor shape; the old example applies to any 0.15.x caller, not just 0.14.x. Align the CHANGELOG marker with the PR body.
The backmerge of upstream/main carried a stale 'skip: isMobile' from the pre-refactor translation test into the six new translation tests and the edge-cases migration. Main's a570189 deliberately dropped the mobile skip; restore that intent. The isMobile constant is unused after this and dropped.
_createAddon() JSDoc referenced 'configurationParams.settings' and
omitted 'projectionPath'. The actual shape built in _load() is
{ path, projectionPath, config }; align the JSDoc with that.
UserMediaMessage.content widened to Uint8Array | string earlier in
this PR but no integration test exercised the string-path branch.
Add one elephant-image test that passes the absolute path as
message content, exercising the loadMedia(string) path through the
JS-to-C++ handoff.
index.js requires('@qvac/logging') at runtime, so it belongs under
dependencies, not devDependencies. Previously it worked only because
another runtime dep pulled it in transitively — fragile for publish
and can break under stricter package managers.
Previous commit 979a070 reworded only my own addition (line 251) but the block still failed at the same position because the surrounding pre-existing message bodies still used ; as a statement separator. Mermaid sequenceDiagram parses ; as end-of-statement, so every message containing it broke the diagram. Replace ; with , or a separator word across all four affected lines (block tetherto#1 lines 251, 256, 266 and block tetherto#2 line 296) so the finetune and pause flow diagrams render on GitHub.
maxim-smotrov
previously approved these changes
Apr 17, 2026
yuranich
previously approved these changes
Apr 17, 2026
gianni-cor
reviewed
Apr 17, 2026
_createAddon() was outside the try so a synchronous throw in
require('./binding') or binding.createInstance() would leave
this.addon set to a partial native handle and never reach the
cleanup path. Route addon construction through the same try the
shard-streaming and activate() calls use.
Contributor
Author
|
/review |
Contributor
Tier-based Approval Status |
maxim-smotrov
approved these changes
Apr 17, 2026
dev-nid
approved these changes
Apr 17, 2026
jesusmb1995
approved these changes
Apr 17, 2026
gianni-cor
approved these changes
Apr 17, 2026
Contributor
|
/review |
gianni-cor
approved these changes
Apr 17, 2026
Contributor
|
/review |
JamieBohannaWebDev
approved these changes
Apr 17, 2026
elchiapp
approved these changes
Apr 17, 2026
Contributor
|
/review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
BaseInferenceand uses theWeightsProviderdownload layer, so the constructor always needs aLoadereven when model files are already on disk.SHARD_REGEXglobal, with no tests.index.js(including the stateful post-finetune TPS skip flag), spreading bridge logic across the class.📝 How does it solve it?
BaseInferenceinheritance; composecreateJobHandler()+exclusiveRunQueue()from@qvac/infer-base@^0.4.0directly.{ files: { model: string[], projectionModel?: string }, config, logger?, opts? }with pre-resolved absolute paths.bare-fs.createReadStreamdirectly (noWeightsProvider).addon.js(mapAddonEvent), including theskipNextRuntimeStatsstate used to drop the stale TPS trailer the C++ addon emits after a finetune terminal.pickPrimaryGgufPath(files)function exported for testing (with unit tests documenting the.tensors.txt-first ordering and single-file fallback).TypeError('files.model must be a non-empty array of absolute paths')for missing/empty input.run()-before-load()guard: throwsError('Addon not initialized. Call load() first.')instead of dereferencingnull.finetune()already had this guard.load()on an already-loaded instance is now a silent no-op (ReadyResource pattern). Callunload()first if you intend to swap weights, thenload()again.load()is serialized through the sameexclusiveRunQueueasrun(),finetune(), andunload(), so two concurrentload()calls on the same instance no longer race past theconfigLoadedguard and leak a native handle.files.modelentry withpath.isAbsolute(), plusfiles.projectionModelwhen supplied. Relative paths are rejected at construction time.this.addon.unload()clearsthis.addonandstate.configLoadedso post-unload calls hit the guards instead of dereferencing a disposed native handle.pause(),cancel(), and the job-handler cancel closure use optional chaining for the same reason.cacheKeyandsaveCacheToDiskfromrunOptionsinto the text message sent to the addon (from upstream/main's 0.15.0 KV cache API simplification).@qvac/infer-base^0.3.0→^0.4.0,@qvac/dl-baseand@qvac/dl-filesystemremoved fromdevDependencies.bare-fsremains a runtime dependency (already present).FinetuneValidationSplit.fractionandFinetuneValidationDataset.paththat were accidentally dropped during theBaseInferenceremoval merge.docs/architecture.mdanddocs/data-flows-detailed.md: four occurrences incorrectly stated the "last shard" was the primary path; actual code selects the first shard-regex match._isSuppressedNoResponseLogfilter tied to the oldBaseInference._jobToResponseMap; the new architecture cannot emit that message at all, so the wrapped-logger indirection is gone. The user-supplied logger is used directly.🧪 How was it tested?
npm run lint(standard) andnpm run test:dtsclean._hasActiveResponseclearing was aligned with the embed pattern (.finally()only; no redundant synchronous clear in_handleAddonOutputEvent).test/unit/map-addon-event.test.js(9 tests) andtest/unit/pick-primary-gguf-path.test.js(4 tests).💥 Breaking Changes
BEFORE (≤ 0.15.x):
AFTER (0.16.0):
Additional behavior changes:
Loaderabstraction.load()is a silent no-op (ReadyResource pattern). Callunload()first if you intend to swap weights.downloadWeights, theLoaderinterface, and loader-based progress callbacks are gone.destroy()is removed; callers useunload()instead.Version bump:
0.15.0→0.16.0. Upstream/main took0.15.0for the KV cache API simplification; this refactor lands on the next minor to avoid collision.🔌 API Changes