chore[bc]: embed addon interface refactor - remove BaseInference and WeightsProvider#1493
Merged
gianni-cor merged 41 commits intoApr 17, 2026
Merged
Conversation
…embed addon
Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider
Constructor now takes { files: { model: string[] }, config, logger, opts }
instead of { loader, diskPath, modelName }.
Remove FilesystemDL dependency, use files: { model: [...] } pattern.
The C++ embed addon converts config map to llama.cpp CLI flags via --key value format. Boolean flags like --no-mmap must be passed with an empty string value so the C++ side emits just --no-mmap (no value). String true was being passed as --no-mmap true which llama.cpp rejects.
donriddo
added a commit
to donriddo/qvac
that referenced
this pull request
Apr 10, 2026
The embed, LLM, and diffusion addons (PRs tetherto#1493, tetherto#1494, tetherto#1496) drop BaseInference + WeightsProvider in favour of composable utilities and expose a single-arg constructor of the form `new Addon({ files, config, logger, opts })` where `files.model` is either a string (diffusion) or an array of absolute paths to all GGUF shards (embed/LLM). Per the addon-loader-abstraction proposal, shard expansion now belongs to the SDK plugin layer rather than the addon. This commit: - Adds `expandGGUFIntoShards()` in `server/utils`, mirroring the C++ `GGUFShards::expandGGUFIntoShards` regex contract. It is pure string manipulation so it works under both bun (unit tests) and bare (production), and handles POSIX and Windows separators. - Rewrites `llamacpp-embedding`, `llamacpp-completion`, and `sdcpp-generation` plugins to call the new constructor. FilesystemDL and the loader-adapter shim are dropped from these three plugins — they no longer create or pass a loader, returning `{ model, loader: undefined }` (the SD plugin already followed this pattern). The diffusion artifact keys are renamed `clipLModel`/`t5XxlModel`/etc. → `clipL`/`t5Xxl`/etc. to match the new `DiffusionFiles` shape. - Pins `@qvac/embed-llamacpp`, `@qvac/llm-llamacpp`, and `@qvac/diffusion-cpp` to `file:../...` so the SDK consumes the in-monorepo refactored addons until the addon PRs land and publish versioned releases. These pins must be bumped back to semver ranges before merging. - Adds a unit test for `expandGGUFIntoShards` covering non-sharded paths, sharded paths supplied at any index, nested directories, single-shard sharded models, relative paths, and Windows separators. `@qvac/dl-filesystem` stays in `dependencies` because OCR, parakeet, nmtcpp, and whispercpp plugins still rely on it.
maxim-smotrov
requested changes
Apr 16, 2026
maxim-smotrov
left a comment
Contributor
There was a problem hiding this comment.
Please fix the mermaid diagrams syntax.
…ailed.md
architecture.md:
- Line 108: `BARE[Bare Runtime<br/>(bare-fs)]` — unquoted parens in a
`graph` node label triggered mermaid's circle-shape interpretation.
Wrap the label in double quotes.
- Line 157: classDiagram constructor signature used an inline
destructured-object literal (`{ files, config, logger, opts }`),
and { } are class-body delimiters. Replace with the canonical named
type `GGMLBertArgs` from index.d.ts.
data-flows-detailed.md:
- Line 340: flowchart node label `ArrayPath[type: 'sequences'<br/>input: string[]]`
— the nested `[]` on `string[]` collided with the outer node-label
brackets. Wrap the label in double quotes; do the same for the
sibling StringPath node.
- Line 354: edge label `-->|vector<string>|` — unquoted angle brackets
rendered as HTML. Quote the label and escape the angle brackets.
Reported by maxim-smotrov.
…nterface-refactor
The run-lint-and-unit-tests action runs `npm run lint` and `npm run test:unit`. The step name "Run JavaScript tests" hides the lint half. Rename to "Run lint and unit tests" and update the step id accordingly.
…d API architecture.md: - Bump package version header to v0.14.0. - Bump `@qvac/infer-base` min version in the dependency table to ^0.4.0 to match package.json. - Update the Exclusive Run Queue decision to reflect that `load()` is now serialized alongside `run()` and `unload()`. data-flows-detailed.md: - Drop the `jobId` argument from the `jsOutputCallback` and `outputCb` calls in the two inference-flow sequence diagrams. The refactor dropped `jobId` from the `BertInterface` outputCb signature; the diagrams were still showing the pre-refactor shape.
Restore three pieces of documentation that were deleted when the BaseInference removal was applied but whose content is still accurate against the refactored code: - Class-level JSDoc on `GGMLBert` describing what the class does. - Inline comment in `_runInternal` explaining array-vs-string input routing (`type: 'sequences'` vs `type: 'text'`). - Inline comment noting the addon-cpp accept contract (no events fire until `runJob` returns true). - Short one-line JSDoc on `cancel()` and `unload()`.
Declaration-file JSDoc surfaces in IDE hover tooltips, so multi-paragraph prose is noise. Trim `pickPrimaryGgufPath` to a one-liner covering the only behavior the type hover needs to convey. `loadWeights` in addon.js had a paragraph explaining JsBlobsStream.hpp and field-name load-bearing-ness. The pre-refactor JSDoc was a plain list of params; the field-name rationale belongs in commit history / architecture doc, not hovering on every caller. Trim to the original style with updated field names (`chunk` not `contents`). architecture.md and data-flows-detailed.md: bump Last Updated from 2026-04-07 to 2026-04-16 since both were updated on this branch.
Tighten comments this PR introduced that drifted into over-explanation. Leave pre-existing comments as-is. - addon.js mapAddonEvent JSDoc: drop the multi-paragraph prose; keep the one-sentence contract plus the param block. - addon.js stats-detection inline comment: reduce to one line. - index.js pickPrimaryGgufPath JSDoc: replace multi-paragraph prose with a single-line summary citing the C++ contract. - index.js class header: one-line purpose statement. - index.js constructor: collapse the cancel-closure rationale to one line. - index.js _addonOutputCallback: drop the narration comment pointing at addon.js — the very next line imports and calls mapAddonEvent, so the code already makes the reader know where event mapping lives. - index.js unknown-event handler: reduce to two lines.
The refactor commit silently dropped three info logs from _load()
('Creating addon with configuration', 'Activating addon') and _createAddon()
('Creating Bert interface with configuration'), plus the JSDoc block on
_createAddon(). Put them back so the refactor only changes what needs to
change.
BertInterface.loadWeights destructure signature {filename, chunk,
completed} collided with the classDiagram class-body braces. Replace
with the scalar 'data' param so mermaid parses the block.
The 'test' alias was redundant with 'test:all' (which already runs test:unit, test:integration, and test:cpp directly) and had no workflow or README caller. Drop it.
_createAddon() JSDoc said 'configurationParams.settings' but the field is actually 'config' (same pre-existing typo on main; restoring it carried the typo forward). Two spots in docs/architecture.md still said the exclusive run queue serialises only run()/unload() even though load() now also wraps in this._run(...); align both.
The usage example set ctx_size: '512' but the config table below listed every other key and omitted it, leaving readers without a description of what the value controls or its default. Add a row for ctx_size so the example keys are all explained.
jesusmb1995
previously approved these changes
Apr 17, 2026
maxim-smotrov
previously approved these changes
Apr 17, 2026
Inner [] inside the quoted flowchart node label ('input: string[]')
tripped the mermaid parser on GitHub's renderer. Swap for plain English
so the diagram renders.
yuranich
previously approved these changes
Apr 17, 2026
gianni-cor
reviewed
Apr 17, 2026
_createAddon() was outside the try so a synchronous throw in
require('./binding') or binding.createInstance() would leave
this.addon set to a partial native handle and never reach the
cleanup path. Route addon construction through the same try the
shard-streaming and activate() calls use.
data-flows-detailed.md:80 advertised the regex as '^(.+)-(\d+)-of-(\d+)\.gguf$' but index.js:19 uses the simpler non-capturing form '/-\d+-of-\d+\.gguf$/'. Update the doc to match the code so readers who copy the regex get the real contract.
Contributor
Author
|
/review |
maxim-smotrov
approved these changes
Apr 17, 2026
dev-nid
approved these changes
Apr 17, 2026
jesusmb1995
approved these changes
Apr 17, 2026
Contributor
|
/review |
gianni-cor
approved these changes
Apr 17, 2026
Contributor
|
/review |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
BaseInferenceand uses theWeightsProviderdownload layer, so the constructor always needs aLoadereven when model files are already on disk.index.js, spreading bridge logic across the class.📝 How does it solve it?
BaseInferenceinheritance; composecreateJobHandler()+exclusiveRunQueue()from@qvac/infer-base@^0.4.0directly.{ files: { model: string[] }, config?, logger?, opts? }with pre-resolved absolute paths. For sharded models the caller passes the full ordered list (<base>.tensors.txtfirst, then each shard).bare-fs.createReadStreamdirectly (noWeightsProvider).addon.js(mapAddonEvent) so the binding wrapper owns the C++ event vocabulary; the JS class only sees logical events (Output,Error,JobEnded).TypeError('files.model must be a non-empty array of absolute paths')for missing/empty input.run()-before-load()guard: throwsError('Addon not initialized. Call load() first.')instead of dereferencingnull.load()on an already-loaded instance is now a silent no-op (ReadyResource pattern). Callunload()first if you intend to swap weights, thenload()again.load()is serialized through the sameexclusiveRunQueueasrun()andunload(), so two concurrentload()calls on the same instance no longer race past theconfigLoadedguard and leak a native handle.files.modelentry withpath.isAbsolute(). Relative paths are rejected at construction time instead of bubbling up frombare-fsor the native load.this.addon.unload()clearsthis.addonandstate.configLoadedso post-unload calls hit the guards instead of dereferencing a disposed native handle.@qvac/infer-base^0.2.2→^0.4.0,bare-fsmoved to runtime dependencies,@qvac/dl-filesystemand@qvac/dl-hyperdriveremoved.addon.jsJSDoc trimmed of internal task-doc references.🧪 How was it tested?
npm run lint(standard) andnpm run test:dtsclean.test/unit/map-addon-event.test.js— 9 tests covering stats detection,backendDevicemapping (0 → cpu, 1 → gpu), Error/Embeddings event names, stats precedence over event name, and null-returns for unknown shapes.💥 Breaking Changes
BEFORE (≤ 0.13.x):
AFTER (0.14.0):
Additional behavior changes:
Loaderabstraction.load()is a silent no-op (ReadyResource pattern). Callunload()first if you intend to swap weights.downloadWeights, theLoaderinterface, and loader-based progress callbacks are gone.🔌 API Changes