Skip to content

chore[bc]: embed addon interface refactor - remove BaseInference and WeightsProvider#1493

Merged
gianni-cor merged 41 commits into
tetherto:mainfrom
donriddo:chore/embed-addon-interface-refactor
Apr 17, 2026
Merged

chore[bc]: embed addon interface refactor - remove BaseInference and WeightsProvider#1493
gianni-cor merged 41 commits into
tetherto:mainfrom
donriddo:chore/embed-addon-interface-refactor

Conversation

@donriddo

@donriddo donriddo commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • Embed addon extends BaseInference and uses the WeightsProvider download layer, so the constructor always needs a Loader even when model files are already on disk.
  • Native C++ event vocabulary is normalized inline in index.js, spreading bridge logic across the class.

📝 How does it solve it?

  • Remove BaseInference inheritance; compose createJobHandler() + exclusiveRunQueue() from @qvac/infer-base@^0.4.0 directly.
  • Constructor now takes { files: { model: string[] }, config?, logger?, opts? } with pre-resolved absolute paths. For sharded models the caller passes the full ordered list (<base>.tensors.txt first, then each shard).
  • Shard streaming via bare-fs.createReadStream directly (no WeightsProvider).
  • Extract native event normalization into addon.js (mapAddonEvent) so the binding wrapper owns the C++ event vocabulary; the JS class only sees logical events (Output, Error, JobEnded).
  • Constructor validation: throws TypeError('files.model must be a non-empty array of absolute paths') for missing/empty input.
  • run()-before-load() guard: throws Error('Addon not initialized. Call load() first.') instead of dereferencing null.
  • load() on an already-loaded instance is now a silent no-op (ReadyResource pattern). Call unload() first if you intend to swap weights, then load() again.
  • load() is serialized through the same exclusiveRunQueue as run() and unload(), so two concurrent load() calls on the same instance no longer race past the configLoaded guard and leak a native handle.
  • Constructor now validates each files.model entry with path.isAbsolute(). Relative paths are rejected at construction time instead of bubbling up from bare-fs or the native load.
  • Crash-safe shard streaming: partial load failure best-effort-unloads and nulls this.addon.
  • unload() clears this.addon and state.configLoaded so post-unload calls hit the guards instead of dereferencing a disposed native handle.
  • Unknown addon events are now logged at warn level (these indicate a native-layer change worth surfacing), not forwarded to the response output stream.
  • Dependency changes: @qvac/infer-base ^0.2.2^0.4.0, bare-fs moved to runtime dependencies, @qvac/dl-filesystem and @qvac/dl-hyperdrive removed.
  • README and docs updated; addon.js JSDoc trimmed of internal task-doc references.

🧪 How was it tested?

  • npm run lint (standard) and npm run test:dts clean.
  • New unit test test/unit/map-addon-event.test.js — 9 tests covering stats detection, backendDevice mapping (0 → cpu, 1 → gpu), Error/Embeddings event names, stats precedence over event name, and null-returns for unknown shapes.
  • Integration tests and examples migrated to the new constructor shape; all pass locally.

💥 Breaking Changes

BEFORE (≤ 0.13.x):

const FilesystemDL = require('@qvac/dl-filesystem')
const loader = new FilesystemDL({ dirPath: '/models' })
const model = new GGMLBert({
  loader,
  modelName: 'bge-small-en-v1.5-q4_0.gguf',
  diskPath: '/models',
  logger: console,
  opts: { stats: true }
}, { device: 'gpu', batch_size: '512' })

AFTER (0.14.0):

const model = new GGMLBert({
  files: { model: ['/models/bge-small-en-v1.5-q4_0.gguf'] },
  config: { device: 'gpu', batch_size: '512' },
  logger: console,
  opts: { stats: true }
})

Additional behavior changes:

  • In-memory streaming from network sources is no longer supported — the SDK does not use it today (stores models to disk first); can be re-added via a future loader abstraction. Previously possible through the Loader abstraction.
  • Second load() is a silent no-op (ReadyResource pattern). Call unload() first if you intend to swap weights.
  • downloadWeights, the Loader interface, and loader-based progress callbacks are gone.

🔌 API Changes

// Constructor: single object with files.model (string array of absolute paths)
// Removed: downloadWeights(), Loader interface, WeightsProvider, BaseInference inheritance
// Removed deps: @qvac/dl-filesystem (devDep), @qvac/dl-hyperdrive (peerDep)
// Kept: load(), run(), cancel(), unload()
// Added: getState()

donriddo added 2 commits April 9, 2026 06:17
…embed addon

Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider

Constructor now takes { files: { model: string[] }, config, logger, opts }
instead of { loader, diskPath, modelName }.
Remove FilesystemDL dependency, use files: { model: [...] } pattern.
The C++ embed addon converts config map to llama.cpp CLI flags via
--key value format. Boolean flags like --no-mmap must be passed with
an empty string value so the C++ side emits just --no-mmap (no value).
String true was being passed as --no-mmap true which llama.cpp rejects.
@donriddo donriddo marked this pull request as ready for review April 10, 2026 09:50
@donriddo donriddo requested review from a team as code owners April 10, 2026 09:50
donriddo added a commit to donriddo/qvac that referenced this pull request Apr 10, 2026
The embed, LLM, and diffusion addons (PRs tetherto#1493, tetherto#1494, tetherto#1496) drop
BaseInference + WeightsProvider in favour of composable utilities and
expose a single-arg constructor of the form
`new Addon({ files, config, logger, opts })` where `files.model` is
either a string (diffusion) or an array of absolute paths to all GGUF
shards (embed/LLM). Per the addon-loader-abstraction proposal, shard
expansion now belongs to the SDK plugin layer rather than the addon.

This commit:

- Adds `expandGGUFIntoShards()` in `server/utils`, mirroring the C++
  `GGUFShards::expandGGUFIntoShards` regex contract. It is pure string
  manipulation so it works under both bun (unit tests) and bare
  (production), and handles POSIX and Windows separators.

- Rewrites `llamacpp-embedding`, `llamacpp-completion`, and
  `sdcpp-generation` plugins to call the new constructor. FilesystemDL
  and the loader-adapter shim are dropped from these three plugins —
  they no longer create or pass a loader, returning
  `{ model, loader: undefined }` (the SD plugin already followed this
  pattern). The diffusion artifact keys are renamed
  `clipLModel`/`t5XxlModel`/etc. → `clipL`/`t5Xxl`/etc. to match the
  new `DiffusionFiles` shape.

- Pins `@qvac/embed-llamacpp`, `@qvac/llm-llamacpp`, and
  `@qvac/diffusion-cpp` to `file:../...` so the SDK consumes the
  in-monorepo refactored addons until the addon PRs land and publish
  versioned releases. These pins must be bumped back to semver ranges
  before merging.

- Adds a unit test for `expandGGUFIntoShards` covering non-sharded
  paths, sharded paths supplied at any index, nested directories,
  single-shard sharded models, relative paths, and Windows
  separators.

`@qvac/dl-filesystem` stays in `dependencies` because OCR, parakeet,
nmtcpp, and whispercpp plugins still rely on it.

@maxim-smotrov maxim-smotrov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the mermaid diagrams syntax.

Comment thread packages/qvac-lib-infer-llamacpp-embed/docs/architecture.md Outdated
Comment thread packages/qvac-lib-infer-llamacpp-embed/docs/architecture.md Outdated
Comment thread packages/qvac-lib-infer-llamacpp-embed/docs/data-flows-detailed.md Outdated
donriddo added 12 commits April 16, 2026 13:40
…ailed.md

architecture.md:
- Line 108: `BARE[Bare Runtime<br/>(bare-fs)]` — unquoted parens in a
  `graph` node label triggered mermaid's circle-shape interpretation.
  Wrap the label in double quotes.
- Line 157: classDiagram constructor signature used an inline
  destructured-object literal (`{ files, config, logger, opts }`),
  and { } are class-body delimiters. Replace with the canonical named
  type `GGMLBertArgs` from index.d.ts.

data-flows-detailed.md:
- Line 340: flowchart node label `ArrayPath[type: 'sequences'<br/>input: string[]]`
  — the nested `[]` on `string[]` collided with the outer node-label
  brackets. Wrap the label in double quotes; do the same for the
  sibling StringPath node.
- Line 354: edge label `-->|vector<string>|` — unquoted angle brackets
  rendered as HTML. Quote the label and escape the angle brackets.

Reported by maxim-smotrov.
The run-lint-and-unit-tests action runs `npm run lint` and
`npm run test:unit`. The step name "Run JavaScript tests" hides
the lint half. Rename to "Run lint and unit tests" and update the
step id accordingly.
…d API

architecture.md:
- Bump package version header to v0.14.0.
- Bump `@qvac/infer-base` min version in the dependency table to ^0.4.0
  to match package.json.
- Update the Exclusive Run Queue decision to reflect that `load()` is
  now serialized alongside `run()` and `unload()`.

data-flows-detailed.md:
- Drop the `jobId` argument from the `jsOutputCallback` and `outputCb`
  calls in the two inference-flow sequence diagrams. The refactor dropped
  `jobId` from the `BertInterface` outputCb signature; the diagrams were
  still showing the pre-refactor shape.
Restore three pieces of documentation that were deleted when the
BaseInference removal was applied but whose content is still accurate
against the refactored code:

- Class-level JSDoc on `GGMLBert` describing what the class does.
- Inline comment in `_runInternal` explaining array-vs-string input
  routing (`type: 'sequences'` vs `type: 'text'`).
- Inline comment noting the addon-cpp accept contract (no events fire
  until `runJob` returns true).
- Short one-line JSDoc on `cancel()` and `unload()`.
Declaration-file JSDoc surfaces in IDE hover tooltips, so multi-paragraph
prose is noise. Trim `pickPrimaryGgufPath` to a one-liner covering the
only behavior the type hover needs to convey.

`loadWeights` in addon.js had a paragraph explaining JsBlobsStream.hpp
and field-name load-bearing-ness. The pre-refactor JSDoc was a plain
list of params; the field-name rationale belongs in commit history /
architecture doc, not hovering on every caller. Trim to the original
style with updated field names (`chunk` not `contents`).

architecture.md and data-flows-detailed.md: bump Last Updated from
2026-04-07 to 2026-04-16 since both were updated on this branch.
Tighten comments this PR introduced that drifted into over-explanation.
Leave pre-existing comments as-is.

- addon.js mapAddonEvent JSDoc: drop the multi-paragraph prose; keep the
  one-sentence contract plus the param block.
- addon.js stats-detection inline comment: reduce to one line.
- index.js pickPrimaryGgufPath JSDoc: replace multi-paragraph prose with
  a single-line summary citing the C++ contract.
- index.js class header: one-line purpose statement.
- index.js constructor: collapse the cancel-closure rationale to one line.
- index.js _addonOutputCallback: drop the narration comment pointing at
  addon.js — the very next line imports and calls mapAddonEvent, so the
  code already makes the reader know where event mapping lives.
- index.js unknown-event handler: reduce to two lines.
The refactor commit silently dropped three info logs from _load()
('Creating addon with configuration', 'Activating addon') and _createAddon()
('Creating Bert interface with configuration'), plus the JSDoc block on
_createAddon(). Put them back so the refactor only changes what needs to
change.
BertInterface.loadWeights destructure signature {filename, chunk,
completed} collided with the classDiagram class-body braces. Replace
with the scalar 'data' param so mermaid parses the block.
The 'test' alias was redundant with 'test:all' (which already runs
test:unit, test:integration, and test:cpp directly) and had no
workflow or README caller. Drop it.
_createAddon() JSDoc said 'configurationParams.settings' but the field
is actually 'config' (same pre-existing typo on main; restoring it
carried the typo forward). Two spots in docs/architecture.md still
said the exclusive run queue serialises only run()/unload() even
though load() now also wraps in this._run(...); align both.
The usage example set ctx_size: '512' but the config table below
listed every other key and omitted it, leaving readers without a
description of what the value controls or its default. Add a row
for ctx_size so the example keys are all explained.
jesusmb1995
jesusmb1995 previously approved these changes Apr 17, 2026
maxim-smotrov
maxim-smotrov previously approved these changes Apr 17, 2026
Inner [] inside the quoted flowchart node label ('input: string[]')
tripped the mermaid parser on GitHub's renderer. Swap for plain English
so the diagram renders.
yuranich
yuranich previously approved these changes Apr 17, 2026
Comment thread packages/qvac-lib-infer-llamacpp-embed/index.js Outdated
_createAddon() was outside the try so a synchronous throw in
require('./binding') or binding.createInstance() would leave
this.addon set to a partial native handle and never reach the
cleanup path. Route addon construction through the same try the
shard-streaming and activate() calls use.
data-flows-detailed.md:80 advertised the regex as
'^(.+)-(\d+)-of-(\d+)\.gguf$' but index.js:19 uses the simpler
non-capturing form '/-\d+-of-\d+\.gguf$/'. Update the doc to match
the code so readers who copy the regex get the real contract.
@donriddo

Copy link
Copy Markdown
Contributor Author

/review

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants