Skip to content
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
f014e66
chore[bc]: remove BaseInference inheritance and WeightsProvider from …
donriddo Apr 7, 2026
9dd5690
fix: update embed examples to use new constructor shape
donriddo Apr 9, 2026
a419b58
fix: update embed benchmark tooling to new constructor shape
donriddo Apr 9, 2026
2871e15
fix: pass typed config object to embed addon and restore addonCtor pa…
donriddo Apr 10, 2026
efaf21e
fix: pass no-mmap as empty string flag in embed benchmark config
donriddo Apr 10, 2026
5a32ab7
docs: update embed README and data-flows for new constructor pattern
donriddo Apr 10, 2026
8cf9e0e
docs: update embed architecture, data-flows, README sharded contract
donriddo Apr 10, 2026
975ee3d
fix: drop loader destructuring from embed multi-instance test
donriddo Apr 10, 2026
b12f22a
chore[bc]: address PR #1493 review findings and bump to 0.14.0
donriddo Apr 10, 2026
b213769
refactor: move embed C++ event normalization into addon.js
donriddo Apr 10, 2026
eeba652
docs: address PR #1493 second-round review findings
donriddo Apr 10, 2026
b6bbc60
Merge remote-tracking branch 'origin/main' into chore/embed-addon-int…
donriddo Apr 10, 2026
47b4034
fix: remove internal task-doc reference from mapAddonEvent JSDoc
donriddo Apr 12, 2026
69dae27
fix: accurate README busy-state, log rejected inferences, add mapAddo…
donriddo Apr 14, 2026
8d12649
fix: throw on second load() instead of silently unload+reload
donriddo Apr 14, 2026
96fc198
fix: extract pickPrimaryGgufPath, document network-streaming loss, wa…
donriddo Apr 14, 2026
4d3a303
fix: make load() a silent no-op when already loaded (ReadyResource pa…
donriddo Apr 15, 2026
aaf53c9
Merge remote-tracking branch 'upstream/main' into chore/embed-addon-i…
donriddo Apr 15, 2026
2e1a99e
doc: correct mapAddonEvent return-null contract
donriddo Apr 15, 2026
8374e67
Merge remote-tracking branch 'upstream/main' into chore/embed-addon-i…
donriddo Apr 15, 2026
5736849
doc: document missing breaking changes from BaseInference removal
donriddo Apr 15, 2026
6887423
fix: address lifecycle, validation, and CI-surface review findings
donriddo Apr 16, 2026
90a1af3
doc: add CHANGELOG entries for load() serialization and absolute-path…
donriddo Apr 16, 2026
34f5641
fix[ci]: run test:unit via run-lint-and-unit-tests action
donriddo Apr 16, 2026
c6c2ddc
doc: fix mermaid parsing errors in architecture.md and data-flows-det…
donriddo Apr 16, 2026
ed81d8d
Merge remote-tracking branch 'upstream/main' into chore/embed-addon-i…
donriddo Apr 16, 2026
7644449
chore[ci]: rename step to reflect what the action actually runs
donriddo Apr 16, 2026
7c7366d
doc: refresh architecture.md and data-flows-detailed.md for refactore…
donriddo Apr 16, 2026
bd13da5
doc: restore class-level JSDoc and inline comments the refactor dropped
donriddo Apr 16, 2026
8041a30
doc: tighten d.ts JSDoc, loadWeights JSDoc, doc dates
donriddo Apr 16, 2026
17e2416
doc: trim verbose comments added during the refactor
donriddo Apr 16, 2026
0923dae
doc: restore pre-refactor load/createAddon logs and JSDoc
donriddo Apr 16, 2026
af5daee
doc: fix classDiagram parse error in architecture.md
donriddo Apr 16, 2026
98bffc3
chore: drop unused 'test' script
donriddo Apr 16, 2026
690fe02
doc: fix _createAddon config field and queue serialization scope
donriddo Apr 16, 2026
07820ea
doc: document ctx_size in the embed config table
donriddo Apr 16, 2026
f6bbcfc
doc: fix flowchart mermaid parse error in data-flows
donriddo Apr 17, 2026
3e9886f
fix: move addon construction into crash-safe try block
donriddo Apr 17, 2026
4c3ac1f
doc: align shard regex in data-flows with index.js
donriddo Apr 17, 2026
261f770
Merge branch 'main' into chore/embed-addon-interface-refactor
donriddo Apr 17, 2026
2cb1da9
Merge branch 'main' into chore/embed-addon-interface-refactor
gianni-cor Apr 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .github/workflows/on-pr-qvac-lib-infer-llamacpp-embed.yml
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,15 @@ jobs:
working-directory: packages/qvac-lib-infer-llamacpp-embed
run: npm run test:dts

- name: Run lint and unit tests
id: run_lint_and_unit_tests
uses: tetherto/oss-actions/.github/actions/run-lint-and-unit-tests@4c64bed91fc8eba3a201adb1495e61b4c1a2246d
with:
gpr-token: ${{ secrets.GITHUB_TOKEN }}
pat-token: ${{ secrets.GITHUB_TOKEN }}
registry-type: gpr
workdir: packages/qvac-lib-infer-llamacpp-embed

prebuild:
permissions:
contents: write
Expand Down
3 changes: 3 additions & 0 deletions packages/qvac-lib-infer-llamacpp-embed/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ package-lock.json
.cache/
.npmrc

test/unit/all.js
test/integration/all.js

__pycache__/
.pytest_cache/
poetry.lock
Expand Down
140 changes: 140 additions & 0 deletions packages/qvac-lib-infer-llamacpp-embed/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,145 @@
# Changelog

## [0.14.0] - 2026-04-10
Comment thread
maxim-smotrov marked this conversation as resolved.

This release migrates the embed addon off `BaseInference` inheritance and the `WeightsProvider` download layer onto the composable `createJobHandler` + `exclusiveRunQueue` utilities from `@qvac/infer-base@^0.4.0`. The constructor signature is replaced with a single object whose `files.model` field is an ordered array of absolute paths, mirroring the parallel LLM and diffusion addon refactors. This is a breaking change β€” every caller must update.

## Breaking Changes

### Constructor signature: single object with `files`, no `Loader`

`GGMLBert` now takes a single `{ files, config?, logger?, opts? }` object. The old `Loader` + `diskPath` + `modelName` + two-arg `(args, config)` shape is gone β€” callers pre-resolve absolute paths and supply them as `files.model`.

```js
// BEFORE (≀ 0.13.x)
const FilesystemDL = require('@qvac/dl-filesystem')
const loader = new FilesystemDL({ dirPath: '/models' })
const model = new GGMLBert({
loader,
modelName: 'bge-small-en-v1.5-q4_0.gguf',
diskPath: '/models',
logger: console,
opts: { stats: true }
}, { device: 'gpu', batch_size: '512' })

// AFTER (0.14.0)
const model = new GGMLBert({
files: {
model: ['/models/bge-small-en-v1.5-q4_0.gguf']
},
config: { device: 'gpu', batch_size: '512' },
logger: console,
opts: { stats: true }
})
```

For sharded models the caller passes the full ordered list β€” the `<basename>.tensors.txt` companion first, followed by every `<basename>-NNNNN-of-MMMMM.gguf` shard in ascending order:

```js
const model = new GGMLBert({
files: {
model: [
'/models/big-embed-model.tensors.txt',
'/models/big-embed-model-00001-of-00003.gguf',
'/models/big-embed-model-00002-of-00003.gguf',
'/models/big-embed-model-00003-of-00003.gguf'
]
},
config: { device: 'gpu' }
})
```

### `BaseInference` inheritance and `WeightsProvider` removed

`GGMLBert` no longer extends `BaseInference` and no longer touches the `WeightsProvider` download layer. The class composes `createJobHandler` and `exclusiveRunQueue` from `@qvac/infer-base@^0.4.0` directly. Public lifecycle methods (`load` / `run` / `cancel` / `unload` / `getState`) are unchanged in shape, but `downloadWeights` and the loader-based progress callbacks are gone β€” the caller is responsible for placing files on disk before constructing the model.

In-memory streaming from network sources (URLs, Hyperdrive) is no longer supported in the current API. The SDK does not currently use it (models are stored to disk first); this can be re-added when/if the SDK plans to support that feature. Before, it was possible through the `Loader` abstraction.

### Dependency changes

- `@qvac/infer-base` bumped from `^0.2.2` to `^0.4.0`.
- `bare-fs` is now a runtime dependency (used to stream shards from disk).
- `@qvac/dl-filesystem` and `@qvac/dl-hyperdrive` are no longer used by this package and have been removed from `devDependencies` / `peerDependencies`.

### `getState()` returns a narrower shape

`getState()` previously returned `{ configLoaded, weightsLoaded, destroyed }` (the three-field shape inherited from `BaseInference`). It now returns `{ configLoaded }` only. The `weightsLoaded` and `destroyed` fields are gone β€” `weightsLoaded` collapsed into `configLoaded` because the refactored `load()` does both in one step, and `destroyed` is no longer tracked since `unload()` resets `configLoaded` and nulls the addon handle instead. Callers reading `state.weightsLoaded` or `state.destroyed` must switch to `state.configLoaded`.

### Public methods removed from `GGMLBert`

`GGMLBert` previously exposed these methods via `BaseInference` inheritance, all of which are now gone:

- `downloadWeights(onDownloadProgress, opts)` β€” the download layer is removed; the caller places files on disk and passes absolute paths in `files.model`.
- `pause()` / `unpause()` / `stop()` β€” BaseInference job-lifecycle helpers. The refactor uses `createJobHandler` directly; use `cancel()` to terminate an in-flight run.
- `status()` β€” replaced by `getState()` for the static readiness flag; per-job state is observed via the `QvacResponse` returned by `run()`.
- `destroy()` β€” folded into `unload()`, which now both releases native resources and nulls `this.addon`.
- `getApiDefinition()` β€” no longer exposed; consumers should import types from `index.d.ts`.

### `load()` takes no arguments

`load()` previously forwarded `...args` through `BaseInference.load` into embed's `_load(closeLoader, reportProgressCallback)`. Both arguments are gone β€” `closeLoader` is meaningless without a `Loader`, and `reportProgressCallback` is superseded by the caller owning download-and-placement before construction. Call `await model.load()` with no arguments.

### Type exports removed from `index.d.ts`

The following exports are no longer part of the package's public type surface because the loader/download layer they described is gone: `ReportProgressCallback`, `Loader`, `GGMLArgs`, `DownloadWeightsOptions`, `DownloadResult`. TypeScript consumers importing any of these must update to the new `GGMLBertArgs` / `files` shape.

### `BertInterface` `outputCb` signature: `jobId` dropped

The exported `BertInterface` class's constructor still takes `(binding, configurationParams, outputCb)`, but the `outputCb` signature changed:

```ts
// BEFORE
(addon: unknown, event: string, jobId: number, data: unknown, error?: Error) => void
// AFTER
(addon: unknown, event: string, data: unknown, error?: Error) => void
```

The `jobId: number` argument is gone because `createJobHandler` owns the single active job directly; the wrapper no longer needs a per-job identifier in the callback chain. External callers constructing `BertInterface` with a custom `outputCb` must drop the third argument.

### `BertInterface.runJob` return type

`BertInterface.runJob(input)` previously returned `Promise<void>`. It now returns `Promise<boolean>` β€” `true` if the job was accepted, `false` if the addon was already busy. `GGMLBert` uses this return to surface a busy error to the caller instead of silently dropping the job.

## Features

### Constructor input validation

The constructor now throws `TypeError('files.model must be a non-empty array of absolute paths')` when `files` or `files.model` is missing or empty. This produces a clear error for callers porting old code instead of a confusing `Cannot read properties of undefined`.

### `run()`-before-`load()` guard

Calling `run()` before `load()` now throws `Error('Addon not initialized. Call load() first.')` instead of dereferencing `null` and crashing.

### `load()` is now idempotent when already loaded

A second `load()` call on an already-loaded instance is now a silent no-op instead of unloading and reloading. This aligns with the ReadyResource pattern used elsewhere in QVAC and prevents accidental double-loads from triggering expensive work. Callers that intentionally want to swap weights must call `unload()` first (which clears `configLoaded`) and then `load()` again.

### Crash-safe shard streaming

If `_streamShards()` or `addon.activate()` throws mid-load (for example a corrupted shard file or a native init failure), the partially-initialized addon is now best-effort-unloaded and `this.addon` is reset to `null`. A subsequent `load()` call starts cleanly instead of leaking a zombie native instance.

## Bug Fixes

### `unload()` clears the addon reference

`unload()` now sets `this.addon = null` after `await this.addon.unload()`, so post-unload `cancel()` / `run()` calls hit the explicit guards rather than dereferencing a disposed native handle. `cancel()` and the job-handler cancel closure both use optional chaining for the same reason.

### Unknown addon events no longer pollute the output stream

`_addonOutputCallback` previously fed any non-stats / non-error event payload into `response.output`, including unknown events. It now logs unknown events at warn level (these indicate a native-layer change worth surfacing) and only forwards `Embeddings` payloads to the active response.

### `load()` is serialized through the exclusive run queue

`load()` is now routed through the same `exclusiveRunQueue` used by `run()` and `unload()`. Previously two overlapping `load()` calls on the same instance could both pass the `configLoaded` guard before it flipped to `true`, both stream shards into and activate the native addon, and clobber `this.addon` β€” leaking one native handle. Concurrent `load()` on a single instance is now safe.

### Constructor rejects non-absolute path entries

Each entry in `files.model` is now validated with `path.isAbsolute()` (matching the existing error-message contract). Relative paths are rejected at construction time instead of bubbling up from `bare-fs` or the native load.

## Pull Requests

- [#1493](https://github.com/tetherto/qvac/pull/1493) - chore[bc]: embed addon interface refactor β€” remove BaseInference and WeightsProvider

## [0.13.4] - 2026-04-03

### Changed
Expand Down
Loading
Loading