chore[bc]: LLM addon interface refactor - remove BaseInference and WeightsProvider by donriddo · Pull Request #1494 · tetherto/qvac

donriddo · 2026-04-09T20:24:40Z

🎯 What problem does this PR solve?

LLM addon extends BaseInference and uses the WeightsProvider download layer, so the constructor always needs a Loader even when model files are already on disk.
Primary GGUF path selection (first shard vs single file) was inlined as a one-liner using a module-level SHARD_REGEX global, with no tests.
Native C++ event vocabulary is normalized inline in index.js (including the stateful post-finetune TPS skip flag), spreading bridge logic across the class.

📝 How does it solve it?

Remove BaseInference inheritance; compose createJobHandler() + exclusiveRunQueue() from @qvac/infer-base@^0.4.0 directly.
Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger?, opts? } with pre-resolved absolute paths.
Shard streaming via bare-fs.createReadStream directly (no WeightsProvider).
Extract native event normalization into addon.js (mapAddonEvent), including the skipNextRuntimeStats state used to drop the stale TPS trailer the C++ addon emits after a finetune terminal.
Extract primary-path selection into a named pickPrimaryGgufPath(files) function exported for testing (with unit tests documenting the .tensors.txt-first ordering and single-file fallback).
Constructor validation: throws TypeError('files.model must be a non-empty array of absolute paths') for missing/empty input.
run()-before-load() guard: throws Error('Addon not initialized. Call load() first.') instead of dereferencing null. finetune() already had this guard.
load() on an already-loaded instance is now a silent no-op (ReadyResource pattern). Call unload() first if you intend to swap weights, then load() again.
load() is serialized through the same exclusiveRunQueue as run(), finetune(), and unload(), so two concurrent load() calls on the same instance no longer race past the configLoaded guard and leak a native handle.
Constructor now validates each files.model entry with path.isAbsolute(), plus files.projectionModel when supplied. Relative paths are rejected at construction time.
Crash-safe shard streaming: partial load failure best-effort-unloads and nulls this.addon.
unload() clears this.addon and state.configLoaded so post-unload calls hit the guards instead of dereferencing a disposed native handle. pause(), cancel(), and the job-handler cancel closure use optional chaining for the same reason.
Forwards cacheKey and saveCacheToDisk from runOptions into the text message sent to the addon (from upstream/main's 0.15.0 KV cache API simplification).
Dependency changes: @qvac/infer-base ^0.3.0 → ^0.4.0, @qvac/dl-base and @qvac/dl-filesystem removed from devDependencies. bare-fs remains a runtime dependency (already present).
Restored JSDoc on FinetuneValidationSplit.fraction and FinetuneValidationDataset.path that were accidentally dropped during the BaseInference removal merge.
Fixed docs/architecture.md and docs/data-flows-detailed.md: four occurrences incorrectly stated the "last shard" was the primary path; actual code selects the first shard-regex match.
Hardcoded shard filenames in the sharded model-loading integration test (was previously regex-derived).
Removed dead _isSuppressedNoResponseLog filter tied to the old BaseInference._jobToResponse Map; the new architecture cannot emit that message at all, so the wrapped-logger indirection is gone. The user-supplied logger is used directly.

🧪 How was it tested?

npm run lint (standard) and npm run test:dts clean.
31 finetuning unit tests pass after _hasActiveResponse clearing was aligned with the embed pattern (.finally() only; no redundant synchronous clear in _handleAddonOutputEvent).
New unit tests: test/unit/map-addon-event.test.js (9 tests) and test/unit/pick-primary-gguf-path.test.js (4 tests).
All 17 integration test files updated to the new constructor shape.
Examples and benchmarks parse-checked.

💥 Breaking Changes

BEFORE (≤ 0.15.x):

const FilesystemDL = require('@qvac/dl-filesystem')
const loader = new FilesystemDL({ dirPath: '/models' })
const model = new LlmLlamacpp({
  loader,
  modelName: 'Qwen3-1.7B-Q4_0.gguf',
  diskPath: '/models',
  logger: console,
  opts: { stats: true }
}, { ctx_size: '4096', gpu_layers: '99' })

AFTER (0.16.0):

const model = new LlmLlamacpp({
  files: {
    model: ['/models/Qwen3-1.7B-Q4_0.gguf'],
    projectionModel: '/models/mmproj-model-f16.gguf'
  },
  config: { ctx_size: '4096', gpu_layers: '99' },
  logger: console,
  opts: { stats: true }
})

Additional behavior changes:

In-memory streaming from network sources is no longer supported — the SDK does not use it today (stores models to disk first); can be re-added via a future loader abstraction. Previously possible through the Loader abstraction.
Second load() is a silent no-op (ReadyResource pattern). Call unload() first if you intend to swap weights.
downloadWeights, the Loader interface, and loader-based progress callbacks are gone.
The inherited destroy() is removed; callers use unload() instead.

Version bump: 0.15.0 → 0.16.0. Upstream/main took 0.15.0 for the KV cache API simplification; this refactor lands on the next minor to avoid collision.

🔌 API Changes

// Constructor: single object with files.model (string array) and optional files.projectionModel
// Removed: downloadWeights(), Loader interface, WeightsProvider, BaseInference inheritance, destroy()
// Removed deps: @qvac/dl-filesystem, @qvac/dl-base
// Kept: load(), run(), finetune(), pause(), cancel(), unload()
// Added: getState(), pickPrimaryGgufPath() (exported for unit testing)

…LLM addon Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0: - createJobHandler() for single-job lifecycle management - exclusiveRunQueue() for run serialization - Direct shard streaming via bare-fs instead of WeightsProvider Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger, opts } instead of { loader, diskPath, modelName, projectionModel } + config. All finetune, media, and filtered logger functionality preserved.

…t callback FinetuneProgress must call updateStats(data.stats), not updateOutput(data). Finetune terminal JobEnded must call ended(data) as result, not updateStats.

…r shape Update 13 examples and sharded model test to use files: { model: [...] } pattern. Remove FilesystemDL dependency from all examples and tests.

The network loader test used the old loader-based constructor. Rewritten to download shards via HttpDL to disk, then pass absolute paths.

…nstructor

…eRunQueue

… contract

The embed, LLM, and diffusion addons (PRs tetherto#1493, tetherto#1494, tetherto#1496) drop BaseInference + WeightsProvider in favour of composable utilities and expose a single-arg constructor of the form `new Addon({ files, config, logger, opts })` where `files.model` is either a string (diffusion) or an array of absolute paths to all GGUF shards (embed/LLM). Per the addon-loader-abstraction proposal, shard expansion now belongs to the SDK plugin layer rather than the addon. This commit: - Adds `expandGGUFIntoShards()` in `server/utils`, mirroring the C++ `GGUFShards::expandGGUFIntoShards` regex contract. It is pure string manipulation so it works under both bun (unit tests) and bare (production), and handles POSIX and Windows separators. - Rewrites `llamacpp-embedding`, `llamacpp-completion`, and `sdcpp-generation` plugins to call the new constructor. FilesystemDL and the loader-adapter shim are dropped from these three plugins — they no longer create or pass a loader, returning `{ model, loader: undefined }` (the SD plugin already followed this pattern). The diffusion artifact keys are renamed `clipLModel`/`t5XxlModel`/etc. → `clipL`/`t5Xxl`/etc. to match the new `DiffusionFiles` shape. - Pins `@qvac/embed-llamacpp`, `@qvac/llm-llamacpp`, and `@qvac/diffusion-cpp` to `file:../...` so the SDK consumes the in-monorepo refactored addons until the addon PRs land and publish versioned releases. These pins must be bumped back to semver ranges before merging. - Adds a unit test for `expandGGUFIntoShards` covering non-sharded paths, sharded paths supplied at any index, nested directories, single-shard sharded models, relative paths, and Windows separators. `@qvac/dl-filesystem` stays in `dependencies` because OCR, parakeet, nmtcpp, and whispercpp plugins still rely on it.

The comment said "Event-name normalization lives in addon.js (mapAddonEvent)", but the very next line imports and calls mapAddonEvent — the code already tells the reader where event mapping lives. Remove the line so the code speaks for itself.

The refactor commit unintentionally rephrased FinetuneOptions JSDoc lines that the refactor itself did not change. Revert those fields back to main's original wording so the diff only carries structural changes tied to the interface migration.

The refactor commit silently dropped the _load() progress logs ('Creating addon with configuration', 'Activating addon'), the 'Error during model load' error log, and the JSDoc block on _createAddon(). Put them back so the refactor only changes what needs to change.

The 'test' alias was only consumed by 'test:all', and neither was referenced in CI workflows or the README. 'test:all' ran test:unit twice because it called both test:unit and the 'test' alias. Remove 'test' and rewrite 'test:all' to run test:unit, test:integration, and test:cpp directly.

0.15.x still used the old (args, config) constructor shape; the old example applies to any 0.15.x caller, not just 0.14.x. Align the CHANGELOG marker with the PR body.

The backmerge of upstream/main carried a stale 'skip: isMobile' from the pre-refactor translation test into the six new translation tests and the edge-cases migration. Main's a570189 deliberately dropped the mobile skip; restore that intent. The isMobile constant is unused after this and dropped.

_createAddon() JSDoc referenced 'configurationParams.settings' and omitted 'projectionPath'. The actual shape built in _load() is { path, projectionPath, config }; align the JSDoc with that. UserMediaMessage.content widened to Uint8Array | string earlier in this PR but no integration test exercised the string-path branch. Add one elephant-image test that passes the absolute path as message content, exercising the loadMedia(string) path through the JS-to-C++ handoff.

index.js requires('@qvac/logging') at runtime, so it belongs under dependencies, not devDependencies. Previously it worked only because another runtime dep pulled it in transitively — fragile for publish and can break under stricter package managers.

Previous commit 979a070 reworded only my own addition (line 251) but the block still failed at the same position because the surrounding pre-existing message bodies still used ; as a statement separator. Mermaid sequenceDiagram parses ; as end-of-statement, so every message containing it broke the diagram. Replace ; with , or a separator word across all four affected lines (block tetherto#1 lines 251, 256, 266 and block tetherto#2 line 296) so the finetune and pause flow diagrams render on GitHub.

_createAddon() was outside the try so a synchronous throw in require('./binding') or binding.createInstance() would leave this.addon set to a partial native handle and never reach the cleanup path. Route addon construction through the same try the shard-streaming and activate() calls use.

donriddo · 2026-04-17T14:20:34Z

/review

github-actions · 2026-04-17T14:21:05Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (4/1)
- 1 Team Lead OR Management approval ✅ (2/1)

**Bypass rule:** Triggered (2+ Team Lead approvals (Tier 1 exception)). This PR is approved regardless of tier.

---
*This comment is automatically updated when reviews change.*

gianni-cor · 2026-04-17T16:04:03Z

/review

gianni-cor · 2026-04-17T16:29:14Z

/review

gianni-cor · 2026-04-17T16:43:21Z

/review

donriddo added 4 commits April 9, 2026 06:19

fix: correct FinetuneProgress and finetune terminal handling in outpu…

7494997

…t callback FinetuneProgress must call updateStats(data.stats), not updateOutput(data). Finetune terminal JobEnded must call ended(data) as result, not updateStats.

fix: update all LLM examples and model-loading test to new constructo…

ace7b57

…r shape Update 13 examples and sharded model test to use files: { model: [...] } pattern. Remove FilesystemDL dependency from all examples and tests.

fix: update sharded model test to download shards to disk first

91c2375

The network loader test used the old loader-based constructor. Rewritten to download shards via HttpDL to disk, then pass absolute paths.

donriddo had a problem deploying to release April 9, 2026 20:25 — with GitHub Actions Failure

fix: update LLM benchmark tooling to new constructor shape

502be84

donriddo had a problem deploying to release April 9, 2026 22:16 — with GitHub Actions Failure

fix: update LLM perf benchmark sweep and judge to new constructor shape

b3dc214

donriddo had a problem deploying to release April 10, 2026 03:27 — with GitHub Actions Failure

docs: update LLM README, finetuning, and afriquegemma docs for new co…

a087ffd

…nstructor

donriddo had a problem deploying to release April 10, 2026 04:28 — with GitHub Actions Failure

donriddo added 3 commits April 10, 2026 06:16

fix: update LLM prepare-prompts and verify-prompts to new constructor

ebcf734

fix: update LLM finetuning unit tests to new constructor and exclusiv…

0ada4a5

…eRunQueue

docs: update LLM architecture, data-flows, finetuning, README sharded…

26bb3c0

… contract

donriddo had a problem deploying to release April 10, 2026 05:28 — with GitHub Actions Failure

donriddo marked this pull request as ready for review April 10, 2026 09:50

donriddo requested review from a team as code owners April 10, 2026 09:50

donriddo mentioned this pull request Apr 10, 2026

chore[bc|notask]: migrate SDK plugins to new addon constructor shape #1510

Closed

donriddo added 9 commits April 16, 2026 20:13

doc: correct pre-refactor constructor marker to <= 0.15.x

94e4d27

0.15.x still used the old (args, config) constructor shape; the old example applies to any 0.15.x caller, not just 0.14.x. Align the CHANGELOG marker with the PR body.

maxim-smotrov previously approved these changes Apr 17, 2026

View reviewed changes

Merge branch 'main' into chore/llm-addon-interface-refactor

de3d93e

yuranich previously approved these changes Apr 17, 2026

View reviewed changes

gianni-cor reviewed Apr 17, 2026

View reviewed changes

Comment thread packages/qvac-lib-infer-llamacpp-llm/index.js Outdated

donriddo added 2 commits April 17, 2026 14:58

Merge branch 'main' into chore/llm-addon-interface-refactor

25d37c2

maxim-smotrov approved these changes Apr 17, 2026

View reviewed changes

dev-nid approved these changes Apr 17, 2026

View reviewed changes

jesusmb1995 approved these changes Apr 17, 2026

View reviewed changes

donriddo and others added 2 commits April 17, 2026 17:02

Merge branch 'main' into chore/llm-addon-interface-refactor

24eb15c

Merge branch 'main' into chore/llm-addon-interface-refactor

a1dfe8f

gianni-cor approved these changes Apr 17, 2026

View reviewed changes

JamieBohannaWebDev approved these changes Apr 17, 2026

View reviewed changes

elchiapp approved these changes Apr 17, 2026

View reviewed changes

donriddo mentioned this pull request Apr 21, 2026

QVAC-17434 chore[bc]: migrate SDK plugins to new addon constructor shape #1688

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore[bc]: LLM addon interface refactor - remove BaseInference and WeightsProvider#1494

chore[bc]: LLM addon interface refactor - remove BaseInference and WeightsProvider#1494
gianni-cor merged 53 commits into
tetherto:mainfrom
donriddo:chore/llm-addon-interface-refactor

donriddo commented Apr 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

donriddo commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

gianni-cor commented Apr 17, 2026

Uh oh!

gianni-cor commented Apr 17, 2026

Uh oh!

gianni-cor commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

donriddo commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

💥 Breaking Changes

🔌 API Changes

Uh oh!

Uh oh!

donriddo commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier-based Approval Status

Uh oh!

gianni-cor commented Apr 17, 2026

Uh oh!

gianni-cor commented Apr 17, 2026

Uh oh!

gianni-cor commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

donriddo commented Apr 9, 2026 •

edited

Loading

github-actions Bot commented Apr 17, 2026 •

edited

Loading