Skip to content

chore[bc]: LLM addon interface refactor - remove BaseInference and WeightsProvider#1494

Merged
gianni-cor merged 53 commits into
tetherto:mainfrom
donriddo:chore/llm-addon-interface-refactor
Apr 17, 2026
Merged

chore[bc]: LLM addon interface refactor - remove BaseInference and WeightsProvider#1494
gianni-cor merged 53 commits into
tetherto:mainfrom
donriddo:chore/llm-addon-interface-refactor

Conversation

@donriddo

@donriddo donriddo commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • LLM addon extends BaseInference and uses the WeightsProvider download layer, so the constructor always needs a Loader even when model files are already on disk.
  • Primary GGUF path selection (first shard vs single file) was inlined as a one-liner using a module-level SHARD_REGEX global, with no tests.
  • Native C++ event vocabulary is normalized inline in index.js (including the stateful post-finetune TPS skip flag), spreading bridge logic across the class.

📝 How does it solve it?

  • Remove BaseInference inheritance; compose createJobHandler() + exclusiveRunQueue() from @qvac/infer-base@^0.4.0 directly.
  • Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger?, opts? } with pre-resolved absolute paths.
  • Shard streaming via bare-fs.createReadStream directly (no WeightsProvider).
  • Extract native event normalization into addon.js (mapAddonEvent), including the skipNextRuntimeStats state used to drop the stale TPS trailer the C++ addon emits after a finetune terminal.
  • Extract primary-path selection into a named pickPrimaryGgufPath(files) function exported for testing (with unit tests documenting the .tensors.txt-first ordering and single-file fallback).
  • Constructor validation: throws TypeError('files.model must be a non-empty array of absolute paths') for missing/empty input.
  • run()-before-load() guard: throws Error('Addon not initialized. Call load() first.') instead of dereferencing null. finetune() already had this guard.
  • load() on an already-loaded instance is now a silent no-op (ReadyResource pattern). Call unload() first if you intend to swap weights, then load() again.
  • load() is serialized through the same exclusiveRunQueue as run(), finetune(), and unload(), so two concurrent load() calls on the same instance no longer race past the configLoaded guard and leak a native handle.
  • Constructor now validates each files.model entry with path.isAbsolute(), plus files.projectionModel when supplied. Relative paths are rejected at construction time.
  • Crash-safe shard streaming: partial load failure best-effort-unloads and nulls this.addon.
  • unload() clears this.addon and state.configLoaded so post-unload calls hit the guards instead of dereferencing a disposed native handle. pause(), cancel(), and the job-handler cancel closure use optional chaining for the same reason.
  • Forwards cacheKey and saveCacheToDisk from runOptions into the text message sent to the addon (from upstream/main's 0.15.0 KV cache API simplification).
  • Dependency changes: @qvac/infer-base ^0.3.0^0.4.0, @qvac/dl-base and @qvac/dl-filesystem removed from devDependencies. bare-fs remains a runtime dependency (already present).
  • Restored JSDoc on FinetuneValidationSplit.fraction and FinetuneValidationDataset.path that were accidentally dropped during the BaseInference removal merge.
  • Fixed docs/architecture.md and docs/data-flows-detailed.md: four occurrences incorrectly stated the "last shard" was the primary path; actual code selects the first shard-regex match.
  • Hardcoded shard filenames in the sharded model-loading integration test (was previously regex-derived).
  • Removed dead _isSuppressedNoResponseLog filter tied to the old BaseInference._jobToResponse Map; the new architecture cannot emit that message at all, so the wrapped-logger indirection is gone. The user-supplied logger is used directly.

🧪 How was it tested?

  • npm run lint (standard) and npm run test:dts clean.
  • 31 finetuning unit tests pass after _hasActiveResponse clearing was aligned with the embed pattern (.finally() only; no redundant synchronous clear in _handleAddonOutputEvent).
  • New unit tests: test/unit/map-addon-event.test.js (9 tests) and test/unit/pick-primary-gguf-path.test.js (4 tests).
  • All 17 integration test files updated to the new constructor shape.
  • Examples and benchmarks parse-checked.

💥 Breaking Changes

BEFORE (≤ 0.15.x):

const FilesystemDL = require('@qvac/dl-filesystem')
const loader = new FilesystemDL({ dirPath: '/models' })
const model = new LlmLlamacpp({
  loader,
  modelName: 'Qwen3-1.7B-Q4_0.gguf',
  diskPath: '/models',
  logger: console,
  opts: { stats: true }
}, { ctx_size: '4096', gpu_layers: '99' })

AFTER (0.16.0):

const model = new LlmLlamacpp({
  files: {
    model: ['/models/Qwen3-1.7B-Q4_0.gguf'],
    projectionModel: '/models/mmproj-model-f16.gguf'
  },
  config: { ctx_size: '4096', gpu_layers: '99' },
  logger: console,
  opts: { stats: true }
})

Additional behavior changes:

  • In-memory streaming from network sources is no longer supported — the SDK does not use it today (stores models to disk first); can be re-added via a future loader abstraction. Previously possible through the Loader abstraction.
  • Second load() is a silent no-op (ReadyResource pattern). Call unload() first if you intend to swap weights.
  • downloadWeights, the Loader interface, and loader-based progress callbacks are gone.
  • The inherited destroy() is removed; callers use unload() instead.

Version bump: 0.15.00.16.0. Upstream/main took 0.15.0 for the KV cache API simplification; this refactor lands on the next minor to avoid collision.

🔌 API Changes

// Constructor: single object with files.model (string array) and optional files.projectionModel
// Removed: downloadWeights(), Loader interface, WeightsProvider, BaseInference inheritance, destroy()
// Removed deps: @qvac/dl-filesystem, @qvac/dl-base
// Kept: load(), run(), finetune(), pause(), cancel(), unload()
// Added: getState(), pickPrimaryGgufPath() (exported for unit testing)

donriddo added 4 commits April 9, 2026 06:19
…LLM addon

Replace class inheritance with composable utilities from @qvac/infer-base@0.4.0:
- createJobHandler() for single-job lifecycle management
- exclusiveRunQueue() for run serialization
- Direct shard streaming via bare-fs instead of WeightsProvider

Constructor now takes { files: { model: string[], projectionModel?: string }, config, logger, opts }
instead of { loader, diskPath, modelName, projectionModel } + config.

All finetune, media, and filtered logger functionality preserved.
…t callback

FinetuneProgress must call updateStats(data.stats), not updateOutput(data).
Finetune terminal JobEnded must call ended(data) as result, not updateStats.
…r shape

Update 13 examples and sharded model test to use files: { model: [...] } pattern.
Remove FilesystemDL dependency from all examples and tests.
The network loader test used the old loader-based constructor.
Rewritten to download shards via HttpDL to disk, then pass absolute paths.
@donriddo donriddo marked this pull request as ready for review April 10, 2026 09:50
@donriddo donriddo requested review from a team as code owners April 10, 2026 09:50
donriddo added a commit to donriddo/qvac that referenced this pull request Apr 10, 2026
The embed, LLM, and diffusion addons (PRs tetherto#1493, tetherto#1494, tetherto#1496) drop
BaseInference + WeightsProvider in favour of composable utilities and
expose a single-arg constructor of the form
`new Addon({ files, config, logger, opts })` where `files.model` is
either a string (diffusion) or an array of absolute paths to all GGUF
shards (embed/LLM). Per the addon-loader-abstraction proposal, shard
expansion now belongs to the SDK plugin layer rather than the addon.

This commit:

- Adds `expandGGUFIntoShards()` in `server/utils`, mirroring the C++
  `GGUFShards::expandGGUFIntoShards` regex contract. It is pure string
  manipulation so it works under both bun (unit tests) and bare
  (production), and handles POSIX and Windows separators.

- Rewrites `llamacpp-embedding`, `llamacpp-completion`, and
  `sdcpp-generation` plugins to call the new constructor. FilesystemDL
  and the loader-adapter shim are dropped from these three plugins —
  they no longer create or pass a loader, returning
  `{ model, loader: undefined }` (the SD plugin already followed this
  pattern). The diffusion artifact keys are renamed
  `clipLModel`/`t5XxlModel`/etc. → `clipL`/`t5Xxl`/etc. to match the
  new `DiffusionFiles` shape.

- Pins `@qvac/embed-llamacpp`, `@qvac/llm-llamacpp`, and
  `@qvac/diffusion-cpp` to `file:../...` so the SDK consumes the
  in-monorepo refactored addons until the addon PRs land and publish
  versioned releases. These pins must be bumped back to semver ranges
  before merging.

- Adds a unit test for `expandGGUFIntoShards` covering non-sharded
  paths, sharded paths supplied at any index, nested directories,
  single-shard sharded models, relative paths, and Windows
  separators.

`@qvac/dl-filesystem` stays in `dependencies` because OCR, parakeet,
nmtcpp, and whispercpp plugins still rely on it.
The comment said "Event-name normalization lives in addon.js
(mapAddonEvent)", but the very next line imports and calls
mapAddonEvent — the code already tells the reader where event mapping
lives. Remove the line so the code speaks for itself.
The refactor commit unintentionally rephrased FinetuneOptions JSDoc
lines that the refactor itself did not change. Revert those fields back
to main's original wording so the diff only carries structural changes
tied to the interface migration.
The refactor commit silently dropped the _load() progress logs ('Creating
addon with configuration', 'Activating addon'), the 'Error during model
load' error log, and the JSDoc block on _createAddon(). Put them back so
the refactor only changes what needs to change.
The 'test' alias was only consumed by 'test:all', and neither was
referenced in CI workflows or the README. 'test:all' ran test:unit
twice because it called both test:unit and the 'test' alias. Remove
'test' and rewrite 'test:all' to run test:unit, test:integration, and
test:cpp directly.
0.15.x still used the old (args, config) constructor shape; the old
example applies to any 0.15.x caller, not just 0.14.x. Align the
CHANGELOG marker with the PR body.
The backmerge of upstream/main carried a stale 'skip: isMobile' from
the pre-refactor translation test into the six new translation tests
and the edge-cases migration. Main's a570189 deliberately dropped
the mobile skip; restore that intent. The isMobile constant is
unused after this and dropped.
_createAddon() JSDoc referenced 'configurationParams.settings' and
omitted 'projectionPath'. The actual shape built in _load() is
{ path, projectionPath, config }; align the JSDoc with that.

UserMediaMessage.content widened to Uint8Array | string earlier in
this PR but no integration test exercised the string-path branch.
Add one elephant-image test that passes the absolute path as
message content, exercising the loadMedia(string) path through the
JS-to-C++ handoff.
index.js requires('@qvac/logging') at runtime, so it belongs under
dependencies, not devDependencies. Previously it worked only because
another runtime dep pulled it in transitively — fragile for publish
and can break under stricter package managers.
Previous commit 979a070 reworded only my own addition (line 251) but
the block still failed at the same position because the surrounding
pre-existing message bodies still used ; as a statement separator.
Mermaid sequenceDiagram parses ; as end-of-statement, so every message
containing it broke the diagram.

Replace ; with , or a separator word across all four affected lines
(block tetherto#1 lines 251, 256, 266 and block tetherto#2 line 296) so the finetune
and pause flow diagrams render on GitHub.
maxim-smotrov
maxim-smotrov previously approved these changes Apr 17, 2026
yuranich
yuranich previously approved these changes Apr 17, 2026
Comment thread packages/qvac-lib-infer-llamacpp-llm/index.js Outdated
_createAddon() was outside the try so a synchronous throw in
require('./binding') or binding.createInstance() would leave
this.addon set to a partial native handle and never reach the
cleanup path. Route addon construction through the same try the
shard-streaming and activate() calls use.
@donriddo

Copy link
Copy Markdown
Contributor Author

/review

@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (4/1)
- 1 Team Lead OR Management approval ✅ (2/1)

**Bypass rule:** Triggered (2+ Team Lead approvals (Tier 1 exception)). This PR is approved regardless of tier.

---
*This comment is automatically updated when reviews change.*

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants