feat[api|notask]: expose backendDevice stat in SDK for LLM and embed addons#1495
Merged
NamelsKing merged 14 commits intoApr 15, 2026
Merged
Conversation
Add optional backendDevice (cpu | gpu) to completion and embed stats. Maps through from addon RuntimeStats to SDK response schemas. Depends on addon PR tetherto#1393 which adds the stat at the C++ layer.
Adds embedWithStats() to the SDK client API so consumers can read the embed addon runtime stats (totalTime, tokensPerSecond, totalTokens, backendDevice) without giving up the simple embed() shape that returns just the vectors. Re-exports EmbedStats for typed consumption. Adds unit tests covering backendDevice on both embedStatsSchema and completionStatsSchema, including round-tripping through the response schemas, to lock in the cpu/gpu enum and prevent regressions.
… inputs Adds three new fields to the SDK's llama.cpp config schemas, mirroring exactly the inputs introduced by tetherto#1506 in `@qvac/llm-llamacpp@0.14.4` and `@qvac/embed-llamacpp@0.13.4`: - `LlmConfig.openclCacheDir` (string) — writable directory for the OpenCL kernel binary cache. Required on Android for fast GPU startup on Adreno devices; ignored on every other platform. - `LlmConfig['cache-type-k']` (string) — KV cache K-tensor quantization forwarded directly to llama.cpp's `--cache-type-k` flag. No JSDoc: the addon's own `index.d.ts` ships none and I have not verified the set of accepted values from source. - `LlmConfig['cache-type-v']` (string) — same for `--cache-type-v`. - `EmbedConfig.openclCacheDir` (string) — same as the LLM field; embed is the only other addon that ships an OpenCL backend. The SDK schema key names are the exact same strings the addon expects on its config map (verified against `index.d.ts` on `upstream/main` at the new versions): `'cache-type-k'` and `'cache-type-v'` are kebab-case, `openclCacheDir` is camelCase. The plugin transforms now forward them as-is: - `transformLlmConfig` no longer rewrites camelCase keys to snake_case. The previous regex was effectively dead code (all current LlmConfig fields are either snake_case to begin with, or `projectionModelSrc` which is filtered out by `resolveConfig` before this transform runs), and it would have actively broken the new `openclCacheDir` by mangling it to `opencl_cache_dir` — a key the addon does not recognize. - `transformEmbedConfig` adds an explicit `openclCacheDir` mapping. Bracket notation is used so the addon's index-signature-only declaration on the currently-installed 0.13.3 d.ts type-checks cleanly under `noPropertyAccessFromIndexSignature`. Bumps the SDK addon dep ranges so consumers actually pull versions that recognize these keys: - `@qvac/embed-llamacpp`: `^0.12.0` → `^0.13.4` - `@qvac/llm-llamacpp`: `^0.12.1` → `^0.14.4` Caret on a 0.x range does not float across the minor boundary, so the old `^0.12.x` pin could not pull `0.13.x` / `0.14.x` even though those versions are committed to upstream/main. Note that 0.13.4 / 0.14.4 are not yet on the public npm registry — they are committed to upstream/main but pending the next release cycle. `bun install` will fall back to the latest published 0.13.x / 0.14.x once they are up.
…ckend-device-stat
After merging upstream/main, the local @qvac/embed-llamacpp source is at 0.13.4, which declares openclCacheDir as a typed field on GGMLConfig (not just an index-signature key). The bracket-notation workaround for TS4111 is no longer necessary — dot notation type-checks cleanly.
Reverts the removal of the .replace() regex that converts camelCase JSON keys to snake_case. The regex may serve a purpose not visible in the current code — need to confirm with the team before removing it.
Move embed schema tests to embed-schemas.test.ts and completion-stream schema tests to completion-stream-schemas.test.ts, matching the SDK convention of naming test files after the module they cover.
Remove embedWithStats() and modify embed() to return { embedding, stats }
matching the pattern used by completionStream(), diffusion(), translate(),
and ocr(). Update all example callers and JSDoc references.
…ckend-device-stat # Conflicts: # packages/sdk/CHANGELOG.md # packages/sdk/package.json
…ckend-device-stat # Conflicts: # packages/sdk/bun.lock # packages/sdk/client/api/index.ts # packages/sdk/index.ts # packages/sdk/package.json
…to SDK root Address PR tetherto#1495 review (Opanin): - Revert SDK version bump (0.9.0 → 0.8.3) — version is handled during release. - Revert SDK CHANGELOG entry — handled during release. - Drop duplicate `export type { EmbedStats }` from client/api/embed.ts; the sdk/index.ts re-export (sourced from ./schemas) is sufficient and matches the precedent of CompletionStats / DiffusionStats. - Move `type EmbedStats` in sdk/index.ts from the `./client/api` block to the `./schemas` block so all stat types have one consistent home.
opaninakuffo
approved these changes
Apr 14, 2026
Contributor
Author
|
/review |
Contributor
Tier-based Approval Status |
gianni-cor
approved these changes
Apr 15, 2026
Contributor
|
/review |
This was referenced Apr 15, 2026
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
…addons (#1495) * feat[api]: expose backendDevice stat in SDK for LLM and embed addons Add optional backendDevice (cpu | gpu) to completion and embed stats. Maps through from addon RuntimeStats to SDK response schemas. Depends on addon PR #1393 which adds the stat at the C++ layer. * feat[api]: add embedWithStats client helper and backendDevice tests Adds embedWithStats() to the SDK client API so consumers can read the embed addon runtime stats (totalTime, tokensPerSecond, totalTokens, backendDevice) without giving up the simple embed() shape that returns just the vectors. Re-exports EmbedStats for typed consumption. Adds unit tests covering backendDevice on both embedStatsSchema and completionStatsSchema, including round-tripping through the response schemas, to lock in the cpu/gpu enum and prevent regressions. * feat[api]: re-export embedWithStats and EmbedStats from SDK root entrypoint * feat[api]: expose openclCacheDir / cache-type-k / cache-type-v config inputs Adds three new fields to the SDK's llama.cpp config schemas, mirroring exactly the inputs introduced by #1506 in `@qvac/llm-llamacpp@0.14.4` and `@qvac/embed-llamacpp@0.13.4`: - `LlmConfig.openclCacheDir` (string) — writable directory for the OpenCL kernel binary cache. Required on Android for fast GPU startup on Adreno devices; ignored on every other platform. - `LlmConfig['cache-type-k']` (string) — KV cache K-tensor quantization forwarded directly to llama.cpp's `--cache-type-k` flag. No JSDoc: the addon's own `index.d.ts` ships none and I have not verified the set of accepted values from source. - `LlmConfig['cache-type-v']` (string) — same for `--cache-type-v`. - `EmbedConfig.openclCacheDir` (string) — same as the LLM field; embed is the only other addon that ships an OpenCL backend. The SDK schema key names are the exact same strings the addon expects on its config map (verified against `index.d.ts` on `upstream/main` at the new versions): `'cache-type-k'` and `'cache-type-v'` are kebab-case, `openclCacheDir` is camelCase. The plugin transforms now forward them as-is: - `transformLlmConfig` no longer rewrites camelCase keys to snake_case. The previous regex was effectively dead code (all current LlmConfig fields are either snake_case to begin with, or `projectionModelSrc` which is filtered out by `resolveConfig` before this transform runs), and it would have actively broken the new `openclCacheDir` by mangling it to `opencl_cache_dir` — a key the addon does not recognize. - `transformEmbedConfig` adds an explicit `openclCacheDir` mapping. Bracket notation is used so the addon's index-signature-only declaration on the currently-installed 0.13.3 d.ts type-checks cleanly under `noPropertyAccessFromIndexSignature`. Bumps the SDK addon dep ranges so consumers actually pull versions that recognize these keys: - `@qvac/embed-llamacpp`: `^0.12.0` → `^0.13.4` - `@qvac/llm-llamacpp`: `^0.12.1` → `^0.14.4` Caret on a 0.x range does not float across the minor boundary, so the old `^0.12.x` pin could not pull `0.13.x` / `0.14.x` even though those versions are committed to upstream/main. Note that 0.13.4 / 0.14.4 are not yet on the public npm registry — they are committed to upstream/main but pending the next release cycle. `bun install` will fall back to the latest published 0.13.x / 0.14.x once they are up. * fix: use dot notation for openclCacheDir now that GGMLConfig types it After merging upstream/main, the local @qvac/embed-llamacpp source is at 0.13.4, which declares openclCacheDir as a typed field on GGMLConfig (not just an index-signature key). The bracket-notation workaround for TS4111 is no longer necessary — dot notation type-checks cleanly. * fix: restore camelCase-to-snake_case regex in transformLlmConfig Reverts the removal of the .replace() regex that converts camelCase JSON keys to snake_case. The regex may serve a purpose not visible in the current code — need to confirm with the team before removing it. * test: split backendDevice schema tests into per-module test files Move embed schema tests to embed-schemas.test.ts and completion-stream schema tests to completion-stream-schemas.test.ts, matching the SDK convention of naming test files after the module they cover. * fix[bc]: return { embedding, stats } from embed() instead of raw vectors Remove embedWithStats() and modify embed() to return { embedding, stats } matching the pattern used by completionStream(), diffusion(), translate(), and ocr(). Update all example callers and JSDoc references. * chore[bc]: add SDK CHANGELOG for 0.9.0 and bump version Documents the embed() breaking return shape change, backendDevice stat addition, new config fields (openclCacheDir, cache-type-k, cache-type-v), EmbedStats export, and addon dependency bumps. * fix: restore openclCacheDir after camelCase-to-snake_case transform The generic regex in transformLlmConfig mangles openclCacheDir to opencl_cache_dir, but the addon expects the camelCase form. Restore it after the transform, same pattern as stop_sequences → reverse_prompt. * chore: revert SDK version + CHANGELOG, collapse EmbedStats re-export to SDK root Address PR #1495 review (Opanin): - Revert SDK version bump (0.9.0 → 0.8.3) — version is handled during release. - Revert SDK CHANGELOG entry — handled during release. - Drop duplicate `export type { EmbedStats }` from client/api/embed.ts; the sdk/index.ts re-export (sourced from ./schemas) is sufficient and matches the precedent of CompletionStats / DiffusionStats. - Move `type EmbedStats` in sdk/index.ts from the `./client/api` block to the `./schemas` block so all stat types have one consistent home.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
backendDevice(cpu/gpu) stat that LLM and embed addons now emit.embed()client function returns rawnumber[]/number[][], so consumers have no way to read addon runtime stats (totalTime,tokensPerSecond,totalTokens,backendDevice).openclCacheDir,cache-type-k, andcache-type-vconfig fields accepted by the LLM and embed addons.📝 How does it solve it?
backendDevice: 'cpu' | 'gpu'tocompletionStatsSchemaandembedStatsSchema(Zod), and toLlmStats/EmbedStatsaddon-response interfaces.backendDevicethroughserver/bare/ops/embed.tsandserver/bare/plugins/llamacpp-completion/ops/completion-stream.tsso addon stats reach the client response.embed()return shape fromPromise<number[] | number[][]>toPromise<{ embedding, stats? }>, matching the pattern used bycompletionStream(),diffusion(),translate(), andocr(). Re-exportEmbedStatsfrom the SDK root for typed consumption.embed()callers (examples +rag.tsJSDoc) to destructure the new shape.openclCacheDir(camelCase),'cache-type-k', and'cache-type-v'(kebab-case) tollmConfigBaseSchema;openclCacheDiralso added toembedConfigBaseSchema. Key names match exactly what the native addons expect.transformLlmConfig) restoresopenclCacheDirafter the legacy camelCase-to-snake_case regex (which would otherwise mangle it toopencl_cache_dir, silently dropping the OpenCL kernel cache directory on Android and slowing first-time GPU startup). Embed plugin passesopenclCacheDirthrough directly with no regex to fight against.🧪 How was it tested?
bun run lint(eslint + tsc) clean.bun run test:unitclean — new tests:test/unit/embed-schemas.test.ts:backendDeviceaccepted for'cpu'/'gpu', rejected for unknown values, optional, and round-trips throughembedResponseSchema.test/unit/completion-stream-schemas.test.ts: same coverage oncompletionStatsSchemaandcompletionStreamResponseSchema.💥 Breaking Changes
embed()return shape — the client function now returns{ embedding, stats? }instead of raw vectors. Every caller must destructure.BEFORE (≤ 0.8.x):
AFTER:
🔌 API Changes
📋 Dependency bumps
@qvac/embed-llamacpp:^0.12.0→^0.13.4@qvac/llm-llamacpp:^0.12.1→^0.14.4