Skip to content

feat[api|notask]: expose backendDevice stat in SDK for LLM and embed addons#1495

Merged
NamelsKing merged 14 commits into
tetherto:mainfrom
donriddo:chore/sdk-expose-backend-device-stat
Apr 15, 2026
Merged

feat[api|notask]: expose backendDevice stat in SDK for LLM and embed addons#1495
NamelsKing merged 14 commits into
tetherto:mainfrom
donriddo:chore/sdk-expose-backend-device-stat

Conversation

@donriddo

@donriddo donriddo commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • SDK does not expose the backendDevice (cpu/gpu) stat that LLM and embed addons now emit.
  • SDK's embed() client function returns raw number[] / number[][], so consumers have no way to read addon runtime stats (totalTime, tokensPerSecond, totalTokens, backendDevice).
  • SDK schemas do not forward the openclCacheDir, cache-type-k, and cache-type-v config fields accepted by the LLM and embed addons.

📝 How does it solve it?

  • Add optional backendDevice: 'cpu' | 'gpu' to completionStatsSchema and embedStatsSchema (Zod), and to LlmStats / EmbedStats addon-response interfaces.
  • Thread backendDevice through server/bare/ops/embed.ts and server/bare/plugins/llamacpp-completion/ops/completion-stream.ts so addon stats reach the client response.
  • Change embed() return shape from Promise<number[] | number[][]> to Promise<{ embedding, stats? }>, matching the pattern used by completionStream(), diffusion(), translate(), and ocr(). Re-export EmbedStats from the SDK root for typed consumption.
  • Update all embed() callers (examples + rag.ts JSDoc) to destructure the new shape.
  • Add openclCacheDir (camelCase), 'cache-type-k', and 'cache-type-v' (kebab-case) to llmConfigBaseSchema; openclCacheDir also added to embedConfigBaseSchema. Key names match exactly what the native addons expect.
  • LLM plugin (transformLlmConfig) restores openclCacheDir after the legacy camelCase-to-snake_case regex (which would otherwise mangle it to opencl_cache_dir, silently dropping the OpenCL kernel cache directory on Android and slowing first-time GPU startup). Embed plugin passes openclCacheDir through directly with no regex to fight against.

🧪 How was it tested?

  • bun run lint (eslint + tsc) clean.
  • bun run test:unit clean — new tests:
    • test/unit/embed-schemas.test.ts: backendDevice accepted for 'cpu' / 'gpu', rejected for unknown values, optional, and round-trips through embedResponseSchema.
    • test/unit/completion-stream-schemas.test.ts: same coverage on completionStatsSchema and completionStreamResponseSchema.

💥 Breaking Changes

embed() return shape — the client function now returns { embedding, stats? } instead of raw vectors. Every caller must destructure.

BEFORE (≤ 0.8.x):

const vectors = await embed({ modelId, text: "hello" });

AFTER:

const { embedding, stats } = await embed({ modelId, text: "hello" });
console.log(stats?.backendDevice); // "cpu" | "gpu" | undefined

🔌 API Changes

// CompletionStats
{
  timeToFirstToken?: number
  tokensPerSecond?: number
  cacheTokens?: number
  backendDevice?: "cpu" | "gpu"  // NEW
}

// EmbedStats (newly re-exported from SDK root)
{
  totalTime?: number
  tokensPerSecond?: number
  totalTokens?: number
  backendDevice?: "cpu" | "gpu"  // NEW
}

// LlmConfig — new optional fields
{
  openclCacheDir?: string     // camelCase
  'cache-type-k'?: string     // kebab-case
  'cache-type-v'?: string     // kebab-case
}

// EmbedConfig — new optional field
{
  openclCacheDir?: string
}

📋 Dependency bumps

  • @qvac/embed-llamacpp: ^0.12.0^0.13.4
  • @qvac/llm-llamacpp: ^0.12.1^0.14.4

Add optional backendDevice (cpu | gpu) to completion and embed stats.
Maps through from addon RuntimeStats to SDK response schemas.
Depends on addon PR tetherto#1393 which adds the stat at the C++ layer.
@donriddo donriddo marked this pull request as ready for review April 9, 2026 20:35
@donriddo donriddo requested review from a team as code owners April 9, 2026 20:35
Adds embedWithStats() to the SDK client API so consumers can read the
embed addon runtime stats (totalTime, tokensPerSecond, totalTokens,
backendDevice) without giving up the simple embed() shape that returns
just the vectors. Re-exports EmbedStats for typed consumption.

Adds unit tests covering backendDevice on both embedStatsSchema and
completionStatsSchema, including round-tripping through the response
schemas, to lock in the cpu/gpu enum and prevent regressions.
@donriddo donriddo changed the title feat[api]: expose backendDevice stat in SDK for LLM and embed addons feat[api|notask]: expose backendDevice stat in SDK for LLM and embed addons Apr 10, 2026
… inputs

Adds three new fields to the SDK's llama.cpp config schemas, mirroring
exactly the inputs introduced by tetherto#1506 in
`@qvac/llm-llamacpp@0.14.4` and `@qvac/embed-llamacpp@0.13.4`:

- `LlmConfig.openclCacheDir` (string) — writable directory for the
  OpenCL kernel binary cache. Required on Android for fast GPU startup
  on Adreno devices; ignored on every other platform.
- `LlmConfig['cache-type-k']` (string) — KV cache K-tensor quantization
  forwarded directly to llama.cpp's `--cache-type-k` flag. No JSDoc:
  the addon's own `index.d.ts` ships none and I have not verified the
  set of accepted values from source.
- `LlmConfig['cache-type-v']` (string) — same for `--cache-type-v`.
- `EmbedConfig.openclCacheDir` (string) — same as the LLM field; embed
  is the only other addon that ships an OpenCL backend.

The SDK schema key names are the exact same strings the addon expects
on its config map (verified against `index.d.ts` on `upstream/main` at
the new versions): `'cache-type-k'` and `'cache-type-v'` are
kebab-case, `openclCacheDir` is camelCase. The plugin transforms now
forward them as-is:

- `transformLlmConfig` no longer rewrites camelCase keys to snake_case.
  The previous regex was effectively dead code (all current LlmConfig
  fields are either snake_case to begin with, or `projectionModelSrc`
  which is filtered out by `resolveConfig` before this transform
  runs), and it would have actively broken the new `openclCacheDir`
  by mangling it to `opencl_cache_dir` — a key the addon does not
  recognize.
- `transformEmbedConfig` adds an explicit `openclCacheDir` mapping.
  Bracket notation is used so the addon's index-signature-only
  declaration on the currently-installed 0.13.3 d.ts type-checks
  cleanly under `noPropertyAccessFromIndexSignature`.

Bumps the SDK addon dep ranges so consumers actually pull versions
that recognize these keys:

- `@qvac/embed-llamacpp`: `^0.12.0` → `^0.13.4`
- `@qvac/llm-llamacpp`:   `^0.12.1` → `^0.14.4`

Caret on a 0.x range does not float across the minor boundary, so the
old `^0.12.x` pin could not pull `0.13.x` / `0.14.x` even though those
versions are committed to upstream/main. Note that 0.13.4 / 0.14.4 are
not yet on the public npm registry — they are committed to
upstream/main but pending the next release cycle. `bun install` will
fall back to the latest published 0.13.x / 0.14.x once they are up.
After merging upstream/main, the local @qvac/embed-llamacpp source is at
0.13.4, which declares openclCacheDir as a typed field on GGMLConfig
(not just an index-signature key). The bracket-notation workaround for
TS4111 is no longer necessary — dot notation type-checks cleanly.
Reverts the removal of the .replace() regex that converts camelCase JSON
keys to snake_case. The regex may serve a purpose not visible in the
current code — need to confirm with the team before removing it.
Move embed schema tests to embed-schemas.test.ts and completion-stream
schema tests to completion-stream-schemas.test.ts, matching the SDK
convention of naming test files after the module they cover.
Remove embedWithStats() and modify embed() to return { embedding, stats }
matching the pattern used by completionStream(), diffusion(), translate(),
and ocr(). Update all example callers and JSDoc references.
…ckend-device-stat

# Conflicts:
#	packages/sdk/CHANGELOG.md
#	packages/sdk/package.json
…ckend-device-stat

# Conflicts:
#	packages/sdk/bun.lock
#	packages/sdk/client/api/index.ts
#	packages/sdk/index.ts
#	packages/sdk/package.json
Comment thread packages/sdk/client/api/embed.ts Outdated
Comment thread packages/sdk/client/api/index.ts Outdated
Comment thread packages/sdk/CHANGELOG.md Outdated
Comment thread packages/sdk/package.json Outdated
…to SDK root

Address PR tetherto#1495 review (Opanin):
- Revert SDK version bump (0.9.0 → 0.8.3) — version is handled during release.
- Revert SDK CHANGELOG entry — handled during release.
- Drop duplicate `export type { EmbedStats }` from client/api/embed.ts; the
  sdk/index.ts re-export (sourced from ./schemas) is sufficient and matches
  the precedent of CompletionStats / DiffusionStats.
- Move `type EmbedStats` in sdk/index.ts from the `./client/api` block to
  the `./schemas` block so all stat types have one consistent home.
@donriddo

Copy link
Copy Markdown
Contributor Author

/review

@github-actions

github-actions Bot commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@NamelsKing NamelsKing merged commit ac0b77a into tetherto:main Apr 15, 2026
19 of 22 checks passed
Proletter pushed a commit that referenced this pull request May 24, 2026
…addons (#1495)

* feat[api]: expose backendDevice stat in SDK for LLM and embed addons

Add optional backendDevice (cpu | gpu) to completion and embed stats.
Maps through from addon RuntimeStats to SDK response schemas.
Depends on addon PR #1393 which adds the stat at the C++ layer.

* feat[api]: add embedWithStats client helper and backendDevice tests

Adds embedWithStats() to the SDK client API so consumers can read the
embed addon runtime stats (totalTime, tokensPerSecond, totalTokens,
backendDevice) without giving up the simple embed() shape that returns
just the vectors. Re-exports EmbedStats for typed consumption.

Adds unit tests covering backendDevice on both embedStatsSchema and
completionStatsSchema, including round-tripping through the response
schemas, to lock in the cpu/gpu enum and prevent regressions.

* feat[api]: re-export embedWithStats and EmbedStats from SDK root entrypoint

* feat[api]: expose openclCacheDir / cache-type-k / cache-type-v config inputs

Adds three new fields to the SDK's llama.cpp config schemas, mirroring
exactly the inputs introduced by #1506 in
`@qvac/llm-llamacpp@0.14.4` and `@qvac/embed-llamacpp@0.13.4`:

- `LlmConfig.openclCacheDir` (string) — writable directory for the
  OpenCL kernel binary cache. Required on Android for fast GPU startup
  on Adreno devices; ignored on every other platform.
- `LlmConfig['cache-type-k']` (string) — KV cache K-tensor quantization
  forwarded directly to llama.cpp's `--cache-type-k` flag. No JSDoc:
  the addon's own `index.d.ts` ships none and I have not verified the
  set of accepted values from source.
- `LlmConfig['cache-type-v']` (string) — same for `--cache-type-v`.
- `EmbedConfig.openclCacheDir` (string) — same as the LLM field; embed
  is the only other addon that ships an OpenCL backend.

The SDK schema key names are the exact same strings the addon expects
on its config map (verified against `index.d.ts` on `upstream/main` at
the new versions): `'cache-type-k'` and `'cache-type-v'` are
kebab-case, `openclCacheDir` is camelCase. The plugin transforms now
forward them as-is:

- `transformLlmConfig` no longer rewrites camelCase keys to snake_case.
  The previous regex was effectively dead code (all current LlmConfig
  fields are either snake_case to begin with, or `projectionModelSrc`
  which is filtered out by `resolveConfig` before this transform
  runs), and it would have actively broken the new `openclCacheDir`
  by mangling it to `opencl_cache_dir` — a key the addon does not
  recognize.
- `transformEmbedConfig` adds an explicit `openclCacheDir` mapping.
  Bracket notation is used so the addon's index-signature-only
  declaration on the currently-installed 0.13.3 d.ts type-checks
  cleanly under `noPropertyAccessFromIndexSignature`.

Bumps the SDK addon dep ranges so consumers actually pull versions
that recognize these keys:

- `@qvac/embed-llamacpp`: `^0.12.0` → `^0.13.4`
- `@qvac/llm-llamacpp`:   `^0.12.1` → `^0.14.4`

Caret on a 0.x range does not float across the minor boundary, so the
old `^0.12.x` pin could not pull `0.13.x` / `0.14.x` even though those
versions are committed to upstream/main. Note that 0.13.4 / 0.14.4 are
not yet on the public npm registry — they are committed to
upstream/main but pending the next release cycle. `bun install` will
fall back to the latest published 0.13.x / 0.14.x once they are up.

* fix: use dot notation for openclCacheDir now that GGMLConfig types it

After merging upstream/main, the local @qvac/embed-llamacpp source is at
0.13.4, which declares openclCacheDir as a typed field on GGMLConfig
(not just an index-signature key). The bracket-notation workaround for
TS4111 is no longer necessary — dot notation type-checks cleanly.

* fix: restore camelCase-to-snake_case regex in transformLlmConfig

Reverts the removal of the .replace() regex that converts camelCase JSON
keys to snake_case. The regex may serve a purpose not visible in the
current code — need to confirm with the team before removing it.

* test: split backendDevice schema tests into per-module test files

Move embed schema tests to embed-schemas.test.ts and completion-stream
schema tests to completion-stream-schemas.test.ts, matching the SDK
convention of naming test files after the module they cover.

* fix[bc]: return { embedding, stats } from embed() instead of raw vectors

Remove embedWithStats() and modify embed() to return { embedding, stats }
matching the pattern used by completionStream(), diffusion(), translate(),
and ocr(). Update all example callers and JSDoc references.

* chore[bc]: add SDK CHANGELOG for 0.9.0 and bump version

Documents the embed() breaking return shape change, backendDevice stat
addition, new config fields (openclCacheDir, cache-type-k, cache-type-v),
EmbedStats export, and addon dependency bumps.

* fix: restore openclCacheDir after camelCase-to-snake_case transform

The generic regex in transformLlmConfig mangles openclCacheDir to
opencl_cache_dir, but the addon expects the camelCase form. Restore it
after the transform, same pattern as stop_sequences → reverse_prompt.

* chore: revert SDK version + CHANGELOG, collapse EmbedStats re-export to SDK root

Address PR #1495 review (Opanin):
- Revert SDK version bump (0.9.0 → 0.8.3) — version is handled during release.
- Revert SDK CHANGELOG entry — handled during release.
- Drop duplicate `export type { EmbedStats }` from client/api/embed.ts; the
  sdk/index.ts re-export (sourced from ./schemas) is sufficient and matches
  the precedent of CompletionStats / DiffusionStats.
- Move `type EmbedStats` in sdk/index.ts from the `./client/api` block to
  the `./schemas` block so all stat types have one consistent home.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants