Skip to content

QVAC-13658 feat[api]: SDK Profiler#836

Merged
opaninakuffo merged 9 commits into
tetherto:mainfrom
opaninakuffo:feat/sdk-profiler-rebased
Mar 15, 2026
Merged

QVAC-13658 feat[api]: SDK Profiler#836
opaninakuffo merged 9 commits into
tetherto:mainfrom
opaninakuffo:feat/sdk-profiler-rebased

Conversation

@opaninakuffo

@opaninakuffo opaninakuffo commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

  • We did not have a end-to-end profiling surface across client RPC, server handling, and delegation paths.
  • Profiling behavior was inconsistent across operations and lacked a clear precedence model for runtime vs per-call control.
  • Delegated requests (consumer server -> provider server) did not expose timing visibility back to the originating client.

📝 How does it solve it?

  • Adds a centralized profiler module with runtime API:
    • profiler.enable(...), profiler.disable(), profiler.exportJSON(), profiler.exportTable(), profiler.exportSummary().
  • Implements deterministic precedence:
    • per-call override > runtime enablement > default disabled.
  • Adds internal __profiling envelope propagation end-to-end.
  • Instruments client RPC transport (unary + streaming).
  • Instruments server request funnel.
  • Instruments delegation transport and peer connection lifecycle.
  • Adds operation-level profiling wrappers + metric extraction maps for handler-level stats (plugin/rag + related handlers).
  • Finalizes mode/session behavior.

🧪 How was it tested?

  • Focused unit tests added in:
    • packages/sdk/test/unit/profiler.test.ts
  • Ran the full test suite (320 tests) with profiler enabled in summary mode. Captured 4,874 events across 450 RPC calls with full server breakdown, validating profiler overhead remains negligible during comprehensive test execution: Screenshot shows profiler output (partial) on consumer shutdown:
image

🔌 API Changes

import { profiler } from "@qvac/sdk";

// Runtime control
profiler.enable({
  mode: "verbose", // "summary" | "verbose"
  includeServerBreakdown: true,
});

// ... run operations ...

console.log(profiler.exportSummary());
console.log(profiler.exportTable());
console.log(profiler.exportJSON({ includeRecentEvents: true }));

profiler.disable();

// Per-call control
await embed(
  { modelId: "m1", text: "hello" },
  { profiling: { enabled: true } },
);

await completion(
  { modelId: "m1", history: [{ role: "user", content: "Hello" }] },
  { profiling: { enabled: false } },
);

// Additional client APIs now accept RPC options passthrough
await invokePlugin({ modelId: "m1", handler: "h", params: {} }, { profiling: { enabled: true } });
await ragSearch({ workspace: "default", query: "q" }, { profiling: { enabled: true } });
await transcribe({ modelId: "m1", audioChunk: "..." }, { profiling: { enabled: true } });
await translate({ modelId: "m1", text: "hello" }, { profiling: { enabled: true } });

📊 Metrics Catalog

Notes

  • Duration metrics are in milliseconds (ms).
  • Aggregated metric key format is op.phase when phase exists, otherwise just op.
  • op is the operation name (e.g., loadModel, completionStream, unloadModel, pluginInvoke).

Aggregated Timing Metrics (exportTable())

Metric key pattern Source What it measures
rpc.connection Client RPC transport Time to establish the first RPC connection in the client lifecycle.
delegation.connection Server delegation profiler Time to establish a provider peer connection (recorded once per peer lifecycle).
op.request.zodValidation Client RPC transport Request schema validation time on the client before sending.
op.request.stringify Client RPC transport; delegation transport JSON serialization time for outbound request payload.
op.request.totalSerialization Client RPC transport request.zodValidation + request.stringify.
op.serverWait Client RPC transport; delegation transport Network/server wait from send to first full unary response receipt.
op.ttfb Client stream transport; delegation stream transport Time to first streamed chunk/token from send time.
op.streamDuration Client stream transport; delegation stream transport Duration from first streamed chunk to last streamed chunk.
op.response.jsonParse Client RPC transport; delegation transport JSON parse time for inbound response payload.
op.response.zodValidation Client RPC transport Response schema validation time on the client.
op.response.totalParsing Client RPC transport response.jsonParse + response.zodValidation.
op.totalClientTime Client RPC transport End-to-end client-observed duration for the operation.
op.clientOverhead Client RPC transport totalClientTime - server.totalServerTime when server breakdown is present.
op.server.request.jsonParse Server breakdown (injected to client) Server-side JSON parse time for the incoming request.
op.server.request.zodValidation Server breakdown (injected to client) Server-side request schema validation time.
op.server.handlerExecution Server breakdown (injected to client) Server-side handler execution time.
op.server.response.zodValidation Server breakdown (injected to client) Server-side response schema validation time.
op.server.response.stringify Server breakdown (injected to client) Server-side response serialization time.
op.server.totalServerTime Server breakdown (injected to client) Total server request lifecycle time (parse -> handler -> serialize).
op.delegation.connection Delegation breakdown (injected to client) Connection establishment time for delegated provider hop attached to unary response.
op.delegation.request.stringify Delegation breakdown (injected to client) Delegation request serialization time.
op.delegation.serverWait Delegation breakdown (injected to client) Wait time for delegated provider unary response.
op.delegation.response.jsonParse Delegation breakdown (injected to client) Delegated response JSON parse time.
op.delegation.totalDelegationTime Delegation breakdown (injected to client) End-to-end delegated hop time for unary flow.
op.delegated.request.jsonParse Delegation profiler with provider server meta Provider-server parse time, recorded under delegated.* prefix.
op.delegated.request.zodValidation Delegation profiler with provider server meta Provider-server request validation time.
op.delegated.handlerExecution Delegation profiler with provider server meta Provider-server handler execution time.
op.delegated.response.zodValidation Delegation profiler with provider server meta Provider-server response validation time.
op.delegated.response.stringify Delegation profiler with provider server meta Provider-server response stringify time.
op.delegated.totalServerTime Delegation profiler with provider server meta Provider-server total server lifecycle time.
op.totalDelegationTime Server delegation transport Delegation hop total time recorded by server-side delegation transport.
op.failed Generic failure recorder Duration from operation start until error capture point.
op (no phase, e.g., embed, rag) Operation wrappers Handler wall-clock execution duration (reply/stream wrapper scope).

Event Gauges and Counters (exportJSON().recentEvents / onRecord)

Metric field Where emitted What it measures
gauges.ttfb Operation wrappers (stream handlers) Time to first yielded chunk within wrapper scope.
gauges.timeToFirstToken completionStream operation metrics Model-reported TTFT from completion stats.
gauges.tokensPerSecond completionStream operation metrics Model-reported output throughput from completion stats.
gauges.cacheTokens completionStream operation metrics Model-reported cache token count.
gauges.processedTokens translate operation metrics Model-reported number of processed translation tokens.
gauges.processingTime translate operation metrics Model-reported translation processing time.
gauges.detectionTime ocrStream operation metrics OCR detection stage time from response stats.
gauges.recognitionTime ocrStream operation metrics OCR recognition stage time from response stats.
gauges.totalTime ocrStream operation metrics OCR total time from response stats.
gauges.processed rag operation metrics Number of processed items returned by RAG operation.
gauges.resultsCount rag operation metrics Number of results in RAG response.
count Stream transports and wrappers Chunk/token/event count captured for streaming flows.

Summary Rows (exportSummary())

Summary row Derived from aggregates What it represents
RPC Total keys ending in .totalClientTime Aggregate client-observed end-to-end RPC time across operations.
Handler keys ending in .server.handlerExecution OR phaseless keys (e.g., embed) Server-side handler execution times combined with operation-wrapper durations.
Model Load load.totalTime or *.load.totalTime Load timing roll-up if/when emitted by load metrics instrumentation.
Download download.time or *.download.time Download timing roll-up if/when emitted by download instrumentation.

@opaninakuffo opaninakuffo changed the title QVAC-13658: SDK Profiler QVAC-13658 feat[api]: SDK Profiler Mar 11, 2026
@opaninakuffo opaninakuffo marked this pull request as ready for review March 12, 2026 09:30
@opaninakuffo opaninakuffo requested review from a team as code owners March 12, 2026 09:30
NamelsKing
NamelsKing previously approved these changes Mar 12, 2026
Comment thread packages/sdk/profiling/ring-buffer.ts
@opaninakuffo opaninakuffo force-pushed the feat/sdk-profiler-rebased branch from 87dba62 to f5060bf Compare March 13, 2026 22:48
@opaninakuffo

Copy link
Copy Markdown
Contributor Author

/review

@github-actions

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

@opaninakuffo opaninakuffo merged commit 36f2690 into tetherto:main Mar 15, 2026
11 of 12 checks passed
NamelsKing pushed a commit that referenced this pull request Apr 3, 2026
* feat: add diffusion SDK plugin integration

Wire up the stable-diffusion.cpp plugin through all SDK layers:
- Schema: sdcpp-config.ts with config, stats, request/response schemas
- Plugin: resolveConfig for companion artifacts, createModel, streaming handlers
- Load model: diffusion entries in all 4 schema locations
- Registration: model type, alias, engine-addon map, worker, pear pre-hook
- Type widening: FilesystemDL | undefined for loader-less plugins

* feat(diffusion): consolidate SDK plugin, fix sampling_method schema, add integration tests

- Fix sampling_method enum to match C++ addon ground truth (dpm++2m not dpm++_2m)
- Add 6 missing sampler values (ipndm, ipndm_v, ddim_trailing, tcd, res_multistep, res_2s)
- Fix addon index.d.ts SamplerMethod type to match C++ parser
- Consolidate generation ops into single unified handler (txt2img + img2img)
- Add dedicated RPC handler, client API, and first-class generation() export
- Add 15 integration test definitions and desktop executor
- Add examples: txt2img, img2img, flux2-klein
- Add comprehensive unit tests for schemas, plugin dispatch, and stats
- Wire diffusion into handler-registry, common schemas, model-config-utils, get-model-info

* fix(diffusion): register generationStream in bare-client handler map

The bare-client dispatches via handlers/index.ts (direct mode), not
handler-registry.ts (IPC worker mode). Missing entry caused
RPC_NO_HANDLER when running examples via bare runtime.

* feat(diffusion): add diffusion naming handler for update-models codegen

Add dedicated generateDiffusionName() to produce clean export constants
for diffusion registry models (SD → SD_V2_1, SDXL → SDXL_BASE, FLUX,
VAE). Includes 4 unit tests covering all model families.

* feat(diffusion): sync registry models and use FLUX constant in tests

Run bun update-models to pull 21 new models (including diffusion) from
the live registry. Replace QVAC_DIFFUSION_MODEL env var in model-manager
with the FLUX_2_KLEIN_4B_Q4_0 registry constant.

* fix(diffusion): prevent statsPromise hang and fix lint issues

Resolve statsPromise after stream loop exits (not only on done:true),
add statsRejecter for error propagation, derive GenerationClientParams
from schema type to prevent drift, and fix lint warnings in generation
ops and test executor.

* revert: remove non-matching patterns from generation client

Revert statsPromise try/catch/rejecter and GenerationClientParams
Omit<> derivation — these diverged from the established patterns
in ocr.ts, translate.ts, and transcription.ts. Also remove unrelated
model history file that was incorrectly included.

* chore: remove unrelated model history file

ecb1bf8.txt was a codegen artifact from bun run update-models
during the merge — it should not have been committed here.

* fix: configure FLUX companion models and GPU device for diffusion tests

FLUX.2 models require companion LLM (Qwen3-4B) and VAE models to create
the stable-diffusion context. Without them, SdModel::load() fails.
Also switches device from CPU to GPU and adds img2img test fixture.

* fix(examples): configure FLUX companion models consistently across all diffusion examples

All three diffusion examples now default to the required companion
LLM (QWEN3_4B_Q4_K_M) and VAE (FLUX_2_KLEIN_4B_VAE) models, matching
the desktop test configuration. Also switches device from cpu to gpu.

* fix(tests): use llm addon elephant.jpg for img2img test fixture

Replace photo.png with elephant.jpg from lib-infer-llamacpp-llm/media.
Update generation test definitions to reference the new filename.

* fix(examples): use path.resolve for img2img default image path

import.meta.dirname is undefined in Bare runtime. Use path.resolve
with a CWD-relative path instead, matching the documented convention
of running examples from the SDK root.

* fix(tests): migrate generation executor to ResourceManager pattern

Replace ModelManager usage with AbstractModelExecutor base class,
matching the pattern used by all other executors after PR #836.

* feat(api): expose progressStream in generation() client helper

The server already emits progress ticks (step/totalSteps/elapsedMs) during
diffusion generation but the client was silently dropping them. Add a
progressStream async generator to the generation() return type so SDK
callers can show progress UI. Update the streaming-progress integration
test to assert progress tick presence and field validity.

* refactor(api): use background fan-out loop for generation() streams

Refactor generation() to follow the completion() multi-stream pattern:
a single background processResponses() task drives the RPC stream and
fans out to outputStream, progressStream, outputs, and stats independently.

This fixes two issues with the previous implementation:
- consuming progressStream alone now works (no longer requires
  outputStream iteration to drive the RPC stream)
- RPC errors propagate to all consumers (streams throw, promises reject)

* chore: regenerate bun.lock and models registry after rebase

Regenerates bun.lock and models/registry/models.ts to restore FLUX
model entries that were lost during rebase conflict resolution.

* fix[api]: align SDK diffusion schemas with addon contract

- Rename config field `wtype` → `type` to match C++ context handler key
- Expand weight type enum to match addon: add auto, bf16, q2_k, q3_k,
  q4_k, q5_k, q6_k; remove invalid "default"
- Remove `schedule` config field (no C++ context handler exists for it)
- Fix per-request scheduler enum: remove "default" (addon rejects it),
  add sgm_uniform, simple, lcm, smoothstep, kl_optimal, bong_tangent
- Remove phantom stats fields from diffusionStatsSchema (generation_time,
  totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps,
  output_count) — addon RuntimeStats never emits these
- Update unit tests and generation executor to use real addon fields

* fix[api]: align SDK rng config with addon contract

- Add std_default to rng enum to match addon RngType
- Add sampler_rng config field (separate RNG for sampler)
- Forward sampler_rng from plugin to addon

* mod[api]: rename public API from generation() to diffusion()

Aligns top-level API naming with other addon-specific surfaces (completion,
ocr, embed) — "diffusion" is specific to the stable-diffusion.cpp backend,
while "generation" is too generic and could apply to any inference addon.

Rename covers: public function, types, schemas, RPC routing literal,
handler registry, plugin handler key, examples, integration tests, and
unit tests. Addon RuntimeStats field names (generationMs, etc.) are
unchanged — those are wire-format names from the C++ addon.

* fix: resolve pre-existing lint errors in diffusion client and load-model

- Cast streamError to Error to satisfy @typescript-eslint/only-throw-error
  (closure type narrowing false positive)
- Remove unnecessary SdcppConfig type assertion and unused import in
  load-model.ts

* mod[api]: remove img2img functionality until addon support lands

Strip init_image, strength, and all img2img code paths from the SDK
surface. Will be re-added when the addon fully supports it (PR #884).

* feat[api]: wire up profiler and device defaults for diffusion addon

Register diffusionStream operation metrics (generationMs, totalSteps,
totalImages, totalPixels) following the pattern of all other addons.
Add sdcppGeneration to deviceConfigDefaultsSchema so device-specific
config defaults can be applied to diffusion models.

* fix[api]: align diffusion client API with actual streaming behavior

C++ generate_image() is synchronous — images are delivered only after
generation completes, not streamed during inference. Remove misleading
outputStream generator and stream param from the client API. The correct
surface is: progressStream (real-time step ticks), outputs (final images),
and stats.

Also update @qvac/diffusion-cpp dependency from file: link to 0.1.0 now
that the package is published.

* chore: clean up internal comments from public-facing API

Remove implementation details (RPC wire format, C++ internals) from
JSDoc and schema comments that end users would see.

* fix[api]: add positive constraint to width/height, describe config fields

- Add .positive() to width and height in diffusionRequestSchema
- Add .describe() to companion model fields (clipLModelSrc, clipGModelSrc,
  t5XxlModelSrc, llmModelSrc, vaeModelSrc) documenting which architectures
  require them
- Add .describe() to prediction, type, clip_on_cpu, vae_on_cpu, vae_tiling,
  flash_attn config fields
- Add diffusion-simple.ts example showing minimal config with a single
  all-in-one GGUF model (no companion files)

* fix: add missing validator to download test custom expectation

DownloadExecutor constructor takes no arguments — remove resources param.
download-tests custom expectation requires validator field per TestDefinition schema.

* fix: add mobile diffusion support, move executor to shared, bump test timeouts

- Move diffusion executor from desktop/ to shared/ (no platform-specific APIs)
- Add skipPreDownload to desktop diffusion resource (companion models resolve at load time)
- Add mobile consumer: SD 2.1 Q8_0 model, device gpu, threads 4, prediction v, vae_on_cpu true
- Bump test timeouts: 300s default, 600s for batch/seed tests
- Fix DownloadExecutor() constructor call (takes no args)

* fix: use exported SDK model constant in diffusion-simple example

Replace hardcoded local file path with SD_V2_1_1B_Q8_0 constant.
Add prediction: "v" config required by SD 2.1 models.

* fix[api]: address PR review comments for diffusion SDK integration

- plugin.ts: import addon types (ImgStableDiffusionArgs, SdConfig), remove
  as any/as never casts. Refactor resolveConfig to use destructure + explicit
  Promise.all matching TTS pattern. Remove SRC_TO_ARTIFACT mapping constant.
  Pass config directly to addon constructor.
- ops/diffusion.ts: pass params inline to model.run() matching TTS/OCR pattern.
- model-registry.ts: loader field optional (loader?: FilesystemDL) with
  conditional spread for exactOptionalPropertyTypes.
- sdcpp-config.ts: derive DiffusionClientParams from DiffusionRequest. Add
  descriptions to cfg_scale and guidance fields.
- bun.lock: regenerated, removes file:../lib-infer-diffusion leak.
- Remove shared-test-data/ directory (elephant.jpg leftover from img2img).
- Remove dead verify* params from diffusion test definitions.

* fix: add eslint-disable for optional MCP SDK import in example

The @modelcontextprotocol/sdk is a user-installed optional dependency,
not a project dependency. Suppress import/no-unresolved to unblock CI.

* fix: bump diffusion-cpp to 0.1.1 for absolute path fix

* fix: bump diffusion-cpp to 0.1.1 for absolute path fix
Proletter pushed a commit that referenced this pull request May 24, 2026
Replace ModelManager usage with AbstractModelExecutor base class,
matching the pattern used by all other executors after PR #836.
Proletter pushed a commit that referenced this pull request May 24, 2026
* feat: add core profiler module and public runtime API

* feat: instrument client rpc transport and wire per-call profiling options

* feat: add server rpc profiling modules

* feat: integrate profiling into server funnel delegation and operation handlers

* chore: add profiler usage examples

* chore: apply lint-only formatting

* chore: profiler unit tests

---------

Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
Proletter pushed a commit that referenced this pull request May 24, 2026
* feat: add diffusion SDK plugin integration

Wire up the stable-diffusion.cpp plugin through all SDK layers:
- Schema: sdcpp-config.ts with config, stats, request/response schemas
- Plugin: resolveConfig for companion artifacts, createModel, streaming handlers
- Load model: diffusion entries in all 4 schema locations
- Registration: model type, alias, engine-addon map, worker, pear pre-hook
- Type widening: FilesystemDL | undefined for loader-less plugins

* feat(diffusion): consolidate SDK plugin, fix sampling_method schema, add integration tests

- Fix sampling_method enum to match C++ addon ground truth (dpm++2m not dpm++_2m)
- Add 6 missing sampler values (ipndm, ipndm_v, ddim_trailing, tcd, res_multistep, res_2s)
- Fix addon index.d.ts SamplerMethod type to match C++ parser
- Consolidate generation ops into single unified handler (txt2img + img2img)
- Add dedicated RPC handler, client API, and first-class generation() export
- Add 15 integration test definitions and desktop executor
- Add examples: txt2img, img2img, flux2-klein
- Add comprehensive unit tests for schemas, plugin dispatch, and stats
- Wire diffusion into handler-registry, common schemas, model-config-utils, get-model-info

* fix(diffusion): register generationStream in bare-client handler map

The bare-client dispatches via handlers/index.ts (direct mode), not
handler-registry.ts (IPC worker mode). Missing entry caused
RPC_NO_HANDLER when running examples via bare runtime.

* feat(diffusion): add diffusion naming handler for update-models codegen

Add dedicated generateDiffusionName() to produce clean export constants
for diffusion registry models (SD → SD_V2_1, SDXL → SDXL_BASE, FLUX,
VAE). Includes 4 unit tests covering all model families.

* feat(diffusion): sync registry models and use FLUX constant in tests

Run bun update-models to pull 21 new models (including diffusion) from
the live registry. Replace QVAC_DIFFUSION_MODEL env var in model-manager
with the FLUX_2_KLEIN_4B_Q4_0 registry constant.

* fix(diffusion): prevent statsPromise hang and fix lint issues

Resolve statsPromise after stream loop exits (not only on done:true),
add statsRejecter for error propagation, derive GenerationClientParams
from schema type to prevent drift, and fix lint warnings in generation
ops and test executor.

* revert: remove non-matching patterns from generation client

Revert statsPromise try/catch/rejecter and GenerationClientParams
Omit<> derivation — these diverged from the established patterns
in ocr.ts, translate.ts, and transcription.ts. Also remove unrelated
model history file that was incorrectly included.

* chore: remove unrelated model history file

ecb1bf8.txt was a codegen artifact from bun run update-models
during the merge — it should not have been committed here.

* fix: configure FLUX companion models and GPU device for diffusion tests

FLUX.2 models require companion LLM (Qwen3-4B) and VAE models to create
the stable-diffusion context. Without them, SdModel::load() fails.
Also switches device from CPU to GPU and adds img2img test fixture.

* fix(examples): configure FLUX companion models consistently across all diffusion examples

All three diffusion examples now default to the required companion
LLM (QWEN3_4B_Q4_K_M) and VAE (FLUX_2_KLEIN_4B_VAE) models, matching
the desktop test configuration. Also switches device from cpu to gpu.

* fix(tests): use llm addon elephant.jpg for img2img test fixture

Replace photo.png with elephant.jpg from lib-infer-llamacpp-llm/media.
Update generation test definitions to reference the new filename.

* fix(examples): use path.resolve for img2img default image path

import.meta.dirname is undefined in Bare runtime. Use path.resolve
with a CWD-relative path instead, matching the documented convention
of running examples from the SDK root.

* fix(tests): migrate generation executor to ResourceManager pattern

Replace ModelManager usage with AbstractModelExecutor base class,
matching the pattern used by all other executors after PR #836.

* feat(api): expose progressStream in generation() client helper

The server already emits progress ticks (step/totalSteps/elapsedMs) during
diffusion generation but the client was silently dropping them. Add a
progressStream async generator to the generation() return type so SDK
callers can show progress UI. Update the streaming-progress integration
test to assert progress tick presence and field validity.

* refactor(api): use background fan-out loop for generation() streams

Refactor generation() to follow the completion() multi-stream pattern:
a single background processResponses() task drives the RPC stream and
fans out to outputStream, progressStream, outputs, and stats independently.

This fixes two issues with the previous implementation:
- consuming progressStream alone now works (no longer requires
  outputStream iteration to drive the RPC stream)
- RPC errors propagate to all consumers (streams throw, promises reject)

* chore: regenerate bun.lock and models registry after rebase

Regenerates bun.lock and models/registry/models.ts to restore FLUX
model entries that were lost during rebase conflict resolution.

* fix[api]: align SDK diffusion schemas with addon contract

- Rename config field `wtype` → `type` to match C++ context handler key
- Expand weight type enum to match addon: add auto, bf16, q2_k, q3_k,
  q4_k, q5_k, q6_k; remove invalid "default"
- Remove `schedule` config field (no C++ context handler exists for it)
- Fix per-request scheduler enum: remove "default" (addon rejects it),
  add sgm_uniform, simple, lcm, smoothstep, kl_optimal, bong_tangent
- Remove phantom stats fields from diffusionStatsSchema (generation_time,
  totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps,
  output_count) — addon RuntimeStats never emits these
- Update unit tests and generation executor to use real addon fields

* fix[api]: align SDK rng config with addon contract

- Add std_default to rng enum to match addon RngType
- Add sampler_rng config field (separate RNG for sampler)
- Forward sampler_rng from plugin to addon

* mod[api]: rename public API from generation() to diffusion()

Aligns top-level API naming with other addon-specific surfaces (completion,
ocr, embed) — "diffusion" is specific to the stable-diffusion.cpp backend,
while "generation" is too generic and could apply to any inference addon.

Rename covers: public function, types, schemas, RPC routing literal,
handler registry, plugin handler key, examples, integration tests, and
unit tests. Addon RuntimeStats field names (generationMs, etc.) are
unchanged — those are wire-format names from the C++ addon.

* fix: resolve pre-existing lint errors in diffusion client and load-model

- Cast streamError to Error to satisfy @typescript-eslint/only-throw-error
  (closure type narrowing false positive)
- Remove unnecessary SdcppConfig type assertion and unused import in
  load-model.ts

* mod[api]: remove img2img functionality until addon support lands

Strip init_image, strength, and all img2img code paths from the SDK
surface. Will be re-added when the addon fully supports it (PR #884).

* feat[api]: wire up profiler and device defaults for diffusion addon

Register diffusionStream operation metrics (generationMs, totalSteps,
totalImages, totalPixels) following the pattern of all other addons.
Add sdcppGeneration to deviceConfigDefaultsSchema so device-specific
config defaults can be applied to diffusion models.

* fix[api]: align diffusion client API with actual streaming behavior

C++ generate_image() is synchronous — images are delivered only after
generation completes, not streamed during inference. Remove misleading
outputStream generator and stream param from the client API. The correct
surface is: progressStream (real-time step ticks), outputs (final images),
and stats.

Also update @qvac/diffusion-cpp dependency from file: link to 0.1.0 now
that the package is published.

* chore: clean up internal comments from public-facing API

Remove implementation details (RPC wire format, C++ internals) from
JSDoc and schema comments that end users would see.

* fix[api]: add positive constraint to width/height, describe config fields

- Add .positive() to width and height in diffusionRequestSchema
- Add .describe() to companion model fields (clipLModelSrc, clipGModelSrc,
  t5XxlModelSrc, llmModelSrc, vaeModelSrc) documenting which architectures
  require them
- Add .describe() to prediction, type, clip_on_cpu, vae_on_cpu, vae_tiling,
  flash_attn config fields
- Add diffusion-simple.ts example showing minimal config with a single
  all-in-one GGUF model (no companion files)

* fix: add missing validator to download test custom expectation

DownloadExecutor constructor takes no arguments — remove resources param.
download-tests custom expectation requires validator field per TestDefinition schema.

* fix: add mobile diffusion support, move executor to shared, bump test timeouts

- Move diffusion executor from desktop/ to shared/ (no platform-specific APIs)
- Add skipPreDownload to desktop diffusion resource (companion models resolve at load time)
- Add mobile consumer: SD 2.1 Q8_0 model, device gpu, threads 4, prediction v, vae_on_cpu true
- Bump test timeouts: 300s default, 600s for batch/seed tests
- Fix DownloadExecutor() constructor call (takes no args)

* fix: use exported SDK model constant in diffusion-simple example

Replace hardcoded local file path with SD_V2_1_1B_Q8_0 constant.
Add prediction: "v" config required by SD 2.1 models.

* fix[api]: address PR review comments for diffusion SDK integration

- plugin.ts: import addon types (ImgStableDiffusionArgs, SdConfig), remove
  as any/as never casts. Refactor resolveConfig to use destructure + explicit
  Promise.all matching TTS pattern. Remove SRC_TO_ARTIFACT mapping constant.
  Pass config directly to addon constructor.
- ops/diffusion.ts: pass params inline to model.run() matching TTS/OCR pattern.
- model-registry.ts: loader field optional (loader?: FilesystemDL) with
  conditional spread for exactOptionalPropertyTypes.
- sdcpp-config.ts: derive DiffusionClientParams from DiffusionRequest. Add
  descriptions to cfg_scale and guidance fields.
- bun.lock: regenerated, removes file:../lib-infer-diffusion leak.
- Remove shared-test-data/ directory (elephant.jpg leftover from img2img).
- Remove dead verify* params from diffusion test definitions.

* fix: add eslint-disable for optional MCP SDK import in example

The @modelcontextprotocol/sdk is a user-installed optional dependency,
not a project dependency. Suppress import/no-unresolved to unblock CI.

* fix: bump diffusion-cpp to 0.1.1 for absolute path fix

* fix: bump diffusion-cpp to 0.1.1 for absolute path fix
Proletter pushed a commit that referenced this pull request May 24, 2026
Replace ModelManager usage with AbstractModelExecutor base class,
matching the pattern used by all other executors after PR #836.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants