QVAC-13658 feat[api]: SDK Profiler by opaninakuffo · Pull Request #836 · tetherto/qvac

opaninakuffo · 2026-03-11T22:52:56Z

🎯 What problem does this PR solve?

We did not have a end-to-end profiling surface across client RPC, server handling, and delegation paths.
Profiling behavior was inconsistent across operations and lacked a clear precedence model for runtime vs per-call control.
Delegated requests (consumer server -> provider server) did not expose timing visibility back to the originating client.

📝 How does it solve it?

Adds a centralized profiler module with runtime API:
- profiler.enable(...), profiler.disable(), profiler.exportJSON(), profiler.exportTable(), profiler.exportSummary().
Implements deterministic precedence:
- per-call override > runtime enablement > default disabled.
Adds internal __profiling envelope propagation end-to-end.
Instruments client RPC transport (unary + streaming).
Instruments server request funnel.
Instruments delegation transport and peer connection lifecycle.
Adds operation-level profiling wrappers + metric extraction maps for handler-level stats (plugin/rag + related handlers).
Finalizes mode/session behavior.

🧪 How was it tested?

Focused unit tests added in:
- packages/sdk/test/unit/profiler.test.ts
Ran the full test suite (320 tests) with profiler enabled in summary mode. Captured 4,874 events across 450 RPC calls with full server breakdown, validating profiler overhead remains negligible during comprehensive test execution: Screenshot shows profiler output (partial) on consumer shutdown:

🔌 API Changes

import { profiler } from "@qvac/sdk";

// Runtime control
profiler.enable({
  mode: "verbose", // "summary" | "verbose"
  includeServerBreakdown: true,
});

// ... run operations ...

console.log(profiler.exportSummary());
console.log(profiler.exportTable());
console.log(profiler.exportJSON({ includeRecentEvents: true }));

profiler.disable();

// Per-call control
await embed(
  { modelId: "m1", text: "hello" },
  { profiling: { enabled: true } },
);

await completion(
  { modelId: "m1", history: [{ role: "user", content: "Hello" }] },
  { profiling: { enabled: false } },
);

// Additional client APIs now accept RPC options passthrough
await invokePlugin({ modelId: "m1", handler: "h", params: {} }, { profiling: { enabled: true } });
await ragSearch({ workspace: "default", query: "q" }, { profiling: { enabled: true } });
await transcribe({ modelId: "m1", audioChunk: "..." }, { profiling: { enabled: true } });
await translate({ modelId: "m1", text: "hello" }, { profiling: { enabled: true } });

📊 Metrics Catalog

Notes

Duration metrics are in milliseconds (ms).
Aggregated metric key format is op.phase when phase exists, otherwise just op.
op is the operation name (e.g., loadModel, completionStream, unloadModel, pluginInvoke).

Aggregated Timing Metrics (`exportTable()`)

Metric key pattern	Source	What it measures
`rpc.connection`	Client RPC transport	Time to establish the first RPC connection in the client lifecycle.
`delegation.connection`	Server delegation profiler	Time to establish a provider peer connection (recorded once per peer lifecycle).
`op.request.zodValidation`	Client RPC transport	Request schema validation time on the client before sending.
`op.request.stringify`	Client RPC transport; delegation transport	JSON serialization time for outbound request payload.
`op.request.totalSerialization`	Client RPC transport	`request.zodValidation + request.stringify`.
`op.serverWait`	Client RPC transport; delegation transport	Network/server wait from send to first full unary response receipt.
`op.ttfb`	Client stream transport; delegation stream transport	Time to first streamed chunk/token from send time.
`op.streamDuration`	Client stream transport; delegation stream transport	Duration from first streamed chunk to last streamed chunk.
`op.response.jsonParse`	Client RPC transport; delegation transport	JSON parse time for inbound response payload.
`op.response.zodValidation`	Client RPC transport	Response schema validation time on the client.
`op.response.totalParsing`	Client RPC transport	`response.jsonParse + response.zodValidation`.
`op.totalClientTime`	Client RPC transport	End-to-end client-observed duration for the operation.
`op.clientOverhead`	Client RPC transport	`totalClientTime - server.totalServerTime` when server breakdown is present.
`op.server.request.jsonParse`	Server breakdown (injected to client)	Server-side JSON parse time for the incoming request.
`op.server.request.zodValidation`	Server breakdown (injected to client)	Server-side request schema validation time.
`op.server.handlerExecution`	Server breakdown (injected to client)	Server-side handler execution time.
`op.server.response.zodValidation`	Server breakdown (injected to client)	Server-side response schema validation time.
`op.server.response.stringify`	Server breakdown (injected to client)	Server-side response serialization time.
`op.server.totalServerTime`	Server breakdown (injected to client)	Total server request lifecycle time (parse -> handler -> serialize).
`op.delegation.connection`	Delegation breakdown (injected to client)	Connection establishment time for delegated provider hop attached to unary response.
`op.delegation.request.stringify`	Delegation breakdown (injected to client)	Delegation request serialization time.
`op.delegation.serverWait`	Delegation breakdown (injected to client)	Wait time for delegated provider unary response.
`op.delegation.response.jsonParse`	Delegation breakdown (injected to client)	Delegated response JSON parse time.
`op.delegation.totalDelegationTime`	Delegation breakdown (injected to client)	End-to-end delegated hop time for unary flow.
`op.delegated.request.jsonParse`	Delegation profiler with provider server meta	Provider-server parse time, recorded under `delegated.*` prefix.
`op.delegated.request.zodValidation`	Delegation profiler with provider server meta	Provider-server request validation time.
`op.delegated.handlerExecution`	Delegation profiler with provider server meta	Provider-server handler execution time.
`op.delegated.response.zodValidation`	Delegation profiler with provider server meta	Provider-server response validation time.
`op.delegated.response.stringify`	Delegation profiler with provider server meta	Provider-server response stringify time.
`op.delegated.totalServerTime`	Delegation profiler with provider server meta	Provider-server total server lifecycle time.
`op.totalDelegationTime`	Server delegation transport	Delegation hop total time recorded by server-side delegation transport.
`op.failed`	Generic failure recorder	Duration from operation start until error capture point.
`op` (no phase, e.g., `embed`, `rag`)	Operation wrappers	Handler wall-clock execution duration (reply/stream wrapper scope).

Event Gauges and Counters (`exportJSON().recentEvents` / `onRecord`)

Metric field	Where emitted	What it measures
`gauges.ttfb`	Operation wrappers (stream handlers)	Time to first yielded chunk within wrapper scope.
`gauges.timeToFirstToken`	`completionStream` operation metrics	Model-reported TTFT from completion stats.
`gauges.tokensPerSecond`	`completionStream` operation metrics	Model-reported output throughput from completion stats.
`gauges.cacheTokens`	`completionStream` operation metrics	Model-reported cache token count.
`gauges.processedTokens`	`translate` operation metrics	Model-reported number of processed translation tokens.
`gauges.processingTime`	`translate` operation metrics	Model-reported translation processing time.
`gauges.detectionTime`	`ocrStream` operation metrics	OCR detection stage time from response stats.
`gauges.recognitionTime`	`ocrStream` operation metrics	OCR recognition stage time from response stats.
`gauges.totalTime`	`ocrStream` operation metrics	OCR total time from response stats.
`gauges.processed`	`rag` operation metrics	Number of processed items returned by RAG operation.
`gauges.resultsCount`	`rag` operation metrics	Number of results in RAG response.
`count`	Stream transports and wrappers	Chunk/token/event count captured for streaming flows.

Summary Rows (`exportSummary()`)

Summary row	Derived from aggregates	What it represents
`RPC Total`	keys ending in `.totalClientTime`	Aggregate client-observed end-to-end RPC time across operations.
`Handler`	keys ending in `.server.handlerExecution` OR phaseless keys (e.g., `embed`)	Server-side handler execution times combined with operation-wrapper durations.
`Model Load`	`load.totalTime` or `*.load.totalTime`	Load timing roll-up if/when emitted by load metrics instrumentation.
`Download`	`download.time` or `*.download.time`	Download timing roll-up if/when emitted by download instrumentation.

…ions

… handlers

opaninakuffo · 2026-03-15T17:50:17Z

/review

github-actions · 2026-03-15T17:50:45Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

* feat: add diffusion SDK plugin integration Wire up the stable-diffusion.cpp plugin through all SDK layers: - Schema: sdcpp-config.ts with config, stats, request/response schemas - Plugin: resolveConfig for companion artifacts, createModel, streaming handlers - Load model: diffusion entries in all 4 schema locations - Registration: model type, alias, engine-addon map, worker, pear pre-hook - Type widening: FilesystemDL | undefined for loader-less plugins * feat(diffusion): consolidate SDK plugin, fix sampling_method schema, add integration tests - Fix sampling_method enum to match C++ addon ground truth (dpm++2m not dpm++_2m) - Add 6 missing sampler values (ipndm, ipndm_v, ddim_trailing, tcd, res_multistep, res_2s) - Fix addon index.d.ts SamplerMethod type to match C++ parser - Consolidate generation ops into single unified handler (txt2img + img2img) - Add dedicated RPC handler, client API, and first-class generation() export - Add 15 integration test definitions and desktop executor - Add examples: txt2img, img2img, flux2-klein - Add comprehensive unit tests for schemas, plugin dispatch, and stats - Wire diffusion into handler-registry, common schemas, model-config-utils, get-model-info * fix(diffusion): register generationStream in bare-client handler map The bare-client dispatches via handlers/index.ts (direct mode), not handler-registry.ts (IPC worker mode). Missing entry caused RPC_NO_HANDLER when running examples via bare runtime. * feat(diffusion): add diffusion naming handler for update-models codegen Add dedicated generateDiffusionName() to produce clean export constants for diffusion registry models (SD → SD_V2_1, SDXL → SDXL_BASE, FLUX, VAE). Includes 4 unit tests covering all model families. * feat(diffusion): sync registry models and use FLUX constant in tests Run bun update-models to pull 21 new models (including diffusion) from the live registry. Replace QVAC_DIFFUSION_MODEL env var in model-manager with the FLUX_2_KLEIN_4B_Q4_0 registry constant. * fix(diffusion): prevent statsPromise hang and fix lint issues Resolve statsPromise after stream loop exits (not only on done:true), add statsRejecter for error propagation, derive GenerationClientParams from schema type to prevent drift, and fix lint warnings in generation ops and test executor. * revert: remove non-matching patterns from generation client Revert statsPromise try/catch/rejecter and GenerationClientParams Omit<> derivation — these diverged from the established patterns in ocr.ts, translate.ts, and transcription.ts. Also remove unrelated model history file that was incorrectly included. * chore: remove unrelated model history file ecb1bf8.txt was a codegen artifact from bun run update-models during the merge — it should not have been committed here. * fix: configure FLUX companion models and GPU device for diffusion tests FLUX.2 models require companion LLM (Qwen3-4B) and VAE models to create the stable-diffusion context. Without them, SdModel::load() fails. Also switches device from CPU to GPU and adds img2img test fixture. * fix(examples): configure FLUX companion models consistently across all diffusion examples All three diffusion examples now default to the required companion LLM (QWEN3_4B_Q4_K_M) and VAE (FLUX_2_KLEIN_4B_VAE) models, matching the desktop test configuration. Also switches device from cpu to gpu. * fix(tests): use llm addon elephant.jpg for img2img test fixture Replace photo.png with elephant.jpg from lib-infer-llamacpp-llm/media. Update generation test definitions to reference the new filename. * fix(examples): use path.resolve for img2img default image path import.meta.dirname is undefined in Bare runtime. Use path.resolve with a CWD-relative path instead, matching the documented convention of running examples from the SDK root. * fix(tests): migrate generation executor to ResourceManager pattern Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836. * feat(api): expose progressStream in generation() client helper The server already emits progress ticks (step/totalSteps/elapsedMs) during diffusion generation but the client was silently dropping them. Add a progressStream async generator to the generation() return type so SDK callers can show progress UI. Update the streaming-progress integration test to assert progress tick presence and field validity. * refactor(api): use background fan-out loop for generation() streams Refactor generation() to follow the completion() multi-stream pattern: a single background processResponses() task drives the RPC stream and fans out to outputStream, progressStream, outputs, and stats independently. This fixes two issues with the previous implementation: - consuming progressStream alone now works (no longer requires outputStream iteration to drive the RPC stream) - RPC errors propagate to all consumers (streams throw, promises reject) * chore: regenerate bun.lock and models registry after rebase Regenerates bun.lock and models/registry/models.ts to restore FLUX model entries that were lost during rebase conflict resolution. * fix[api]: align SDK diffusion schemas with addon contract - Rename config field `wtype` → `type` to match C++ context handler key - Expand weight type enum to match addon: add auto, bf16, q2_k, q3_k, q4_k, q5_k, q6_k; remove invalid "default" - Remove `schedule` config field (no C++ context handler exists for it) - Fix per-request scheduler enum: remove "default" (addon rejects it), add sgm_uniform, simple, lcm, smoothstep, kl_optimal, bong_tangent - Remove phantom stats fields from diffusionStatsSchema (generation_time, totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps, output_count) — addon RuntimeStats never emits these - Update unit tests and generation executor to use real addon fields * fix[api]: align SDK rng config with addon contract - Add std_default to rng enum to match addon RngType - Add sampler_rng config field (separate RNG for sampler) - Forward sampler_rng from plugin to addon * mod[api]: rename public API from generation() to diffusion() Aligns top-level API naming with other addon-specific surfaces (completion, ocr, embed) — "diffusion" is specific to the stable-diffusion.cpp backend, while "generation" is too generic and could apply to any inference addon. Rename covers: public function, types, schemas, RPC routing literal, handler registry, plugin handler key, examples, integration tests, and unit tests. Addon RuntimeStats field names (generationMs, etc.) are unchanged — those are wire-format names from the C++ addon. * fix: resolve pre-existing lint errors in diffusion client and load-model - Cast streamError to Error to satisfy @typescript-eslint/only-throw-error (closure type narrowing false positive) - Remove unnecessary SdcppConfig type assertion and unused import in load-model.ts * mod[api]: remove img2img functionality until addon support lands Strip init_image, strength, and all img2img code paths from the SDK surface. Will be re-added when the addon fully supports it (PR #884). * feat[api]: wire up profiler and device defaults for diffusion addon Register diffusionStream operation metrics (generationMs, totalSteps, totalImages, totalPixels) following the pattern of all other addons. Add sdcppGeneration to deviceConfigDefaultsSchema so device-specific config defaults can be applied to diffusion models. * fix[api]: align diffusion client API with actual streaming behavior C++ generate_image() is synchronous — images are delivered only after generation completes, not streamed during inference. Remove misleading outputStream generator and stream param from the client API. The correct surface is: progressStream (real-time step ticks), outputs (final images), and stats. Also update @qvac/diffusion-cpp dependency from file: link to 0.1.0 now that the package is published. * chore: clean up internal comments from public-facing API Remove implementation details (RPC wire format, C++ internals) from JSDoc and schema comments that end users would see. * fix[api]: add positive constraint to width/height, describe config fields - Add .positive() to width and height in diffusionRequestSchema - Add .describe() to companion model fields (clipLModelSrc, clipGModelSrc, t5XxlModelSrc, llmModelSrc, vaeModelSrc) documenting which architectures require them - Add .describe() to prediction, type, clip_on_cpu, vae_on_cpu, vae_tiling, flash_attn config fields - Add diffusion-simple.ts example showing minimal config with a single all-in-one GGUF model (no companion files) * fix: add missing validator to download test custom expectation DownloadExecutor constructor takes no arguments — remove resources param. download-tests custom expectation requires validator field per TestDefinition schema. * fix: add mobile diffusion support, move executor to shared, bump test timeouts - Move diffusion executor from desktop/ to shared/ (no platform-specific APIs) - Add skipPreDownload to desktop diffusion resource (companion models resolve at load time) - Add mobile consumer: SD 2.1 Q8_0 model, device gpu, threads 4, prediction v, vae_on_cpu true - Bump test timeouts: 300s default, 600s for batch/seed tests - Fix DownloadExecutor() constructor call (takes no args) * fix: use exported SDK model constant in diffusion-simple example Replace hardcoded local file path with SD_V2_1_1B_Q8_0 constant. Add prediction: "v" config required by SD 2.1 models. * fix[api]: address PR review comments for diffusion SDK integration - plugin.ts: import addon types (ImgStableDiffusionArgs, SdConfig), remove as any/as never casts. Refactor resolveConfig to use destructure + explicit Promise.all matching TTS pattern. Remove SRC_TO_ARTIFACT mapping constant. Pass config directly to addon constructor. - ops/diffusion.ts: pass params inline to model.run() matching TTS/OCR pattern. - model-registry.ts: loader field optional (loader?: FilesystemDL) with conditional spread for exactOptionalPropertyTypes. - sdcpp-config.ts: derive DiffusionClientParams from DiffusionRequest. Add descriptions to cfg_scale and guidance fields. - bun.lock: regenerated, removes file:../lib-infer-diffusion leak. - Remove shared-test-data/ directory (elephant.jpg leftover from img2img). - Remove dead verify* params from diffusion test definitions. * fix: add eslint-disable for optional MCP SDK import in example The @modelcontextprotocol/sdk is a user-installed optional dependency, not a project dependency. Suppress import/no-unresolved to unblock CI. * fix: bump diffusion-cpp to 0.1.1 for absolute path fix * fix: bump diffusion-cpp to 0.1.1 for absolute path fix

Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836.

* feat: add core profiler module and public runtime API * feat: instrument client rpc transport and wire per-call profiling options * feat: add server rpc profiling modules * feat: integrate profiling into server funnel delegation and operation handlers * chore: add profiler usage examples * chore: apply lint-only formatting * chore: profiler unit tests --------- Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>

* feat: add diffusion SDK plugin integration Wire up the stable-diffusion.cpp plugin through all SDK layers: - Schema: sdcpp-config.ts with config, stats, request/response schemas - Plugin: resolveConfig for companion artifacts, createModel, streaming handlers - Load model: diffusion entries in all 4 schema locations - Registration: model type, alias, engine-addon map, worker, pear pre-hook - Type widening: FilesystemDL | undefined for loader-less plugins * feat(diffusion): consolidate SDK plugin, fix sampling_method schema, add integration tests - Fix sampling_method enum to match C++ addon ground truth (dpm++2m not dpm++_2m) - Add 6 missing sampler values (ipndm, ipndm_v, ddim_trailing, tcd, res_multistep, res_2s) - Fix addon index.d.ts SamplerMethod type to match C++ parser - Consolidate generation ops into single unified handler (txt2img + img2img) - Add dedicated RPC handler, client API, and first-class generation() export - Add 15 integration test definitions and desktop executor - Add examples: txt2img, img2img, flux2-klein - Add comprehensive unit tests for schemas, plugin dispatch, and stats - Wire diffusion into handler-registry, common schemas, model-config-utils, get-model-info * fix(diffusion): register generationStream in bare-client handler map The bare-client dispatches via handlers/index.ts (direct mode), not handler-registry.ts (IPC worker mode). Missing entry caused RPC_NO_HANDLER when running examples via bare runtime. * feat(diffusion): add diffusion naming handler for update-models codegen Add dedicated generateDiffusionName() to produce clean export constants for diffusion registry models (SD → SD_V2_1, SDXL → SDXL_BASE, FLUX, VAE). Includes 4 unit tests covering all model families. * feat(diffusion): sync registry models and use FLUX constant in tests Run bun update-models to pull 21 new models (including diffusion) from the live registry. Replace QVAC_DIFFUSION_MODEL env var in model-manager with the FLUX_2_KLEIN_4B_Q4_0 registry constant. * fix(diffusion): prevent statsPromise hang and fix lint issues Resolve statsPromise after stream loop exits (not only on done:true), add statsRejecter for error propagation, derive GenerationClientParams from schema type to prevent drift, and fix lint warnings in generation ops and test executor. * revert: remove non-matching patterns from generation client Revert statsPromise try/catch/rejecter and GenerationClientParams Omit<> derivation — these diverged from the established patterns in ocr.ts, translate.ts, and transcription.ts. Also remove unrelated model history file that was incorrectly included. * chore: remove unrelated model history file ecb1bf8.txt was a codegen artifact from bun run update-models during the merge — it should not have been committed here. * fix: configure FLUX companion models and GPU device for diffusion tests FLUX.2 models require companion LLM (Qwen3-4B) and VAE models to create the stable-diffusion context. Without them, SdModel::load() fails. Also switches device from CPU to GPU and adds img2img test fixture. * fix(examples): configure FLUX companion models consistently across all diffusion examples All three diffusion examples now default to the required companion LLM (QWEN3_4B_Q4_K_M) and VAE (FLUX_2_KLEIN_4B_VAE) models, matching the desktop test configuration. Also switches device from cpu to gpu. * fix(tests): use llm addon elephant.jpg for img2img test fixture Replace photo.png with elephant.jpg from lib-infer-llamacpp-llm/media. Update generation test definitions to reference the new filename. * fix(examples): use path.resolve for img2img default image path import.meta.dirname is undefined in Bare runtime. Use path.resolve with a CWD-relative path instead, matching the documented convention of running examples from the SDK root. * fix(tests): migrate generation executor to ResourceManager pattern Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836. * feat(api): expose progressStream in generation() client helper The server already emits progress ticks (step/totalSteps/elapsedMs) during diffusion generation but the client was silently dropping them. Add a progressStream async generator to the generation() return type so SDK callers can show progress UI. Update the streaming-progress integration test to assert progress tick presence and field validity. * refactor(api): use background fan-out loop for generation() streams Refactor generation() to follow the completion() multi-stream pattern: a single background processResponses() task drives the RPC stream and fans out to outputStream, progressStream, outputs, and stats independently. This fixes two issues with the previous implementation: - consuming progressStream alone now works (no longer requires outputStream iteration to drive the RPC stream) - RPC errors propagate to all consumers (streams throw, promises reject) * chore: regenerate bun.lock and models registry after rebase Regenerates bun.lock and models/registry/models.ts to restore FLUX model entries that were lost during rebase conflict resolution. * fix[api]: align SDK diffusion schemas with addon contract - Rename config field `wtype` → `type` to match C++ context handler key - Expand weight type enum to match addon: add auto, bf16, q2_k, q3_k, q4_k, q5_k, q6_k; remove invalid "default" - Remove `schedule` config field (no C++ context handler exists for it) - Fix per-request scheduler enum: remove "default" (addon rejects it), add sgm_uniform, simple, lcm, smoothstep, kl_optimal, bong_tangent - Remove phantom stats fields from diffusionStatsSchema (generation_time, totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps, output_count) — addon RuntimeStats never emits these - Update unit tests and generation executor to use real addon fields * fix[api]: align SDK rng config with addon contract - Add std_default to rng enum to match addon RngType - Add sampler_rng config field (separate RNG for sampler) - Forward sampler_rng from plugin to addon * mod[api]: rename public API from generation() to diffusion() Aligns top-level API naming with other addon-specific surfaces (completion, ocr, embed) — "diffusion" is specific to the stable-diffusion.cpp backend, while "generation" is too generic and could apply to any inference addon. Rename covers: public function, types, schemas, RPC routing literal, handler registry, plugin handler key, examples, integration tests, and unit tests. Addon RuntimeStats field names (generationMs, etc.) are unchanged — those are wire-format names from the C++ addon. * fix: resolve pre-existing lint errors in diffusion client and load-model - Cast streamError to Error to satisfy @typescript-eslint/only-throw-error (closure type narrowing false positive) - Remove unnecessary SdcppConfig type assertion and unused import in load-model.ts * mod[api]: remove img2img functionality until addon support lands Strip init_image, strength, and all img2img code paths from the SDK surface. Will be re-added when the addon fully supports it (PR #884). * feat[api]: wire up profiler and device defaults for diffusion addon Register diffusionStream operation metrics (generationMs, totalSteps, totalImages, totalPixels) following the pattern of all other addons. Add sdcppGeneration to deviceConfigDefaultsSchema so device-specific config defaults can be applied to diffusion models. * fix[api]: align diffusion client API with actual streaming behavior C++ generate_image() is synchronous — images are delivered only after generation completes, not streamed during inference. Remove misleading outputStream generator and stream param from the client API. The correct surface is: progressStream (real-time step ticks), outputs (final images), and stats. Also update @qvac/diffusion-cpp dependency from file: link to 0.1.0 now that the package is published. * chore: clean up internal comments from public-facing API Remove implementation details (RPC wire format, C++ internals) from JSDoc and schema comments that end users would see. * fix[api]: add positive constraint to width/height, describe config fields - Add .positive() to width and height in diffusionRequestSchema - Add .describe() to companion model fields (clipLModelSrc, clipGModelSrc, t5XxlModelSrc, llmModelSrc, vaeModelSrc) documenting which architectures require them - Add .describe() to prediction, type, clip_on_cpu, vae_on_cpu, vae_tiling, flash_attn config fields - Add diffusion-simple.ts example showing minimal config with a single all-in-one GGUF model (no companion files) * fix: add missing validator to download test custom expectation DownloadExecutor constructor takes no arguments — remove resources param. download-tests custom expectation requires validator field per TestDefinition schema. * fix: add mobile diffusion support, move executor to shared, bump test timeouts - Move diffusion executor from desktop/ to shared/ (no platform-specific APIs) - Add skipPreDownload to desktop diffusion resource (companion models resolve at load time) - Add mobile consumer: SD 2.1 Q8_0 model, device gpu, threads 4, prediction v, vae_on_cpu true - Bump test timeouts: 300s default, 600s for batch/seed tests - Fix DownloadExecutor() constructor call (takes no args) * fix: use exported SDK model constant in diffusion-simple example Replace hardcoded local file path with SD_V2_1_1B_Q8_0 constant. Add prediction: "v" config required by SD 2.1 models. * fix[api]: address PR review comments for diffusion SDK integration - plugin.ts: import addon types (ImgStableDiffusionArgs, SdConfig), remove as any/as never casts. Refactor resolveConfig to use destructure + explicit Promise.all matching TTS pattern. Remove SRC_TO_ARTIFACT mapping constant. Pass config directly to addon constructor. - ops/diffusion.ts: pass params inline to model.run() matching TTS/OCR pattern. - model-registry.ts: loader field optional (loader?: FilesystemDL) with conditional spread for exactOptionalPropertyTypes. - sdcpp-config.ts: derive DiffusionClientParams from DiffusionRequest. Add descriptions to cfg_scale and guidance fields. - bun.lock: regenerated, removes file:../lib-infer-diffusion leak. - Remove shared-test-data/ directory (elephant.jpg leftover from img2img). - Remove dead verify* params from diffusion test definitions. * fix: add eslint-disable for optional MCP SDK import in example The @modelcontextprotocol/sdk is a user-installed optional dependency, not a project dependency. Suppress import/no-unresolved to unblock CI. * fix: bump diffusion-cpp to 0.1.1 for absolute path fix * fix: bump diffusion-cpp to 0.1.1 for absolute path fix

Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836.

opaninakuffo changed the title ~~QVAC-13658: SDK Profiler~~ QVAC-13658 feat[api]: SDK Profiler Mar 11, 2026

opaninakuffo marked this pull request as ready for review March 12, 2026 09:30

opaninakuffo requested review from a team as code owners March 12, 2026 09:30

opaninakuffo added the tier1 label Mar 12, 2026

NamelsKing previously approved these changes Mar 12, 2026

View reviewed changes

simon-iribarren reviewed Mar 13, 2026

View reviewed changes

Comment thread packages/sdk/profiling/ring-buffer.ts

simon-iribarren previously approved these changes Mar 13, 2026

View reviewed changes

opaninakuffo mentioned this pull request Mar 13, 2026

QVAC-13715 feat: Profiler Operation Transport + Load/Download Metrics + Stream Profiling #899

Merged

opaninakuffo dismissed stale reviews from simon-iribarren and NamelsKing via 87dba62 March 13, 2026 22:41

opaninakuffo added 7 commits March 13, 2026 22:43

feat: add core profiler module and public runtime API

dc37f82

feat: instrument client rpc transport and wire per-call profiling opt…

bf82c3c

…ions

feat: add server rpc profiling modules

d8eff21

feat: integrate profiling into server funnel delegation and operation…

6a5f116

… handlers

chore: add profiler usage examples

fcad050

chore: apply lint-only formatting

f7eaab6

chore: profiler unit tests

f5060bf

opaninakuffo force-pushed the feat/sdk-profiler-rebased branch from 87dba62 to f5060bf Compare March 13, 2026 22:48

NamelsKing approved these changes Mar 14, 2026

View reviewed changes

This was referenced Mar 14, 2026

QVAC-14104 feat[api]: SDK diffusion plugin integration #873

Closed

[POC] SDK finetuning support #578

Closed

Merge branch 'main' into feat/sdk-profiler-rebased

31c3815

simon-iribarren approved these changes Mar 15, 2026

View reviewed changes

Merge branch 'main' into feat/sdk-profiler-rebased

e477f3f

opaninakuffo merged commit 36f2690 into tetherto:main Mar 15, 2026
11 of 12 checks passed

Proletter pushed a commit that referenced this pull request May 24, 2026

fix(tests): migrate generation executor to ResourceManager pattern

bf6e3cf

Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836.

Proletter pushed a commit that referenced this pull request May 24, 2026

fix(tests): migrate generation executor to ResourceManager pattern

0e56871

Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QVAC-13658 feat[api]: SDK Profiler#836

QVAC-13658 feat[api]: SDK Profiler#836
opaninakuffo merged 9 commits into
tetherto:mainfrom
opaninakuffo:feat/sdk-profiler-rebased

opaninakuffo commented Mar 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

opaninakuffo commented Mar 15, 2026

Uh oh!

github-actions Bot commented Mar 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

opaninakuffo commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

🔌 API Changes

📊 Metrics Catalog

Notes

Aggregated Timing Metrics (exportTable())

Event Gauges and Counters (exportJSON().recentEvents / onRecord)

Summary Rows (exportSummary())

Uh oh!

Uh oh!

opaninakuffo commented Mar 15, 2026

Uh oh!

github-actions Bot commented Mar 15, 2026

Tier-based Approval Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

opaninakuffo commented Mar 11, 2026 •

edited

Loading

Aggregated Timing Metrics (`exportTable()`)

Event Gauges and Counters (`exportJSON().recentEvents` / `onRecord`)

Summary Rows (`exportSummary()`)