QVAC-13658 feat[api]: SDK Profiler#836
Merged
opaninakuffo merged 9 commits intoMar 15, 2026
Merged
Conversation
NamelsKing
previously approved these changes
Mar 12, 2026
simon-iribarren
previously approved these changes
Mar 13, 2026
87dba62
87dba62 to
f5060bf
Compare
NamelsKing
approved these changes
Mar 14, 2026
This was referenced Mar 14, 2026
simon-iribarren
approved these changes
Mar 15, 2026
Contributor
Author
|
/review |
Contributor
Tier-based Approval Status |
NamelsKing
pushed a commit
that referenced
this pull request
Apr 3, 2026
* feat: add diffusion SDK plugin integration Wire up the stable-diffusion.cpp plugin through all SDK layers: - Schema: sdcpp-config.ts with config, stats, request/response schemas - Plugin: resolveConfig for companion artifacts, createModel, streaming handlers - Load model: diffusion entries in all 4 schema locations - Registration: model type, alias, engine-addon map, worker, pear pre-hook - Type widening: FilesystemDL | undefined for loader-less plugins * feat(diffusion): consolidate SDK plugin, fix sampling_method schema, add integration tests - Fix sampling_method enum to match C++ addon ground truth (dpm++2m not dpm++_2m) - Add 6 missing sampler values (ipndm, ipndm_v, ddim_trailing, tcd, res_multistep, res_2s) - Fix addon index.d.ts SamplerMethod type to match C++ parser - Consolidate generation ops into single unified handler (txt2img + img2img) - Add dedicated RPC handler, client API, and first-class generation() export - Add 15 integration test definitions and desktop executor - Add examples: txt2img, img2img, flux2-klein - Add comprehensive unit tests for schemas, plugin dispatch, and stats - Wire diffusion into handler-registry, common schemas, model-config-utils, get-model-info * fix(diffusion): register generationStream in bare-client handler map The bare-client dispatches via handlers/index.ts (direct mode), not handler-registry.ts (IPC worker mode). Missing entry caused RPC_NO_HANDLER when running examples via bare runtime. * feat(diffusion): add diffusion naming handler for update-models codegen Add dedicated generateDiffusionName() to produce clean export constants for diffusion registry models (SD → SD_V2_1, SDXL → SDXL_BASE, FLUX, VAE). Includes 4 unit tests covering all model families. * feat(diffusion): sync registry models and use FLUX constant in tests Run bun update-models to pull 21 new models (including diffusion) from the live registry. Replace QVAC_DIFFUSION_MODEL env var in model-manager with the FLUX_2_KLEIN_4B_Q4_0 registry constant. * fix(diffusion): prevent statsPromise hang and fix lint issues Resolve statsPromise after stream loop exits (not only on done:true), add statsRejecter for error propagation, derive GenerationClientParams from schema type to prevent drift, and fix lint warnings in generation ops and test executor. * revert: remove non-matching patterns from generation client Revert statsPromise try/catch/rejecter and GenerationClientParams Omit<> derivation — these diverged from the established patterns in ocr.ts, translate.ts, and transcription.ts. Also remove unrelated model history file that was incorrectly included. * chore: remove unrelated model history file ecb1bf8.txt was a codegen artifact from bun run update-models during the merge — it should not have been committed here. * fix: configure FLUX companion models and GPU device for diffusion tests FLUX.2 models require companion LLM (Qwen3-4B) and VAE models to create the stable-diffusion context. Without them, SdModel::load() fails. Also switches device from CPU to GPU and adds img2img test fixture. * fix(examples): configure FLUX companion models consistently across all diffusion examples All three diffusion examples now default to the required companion LLM (QWEN3_4B_Q4_K_M) and VAE (FLUX_2_KLEIN_4B_VAE) models, matching the desktop test configuration. Also switches device from cpu to gpu. * fix(tests): use llm addon elephant.jpg for img2img test fixture Replace photo.png with elephant.jpg from lib-infer-llamacpp-llm/media. Update generation test definitions to reference the new filename. * fix(examples): use path.resolve for img2img default image path import.meta.dirname is undefined in Bare runtime. Use path.resolve with a CWD-relative path instead, matching the documented convention of running examples from the SDK root. * fix(tests): migrate generation executor to ResourceManager pattern Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836. * feat(api): expose progressStream in generation() client helper The server already emits progress ticks (step/totalSteps/elapsedMs) during diffusion generation but the client was silently dropping them. Add a progressStream async generator to the generation() return type so SDK callers can show progress UI. Update the streaming-progress integration test to assert progress tick presence and field validity. * refactor(api): use background fan-out loop for generation() streams Refactor generation() to follow the completion() multi-stream pattern: a single background processResponses() task drives the RPC stream and fans out to outputStream, progressStream, outputs, and stats independently. This fixes two issues with the previous implementation: - consuming progressStream alone now works (no longer requires outputStream iteration to drive the RPC stream) - RPC errors propagate to all consumers (streams throw, promises reject) * chore: regenerate bun.lock and models registry after rebase Regenerates bun.lock and models/registry/models.ts to restore FLUX model entries that were lost during rebase conflict resolution. * fix[api]: align SDK diffusion schemas with addon contract - Rename config field `wtype` → `type` to match C++ context handler key - Expand weight type enum to match addon: add auto, bf16, q2_k, q3_k, q4_k, q5_k, q6_k; remove invalid "default" - Remove `schedule` config field (no C++ context handler exists for it) - Fix per-request scheduler enum: remove "default" (addon rejects it), add sgm_uniform, simple, lcm, smoothstep, kl_optimal, bong_tangent - Remove phantom stats fields from diffusionStatsSchema (generation_time, totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps, output_count) — addon RuntimeStats never emits these - Update unit tests and generation executor to use real addon fields * fix[api]: align SDK rng config with addon contract - Add std_default to rng enum to match addon RngType - Add sampler_rng config field (separate RNG for sampler) - Forward sampler_rng from plugin to addon * mod[api]: rename public API from generation() to diffusion() Aligns top-level API naming with other addon-specific surfaces (completion, ocr, embed) — "diffusion" is specific to the stable-diffusion.cpp backend, while "generation" is too generic and could apply to any inference addon. Rename covers: public function, types, schemas, RPC routing literal, handler registry, plugin handler key, examples, integration tests, and unit tests. Addon RuntimeStats field names (generationMs, etc.) are unchanged — those are wire-format names from the C++ addon. * fix: resolve pre-existing lint errors in diffusion client and load-model - Cast streamError to Error to satisfy @typescript-eslint/only-throw-error (closure type narrowing false positive) - Remove unnecessary SdcppConfig type assertion and unused import in load-model.ts * mod[api]: remove img2img functionality until addon support lands Strip init_image, strength, and all img2img code paths from the SDK surface. Will be re-added when the addon fully supports it (PR #884). * feat[api]: wire up profiler and device defaults for diffusion addon Register diffusionStream operation metrics (generationMs, totalSteps, totalImages, totalPixels) following the pattern of all other addons. Add sdcppGeneration to deviceConfigDefaultsSchema so device-specific config defaults can be applied to diffusion models. * fix[api]: align diffusion client API with actual streaming behavior C++ generate_image() is synchronous — images are delivered only after generation completes, not streamed during inference. Remove misleading outputStream generator and stream param from the client API. The correct surface is: progressStream (real-time step ticks), outputs (final images), and stats. Also update @qvac/diffusion-cpp dependency from file: link to 0.1.0 now that the package is published. * chore: clean up internal comments from public-facing API Remove implementation details (RPC wire format, C++ internals) from JSDoc and schema comments that end users would see. * fix[api]: add positive constraint to width/height, describe config fields - Add .positive() to width and height in diffusionRequestSchema - Add .describe() to companion model fields (clipLModelSrc, clipGModelSrc, t5XxlModelSrc, llmModelSrc, vaeModelSrc) documenting which architectures require them - Add .describe() to prediction, type, clip_on_cpu, vae_on_cpu, vae_tiling, flash_attn config fields - Add diffusion-simple.ts example showing minimal config with a single all-in-one GGUF model (no companion files) * fix: add missing validator to download test custom expectation DownloadExecutor constructor takes no arguments — remove resources param. download-tests custom expectation requires validator field per TestDefinition schema. * fix: add mobile diffusion support, move executor to shared, bump test timeouts - Move diffusion executor from desktop/ to shared/ (no platform-specific APIs) - Add skipPreDownload to desktop diffusion resource (companion models resolve at load time) - Add mobile consumer: SD 2.1 Q8_0 model, device gpu, threads 4, prediction v, vae_on_cpu true - Bump test timeouts: 300s default, 600s for batch/seed tests - Fix DownloadExecutor() constructor call (takes no args) * fix: use exported SDK model constant in diffusion-simple example Replace hardcoded local file path with SD_V2_1_1B_Q8_0 constant. Add prediction: "v" config required by SD 2.1 models. * fix[api]: address PR review comments for diffusion SDK integration - plugin.ts: import addon types (ImgStableDiffusionArgs, SdConfig), remove as any/as never casts. Refactor resolveConfig to use destructure + explicit Promise.all matching TTS pattern. Remove SRC_TO_ARTIFACT mapping constant. Pass config directly to addon constructor. - ops/diffusion.ts: pass params inline to model.run() matching TTS/OCR pattern. - model-registry.ts: loader field optional (loader?: FilesystemDL) with conditional spread for exactOptionalPropertyTypes. - sdcpp-config.ts: derive DiffusionClientParams from DiffusionRequest. Add descriptions to cfg_scale and guidance fields. - bun.lock: regenerated, removes file:../lib-infer-diffusion leak. - Remove shared-test-data/ directory (elephant.jpg leftover from img2img). - Remove dead verify* params from diffusion test definitions. * fix: add eslint-disable for optional MCP SDK import in example The @modelcontextprotocol/sdk is a user-installed optional dependency, not a project dependency. Suppress import/no-unresolved to unblock CI. * fix: bump diffusion-cpp to 0.1.1 for absolute path fix * fix: bump diffusion-cpp to 0.1.1 for absolute path fix
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836.
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
* feat: add core profiler module and public runtime API * feat: instrument client rpc transport and wire per-call profiling options * feat: add server rpc profiling modules * feat: integrate profiling into server funnel delegation and operation handlers * chore: add profiler usage examples * chore: apply lint-only formatting * chore: profiler unit tests --------- Co-authored-by: Ridwan Taiwo <donriddo@gmail.com>
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
* feat: add diffusion SDK plugin integration Wire up the stable-diffusion.cpp plugin through all SDK layers: - Schema: sdcpp-config.ts with config, stats, request/response schemas - Plugin: resolveConfig for companion artifacts, createModel, streaming handlers - Load model: diffusion entries in all 4 schema locations - Registration: model type, alias, engine-addon map, worker, pear pre-hook - Type widening: FilesystemDL | undefined for loader-less plugins * feat(diffusion): consolidate SDK plugin, fix sampling_method schema, add integration tests - Fix sampling_method enum to match C++ addon ground truth (dpm++2m not dpm++_2m) - Add 6 missing sampler values (ipndm, ipndm_v, ddim_trailing, tcd, res_multistep, res_2s) - Fix addon index.d.ts SamplerMethod type to match C++ parser - Consolidate generation ops into single unified handler (txt2img + img2img) - Add dedicated RPC handler, client API, and first-class generation() export - Add 15 integration test definitions and desktop executor - Add examples: txt2img, img2img, flux2-klein - Add comprehensive unit tests for schemas, plugin dispatch, and stats - Wire diffusion into handler-registry, common schemas, model-config-utils, get-model-info * fix(diffusion): register generationStream in bare-client handler map The bare-client dispatches via handlers/index.ts (direct mode), not handler-registry.ts (IPC worker mode). Missing entry caused RPC_NO_HANDLER when running examples via bare runtime. * feat(diffusion): add diffusion naming handler for update-models codegen Add dedicated generateDiffusionName() to produce clean export constants for diffusion registry models (SD → SD_V2_1, SDXL → SDXL_BASE, FLUX, VAE). Includes 4 unit tests covering all model families. * feat(diffusion): sync registry models and use FLUX constant in tests Run bun update-models to pull 21 new models (including diffusion) from the live registry. Replace QVAC_DIFFUSION_MODEL env var in model-manager with the FLUX_2_KLEIN_4B_Q4_0 registry constant. * fix(diffusion): prevent statsPromise hang and fix lint issues Resolve statsPromise after stream loop exits (not only on done:true), add statsRejecter for error propagation, derive GenerationClientParams from schema type to prevent drift, and fix lint warnings in generation ops and test executor. * revert: remove non-matching patterns from generation client Revert statsPromise try/catch/rejecter and GenerationClientParams Omit<> derivation — these diverged from the established patterns in ocr.ts, translate.ts, and transcription.ts. Also remove unrelated model history file that was incorrectly included. * chore: remove unrelated model history file ecb1bf8.txt was a codegen artifact from bun run update-models during the merge — it should not have been committed here. * fix: configure FLUX companion models and GPU device for diffusion tests FLUX.2 models require companion LLM (Qwen3-4B) and VAE models to create the stable-diffusion context. Without them, SdModel::load() fails. Also switches device from CPU to GPU and adds img2img test fixture. * fix(examples): configure FLUX companion models consistently across all diffusion examples All three diffusion examples now default to the required companion LLM (QWEN3_4B_Q4_K_M) and VAE (FLUX_2_KLEIN_4B_VAE) models, matching the desktop test configuration. Also switches device from cpu to gpu. * fix(tests): use llm addon elephant.jpg for img2img test fixture Replace photo.png with elephant.jpg from lib-infer-llamacpp-llm/media. Update generation test definitions to reference the new filename. * fix(examples): use path.resolve for img2img default image path import.meta.dirname is undefined in Bare runtime. Use path.resolve with a CWD-relative path instead, matching the documented convention of running examples from the SDK root. * fix(tests): migrate generation executor to ResourceManager pattern Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836. * feat(api): expose progressStream in generation() client helper The server already emits progress ticks (step/totalSteps/elapsedMs) during diffusion generation but the client was silently dropping them. Add a progressStream async generator to the generation() return type so SDK callers can show progress UI. Update the streaming-progress integration test to assert progress tick presence and field validity. * refactor(api): use background fan-out loop for generation() streams Refactor generation() to follow the completion() multi-stream pattern: a single background processResponses() task drives the RPC stream and fans out to outputStream, progressStream, outputs, and stats independently. This fixes two issues with the previous implementation: - consuming progressStream alone now works (no longer requires outputStream iteration to drive the RPC stream) - RPC errors propagate to all consumers (streams throw, promises reject) * chore: regenerate bun.lock and models registry after rebase Regenerates bun.lock and models/registry/models.ts to restore FLUX model entries that were lost during rebase conflict resolution. * fix[api]: align SDK diffusion schemas with addon contract - Rename config field `wtype` → `type` to match C++ context handler key - Expand weight type enum to match addon: add auto, bf16, q2_k, q3_k, q4_k, q5_k, q6_k; remove invalid "default" - Remove `schedule` config field (no C++ context handler exists for it) - Fix per-request scheduler enum: remove "default" (addon rejects it), add sgm_uniform, simple, lcm, smoothstep, kl_optimal, bong_tangent - Remove phantom stats fields from diffusionStatsSchema (generation_time, totalTime, stepsPerSecond, msPerStep, megapixelsPerSecond, steps, output_count) — addon RuntimeStats never emits these - Update unit tests and generation executor to use real addon fields * fix[api]: align SDK rng config with addon contract - Add std_default to rng enum to match addon RngType - Add sampler_rng config field (separate RNG for sampler) - Forward sampler_rng from plugin to addon * mod[api]: rename public API from generation() to diffusion() Aligns top-level API naming with other addon-specific surfaces (completion, ocr, embed) — "diffusion" is specific to the stable-diffusion.cpp backend, while "generation" is too generic and could apply to any inference addon. Rename covers: public function, types, schemas, RPC routing literal, handler registry, plugin handler key, examples, integration tests, and unit tests. Addon RuntimeStats field names (generationMs, etc.) are unchanged — those are wire-format names from the C++ addon. * fix: resolve pre-existing lint errors in diffusion client and load-model - Cast streamError to Error to satisfy @typescript-eslint/only-throw-error (closure type narrowing false positive) - Remove unnecessary SdcppConfig type assertion and unused import in load-model.ts * mod[api]: remove img2img functionality until addon support lands Strip init_image, strength, and all img2img code paths from the SDK surface. Will be re-added when the addon fully supports it (PR #884). * feat[api]: wire up profiler and device defaults for diffusion addon Register diffusionStream operation metrics (generationMs, totalSteps, totalImages, totalPixels) following the pattern of all other addons. Add sdcppGeneration to deviceConfigDefaultsSchema so device-specific config defaults can be applied to diffusion models. * fix[api]: align diffusion client API with actual streaming behavior C++ generate_image() is synchronous — images are delivered only after generation completes, not streamed during inference. Remove misleading outputStream generator and stream param from the client API. The correct surface is: progressStream (real-time step ticks), outputs (final images), and stats. Also update @qvac/diffusion-cpp dependency from file: link to 0.1.0 now that the package is published. * chore: clean up internal comments from public-facing API Remove implementation details (RPC wire format, C++ internals) from JSDoc and schema comments that end users would see. * fix[api]: add positive constraint to width/height, describe config fields - Add .positive() to width and height in diffusionRequestSchema - Add .describe() to companion model fields (clipLModelSrc, clipGModelSrc, t5XxlModelSrc, llmModelSrc, vaeModelSrc) documenting which architectures require them - Add .describe() to prediction, type, clip_on_cpu, vae_on_cpu, vae_tiling, flash_attn config fields - Add diffusion-simple.ts example showing minimal config with a single all-in-one GGUF model (no companion files) * fix: add missing validator to download test custom expectation DownloadExecutor constructor takes no arguments — remove resources param. download-tests custom expectation requires validator field per TestDefinition schema. * fix: add mobile diffusion support, move executor to shared, bump test timeouts - Move diffusion executor from desktop/ to shared/ (no platform-specific APIs) - Add skipPreDownload to desktop diffusion resource (companion models resolve at load time) - Add mobile consumer: SD 2.1 Q8_0 model, device gpu, threads 4, prediction v, vae_on_cpu true - Bump test timeouts: 300s default, 600s for batch/seed tests - Fix DownloadExecutor() constructor call (takes no args) * fix: use exported SDK model constant in diffusion-simple example Replace hardcoded local file path with SD_V2_1_1B_Q8_0 constant. Add prediction: "v" config required by SD 2.1 models. * fix[api]: address PR review comments for diffusion SDK integration - plugin.ts: import addon types (ImgStableDiffusionArgs, SdConfig), remove as any/as never casts. Refactor resolveConfig to use destructure + explicit Promise.all matching TTS pattern. Remove SRC_TO_ARTIFACT mapping constant. Pass config directly to addon constructor. - ops/diffusion.ts: pass params inline to model.run() matching TTS/OCR pattern. - model-registry.ts: loader field optional (loader?: FilesystemDL) with conditional spread for exactOptionalPropertyTypes. - sdcpp-config.ts: derive DiffusionClientParams from DiffusionRequest. Add descriptions to cfg_scale and guidance fields. - bun.lock: regenerated, removes file:../lib-infer-diffusion leak. - Remove shared-test-data/ directory (elephant.jpg leftover from img2img). - Remove dead verify* params from diffusion test definitions. * fix: add eslint-disable for optional MCP SDK import in example The @modelcontextprotocol/sdk is a user-installed optional dependency, not a project dependency. Suppress import/no-unresolved to unblock CI. * fix: bump diffusion-cpp to 0.1.1 for absolute path fix * fix: bump diffusion-cpp to 0.1.1 for absolute path fix
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
Replace ModelManager usage with AbstractModelExecutor base class, matching the pattern used by all other executors after PR #836.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
📝 How does it solve it?
profiler.enable(...),profiler.disable(),profiler.exportJSON(),profiler.exportTable(),profiler.exportSummary().__profilingenvelope propagation end-to-end.🧪 How was it tested?
packages/sdk/test/unit/profiler.test.ts🔌 API Changes
📊 Metrics Catalog
Notes
ms).op.phasewhenphaseexists, otherwise justop.opis the operation name (e.g.,loadModel,completionStream,unloadModel,pluginInvoke).Aggregated Timing Metrics (
exportTable())rpc.connectiondelegation.connectionop.request.zodValidationop.request.stringifyop.request.totalSerializationrequest.zodValidation + request.stringify.op.serverWaitop.ttfbop.streamDurationop.response.jsonParseop.response.zodValidationop.response.totalParsingresponse.jsonParse + response.zodValidation.op.totalClientTimeop.clientOverheadtotalClientTime - server.totalServerTimewhen server breakdown is present.op.server.request.jsonParseop.server.request.zodValidationop.server.handlerExecutionop.server.response.zodValidationop.server.response.stringifyop.server.totalServerTimeop.delegation.connectionop.delegation.request.stringifyop.delegation.serverWaitop.delegation.response.jsonParseop.delegation.totalDelegationTimeop.delegated.request.jsonParsedelegated.*prefix.op.delegated.request.zodValidationop.delegated.handlerExecutionop.delegated.response.zodValidationop.delegated.response.stringifyop.delegated.totalServerTimeop.totalDelegationTimeop.failedop(no phase, e.g.,embed,rag)Event Gauges and Counters (
exportJSON().recentEvents/onRecord)gauges.ttfbgauges.timeToFirstTokencompletionStreamoperation metricsgauges.tokensPerSecondcompletionStreamoperation metricsgauges.cacheTokenscompletionStreamoperation metricsgauges.processedTokenstranslateoperation metricsgauges.processingTimetranslateoperation metricsgauges.detectionTimeocrStreamoperation metricsgauges.recognitionTimeocrStreamoperation metricsgauges.totalTimeocrStreamoperation metricsgauges.processedragoperation metricsgauges.resultsCountragoperation metricscountSummary Rows (
exportSummary())RPC Total.totalClientTimeHandler.server.handlerExecutionOR phaseless keys (e.g.,embed)Model Loadload.totalTimeor*.load.totalTimeDownloaddownload.timeor*.download.time