QVAC-18873 feat[api|mod]: expose diffusion_fa, drop flux_flow, sync model registry#2046
Merged
gianni-cor merged 13 commits intoMay 20, 2026
Merged
Conversation
c225999 to
a67d93a
Compare
66c2a6c to
1afbc75
Compare
Adds diffusion_fa to sdcppConfigSchema so callers can explicitly control per-transformer flash attention. The addon enables this by default (required for FLUX.2 to avoid materialising the full Q·Kᵀ attention matrix); the field is a no-op escape hatch for backends that don't support ggml_flash_attn_ext. The plugin's ...rest spread already forwards it to the native layer; no plugin changes required.
flux_flow (FLUX.1) was never a supported model family — only flux2_flow (FLUX.2) is. Remove the stale enum value so the SDK schema matches the diffusion addon surface.
Contributor
QVAC E2E —
|
Contributor
Contributor
QVAC E2E —
|
Contributor
Contributor
Contributor
|
/review |
Contributor
Tier-based Approval Status |
NamelsKing
approved these changes
May 20, 2026
Contributor
Author
|
/review |
Contributor
|
/review |
Proletter
pushed a commit
that referenced
this pull request
May 24, 2026
…odel registry (#2046) * feat(sdk): expose diffusion_fa in sdcppConfigSchema Adds diffusion_fa to sdcppConfigSchema so callers can explicitly control per-transformer flash attention. The addon enables this by default (required for FLUX.2 to avoid materialising the full Q·Kᵀ attention matrix); the field is a no-op escape hatch for backends that don't support ggml_flash_attn_ext. The plugin's ...rest spread already forwards it to the native layer; no plugin changes required. * fix(sdk): remove flux_flow from prediction enum flux_flow (FLUX.1) was never a supported model family — only flux2_flow (FLUX.2) is. Remove the stale enum value so the SDK schema matches the diffusion addon surface. * fix(sdk): simplify diffusion_fa description and add unit test coverage Shorten the describe() string to match the terse style of adjacent boolean fields. Add diffusion_fa to the "accepts valid full config" fixture in sdcpp-plugin.test.ts so the field has schema-parse coverage. * test(sdk): add diffusion_fa E2E test to tests-qvac Adds a dedicated 'diffusion-fa' resource in the desktop consumer loaded with diffusion_fa: true, a matching executor method that calls ensureLoaded('diffusion-fa'), and a test definition 'diffusion-fa-accepted' that generates a 256x256 image through the full SDK -> plugin -> addon path, confirming the field is accepted and forwarded without breaking inference. * test(sdk): remove misleading comment from diffusion-fa resource * test(sdk): add rejection tests for flux_flow and diffusion_fa type; fix E2E test name and remove redundant preload Add two missing schema rejection tests: non-boolean diffusion_fa and the removed flux_flow prediction value. Rename diffusion-fa-accepted to diffusion-fa-loads-and-runs to match what the test actually verifies (load + generate, not FA effect). Remove preLoadUnload from diffusion-fa resource — it reuses the same Flux2 model files as the diffusion resource, so the extra load+unload at bootstrap is redundant cost. * feat[mod](sdk): add Gemma4-E2B/E4B/31B, Qwen3.5-0.8B/2B/4B/9B, Qwen3.6-27B/35B-A3B to SDK registry * fix[notask]: bump @qvac/diffusion-cpp to ^0.8.0 * test(sdk): prove diffusion_fa:false override path end-to-end Unit test verifies sdcppConfigSchema preserves false through parsing (not just rejects non-booleans). E2E adds diffusion-fa-disabled resource with diffusion_fa:false and a matching test so the full SDK→plugin→addon path is exercised for the opt-out case, not just the addon default. * fix(sdk): replace hardcoded HF URLs with registry constants; add qwen35/gemma4 dialect E2E tests Examples llamacpp-tools-qwen35 and llamacpp-tools-gemma4 were using raw HuggingFace URLs as fallback defaults because the registry had not yet been seeded with Qwen3.5 and Gemma4 models. Now that those constants exist (QWEN3_5_0_8B_MULTIMODAL_Q8_0, GEMMA4_2B_MULTIMODAL_Q4_K_M), use them directly, matching the pattern of all other SDK examples. Adds tools-qwen35 and tools-gemma4 resources to the desktop consumer and two dialect-specific E2E tests (tools-simple-function-qwen35, tools-simple-function-gemma4). PR #1974 wired toolDialect and resourceKey through ToolsExecutor and createToolsTest specifically to enable these tests once constants were available. --------- Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🎯 What problem does this PR solve?
sdcppConfigSchemawas missing thediffusion_fafield, making it impossible to set the per-transformer flash attention flag from the SDK layer.flux_flowwas still in the SDK'spredictionenum.@qvac/diffusion-cpp@0.8.0removedflux_flowentirely — from the C++ handler, JS validator, and TypeScript type. Callers passingflux_flownow receiveInvalidArgumentfrom the addon. The SDK Zod guard is updated to match.📝 How does it solve it?
diffusion_fa: z.boolean().optional()tosdcppConfigSchema, consistent with the adjacentflash_attnfield. The plugin atsdcpp-generation/plugin.tsalready forwards all schema fields via...rest, so no plugin change is needed.flux_flowfrom thepredictionenum, matching@qvac/diffusion-cpp@0.8.0's removal across its JS and C++ layers.diffusionFlashAttn = trueas of@qvac/diffusion-cpp@0.8.0(PR fix(diffusion-cpp): prevent FLUX2 img2img OOM on large input images #1952). This SDK PR exposes the override so callers can explicitly opt out by passingdiffusion_fa: false.bun run update-modelsagainst registry commit0e91a173a: 87 models added, 10 updated.examples/tools/llamacpp-tools-qwen35.tsandllamacpp-tools-gemma4.tswithQWEN3_5_0_8B_MULTIMODAL_Q8_0andGEMMA4_2B_MULTIMODAL_Q4_K_Mregistry constants.tools-qwen35andtools-gemma4resources to the desktop E2E consumer and two dialect-specific tests (tools-simple-function-qwen35,tools-simple-function-gemma4) to verify the Qwen3.5 Pythonic-XML and Gemma4 native dialects end-to-end.🧪 How was it tested?
bun run test:unit), including new rejection tests for non-booleandiffusion_faand removedflux_flow, and a schema preservation test confirmingdiffusion_fa: falsesurvives Zod parsing without being stripped.diffusion-fa-loads-and-runsin tests-qvac: loads FLUX.2-klein-4B withdiffusion_fa: trueand generates a 256×256 image, confirming the field is accepted and forwarded through the full SDK → plugin → addon path without error.diffusion-fa-disabled-loads-and-runsin tests-qvac: loads the same model withdiffusion_fa: falseand generates successfully, proving the opt-out override path is forwarded end-to-end and does not error.bun run update-modelsran cleanly against the live registry.💥 Breaking changes
prediction: "flux_flow"now fails Zod validation at the SDK boundary. This matches@qvac/diffusion-cpp@0.8.0, which already throwsInvalidArgumentifflux_flowreaches the addon. External consumers must switch toflux2_flow.🔌 API Changes
New optional field in
SdcppConfig:Removed from
predictionenum:"flux_flow"(removed in@qvac/diffusion-cpp@0.8.0; useflux2_flow).📦 Models
Added models
Updated models
Renumbered metadata constants (not removed)
Inserting 11 new Bergamot language pairs (EN_BS, EN_HI, EN_NB, EN_NO, EN_SR, EN_TH, EN_VI, EN_ZH, base, NO_EN, TH_EN, ZH_EN) in alphabetical order shifted the sequential
BERGAMOT_METADATA_Nnumbering for all entries after the insertion points. Two constants that existed before now resolve to different language pairs:BERGAMOT_METADATA_28: wasbergamot-enhr(en→Croatian), nowbergamot-enhi(en→Hindi)BERGAMOT_METADATA_40: wasbergamot-ennl(en→Dutch), nowbergamot-enms(en→Malay)Both constants still exist. Consumers using the stable language-pair constants (
BERGAMOT_EN_HR,BERGAMOT_EN_NL) are unaffected. The sequentialBERGAMOT_METADATA_Nnaming is a pre-existingupdate-modelstooling limitation.