Skip to content

QVAC-18873 feat[api|mod]: expose diffusion_fa, drop flux_flow, sync model registry#2046

Merged
gianni-cor merged 13 commits into
tetherto:mainfrom
donriddo:feat/sdk-expose-diffusion-fa
May 20, 2026
Merged

QVAC-18873 feat[api|mod]: expose diffusion_fa, drop flux_flow, sync model registry#2046
gianni-cor merged 13 commits into
tetherto:mainfrom
donriddo:feat/sdk-expose-diffusion-fa

Conversation

@donriddo

@donriddo donriddo commented May 14, 2026

Copy link
Copy Markdown
Contributor

🎯 What problem does this PR solve?

📝 How does it solve it?

  • Adds diffusion_fa: z.boolean().optional() to sdcppConfigSchema, consistent with the adjacent flash_attn field. The plugin at sdcpp-generation/plugin.ts already forwards all schema fields via ...rest, so no plugin change is needed.
  • Removes flux_flow from the prediction enum, matching @qvac/diffusion-cpp@0.8.0's removal across its JS and C++ layers.
  • The addon now defaults diffusionFlashAttn = true as of @qvac/diffusion-cpp@0.8.0 (PR fix(diffusion-cpp): prevent FLUX2 img2img OOM on large input images #1952). This SDK PR exposes the override so callers can explicitly opt out by passing diffusion_fa: false.
  • Runs bun run update-models against registry commit 0e91a173a: 87 models added, 10 updated.
  • Replaces hardcoded HuggingFace URLs in examples/tools/llamacpp-tools-qwen35.ts and llamacpp-tools-gemma4.ts with QWEN3_5_0_8B_MULTIMODAL_Q8_0 and GEMMA4_2B_MULTIMODAL_Q4_K_M registry constants.
  • Adds tools-qwen35 and tools-gemma4 resources to the desktop E2E consumer and two dialect-specific tests (tools-simple-function-qwen35, tools-simple-function-gemma4) to verify the Qwen3.5 Pythonic-XML and Gemma4 native dialects end-to-end.

🧪 How was it tested?

  • SDK unit tests pass locally (bun run test:unit), including new rejection tests for non-boolean diffusion_fa and removed flux_flow, and a schema preservation test confirming diffusion_fa: false survives Zod parsing without being stripped.
  • E2E test diffusion-fa-loads-and-runs in tests-qvac: loads FLUX.2-klein-4B with diffusion_fa: true and generates a 256×256 image, confirming the field is accepted and forwarded through the full SDK → plugin → addon path without error.
  • E2E test diffusion-fa-disabled-loads-and-runs in tests-qvac: loads the same model with diffusion_fa: false and generates successfully, proving the opt-out override path is forwarded end-to-end and does not error.
  • bun run update-models ran cleanly against the live registry.

💥 Breaking changes

prediction: "flux_flow" now fails Zod validation at the SDK boundary. This matches @qvac/diffusion-cpp@0.8.0, which already throws InvalidArgument if flux_flow reaches the addon. External consumers must switch to flux2_flow.

🔌 API Changes

New optional field in SdcppConfig:

diffusion_fa?: boolean  // enable per-transformer flash attention (addon default: true as of @qvac/diffusion-cpp@0.8.0)

Removed from prediction enum: "flux_flow" (removed in @qvac/diffusion-cpp@0.8.0; use flux2_flow).

📦 Models

Added models

GEMMA4_31B_MULTIMODAL_Q4_K_M
GEMMA4_31B_MULTIMODAL_Q6_K
MMPROJ_GEMMA4_31B_MULTIMODAL_BF16
MMPROJ_GEMMA4_31B_MULTIMODAL_F16
GEMMA4_2B_MULTIMODAL_Q4_K_M
GEMMA4_2B_MULTIMODAL_Q6_K
MMPROJ_GEMMA4_2B_MULTIMODAL_BF16
MMPROJ_GEMMA4_2B_MULTIMODAL_F16
GEMMA4_4B_MULTIMODAL_Q4_K_M
GEMMA4_4B_MULTIMODAL_Q6_K
MMPROJ_GEMMA4_4B_MULTIMODAL_BF16
MMPROJ_GEMMA4_4B_MULTIMODAL_F16
BERGAMOT_METADATA
BERGAMOT_EN_BS_LEX
BERGAMOT_METADATA_1
BERGAMOT_EN_BS
BERGAMOT_EN_BS_VOCAB
BERGAMOT_METADATA_2
BERGAMOT_EN_NB_LEX
BERGAMOT_METADATA_3
BERGAMOT_EN_NB
BERGAMOT_EN_NB_VOCAB
BERGAMOT_METADATA_4
BERGAMOT_EN_NO_LEX
BERGAMOT_METADATA_5
BERGAMOT_EN_NO
BERGAMOT_EN_NO_VOCAB
BERGAMOT_EN_SR_LEX
BERGAMOT_METADATA_6
BERGAMOT_EN_SR
BERGAMOT_EN_SR_VOCAB
BERGAMOT_EN_TH_LEX
BERGAMOT_METADATA_7
BERGAMOT_EN_TH
BERGAMOT_EN_TH_VOCAB
BERGAMOT_EN_VI_LEX
BERGAMOT_METADATA_8
BERGAMOT_EN_VI
BERGAMOT_EN_VI_VOCAB
BERGAMOT_EN_ZH_LEX
BERGAMOT_METADATA_9
BERGAMOT_EN_ZH
BERGAMOT_EN_ZH_SRCVOCAB
BERGAMOT_EN_ZH_TRGVOCAB
BERGAMOT_LEX
BERGAMOT_METADATA_10
BERGAMOT
BERGAMOT_VOCAB
BERGAMOT_NO_EN_LEX
BERGAMOT_METADATA_11
BERGAMOT_NO_EN
BERGAMOT_NO_EN_VOCAB
BERGAMOT_TH_EN_LEX
BERGAMOT_METADATA_12
BERGAMOT_TH_EN
BERGAMOT_TH_EN_VOCAB
BERGAMOT_ZH_EN_LEX
BERGAMOT_ZH_EN
BERGAMOT_ZH_EN_VOCAB
PARAKEET_TDT_PARAKEET_CTC_0_6B_Q8_0_Q8_0
PARAKEET_TDT_PARAKEET_EOU_120M_V1_Q8_0_Q8_0
PARAKEET_TDT_PARAKEET_TDT_0_6B_V3_Q8_0_Q8_0
PARAKEET_TDT_Q8_0
QWEN3_5_0_8B_MULTIMODAL_Q4_K_M
QWEN3_5_0_8B_MULTIMODAL_Q8_0
MMPROJ_QWEN3_5_0_8B_MULTIMODAL_BF16
MMPROJ_QWEN3_5_0_8B_MULTIMODAL_F16
QWEN3_5_2B_MULTIMODAL_Q4_K_M
QWEN3_5_2B_MULTIMODAL_Q6_K
MMPROJ_QWEN3_5_2B_MULTIMODAL_BF16
MMPROJ_QWEN3_5_2B_MULTIMODAL_F16
QWEN3_5_4B_MULTIMODAL_Q4_K_M
QWEN3_5_4B_MULTIMODAL_Q6_K
MMPROJ_QWEN3_5_4B_MULTIMODAL_BF16
MMPROJ_QWEN3_5_4B_MULTIMODAL_F16
QWEN3_5_9B_MULTIMODAL_Q4_K_M
QWEN3_5_9B_MULTIMODAL_Q6_K
MMPROJ_QWEN3_5_9B_MULTIMODAL_BF16
MMPROJ_QWEN3_5_9B_MULTIMODAL_F16
QWEN3_6_27B_MULTIMODAL_Q4_K_XL
QWEN3_6_27B_MULTIMODAL_Q6_K_XL
MMPROJ_QWEN3_6_27B_MULTIMODAL_BF16
MMPROJ_QWEN3_6_27B_MULTIMODAL_F16
QWEN3_6_35B_A3B_MULTIMODAL_Q4_K_M
QWEN3_6_35B_A3B_MULTIMODAL_Q6_K_XL
MMPROJ_QWEN3_6_35B_A3B_MULTIMODAL_BF16
MMPROJ_QWEN3_6_35B_A3B_MULTIMODAL_F16

Updated models

BERGAMOT_EN_BG_LEX
BERGAMOT_EN_BG
BERGAMOT_EN_BG_VOCAB
BERGAMOT_EN_HR_LEX
BERGAMOT_EN_HR
BERGAMOT_EN_HR_VOCAB
BERGAMOT_EN_NL_LEX
BERGAMOT_EN_NL
BERGAMOT_EN_NL_VOCAB
BERGAMOT_METADATA_13

Renumbered metadata constants (not removed)

Inserting 11 new Bergamot language pairs (EN_BS, EN_HI, EN_NB, EN_NO, EN_SR, EN_TH, EN_VI, EN_ZH, base, NO_EN, TH_EN, ZH_EN) in alphabetical order shifted the sequential BERGAMOT_METADATA_N numbering for all entries after the insertion points. Two constants that existed before now resolve to different language pairs:

  • BERGAMOT_METADATA_28: was bergamot-enhr (en→Croatian), now bergamot-enhi (en→Hindi)
  • BERGAMOT_METADATA_40: was bergamot-ennl (en→Dutch), now bergamot-enms (en→Malay)

Both constants still exist. Consumers using the stable language-pair constants (BERGAMOT_EN_HR, BERGAMOT_EN_NL) are unaffected. The sequential BERGAMOT_METADATA_N naming is a pre-existing update-models tooling limitation.

@donriddo donriddo changed the title feat(sdk): expose diffusion_fa in sdcppConfigSchema feat[api]: expose diffusion_fa in sdcppConfigSchema May 14, 2026
@donriddo donriddo force-pushed the feat/sdk-expose-diffusion-fa branch from c225999 to a67d93a Compare May 14, 2026 07:57
@donriddo donriddo changed the title feat[api]: expose diffusion_fa in sdcppConfigSchema QVAC-18873 feat[api]: expose diffusion_fa in sdcppConfigSchema May 15, 2026
@donriddo donriddo added the test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] label May 15, 2026
@donriddo donriddo force-pushed the feat/sdk-expose-diffusion-fa branch from 66c2a6c to 1afbc75 Compare May 17, 2026 09:38
@donriddo donriddo marked this pull request as ready for review May 17, 2026 09:38
@donriddo donriddo requested review from a team as code owners May 17, 2026 09:38
@donriddo donriddo changed the title QVAC-18873 feat[api]: expose diffusion_fa in sdcppConfigSchema QVAC-18873 feat[api,mod]: expose diffusion_fa, drop flux_flow, sync model registry May 17, 2026
donriddo added 2 commits May 18, 2026 21:43
Adds diffusion_fa to sdcppConfigSchema so callers can explicitly
control per-transformer flash attention. The addon enables this by
default (required for FLUX.2 to avoid materialising the full Q·Kᵀ
attention matrix); the field is a no-op escape hatch for backends
that don't support ggml_flash_attn_ext.

The plugin's ...rest spread already forwards it to the native layer;
no plugin changes required.
flux_flow (FLUX.1) was never a supported model family — only flux2_flow
(FLUX.2) is. Remove the stale enum value so the SDK schema matches the
diffusion addon surface.
@donriddo donriddo added the test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] label May 20, 2026
@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — android — ✅ all tests passed (83/91, 2374s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports · Device Farm logs

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — windows — ✅ all tests passed (91/91, 372s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — ios — ✅ all tests passed (82/91, 1282s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports · Device Farm logs

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — linux — ✅ all tests passed (91/91, 246s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

QVAC E2E — macos — ✅ all tests passed (91/91, 321s)

Config: suite=smoke · filter=(none) · exclude=(none)
View run · Artifacts: reports

@github-actions github-actions Bot added the e2e-tested Test suite has run on this PR. Does not indicate tests pass/fail - see results in comments. label May 20, 2026
@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@github-actions

github-actions Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (1/1)
- 1 Team Lead OR Management approval ✅ (1/1)



---
*This comment is automatically updated when reviews change.*

@donriddo

Copy link
Copy Markdown
Contributor Author

/review

@gianni-cor

Copy link
Copy Markdown
Contributor

/review

@gianni-cor gianni-cor merged commit e21fda0 into tetherto:main May 20, 2026
18 checks passed
Proletter pushed a commit that referenced this pull request May 24, 2026
…odel registry (#2046)

* feat(sdk): expose diffusion_fa in sdcppConfigSchema

Adds diffusion_fa to sdcppConfigSchema so callers can explicitly
control per-transformer flash attention. The addon enables this by
default (required for FLUX.2 to avoid materialising the full Q·Kᵀ
attention matrix); the field is a no-op escape hatch for backends
that don't support ggml_flash_attn_ext.

The plugin's ...rest spread already forwards it to the native layer;
no plugin changes required.

* fix(sdk): remove flux_flow from prediction enum

flux_flow (FLUX.1) was never a supported model family — only flux2_flow
(FLUX.2) is. Remove the stale enum value so the SDK schema matches the
diffusion addon surface.

* fix(sdk): simplify diffusion_fa description and add unit test coverage

Shorten the describe() string to match the terse style of adjacent
boolean fields. Add diffusion_fa to the "accepts valid full config"
fixture in sdcpp-plugin.test.ts so the field has schema-parse coverage.

* test(sdk): add diffusion_fa E2E test to tests-qvac

Adds a dedicated 'diffusion-fa' resource in the desktop consumer loaded with diffusion_fa: true, a matching executor method that calls ensureLoaded('diffusion-fa'), and a test definition 'diffusion-fa-accepted' that generates a 256x256 image through the full SDK -> plugin -> addon path, confirming the field is accepted and forwarded without breaking inference.

* test(sdk): remove misleading comment from diffusion-fa resource

* test(sdk): add rejection tests for flux_flow and diffusion_fa type; fix E2E test name and remove redundant preload

Add two missing schema rejection tests: non-boolean diffusion_fa and the removed flux_flow prediction value. Rename diffusion-fa-accepted to diffusion-fa-loads-and-runs to match what the test actually verifies (load + generate, not FA effect). Remove preLoadUnload from diffusion-fa resource — it reuses the same Flux2 model files as the diffusion resource, so the extra load+unload at bootstrap is redundant cost.

* feat[mod](sdk): add Gemma4-E2B/E4B/31B, Qwen3.5-0.8B/2B/4B/9B, Qwen3.6-27B/35B-A3B to SDK registry

* fix[notask]: bump @qvac/diffusion-cpp to ^0.8.0

* test(sdk): prove diffusion_fa:false override path end-to-end

Unit test verifies sdcppConfigSchema preserves false through parsing (not
just rejects non-booleans). E2E adds diffusion-fa-disabled resource with
diffusion_fa:false and a matching test so the full SDK→plugin→addon path
is exercised for the opt-out case, not just the addon default.

* fix(sdk): replace hardcoded HF URLs with registry constants; add qwen35/gemma4 dialect E2E tests

Examples llamacpp-tools-qwen35 and llamacpp-tools-gemma4 were using raw
HuggingFace URLs as fallback defaults because the registry had not yet been
seeded with Qwen3.5 and Gemma4 models. Now that those constants exist
(QWEN3_5_0_8B_MULTIMODAL_Q8_0, GEMMA4_2B_MULTIMODAL_Q4_K_M), use them
directly, matching the pattern of all other SDK examples.

Adds tools-qwen35 and tools-gemma4 resources to the desktop consumer and
two dialect-specific E2E tests (tools-simple-function-qwen35,
tools-simple-function-gemma4). PR #1974 wired toolDialect and resourceKey
through ToolsExecutor and createToolsTest specifically to enable these tests
once constants were available.

---------

Co-authored-by: gianni-cor <gianfranco.cordella@tether.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

e2e-tested Test suite has run on this PR. Does not indicate tests pass/fail - see results in comments. test-e2e-smoke Triggers smoke e2e test suite [Currently SDK-only] verified Authorize secrets / label-gate in PR workflows verify

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants