Fix QMoE blockwise quantization support for TRT-RTX execution provider by anujj · Pull Request #1926 · microsoft/onnxruntime-genai

anujj · 2025-12-19T13:35:19Z

Add QMoE and BF16 support for TRT-RTX execution provider

Enable blockwise quantization for TRT-RTX/NvTensorRtRtx EPs
Add gpt_oss_swiglu_fusion option for separate gate/up weights
Add int4_qdq_block_size for MatMul quantization block size
Add BF16 precision support for TRT-RTX
Keep padding in QMoE weights for proper alignment

anujj · 2026-01-06T08:35:51Z

@kunal-vaishnavi @baijumeswani for review

…quantization

…ck_size for QMoE (default 128)

anskumar01 · 2026-01-12T14:26:33Z

#1861 has broken model builder for TRT RTX EP for the cases where we use int4_block_size in olive recipe. We need the fix to that.

- Remove bfloat16 scale conversion workaround (ORT 1.24 supports natively) - Fix zero_points: skip for TRT-RTX, always include for other EPs - Remove NvTensorRtRtx from internal EP checks (use 'trt-rtx' only) - Simplify make_qmoe_weights() to use int4_qmoe_block_size for all supported EPs (trt-rtx defaults to 128, cpu/webgpu default to 0)

anujj · 2026-01-15T17:23:08Z

@kunal-vaishnavi : i have addressed the issues, can u please have a look

thiagocrepaldi · 2026-01-20T01:36:54Z

@thpereir FYI

…aults - Rename int4_qmoe_block_size to qmoe_block_size (op supports int4 and int8) - Add CUDA to supported blockwise quantization EPs - Change default qmoe_block_size: 128 (trt-rtx), 32 (others) - Remove bfloat16 workarounds (ORT 1.24 supports natively) - Rename quant_attrs key from 'block_size' to 'qmoe_block_size'

anujj · 2026-01-20T17:02:10Z

Addressed the comments in the latest commit

thpereir · 2026-01-20T22:14:21Z

Still going over the PR and reviewing it

CUDA does not yet support block-wise quantization for QMoE

anujj · 2026-01-21T18:17:11Z

addressed @kunal-vaishnavi all comments

thpereir

Code lgtm. Also ran a quick gpt-oss regression and everything is working as expected

thiagocrepaldi · 2026-01-21T22:46:43Z

[like] Fernandes Crepaldi, Thiago reacted to your message:

…

________________________________ From: Thiago Pereira Rocha ***@***.***> Sent: Wednesday, January 21, 2026 9:32:28 PM To: microsoft/onnxruntime-genai ***@***.***> Cc: Fernandes Crepaldi, Thiago ***@***.***>; Comment ***@***.***> Subject: Re: [microsoft/onnxruntime-genai] Fix QMoE blockwise quantization support for TRT-RTX execution provider (PR #1926) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. @thpereir approved this pull request. Code lgtm. Also run a quick gpt-oss regression and everything is working as expected — Reply to this email directly, view it on GitHub<#1926 (review)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABJXM4PND4G3WCVQEJIVZ3T4H7V6ZAVCNFSM6AAAAACPRLIRRSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMOBZGQ4TANZZGU>. You are receiving this because you commented.Message ID: ***@***.***>

anujj marked this pull request as draft December 19, 2025 13:35

anujj marked this pull request as ready for review January 6, 2026 08:37

anujj added 2 commits January 6, 2026 17:01

Fix QMoE blockwise quantization support for TRT-RTX execution provider

581a564

remvoed madding

9f88bcd

anujj force-pushed the gpt_oss_trt_rtx branch from b9d8d44 to 9f88bcd Compare January 6, 2026 11:33

anujj added 5 commits January 6, 2026 17:39

trt-rtx guarg

9ae34f6

Only add zero_points inputs to QMoE when needed for Quark asymmetric …

6d4ebca

…quantization

Remove unfused SwiGLU, int4_block_size for MatMulNBits, int4_qmoe_blo…

60dfcdf

…ck_size for QMoE (default 128)

cuda dont support block size qnat for MOE

18edebb

minor fixes

73c67ed

kunal-vaishnavi reviewed Jan 7, 2026

View reviewed changes

Comment thread src/python/py/models/builders/base.py Outdated

kunal-vaishnavi reviewed Jan 7, 2026

View reviewed changes

Comment thread src/python/py/models/builders/base.py

kunal-vaishnavi reviewed Jan 7, 2026

View reviewed changes

Comment thread src/python/py/models/builders/gptoss.py Outdated

kunal-vaishnavi reviewed Jan 7, 2026

View reviewed changes

Comment thread src/python/py/models/builder.py Outdated

kunal-vaishnavi reviewed Jan 7, 2026

View reviewed changes

Comment thread src/python/py/models/builders/base.py Outdated