Skip to content

convert : support mixed-precision ModelOpt models with per-tensor NVFP4/FP8 quantization#20539

Merged
CISC merged 3 commits intoggml-org:masterfrom
richarddd:fix/nvfp4-mixed-precision-convert
Mar 16, 2026
Merged

convert : support mixed-precision ModelOpt models with per-tensor NVFP4/FP8 quantization#20539
CISC merged 3 commits intoggml-org:masterfrom
richarddd:fix/nvfp4-mixed-precision-convert

Conversation

@richarddd
Copy link
Contributor

Adds support for converting mixed-precision ModelOpt models (e.g. nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4) that use per-tensor quant_algo with both NVFP4 and FP8 layers, instead of a single global quant_algo: "NVFP4". NVFP4 tensors (2D scales) are repacked natively while FP8 tensors (1D scales) are dequantized to float.

Fixes: #20504

@richarddd richarddd requested a review from CISC as a code owner March 14, 2026 07:21
@github-actions github-actions bot added the python python script changes label Mar 14, 2026
@vbooka1
Copy link

vbooka1 commented Mar 14, 2026

Either llama.cpp or convert_hf_to_gguf.py is broken, model https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 does not work with builds 8304 (first supporting Nemotron 3) and 8334 (latest): I'm getting error llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 844, got 843.

Please check.

convert.txt

launch.txt

@richarddd
Copy link
Contributor Author

Either llama.cpp or convert_hf_to_gguf.py is broken, model https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 does not work with builds 8304 (first supporting Nemotron 3) and 8334 (latest): I'm getting error llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 844, got 843.

Please check.

convert.txt

launch.txt

Yeah something is off. I didn’t properly smoke test due to lack of memory.

@richarddd richarddd marked this pull request as draft March 14, 2026 20:47
@CISC
Copy link
Member

CISC commented Mar 14, 2026

@vbooka1 @richarddd Fixed by #20506

@richarddd richarddd marked this pull request as ready for review March 15, 2026 06:16
@richarddd richarddd force-pushed the fix/nvfp4-mixed-precision-convert branch from 3530623 to 585e8da Compare March 15, 2026 06:19
@CISC CISC merged commit 079e5a4 into ggml-org:master Mar 16, 2026
6 checks passed
@richarddd richarddd deleted the fix/nvfp4-mixed-precision-convert branch March 16, 2026 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: convert_hf_to_gguf.py does not support Nemotron3 NVFP4

3 participants