File tree Expand file tree Collapse file tree 1 file changed +6
-3
lines changed
examples/models/core/qwen Expand file tree Collapse file tree 1 file changed +6
-3
lines changed Original file line number Diff line number Diff line change @@ -652,7 +652,7 @@ trtllm-eval --model=Qwen3-30B-A3B/ --tokenizer=Qwen3-30B-A3B/ --backend=pytorch
652652
653653```
654654
655- ### Model Quantization to FP4
655+ ### Model Quantization
656656
657657To quantize the Qwen3 model for use with the PyTorch backend, we'll use NVIDIA's Model Optimizer (ModelOpt) tool. Follow these steps:
658658
@@ -665,12 +665,15 @@ pushd TensorRT-Model-Optimizer
665665pip install -e .
666666
667667# Quantize the Qwen3-235B-A22B model by nvfp4
668+ # By default, the checkpoint would be stored in `TensorRT-Model-Optimizer/examples/llm_ptq/saved_models_Qwen3-235B-A22B_nvfp4_hf/`.
668669./examples/llm_ptq/scripts/huggingface_example.sh --model Qwen3-235B-A22B/ --quant nvfp4 --export_fmt hf
670+
671+ # Quantize the Qwen3-32B model by fp8_pc_pt
672+ # By default, the checkpoint would be stored in `TensorRT-Model-Optimizer/examples/llm_ptq/saved_models_Qwen3-32B_fp8_pc_pt_hf/`.
673+ ./examples/llm_ptq/scripts/huggingface_example.sh --model Qwen3-32B/ --quant fp8_pc_pt --export_fmt hf
669674popd
670675```
671676
672- By default, the checkpoint would be stored in ` TensorRT-Model-Optimizer/examples/llm_ptq/saved_models_Qwen3-235B-A22B_nvfp4_hf/ ` .
673-
674677### Benchmark
675678
676679To run the benchmark, we suggest using the ` trtllm-bench ` tool. Please refer to the following script on B200:
You can’t perform that action at this time.
0 commit comments