Skip to content

Commit 75e0535

Browse files
committed
add doc for qwen3
Signed-off-by: Dylan Chen <[email protected]>
1 parent 26e5494 commit 75e0535

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

examples/models/core/qwen/README.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -652,7 +652,7 @@ trtllm-eval --model=Qwen3-30B-A3B/ --tokenizer=Qwen3-30B-A3B/ --backend=pytorch
652652

653653
```
654654

655-
### Model Quantization to FP4
655+
### Model Quantization
656656

657657
To quantize the Qwen3 model for use with the PyTorch backend, we'll use NVIDIA's Model Optimizer (ModelOpt) tool. Follow these steps:
658658

@@ -665,12 +665,15 @@ pushd TensorRT-Model-Optimizer
665665
pip install -e .
666666

667667
# Quantize the Qwen3-235B-A22B model by nvfp4
668+
# By default, the checkpoint would be stored in `TensorRT-Model-Optimizer/examples/llm_ptq/saved_models_Qwen3-235B-A22B_nvfp4_hf/`.
668669
./examples/llm_ptq/scripts/huggingface_example.sh --model Qwen3-235B-A22B/ --quant nvfp4 --export_fmt hf
670+
671+
# Quantize the Qwen3-32B model by fp8_pc_pt
672+
# By default, the checkpoint would be stored in `TensorRT-Model-Optimizer/examples/llm_ptq/saved_models_Qwen3-32B_fp8_pc_pt_hf/`.
673+
./examples/llm_ptq/scripts/huggingface_example.sh --model Qwen3-32B/ --quant fp8_pc_pt --export_fmt hf
669674
popd
670675
```
671676

672-
By default, the checkpoint would be stored in `TensorRT-Model-Optimizer/examples/llm_ptq/saved_models_Qwen3-235B-A22B_nvfp4_hf/`.
673-
674677
### Benchmark
675678

676679
To run the benchmark, we suggest using the `trtllm-bench` tool. Please refer to the following script on B200:

0 commit comments

Comments
 (0)