diff --git a/docs/advanced_features/quantization.md b/docs/advanced_features/quantization.md index 18ef0e8a0516..da4c6d57b718 100644 --- a/docs/advanced_features/quantization.md +++ b/docs/advanced_features/quantization.md @@ -189,7 +189,7 @@ python3 -m sglang.launch_server \ --port 30000 --host 0.0.0.0 ``` -#### Using [NVIDIA ModelOpt](https://github.com/NVIDIA/TensorRT-Model-Optimizer) +#### Using [NVIDIA ModelOpt](https://github.com/NVIDIA/Model-Optimizer) NVIDIA Model Optimizer (ModelOpt) provides advanced quantization techniques optimized for NVIDIA hardware. SGLang includes a streamlined workflow for quantizing models with ModelOpt and automatically exporting them for deployment. @@ -378,7 +378,7 @@ python3 -m sglang.launch_server \ - [GPTQModel](https://github.com/ModelCloud/GPTQModel) - [LLM Compressor](https://github.com/vllm-project/llm-compressor/) -- [NVIDIA Model Optimizer (ModelOpt)](https://github.com/NVIDIA/TensorRT-Model-Optimizer) +- [NVIDIA Model Optimizer (ModelOpt)](https://github.com/NVIDIA/Model-Optimizer) - [Torchao: PyTorch Architecture Optimization](https://github.com/pytorch/ao) - [vLLM Quantization](https://docs.vllm.ai/en/latest/quantization/) - [auto-round](https://github.com/intel/auto-round)