|  | 
| 9 | 9 |    Before the pre-built Python wheel can be installed via `pip`, a few | 
| 10 | 10 |    prerequisites must be put into place: | 
| 11 | 11 | 
 | 
|  | 12 | +   Install CUDA Toolkit following the [CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) and | 
|  | 13 | +   make sure `CUDA_HOME` environment variable is properly set. | 
|  | 14 | + | 
| 12 | 15 |    ```bash | 
| 13 |  | -   # Optional step: Only required for Blackwell and Grace Hopper | 
|  | 16 | +   # Optional step: Only required for NVIDIA Blackwell GPUs and SBSA platform | 
| 14 | 17 |    pip3 install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 | 
| 15 | 18 | 
 | 
|  | 19 | +   # Optional step: Workaround for deep_gemm installation failure on SBSA platform | 
|  | 20 | +   # The actual deep_gemm package and version should be obtained from the requirements.txt file. | 
|  | 21 | +   pip3 install 'deep_gemm @ git+https://github.com/zongfeijing/DeepGEMM.git@a9d538ef4dff0326fe521c6ca0bfde115703b56a' \ | 
|  | 22 | +       --extra-index-url https://download.pytorch.org/whl/cu128 | 
|  | 23 | + | 
| 16 | 24 |    sudo apt-get -y install libopenmpi-dev | 
| 17 | 25 |    ``` | 
| 18 | 26 | 
 | 
| 19 |  | -   PyTorch CUDA 12.8 package is required for supporting NVIDIA Blackwell and Grace Hopper GPUs. On prior GPUs, this extra installation is not required. | 
|  | 27 | +   PyTorch CUDA 12.8 package is required for supporting NVIDIA Blackwell GPUs and SBSA platform. On prior GPUs or Linux x86_64 platform, this extra installation is not required. | 
| 20 | 28 | 
 | 
| 21 | 29 |    ```{tip} | 
| 22 | 30 |    Instead of manually installing the preqrequisites as described | 
| @@ -55,16 +63,3 @@ There are some known limitations when you pip install pre-built TensorRT-LLM whe | 
| 55 | 63 |     when OMPI was not configured --with-slurm and we weren't able | 
| 56 | 64 |     to discover a SLURM installation in the usual places. | 
| 57 | 65 |     ``` | 
| 58 |  | - | 
| 59 |  | -2. CUDA Toolkit | 
| 60 |  | - | 
| 61 |  | -    `pip install tensorrt-llm` won't install CUDA toolkit in your system, and the CUDA Toolkit is not required if want to just deploy a TensorRT-LLM engine. | 
| 62 |  | -    TensorRT-LLM uses the [ModelOpt](https://nvidia.github.io/TensorRT-Model-Optimizer/) to quantize a model, while the ModelOpt requires CUDA toolkit to jit compile certain kernels which is not included in the pytorch to do quantization effectively. | 
| 63 |  | -    Please install CUDA toolkit when you see the following message when running ModelOpt quantization. | 
| 64 |  | - | 
| 65 |  | -    ``` | 
| 66 |  | -    /usr/local/lib/python3.10/dist-packages/modelopt/torch/utils/cpp_extension.py:65: | 
| 67 |  | -    UserWarning: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. | 
| 68 |  | -    Unable to load extension modelopt_cuda_ext and falling back to CPU version. | 
| 69 |  | -    ``` | 
| 70 |  | -    The installation of CUDA toolkit can be found in [CUDA Toolkit Documentation](https://docs.nvidia.com/cuda/). | 
0 commit comments