Skip to content

Commit d53cc23

Browse files
authored
[https://nvbugs/5433581][infra] Update install docs and CI script for SBSA deep_gemm workaround (#6607)
Signed-off-by: Yanchao Lu <[email protected]>
1 parent a178cea commit d53cc23

File tree

2 files changed

+24
-17
lines changed

2 files changed

+24
-17
lines changed

docs/source/installation/linux.md

Lines changed: 10 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9,14 +9,22 @@
99
Before the pre-built Python wheel can be installed via `pip`, a few
1010
prerequisites must be put into place:
1111

12+
Install CUDA Toolkit following the [CUDA Installation Guide for Linux](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) and
13+
make sure `CUDA_HOME` environment variable is properly set.
14+
1215
```bash
13-
# Optional step: Only required for Blackwell and Grace Hopper
16+
# Optional step: Only required for NVIDIA Blackwell GPUs and SBSA platform
1417
pip3 install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
1518

19+
# Optional step: Workaround for deep_gemm installation failure on SBSA platform
20+
# The actual deep_gemm package and version should be obtained from the requirements.txt file.
21+
pip3 install 'deep_gemm @ git+https://github.com/zongfeijing/DeepGEMM.git@a9d538ef4dff0326fe521c6ca0bfde115703b56a' \
22+
--extra-index-url https://download.pytorch.org/whl/cu128
23+
1624
sudo apt-get -y install libopenmpi-dev
1725
```
1826

19-
PyTorch CUDA 12.8 package is required for supporting NVIDIA Blackwell and Grace Hopper GPUs. On prior GPUs, this extra installation is not required.
27+
PyTorch CUDA 12.8 package is required for supporting NVIDIA Blackwell GPUs and SBSA platform. On prior GPUs or Linux x86_64 platform, this extra installation is not required.
2028

2129
```{tip}
2230
Instead of manually installing the preqrequisites as described
@@ -55,16 +63,3 @@ There are some known limitations when you pip install pre-built TensorRT-LLM whe
5563
when OMPI was not configured --with-slurm and we weren't able
5664
to discover a SLURM installation in the usual places.
5765
```
58-
59-
2. CUDA Toolkit
60-
61-
`pip install tensorrt-llm` won't install CUDA toolkit in your system, and the CUDA Toolkit is not required if want to just deploy a TensorRT-LLM engine.
62-
TensorRT-LLM uses the [ModelOpt](https://nvidia.github.io/TensorRT-Model-Optimizer/) to quantize a model, while the ModelOpt requires CUDA toolkit to jit compile certain kernels which is not included in the pytorch to do quantization effectively.
63-
Please install CUDA toolkit when you see the following message when running ModelOpt quantization.
64-
65-
```
66-
/usr/local/lib/python3.10/dist-packages/modelopt/torch/utils/cpp_extension.py:65:
67-
UserWarning: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
68-
Unable to load extension modelopt_cuda_ext and falling back to CPU version.
69-
```
70-
The installation of CUDA toolkit can be found in [CUDA Toolkit Documentation](https://docs.nvidia.com/cuda/).

jenkins/L0_Test.groovy

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2091,19 +2091,31 @@ def launchTestJobs(pipeline, testFilter, dockerNode=null)
20912091
trtllm_utils.llmExecStepWithRetry(pipeline, script: "pip3 uninstall -y tensorrt")
20922092
if (values[5] != DLFW_IMAGE) {
20932093
def ubuntu_version = key.contains("UB2404") ? "ubuntu2404" : "ubuntu2204"
2094-
def platform = values[2] == X86_64_TRIPLE ? "x86_64" : "sbsa"
2094+
def platform = cpu_arch == X86_64_TRIPLE ? "x86_64" : "sbsa"
20952095
trtllm_utils.llmExecStepWithRetry(pipeline, script: "wget https://developer.download.nvidia.com/compute/cuda/repos/${ubuntu_version}/${platform}/cuda-keyring_1.1-1_all.deb")
20962096
trtllm_utils.llmExecStepWithRetry(pipeline, script: "dpkg -i cuda-keyring_1.1-1_all.deb")
20972097
trtllm_utils.llmExecStepWithRetry(pipeline, script: "apt-get update")
20982098
trtllm_utils.llmExecStepWithRetry(pipeline, script: "apt-get -y install cuda-toolkit-12-9")
20992099
}
21002100

2101-
// Extra PyTorch CUDA 12.8 install
2101+
// Extra PyTorch CUDA 12.8 install for SBSA platform and Blackwell GPUs bare-metal environments
21022102
if (values[6]) {
21032103
echo "###### Extra PyTorch CUDA 12.8 install Start ######"
21042104
trtllm_utils.llmExecStepWithRetry(pipeline, script: "pip3 install torch==2.7.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128")
21052105
}
21062106

2107+
// Workaround for https://nvbugs/5433581 where deep_gemm installation fails on SBSA platform
2108+
if (cpu_arch == AARCH64_TRIPLE) {
2109+
echo "###### Workaround for https://nvbugs/5433581 Start ######"
2110+
def deepGemmLine = readFile("${LLM_ROOT}/requirements.txt").readLines().find { it.trim().startsWith('deep_gemm') }
2111+
if (deepGemmLine) {
2112+
trtllm_utils.llmExecStepWithRetry(pipeline, script: "pip3 install '${deepGemmLine.trim()}' --extra-index-url https://download.pytorch.org/whl/cu128")
2113+
}
2114+
else {
2115+
echo "deep_gemm package not found in requirements.txt"
2116+
}
2117+
}
2118+
21072119
def libEnv = []
21082120
if (env.alternativeTRT) {
21092121
stage("Replace TensorRT") {

0 commit comments

Comments
 (0)