From f85cfcfacdbb593d2ec6c5409e99194d87599352 Mon Sep 17 00:00:00 2001 From: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com> Date: Wed, 13 Aug 2025 14:46:21 +0800 Subject: [PATCH] Add the workaround doc for H200 OOM Signed-off-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com> --- .../quick-start-recipe-for-deepseek-r1-on-trtllm.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/source/deployment-guide/quick-start-recipe-for-deepseek-r1-on-trtllm.md b/docs/source/deployment-guide/quick-start-recipe-for-deepseek-r1-on-trtllm.md index 8e06b8c55f9..070b2c18038 100644 --- a/docs/source/deployment-guide/quick-start-recipe-for-deepseek-r1-on-trtllm.md +++ b/docs/source/deployment-guide/quick-start-recipe-for-deepseek-r1-on-trtllm.md @@ -235,11 +235,12 @@ Here is an example response, showing that the TRT-LLM server returns “New York ### Troubleshooting Tips -* If you encounter CUDA out-of-memory errors, try reducing max\_batch\_size or max\_seq\_len -* Ensure your model checkpoints are compatible with the expected format -* For performance issues, check GPU utilization with nvidia-smi while the server is running -* If the container fails to start, verify that the NVIDIA Container Toolkit is properly installed -* For connection issues, make sure port 8000 is not being used by another application +* If you encounter CUDA out-of-memory errors, try reducing `max_batch_size` or `max_seq_len`. + * For running input/output sequence lengths of 8K/1K on H200, there is a known CUDA Out-Of-Memory issue caused by the PyTorch CUDA Caching Allocator fragmenting memory. As a workaround, you can set the environment variable `PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:8192`. For more details, please refer to the [PyTorch documentation on optimizing memory usage](https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf). +* Ensure your model checkpoints are compatible with the expected format. +* For performance issues, check GPU utilization with nvidia-smi while the server is running. +* If the container fails to start, verify that the NVIDIA Container Toolkit is properly installed. +* For connection issues, make sure port 8000 is not being used by another application. ### Running Evaluations to Verify Accuracy (Optional)