-
Couldn't load subscription status.
- Fork 1.8k
Description
System Info
NVIDIA H200
NVIDIA A100
[TensorRT-LLM] TensorRT LLM version: 1.1.0rc6
-
TensorRT-LLM Sep 27, 2025 Version: https://github.com/NVIDIA/TensorRT-LLM/tree/a36b48bcab30dfbb64a98c299718aa64d06359d9
Related PRs:
[TRTLLM-6341][feature] Support SWA KV cache #6768 merged
[None][feature] Add environment variable to adjust block pool allocation ration under kv cache manager #7923 merged
[TLLM-6777][feature] Support SWA KV cache reuse OOW block detach #7922 not merged yet -
TensorRT-LLM Sep 24, 2025 Version: https://github.com/NVIDIA/TensorRT-LLM/tree/b890d7fea40667fc7131bcdc26da3cc8f91e3bde (similar error)
Related PR:
[TRTLLM-6341][feature] Support SWA KV cache #6768 merged
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
trtllm-serve command
CUDA_VISIBLE_DEVICES=0,1,2,3 trtllm-serve EXAONE-4.0-32B --backend pytorch --tp_size 1 --extra_llm_api_options config.yml --kv_cache_free_gpu_memory_fraction 0.9
config.yml
kv_cache_config:
enable_block_reuse: false
max_attention_window: [4096,4096,4096,131072]
enable_chunked_prefill: true
client with openai-python API
Input Sequence Length : 14336
Output Sequence Length : 2048
Number of Concurrent Client : 16
Expected behavior
The model with VSWA should be served successfully through trtllm-serve.
actual behavior
- TensorRT-LLM Sep 27, 2025 Version (H200)
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
==== backtrace (tid: 676) ====
0 /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7551f1c93774]
1 /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x3796a) [0x7551f1c9396a]
2 /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x37ba8) [0x7551f1c93ba8]
3 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm13batch_manager16kv_cache_manager12KVCacheBlock7hasRefsEv+0x4) [0x754eb804d2f4]
4 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager18WindowBlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x3ee) [0x754eb805d54e]
5 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager12BlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0xa3) [0x754eb805dd13]
6 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager14KVCacheManager14removeSequenceEmNS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x333) [0x754eb805e073]
7 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x101d5b) [0x754eeb798d5b]
8 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x2cae8c) [0x754eeb961e8c]
9 /usr/bin/python(PyObject_Vectorcall+0x35) [0x549985]
10 /usr/bin/python(_PyEval_EvalFrameDefault+0xadf) [0x5d6b2f]
11 /usr/bin/python() [0x54cb32]
12 /usr/bin/python(_PyEval_EvalFrameDefault+0x4cc6) [0x5dad16]
13 /usr/bin/python() [0x54cb32]
14 /usr/bin/python() [0x6f80dc]
15 /usr/bin/python() [0x6b931c]
16 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7551f463baa4]
17 /usr/lib/x86_64-linux-gnu/libc.so.6(__clone+0x44) [0x7551f46c8a64]
=================================
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: (-6)
Failing at address: 0x230
[ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x7551f45e4330]
[ 1] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm13batch_manager16kv_cache_manager12KVCacheBlock7hasRefsEv+0x4)[0x754eb804d2f4]
[ 2] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager18WindowBlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x3ee)[0x754eb805d54e]
[ 3] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager12BlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0xa3)[0x754eb805dd13]
[ 4] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager14KVCacheManager14removeSequenceEmNS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x333)[0x754eb805e073]
[ 5] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x101d5b)[0x754eeb798d5b]
[ 6] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x2cae8c)[0x754eeb961e8c]
[ 7] /usr/bin/python(PyObject_Vectorcall+0x35)[0x549985]
[ 8] /usr/bin/python(_PyEval_EvalFrameDefault+0xadf)[0x5d6b2f]
[ 9] /usr/bin/python[0x54cb32]
[10] /usr/bin/python(_PyEval_EvalFrameDefault+0x4cc6)[0x5dad16]
[11] /usr/bin/python[0x54cb32]
[12] /usr/bin/python[0x6f80dc]
[13] /usr/bin/python[0x6b931c]
[14] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7551f463baa4]
[15] /usr/lib/x86_64-linux-gnu/libc.so.6(__clone+0x44)[0x7551f46c8a64]
*** End of error message ***
- TensorRT-LLM Sep 24, 2025 Version (A100)
Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid: 1728) ====
0 /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7effbd6dd774]
1 /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x3796a) [0x7effbd6dd96a]
2 /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x37ba8) [0x7effbd6ddba8]
3 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm13batch_manager16kv_cache_manager12KVCacheBlock7hasRefsEv+0x4) [0x7efdf321ce14]
4 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager18WindowBlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x3ee) [0x7efdf322ca5e]
5 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager12BlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0xa3) [0x7efdf322d223]
6 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager14KVCacheManager14removeSequenceEmNS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x353) [0x7efdf322d5a3]
7 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x101d5b) [0x7efe31da2d5b]
8 /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x2cae8c) [0x7efe31f6be8c]
9 /usr/bin/python(PyObject_Vectorcall+0x35) [0x549985]
10 /usr/bin/python(_PyEval_EvalFrameDefault+0xadf) [0x5d6b2f]
11 /usr/bin/python() [0x54cb32]
12 /usr/bin/python(_PyEval_EvalFrameDefault+0x4cc6) [0x5dad16]
13 /usr/bin/python() [0x54cb32]
14 /usr/bin/python() [0x6f80dc]
15 /usr/bin/python() [0x6b931c]
16 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7effc0494aa4]
17 /usr/lib/x86_64-linux-gnu/libc.so.6(__clone+0x44) [0x7effc0521a64]
additional notes
This is a re-test after resolving the issue. #7741
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.