Skip to content

[Bug]: EXAONE 4.0 with VSWA trtllm-serve Segmentation fault during inference #8038

@lkm2835

Description

@lkm2835

System Info

NVIDIA H200
NVIDIA A100

[TensorRT-LLM] TensorRT LLM version: 1.1.0rc6

  1. TensorRT-LLM Sep 27, 2025 Version: https://github.com/NVIDIA/TensorRT-LLM/tree/a36b48bcab30dfbb64a98c299718aa64d06359d9
    Related PRs:
    [TRTLLM-6341][feature] Support SWA KV cache #6768 merged
    [None][feature] Add environment variable to adjust block pool allocation ration under kv cache manager #7923 merged
    [TLLM-6777][feature] Support SWA KV cache reuse OOW block detach #7922 not merged yet

  2. TensorRT-LLM Sep 24, 2025 Version: https://github.com/NVIDIA/TensorRT-LLM/tree/b890d7fea40667fc7131bcdc26da3cc8f91e3bde (similar error)
    Related PR:
    [TRTLLM-6341][feature] Support SWA KV cache #6768 merged

Who can help?

@qixiang-99
@eopXD

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

trtllm-serve command

CUDA_VISIBLE_DEVICES=0,1,2,3 trtllm-serve EXAONE-4.0-32B --backend pytorch --tp_size 1 --extra_llm_api_options config.yml --kv_cache_free_gpu_memory_fraction 0.9

config.yml

kv_cache_config:
  enable_block_reuse: false
  max_attention_window: [4096,4096,4096,131072]
enable_chunked_prefill: true

client with openai-python API

Input Sequence Length : 14336
Output Sequence Length : 2048
Number of Concurrent Client : 16

Expected behavior

The model with VSWA should be served successfully through trtllm-serve.

actual behavior

  1. TensorRT-LLM Sep 27, 2025 Version (H200)
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
==== backtrace (tid:    676) ====
 0  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7551f1c93774]
 1  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x3796a) [0x7551f1c9396a]
 2  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x37ba8) [0x7551f1c93ba8]
 3  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm13batch_manager16kv_cache_manager12KVCacheBlock7hasRefsEv+0x4) [0x754eb804d2f4]
 4  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager18WindowBlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x3ee) [0x754eb805d54e]
 5  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager12BlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0xa3) [0x754eb805dd13]
 6  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager14KVCacheManager14removeSequenceEmNS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x333) [0x754eb805e073]
 7  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x101d5b) [0x754eeb798d5b]
 8  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x2cae8c) [0x754eeb961e8c]
 9  /usr/bin/python(PyObject_Vectorcall+0x35) [0x549985]
10  /usr/bin/python(_PyEval_EvalFrameDefault+0xadf) [0x5d6b2f]
11  /usr/bin/python() [0x54cb32]
12  /usr/bin/python(_PyEval_EvalFrameDefault+0x4cc6) [0x5dad16]
13  /usr/bin/python() [0x54cb32]
14  /usr/bin/python() [0x6f80dc]
15  /usr/bin/python() [0x6b931c]
16  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7551f463baa4]
17  /usr/lib/x86_64-linux-gnu/libc.so.6(__clone+0x44) [0x7551f46c8a64]
=================================
 *** Process received signal ***
 Signal: Segmentation fault (11)
 Signal code:  (-6)
 Failing at address: 0x230
 [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x7551f45e4330]
 [ 1] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm13batch_manager16kv_cache_manager12KVCacheBlock7hasRefsEv+0x4)[0x754eb804d2f4]
 [ 2] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager18WindowBlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x3ee)[0x754eb805d54e]
 [ 3] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager12BlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0xa3)[0x754eb805dd13]
 [ 4] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager14KVCacheManager14removeSequenceEmNS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x333)[0x754eb805e073]
 [ 5] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x101d5b)[0x754eeb798d5b]
 [ 6] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x2cae8c)[0x754eeb961e8c]
 [ 7] /usr/bin/python(PyObject_Vectorcall+0x35)[0x549985]
 [ 8] /usr/bin/python(_PyEval_EvalFrameDefault+0xadf)[0x5d6b2f]
 [ 9] /usr/bin/python[0x54cb32]
 [10] /usr/bin/python(_PyEval_EvalFrameDefault+0x4cc6)[0x5dad16]
 [11] /usr/bin/python[0x54cb32]
 [12] /usr/bin/python[0x6f80dc]
 [13] /usr/bin/python[0x6b931c]
 [14] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7551f463baa4]
 [15] /usr/lib/x86_64-linux-gnu/libc.so.6(__clone+0x44)[0x7551f46c8a64]
 *** End of error message ***
  1. TensorRT-LLM Sep 24, 2025 Version (A100)
Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:   1728) ====
 0  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7effbd6dd774]
 1  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x3796a) [0x7effbd6dd96a]
 2  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x37ba8) [0x7effbd6ddba8]
 3  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm13batch_manager16kv_cache_manager12KVCacheBlock7hasRefsEv+0x4) [0x7efdf321ce14]
 4  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager18WindowBlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x3ee) [0x7efdf322ca5e]
 5  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager12BlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0xa3) [0x7efdf322d223]
 6  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager14KVCacheManager14removeSequenceEmNS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x353) [0x7efdf322d5a3]
 7  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x101d5b) [0x7efe31da2d5b]
 8  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x2cae8c) [0x7efe31f6be8c]
 9  /usr/bin/python(PyObject_Vectorcall+0x35) [0x549985]
10  /usr/bin/python(_PyEval_EvalFrameDefault+0xadf) [0x5d6b2f]
11  /usr/bin/python() [0x54cb32]
12  /usr/bin/python(_PyEval_EvalFrameDefault+0x4cc6) [0x5dad16]
13  /usr/bin/python() [0x54cb32]
14  /usr/bin/python() [0x6f80dc]
15  /usr/bin/python() [0x6b931c]
16  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7effc0494aa4]
17  /usr/lib/x86_64-linux-gnu/libc.so.6(__clone+0x44) [0x7effc0521a64]

additional notes

This is a re-test after resolving the issue. #7741

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Labels

KV-Cache Managementkv-cache management for efficient LLM inferencePytorch<NV>Pytorch backend related issuesbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions