[Bug]: EXAONE 4.0 with VSWA trtllm-serve Segmentation fault during inference

### System Info

NVIDIA H200
NVIDIA A100

[TensorRT-LLM] TensorRT LLM version: 1.1.0rc6

1. TensorRT-LLM Sep 27, 2025 Version: https://github.com/NVIDIA/TensorRT-LLM/tree/a36b48bcab30dfbb64a98c299718aa64d06359d9
Related PRs:
https://github.com/NVIDIA/TensorRT-LLM/pull/6768 merged
https://github.com/NVIDIA/TensorRT-LLM/pull/7923 merged
https://github.com/NVIDIA/TensorRT-LLM/pull/7922 not merged yet

2. TensorRT-LLM Sep 24, 2025 Version: https://github.com/NVIDIA/TensorRT-LLM/tree/b890d7fea40667fc7131bcdc26da3cc8f91e3bde (similar error)
Related PR:
https://github.com/NVIDIA/TensorRT-LLM/pull/6768 merged


### Who can help?

@qixiang-99 
@eopXD 

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

trtllm-serve command

```
CUDA_VISIBLE_DEVICES=0,1,2,3 trtllm-serve EXAONE-4.0-32B --backend pytorch --tp_size 1 --extra_llm_api_options config.yml --kv_cache_free_gpu_memory_fraction 0.9
```
config.yml
```
kv_cache_config:
  enable_block_reuse: false
  max_attention_window: [4096,4096,4096,131072]
enable_chunked_prefill: true
```

client with openai-python API
```
Input Sequence Length : 14336
Output Sequence Length : 2048
Number of Concurrent Client : 16
```

### Expected behavior

The model with VSWA should be served successfully through trtllm-serve.

### actual behavior

1. TensorRT-LLM Sep 27, 2025 Version (H200)
```
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
==== backtrace (tid:    676) ====
 0  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7551f1c93774]
 1  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x3796a) [0x7551f1c9396a]
 2  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x37ba8) [0x7551f1c93ba8]
 3  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm13batch_manager16kv_cache_manager12KVCacheBlock7hasRefsEv+0x4) [0x754eb804d2f4]
 4  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager18WindowBlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x3ee) [0x754eb805d54e]
 5  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager12BlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0xa3) [0x754eb805dd13]
 6  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager14KVCacheManager14removeSequenceEmNS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x333) [0x754eb805e073]
 7  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x101d5b) [0x754eeb798d5b]
 8  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x2cae8c) [0x754eeb961e8c]
 9  /usr/bin/python(PyObject_Vectorcall+0x35) [0x549985]
10  /usr/bin/python(_PyEval_EvalFrameDefault+0xadf) [0x5d6b2f]
11  /usr/bin/python() [0x54cb32]
12  /usr/bin/python(_PyEval_EvalFrameDefault+0x4cc6) [0x5dad16]
13  /usr/bin/python() [0x54cb32]
14  /usr/bin/python() [0x6f80dc]
15  /usr/bin/python() [0x6b931c]
16  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7551f463baa4]
17  /usr/lib/x86_64-linux-gnu/libc.so.6(__clone+0x44) [0x7551f46c8a64]
=================================
 *** Process received signal ***
 Signal: Segmentation fault (11)
 Signal code:  (-6)
 Failing at address: 0x230
 [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x7551f45e4330]
 [ 1] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm13batch_manager16kv_cache_manager12KVCacheBlock7hasRefsEv+0x4)[0x754eb804d2f4]
 [ 2] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager18WindowBlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x3ee)[0x754eb805d54e]
 [ 3] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager12BlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0xa3)[0x754eb805dd13]
 [ 4] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager14KVCacheManager14removeSequenceEmNS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x333)[0x754eb805e073]
 [ 5] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x101d5b)[0x754eeb798d5b]
 [ 6] /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x2cae8c)[0x754eeb961e8c]
 [ 7] /usr/bin/python(PyObject_Vectorcall+0x35)[0x549985]
 [ 8] /usr/bin/python(_PyEval_EvalFrameDefault+0xadf)[0x5d6b2f]
 [ 9] /usr/bin/python[0x54cb32]
 [10] /usr/bin/python(_PyEval_EvalFrameDefault+0x4cc6)[0x5dad16]
 [11] /usr/bin/python[0x54cb32]
 [12] /usr/bin/python[0x6f80dc]
 [13] /usr/bin/python[0x6b931c]
 [14] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4)[0x7551f463baa4]
 [15] /usr/lib/x86_64-linux-gnu/libc.so.6(__clone+0x44)[0x7551f46c8a64]
 *** End of error message ***
```

2. TensorRT-LLM Sep 24, 2025 Version (A100)
```
Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:   1728) ====
 0  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(ucs_handle_error+0x2e4) [0x7effbd6dd774]
 1  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x3796a) [0x7effbd6dd96a]
 2  /opt/hpcx/ompi/lib/openmpi/../../../ucx/lib/libucs.so.0(+0x37ba8) [0x7effbd6ddba8]
 3  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZNK12tensorrt_llm13batch_manager16kv_cache_manager12KVCacheBlock7hasRefsEv+0x4) [0x7efdf321ce14]
 4  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager18WindowBlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x3ee) [0x7efdf322ca5e]
 5  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager12BlockManager13releaseBlocksERNS1_17GenerationRequestENS_6common11OptionalRefIKNS0_10LlmRequestEEE+0xa3) [0x7efdf322d223]
 6  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm13batch_manager16kv_cache_manager14KVCacheManager14removeSequenceEmNS_6common11OptionalRefIKNS0_10LlmRequestEEE+0x353) [0x7efdf322d5a3]
 7  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x101d5b) [0x7efe31da2d5b]
 8  /usr/local/lib/python3.12/dist-packages/tensorrt_llm/bindings.cpython-312-x86_64-linux-gnu.so(+0x2cae8c) [0x7efe31f6be8c]
 9  /usr/bin/python(PyObject_Vectorcall+0x35) [0x549985]
10  /usr/bin/python(_PyEval_EvalFrameDefault+0xadf) [0x5d6b2f]
11  /usr/bin/python() [0x54cb32]
12  /usr/bin/python(_PyEval_EvalFrameDefault+0x4cc6) [0x5dad16]
13  /usr/bin/python() [0x54cb32]
14  /usr/bin/python() [0x6f80dc]
15  /usr/bin/python() [0x6b931c]
16  /usr/lib/x86_64-linux-gnu/libc.so.6(+0x9caa4) [0x7effc0494aa4]
17  /usr/lib/x86_64-linux-gnu/libc.so.6(__clone+0x44) [0x7effc0521a64]
```

### additional notes

This is a re-test after resolving the issue. https://github.com/NVIDIA/TensorRT-LLM/issues/7741



### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: EXAONE 4.0 with VSWA trtllm-serve Segmentation fault during inference #8038

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: EXAONE 4.0 with VSWA trtllm-serve Segmentation fault during inference #8038

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions