Error in the Disaggregated Serving example

### System Info

tensorrt-llm:release-0.20.0



### Who can help?

Error in the Disaggregated Serving example

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

in https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated

I fully reproduced the requirements of this example, using the same model. My GPU is L20 46G

### Expected behavior

Successful execution

### actual behavior

my cmd：

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d '{
        "model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
        "prompt": "NVIDIA is a great company because",
        "max_tokens": 16,
        "temperature": 0
    }' -w "\n"


the result history：



root@56acd428936d:/app/models# jobs -l
[1]  64414 Running                 CUDA_VISIBLE_DEVICES=0 trtllm-serve TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8001 --backend pytorch --extra_llm_api_options config/context_extra-llm-api-config.yml &> log_ctx_0 &
[2]- 64415 Running                 CUDA_VISIBLE_DEVICES=1 trtllm-serve TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8002 --backend pytorch --extra_llm_api_options config/context_extra-llm-api-config.yml &> log_ctx_1 &
[4]+ 66403 Running                 CUDA_VISIBLE_DEVICES=2 trtllm-serve TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8003 --backend pytorch --extra_llm_api_options config/gen_extra-llm-api-config.yml &> log_gen_0 &
root@56acd428936d:/app/models# trtllm-serve disaggregated -c config/disagg_config.yaml
2025-07-07 09:03:34,710 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.20.0
INFO:     Started server process [66764]
INFO:     Waiting for application startup.
INFO:root:Waiting for context and generation servers to be ready
INFO:     Application startup complete.
INFO:     Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO:root:Sending request to ctx server: http://localhost:8001
INFO:root:Sending request to gen server: http://localhost:8003
ERROR:root:Received failed response {'object': 'error', 'message': 'Failed during generation: __init__(): incompatible constructor arguments. The following argument types are supported:\n    1. tensorrt_llm.bindings.executor.ContextPhaseParams(arg0: list[int], arg1: int, arg2: Optional[bytes], arg3: Optional[list[int]])\n\nInvoked with: None, None, None, None', 'type': 'BadRequestError', 'param': None, 'code': 400}
ERROR:root:400, message='Bad Request', url='http://localhost:8003/v1/completions'
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 140, in openai_completion
    return await self._process_generation_server_request(gen_req, ctx_response)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 214, in _process_generation_server_request
    gen_response = await self.send_completion_request(gen_server, gen_req)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 281, in send_completion_request
    return await self.send_request(url, request, "/v1/completions", CompletionResponse, self.create_completion_generator)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 277, in send_request
    response.raise_for_status()
  File "/usr/local/lib/python3.12/dist-packages/aiohttp/client_reqrep.py", line 1161, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url='http://localhost:8003/v1/completions'
INFO:     127.0.0.1:56126 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error




### additional notes

my log about ：

"detail":"Internal server error 400, message='Bad Request', url='http://localhost:8003/v1/completions'"}
root@56acd428936d:/app/models# ls
InternVL2_5-8B	Qwen3-0.6B  Qwen3-30B-A3B  Qwen3-4B  TinyLlama-1.1B-Chat-v1.0  config  log_ctx_0  log_ctx_1  log_gen_0	qwen
root@56acd428936d:/app/models# tail -f log_gen_0
Encountered an exception: Failed during generation: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. tensorrt_llm.bindings.executor.ContextPhaseParams(arg0: list[int], arg1: int, arg2: Optional[bytes], arg3: Optional[list[int]])

Invoked with: None, None, None, None
INFO:     127.0.0.1:37166 - "POST /v1/completions HTTP/1.1" 400 Bad Request
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread await_response_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread proxy_dispatch_stats_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread proxy_dispatch_kv_cache_events_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread dispatch_stats_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread dispatch_kv_cache_events_thread stopped.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error in the Disaggregated Serving example #5792

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error in the Disaggregated Serving example #5792

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions