Skip to content

Error in the Disaggregated Serving example #5792

@pandalee99

Description

@pandalee99

System Info

tensorrt-llm:release-0.20.0

Who can help?

Error in the Disaggregated Serving example

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

in https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/disaggregated

I fully reproduced the requirements of this example, using the same model. My GPU is L20 46G

Expected behavior

Successful execution

actual behavior

my cmd:

curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"prompt": "NVIDIA is a great company because",
"max_tokens": 16,
"temperature": 0
}' -w "\n"

the result history:

root@56acd428936d:/app/models# jobs -l
[1] 64414 Running CUDA_VISIBLE_DEVICES=0 trtllm-serve TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8001 --backend pytorch --extra_llm_api_options config/context_extra-llm-api-config.yml &> log_ctx_0 &
[2]- 64415 Running CUDA_VISIBLE_DEVICES=1 trtllm-serve TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8002 --backend pytorch --extra_llm_api_options config/context_extra-llm-api-config.yml &> log_ctx_1 &
[4]+ 66403 Running CUDA_VISIBLE_DEVICES=2 trtllm-serve TinyLlama-1.1B-Chat-v1.0 --host localhost --port 8003 --backend pytorch --extra_llm_api_options config/gen_extra-llm-api-config.yml &> log_gen_0 &
root@56acd428936d:/app/models# trtllm-serve disaggregated -c config/disagg_config.yaml
2025-07-07 09:03:34,710 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 0.20.0
INFO: Started server process [66764]
INFO: Waiting for application startup.
INFO:root:Waiting for context and generation servers to be ready
INFO: Application startup complete.
INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit)
INFO:root:Sending request to ctx server: http://localhost:8001
INFO:root:Sending request to gen server: http://localhost:8003
ERROR:root:Received failed response {'object': 'error', 'message': 'Failed during generation: init(): incompatible constructor arguments. The following argument types are supported:\n 1. tensorrt_llm.bindings.executor.ContextPhaseParams(arg0: list[int], arg1: int, arg2: Optional[bytes], arg3: Optional[list[int]])\n\nInvoked with: None, None, None, None', 'type': 'BadRequestError', 'param': None, 'code': 400}
ERROR:root:400, message='Bad Request', url='http://localhost:8003/v1/completions'
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 140, in openai_completion
return await self._process_generation_server_request(gen_req, ctx_response)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 214, in _process_generation_server_request
gen_response = await self.send_completion_request(gen_server, gen_req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 281, in send_completion_request
return await self.send_request(url, request, "/v1/completions", CompletionResponse, self.create_completion_generator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/serve/openai_disagg_server.py", line 277, in send_request
response.raise_for_status()
File "/usr/local/lib/python3.12/dist-packages/aiohttp/client_reqrep.py", line 1161, in raise_for_status
raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 400, message='Bad Request', url='http://localhost:8003/v1/completions'
INFO: 127.0.0.1:56126 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error

additional notes

my log about :

"detail":"Internal server error 400, message='Bad Request', url='http://localhost:8003/v1/completions'"}
root@56acd428936d:/app/models# ls
InternVL2_5-8B Qwen3-0.6B Qwen3-30B-A3B Qwen3-4B TinyLlama-1.1B-Chat-v1.0 config log_ctx_0 log_ctx_1 log_gen_0 qwen
root@56acd428936d:/app/models# tail -f log_gen_0
Encountered an exception: Failed during generation: init(): incompatible constructor arguments. The following argument types are supported:
1. tensorrt_llm.bindings.executor.ContextPhaseParams(arg0: list[int], arg1: int, arg2: Optional[bytes], arg3: Optional[list[int]])

Invoked with: None, None, None, None
INFO: 127.0.0.1:37166 - "POST /v1/completions HTTP/1.1" 400 Bad Request
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread await_response_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread proxy_dispatch_stats_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread proxy_dispatch_kv_cache_events_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread dispatch_stats_thread stopped.
[07/07/2025-09:03:46] [TRT-LLM] [I] Thread dispatch_kv_cache_events_thread stopped.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions