[Feature] [PD] add simple router and refine splitwise deployment #4709

juncaipeng · 2025-10-30T12:57:58Z

Motivation

Refine splitwise deployment.

Modifications

add simple router support mixed and splitwise deployment
refine pd communication in splitwise deployment
add example
fix unit test for splitwise deployment on multi node
run benchmark with splitwise deployment

TODO:

add doc
add unit test that use rdma to transfer cache

Usage or Command

Refer to examples.

Accuracy Tests

benchmark of v1 splitwise

benchmark_duration: 16.98276040190831 秒
============ Serving Benchmark Result ============
Successful requests:                     997       
Benchmark duration (s):                  16.98     
Total input tokens:                      1275938   
Total generated tokens:                  1994      
Request throughput (req/s):              58.707    
Output token throughput (tok/s):         117.41    
Total Token throughput (tok/s):          75248.78  
-------------------解码速度(tok/s)--------------------
Mean Decode:                             12.51     
Median Decode:                           12.01     
P80 Decode:                              14.32     
P95 Decode:                              17.76     
P99 Decode:                              29.91     
P99.9 Decode:                            48.12     
P99.95 Decode:                           48.38     
P99.99 Decode:                           48.58     
---------------Time to First Token----------------
Mean TTFT (ms):                          3036.59   
Median TTFT (ms):                        3092.77   
P80 TTFT (ms):                           3325.40   
P95 TTFT (ms):                           3671.44   
P99 TTFT (ms):                           4170.79   
P99.9 TTFT (ms):                         4279.39   
P99.95 TTFT (ms):                        4279.76   
P99.99 TTFT (ms):                        4280.05   
------------Infer Time to First Token-------------
Mean S_TTFT (ms):                        77.57     
Median S_TTFT (ms):                      78.24     
P80 S_TTFT (ms):                         103.77    
P95 S_TTFT (ms):                         128.60    
P99 S_TTFT (ms):                         147.21    
P99.9 S_TTFT (ms):                       158.14    
P99.95 S_TTFT (ms):                      160.26    
P99.99 S_TTFT (ms):                      161.96

benchmark of v2 splitwise (using router):

benchmark_duration: 16.793025318998843 秒
============ Serving Benchmark Result ============
Successful requests:                     997       
Benchmark duration (s):                  16.79     
Total input tokens:                      1275938   
Total generated tokens:                  1994      
Request throughput (req/s):              59.370    
Output token throughput (tok/s):         118.74    
Total Token throughput (tok/s):          76098.97  
-------------------解码速度(tok/s)--------------------
Mean Decode:                             13.51     
Median Decode:                           12.60     
P80 Decode:                              14.95     
P95 Decode:                              20.80     
P99 Decode:                              43.47     
P99.9 Decode:                            54.75     
P99.95 Decode:                           69.16     
P99.99 Decode:                           80.69     
---------------Time to First Token----------------
Mean TTFT (ms):                          3026.10   
Median TTFT (ms):                        3079.82   
P80 TTFT (ms):                           3209.71   
P95 TTFT (ms):                           4362.91   
P99 TTFT (ms):                           4838.31   
P99.9 TTFT (ms):                         4930.10   
P99.95 TTFT (ms):                        4931.17   
P99.99 TTFT (ms):                        4932.04   
------------Infer Time to First Token-------------
Mean S_TTFT (ms):                        72.28     
Median S_TTFT (ms):                      70.63     
P80 S_TTFT (ms):                         93.56     
P95 S_TTFT (ms):                         119.08    
P99 S_TTFT (ms):                         147.24    
P99.9 S_TTFT (ms):                       273.10    
P99.95 S_TTFT (ms):                      277.48    
P99.99 S_TTFT (ms):                      280.98

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

CLAassistant · 2025-10-30T12:58:05Z

All committers have signed the CLA.

paddle-bot · 2025-10-30T12:58:06Z

Thanks for your contribution!

Jiang-Jia-Jun · 2025-10-31T08:46:53Z

examples/splitwise/start_v0_tp1.sh

+
+export CUDA_VISIBLE_DEVICES=0
+export FD_DEBUG=1
+export ENABLE_V1_KVCACHE_SCHEDULER=0


这个开关应该已经废弃了，另外这个还开了DEBUG日志

这个开 debug 是便于调试，后面可以删掉。 ENABLE_V1_KVCACHE_SCHEDULER也还有效，后面要适配 v1

Jiang-Jia-Jun · 2025-10-31T08:48:23Z

fastdeploy/config.py

            self.ips = self.ips.split(",")

        self.host_ip = get_host_ip()
+        self.port = port


这里是把服务层的端口号下放到config了吗，这个应该只是APIServer层的配置参数

对 config需要感知 api server的端口才可以上报给 router

Jiang-Jia-Jun · 2025-11-04T11:27:30Z

fastdeploy/cache_manager/cache_messager.py

-logger = get_logger("cache_messager", "cache_messager.log")
-

 def parse_args():


这里移除logger的声明，下面的函数还能正常使用logger吗？

这个文件使用带 rank_id 的 logger，所以这里多余声明了

Jiang-Jia-Jun · 2025-11-04T11:29:56Z

fastdeploy/config.py


        self.read_from_config()
        self.postprocess()
+        self.init_cache_info()


init_cache_info需要再跟tingdan确认下是否放在config这步，看是否影响profile，以及训练场景的权重重加载

Jiang-Jia-Jun · 2025-11-04T11:43:53Z

fastdeploy/scheduler/config.py

-from fastdeploy.utils import llm_logger
+from fastdeploy.utils import get_logger, llm_logger
+
+config_logger = get_logger("config", "config.log")


现在日志文件有些多，config如无必要，应该不用额外增加日志了

config 是之前就有的 log 文件，原先是将 config 保存在 llm_logger 中，这里统一都保存在 config.log 中了

Jiang-Jia-Jun · 2025-11-04T11:47:25Z

fastdeploy/splitwise/splitwise_connector.py

-                    error_msg=task["error_msg"],
-                )
-            )
+            output = RequestOutput.from_dict(task)


这个代码跟左边是等价的吗，例如像send_idx，以及finished

我跑下来是等价的，from_dict 的字段会更多

Copilot

Pull Request Overview

This PR implements a router-based splitwise deployment architecture (v2) for distributed LLM inference, supporting flexible prefill/decode instance management across multiple nodes.

Introduces a new router service that manages prefill/decode instance registration and request routing
Adds support for dynamic instance health monitoring and automatic removal of unhealthy instances
Implements request ID propagation through the router to enable proper request tracking in splitwise deployments

Reviewed Changes

Copilot reviewed 33 out of 38 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
tests/e2e/test_ernie_03b_pd_multi_node.py	New e2e test for multi-node splitwise deployment with router
tests/e2e/test_ernie_03b_pd.py	Updated to use router-based deployment (v2) instead of direct instance communication
requirements.txt	Added `setproctitle` and `aistudio_sdk` dependencies
fastdeploy/router/*.py	New router module with launch script, router server, and utilities
fastdeploy/config.py	Added RouterConfig class and splitwise version detection logic
fastdeploy/engine/args_utils.py	Added router and port CLI arguments, reorganized splitwise args
fastdeploy/engine/common_engine.py	Implemented router registration and refactored splitwise task processing
fastdeploy/entrypoints/openai/*.py	Added request_id and disaggregate_info support to API protocols
fastdeploy/splitwise/splitwise_connector.py	Enhanced logging and refactored message handling
fastdeploy/scheduler/local_scheduler.py	Added has_request method and enhanced logging
fastdeploy/worker/worker_process.py	Added execution time logging
examples/splitwise/*.sh	New example scripts for v0/v1/v2 splitwise deployments
docs/**/multi-node_deployment.md	Removed trailing whitespace
benchmarks/*.py	Code formatting fixes

Copilot · 2025-11-04T13:47:36Z

fastdeploy/cache_manager/cache_messager.py

 )
 from fastdeploy.model_executor.ops.gpu import get_output_kv_signal, set_data_ipc
 from fastdeploy.utils import envs, get_logger



The logger initialization at module level (lines 38-39 in the original code) was removed but the code still uses logger variable throughout the file (e.g., line 348, 378-379). This will cause NameError at runtime since logger is now only defined inside a function at line 844. Either restore the module-level logger or ensure all usage sites have access to the logger instance.

Suggested change

logger = get_logger()

Copilot · 2025-11-04T13:47:36Z

tests/e2e/test_ernie_03b_pd.py

    env_decode["CUDA_VISIBLE_DEVICES"] = "1"
+    env_prefill["ENABLE_V1_KVCACHE_SCHEDULER"] = "0"
    env_decode["INFERENCE_MSG_QUEUE_ID"] = str(FD_API_PORT + 1)


Line 161 incorrectly sets env_prefill[\"ENABLE_V1_KVCACHE_SCHEDULER\"] instead of env_decode[\"ENABLE_V1_KVCACHE_SCHEDULER\"]. This causes the environment variable to be set on the wrong process environment.

Copilot · 2025-11-04T13:47:37Z

tests/e2e/test_ernie_03b_pd_multi_node.py

+    env_decode["CUDA_VISIBLE_DEVICES"] = "1"
+    env_prefill["ENABLE_V1_KVCACHE_SCHEDULER"] = "0"
+    env_decode["INFERENCE_MSG_QUEUE_ID"] = str(FD_API_PORT + 1)


Line 243 incorrectly sets env_prefill[\"ENABLE_V1_KVCACHE_SCHEDULER\"] instead of env_decode[\"ENABLE_V1_KVCACHE_SCHEDULER\"]. This causes the environment variable to be set on the wrong process environment.

Copilot · 2025-11-04T13:47:37Z

fastdeploy/router/utils.py

+
+    Returns:
+        bool: True if the service is healthy, False otherwise.
+    """


The function doesn't ensure the base_url starts with 'http' before using it, unlike the test file version at line 74. If a URL without protocol is passed, it will make an invalid request. Add the protocol check from the test version: if not base_url.startswith(\"http\"): base_url = f\"http://{base_url}\"

Suggested change

"""

"""

if not base_url.startswith(("http://", "https://")):

base_url = f"http://{base_url}"

Copilot · 2025-11-04T13:47:37Z

examples/splitwise/stop.sh

+pkill -9 -f python
+pkill -9 -f fastdeploy
+pkill -f -9 gunicorn


Using pkill -9 to forcefully kill all Python processes is dangerous in a shared environment as it will terminate unrelated Python processes. Consider using more targeted process management (e.g., storing PIDs in files and killing specific processes) or at least warning users about this behavior in comments.

Copilot · 2025-11-04T13:47:38Z

tests/e2e/test_ernie_03b_pd_multi_node.py

+    # for idx, chunk in enumerate(chunks):
+    #     print(f"\nchunk[{idx}]:\n{json.dumps(chunk, indent=2, ensure_ascii=False)}")


This comment appears to contain commented-out code.

Copilot · 2025-11-04T13:47:38Z

tests/e2e/test_ernie_03b_pd_multi_node.py

+                continue
+            os.kill(pid, signal.SIGKILL)
+            print(f"Killed process on port {port}, pid={pid}")
+    except subprocess.CalledProcessError:


'except' clause does nothing but pass and there is no explanatory comment.

rainyfly · 2025-11-05T07:26:41Z

fastdeploy/cache_manager/cache_messager.py

                            )
                    item["layer_idx"] = current_layer_idx
                    if item["layer_idx"] == self.num_layers:
+                        item["status"] = "finished"


if 'error' not in item['status']:
item["status"] = "finished"

juncaipeng force-pushed the pd branch from a2ef22a to 22e70a9 Compare October 30, 2025 12:59

juncaipeng requested a review from Jiang-Jia-Jun October 31, 2025 02:02

Jiang-Jia-Jun reviewed Oct 31, 2025

View reviewed changes

juncaipeng force-pushed the pd branch 2 times, most recently from 7903bb4 to 2f16499 Compare November 4, 2025 07:22

Jiang-Jia-Jun requested changes Nov 4, 2025

View reviewed changes

juncaipeng force-pushed the pd branch from ee332c5 to f4a6821 Compare November 4, 2025 13:21

juncaipeng changed the title ~~[Feature] [PD] splitwise deployment on multi node supports router~~ [Feature] [PD] add simple router and refine splitwise deployment Nov 4, 2025

Jiang-Jia-Jun requested a review from Copilot November 4, 2025 13:43

Copilot AI reviewed Nov 4, 2025

View reviewed changes

juncaipeng force-pushed the pd branch 2 times, most recently from 1fd1ad6 to c7b0414 Compare November 5, 2025 04:05

add simple router and refine splitwise deployment

50fd543

juncaipeng force-pushed the pd branch from c7b0414 to 50fd543 Compare November 5, 2025 05:19

rainyfly reviewed Nov 5, 2025

View reviewed changes

fix

7cbe8fa

Jiang-Jia-Jun added the skip-ci: coverage label Nov 6, 2025

Jiang-Jia-Jun merged commit 08ca0f6 into PaddlePaddle:develop Nov 6, 2025
10 of 13 checks passed

		logger = get_logger("cache_messager", "cache_messager.log")


		def parse_args():

-    """
+    """
+    if not base_url.startswith(("http://", "https://")):
+        base_url = f"http://{base_url}"

		# for idx, chunk in enumerate(chunks):
		# print(f"\nchunk[{idx}]:\n{json.dumps(chunk, indent=2, ensure_ascii=False)}")

[Feature] [PD] add simple router and refine splitwise deployment #4709

[Feature] [PD] add simple router and refine splitwise deployment #4709

Uh oh!

Conversation

juncaipeng commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

CLAassistant commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Oct 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

juncaipeng commented Oct 30, 2025 •

edited

Loading

CLAassistant commented Oct 30, 2025 •

edited

Loading