-
Notifications
You must be signed in to change notification settings - Fork 660
[Feature] [PD] add simple router and refine splitwise deployment #4709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
|
|
||
| export CUDA_VISIBLE_DEVICES=0 | ||
| export FD_DEBUG=1 | ||
| export ENABLE_V1_KVCACHE_SCHEDULER=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个开关应该已经废弃了,另外这个还开了DEBUG日志
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个开 debug 是便于调试,后面可以删掉。 ENABLE_V1_KVCACHE_SCHEDULER也还有效,后面要适配 v1
fastdeploy/config.py
Outdated
| self.ips = self.ips.split(",") | ||
|
|
||
| self.host_ip = get_host_ip() | ||
| self.port = port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是把服务层的端口号下放到config了吗,这个应该只是APIServer层的配置参数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对 config需要感知 api server的端口才可以上报给 router
7903bb4 to
2f16499
Compare
| logger = get_logger("cache_messager", "cache_messager.log") | ||
|
|
||
|
|
||
| def parse_args(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里移除logger的声明,下面的函数还能正常使用logger吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件使用带 rank_id 的 logger,所以这里多余声明了
|
|
||
| self.read_from_config() | ||
| self.postprocess() | ||
| self.init_cache_info() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_cache_info需要再跟tingdan确认下是否放在config这步,看是否影响profile,以及训练场景的权重重加载
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| from fastdeploy.utils import llm_logger | ||
| from fastdeploy.utils import get_logger, llm_logger | ||
|
|
||
| config_logger = get_logger("config", "config.log") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
现在日志文件有些多,config如无必要,应该不用额外增加日志了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config 是之前就有的 log 文件,原先是将 config 保存在 llm_logger 中,这里统一都保存在 config.log 中了
| error_msg=task["error_msg"], | ||
| ) | ||
| ) | ||
| output = RequestOutput.from_dict(task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个代码跟左边是等价的吗,例如像send_idx,以及finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我跑下来是等价的,from_dict 的字段会更多
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a router-based splitwise deployment architecture (v2) for distributed LLM inference, supporting flexible prefill/decode instance management across multiple nodes.
- Introduces a new router service that manages prefill/decode instance registration and request routing
- Adds support for dynamic instance health monitoring and automatic removal of unhealthy instances
- Implements request ID propagation through the router to enable proper request tracking in splitwise deployments
Reviewed Changes
Copilot reviewed 33 out of 38 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/test_ernie_03b_pd_multi_node.py | New e2e test for multi-node splitwise deployment with router |
| tests/e2e/test_ernie_03b_pd.py | Updated to use router-based deployment (v2) instead of direct instance communication |
| requirements.txt | Added setproctitle and aistudio_sdk dependencies |
| fastdeploy/router/*.py | New router module with launch script, router server, and utilities |
| fastdeploy/config.py | Added RouterConfig class and splitwise version detection logic |
| fastdeploy/engine/args_utils.py | Added router and port CLI arguments, reorganized splitwise args |
| fastdeploy/engine/common_engine.py | Implemented router registration and refactored splitwise task processing |
| fastdeploy/entrypoints/openai/*.py | Added request_id and disaggregate_info support to API protocols |
| fastdeploy/splitwise/splitwise_connector.py | Enhanced logging and refactored message handling |
| fastdeploy/scheduler/local_scheduler.py | Added has_request method and enhanced logging |
| fastdeploy/worker/worker_process.py | Added execution time logging |
| examples/splitwise/*.sh | New example scripts for v0/v1/v2 splitwise deployments |
| docs/**/multi-node_deployment.md | Removed trailing whitespace |
| benchmarks/*.py | Code formatting fixes |
| ) | ||
| from fastdeploy.model_executor.ops.gpu import get_output_kv_signal, set_data_ipc | ||
| from fastdeploy.utils import envs, get_logger | ||
|
|
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logger initialization at module level (lines 38-39 in the original code) was removed but the code still uses logger variable throughout the file (e.g., line 348, 378-379). This will cause NameError at runtime since logger is now only defined inside a function at line 844. Either restore the module-level logger or ensure all usage sites have access to the logger instance.
| logger = get_logger() |
| env_decode["CUDA_VISIBLE_DEVICES"] = "1" | ||
| env_prefill["ENABLE_V1_KVCACHE_SCHEDULER"] = "0" | ||
| env_decode["INFERENCE_MSG_QUEUE_ID"] = str(FD_API_PORT + 1) |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 161 incorrectly sets env_prefill[\"ENABLE_V1_KVCACHE_SCHEDULER\"] instead of env_decode[\"ENABLE_V1_KVCACHE_SCHEDULER\"]. This causes the environment variable to be set on the wrong process environment.
| env_decode["CUDA_VISIBLE_DEVICES"] = "1" | ||
| env_prefill["ENABLE_V1_KVCACHE_SCHEDULER"] = "0" | ||
| env_decode["INFERENCE_MSG_QUEUE_ID"] = str(FD_API_PORT + 1) |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 243 incorrectly sets env_prefill[\"ENABLE_V1_KVCACHE_SCHEDULER\"] instead of env_decode[\"ENABLE_V1_KVCACHE_SCHEDULER\"]. This causes the environment variable to be set on the wrong process environment.
| Returns: | ||
| bool: True if the service is healthy, False otherwise. | ||
| """ |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function doesn't ensure the base_url starts with 'http' before using it, unlike the test file version at line 74. If a URL without protocol is passed, it will make an invalid request. Add the protocol check from the test version: if not base_url.startswith(\"http\"): base_url = f\"http://{base_url}\"
| """ | |
| """ | |
| if not base_url.startswith(("http://", "https://")): | |
| base_url = f"http://{base_url}" |
| pkill -9 -f python | ||
| pkill -9 -f fastdeploy | ||
| pkill -f -9 gunicorn |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using pkill -9 to forcefully kill all Python processes is dangerous in a shared environment as it will terminate unrelated Python processes. Consider using more targeted process management (e.g., storing PIDs in files and killing specific processes) or at least warning users about this behavior in comments.
| # for idx, chunk in enumerate(chunks): | ||
| # print(f"\nchunk[{idx}]:\n{json.dumps(chunk, indent=2, ensure_ascii=False)}") |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| continue | ||
| os.kill(pid, signal.SIGKILL) | ||
| print(f"Killed process on port {port}, pid={pid}") | ||
| except subprocess.CalledProcessError: |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
1fd1ad6 to
c7b0414
Compare
| ) | ||
| item["layer_idx"] = current_layer_idx | ||
| if item["layer_idx"] == self.num_layers: | ||
| item["status"] = "finished" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if 'error' not in item['status']:
item["status"] = "finished"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Motivation
Refine splitwise deployment.
Modifications
TODO:
Usage or Command
Refer to examples.
Accuracy Tests
benchmark of v1 splitwise
benchmark of v2 splitwise (using router):
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.