-
Notifications
You must be signed in to change notification settings - Fork 5k
[PD Disaggregation] Add the KVCache transfer latency monitor metric #7944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8f2208a
7b2a6c8
bd1b6aa
291399c
6fd9f1a
d7ff04d
342cc6c
763c2b4
9118c71
76039d3
e1a67a0
face952
f4e3742
325d56a
3c01861
1d8e524
8037779
7d77b50
6ee4e17
d2111b1
2466bf7
e450144
1db01fa
7988552
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,6 +26,7 @@ | |
| ) | ||
| from sglang.srt.disaggregation.common.utils import group_concurrent_contiguous | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems you don't implement the same latency monitor for nixl
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. So far, I'm not quite familiar with the NIXL part code, may be later I would go through that then add support for it. But for interface compatibility, the |
||
| from sglang.srt.disaggregation.utils import DisaggregationMode | ||
| from sglang.srt.metrics.collector import SchedulerMetricsCollector | ||
| from sglang.srt.server_args import ServerArgs | ||
| from sglang.srt.utils import ( | ||
| format_tcp_address, | ||
|
|
@@ -117,6 +118,7 @@ def __init__( | |
| disaggregation_mode: DisaggregationMode, | ||
| server_args: ServerArgs, | ||
| is_mla_backend: Optional[bool] = False, | ||
| scheduler_metrics_collector: Optional[SchedulerMetricsCollector] = None, | ||
| ): | ||
| super().__init__(args, disaggregation_mode, server_args, is_mla_backend) | ||
| try: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as the other pr. don't pass collector into kv manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a deep analysis, I find it difficult to achieve a fine-grained latency tracking without passing the
collectorto the KVManager && the coordination inside theconn.py🤔 The purpose of this PR is to track the KV transfer latency of the exact network stack, reflecting the real-time network performance. However, if the timestamp collecting is only allowed in theprefill.py, though generality is preserved, other irrelevant latencies like request queueing, scheduler dispatching and result polling would be included in this metric, which would mislead the operators. Hence, I think passing the collector into the KVManager in this case is quite necessary.