-
Notifications
You must be signed in to change notification settings - Fork 680
feat: SLA-based Planner #1420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: SLA-based Planner #1420
Conversation
Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>
…ynamo into hzhou/sla_planner_v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
components/planner/src/dynamo/planner/__init__.py (1)
16-27:⚠️ Potential issue
__all__exports stale symbol & makes Ruff unhappy
PlannerDefaultsno longer exists after the split; meanwhile the two new defaults classes are imported but unused, triggering F401.-__all__ = [ - "CircusController", - "LocalConnector", - "PlannerConnector", - "KubernetesConnector", - "PlannerDefaults", -] +__all__ = [ + "CircusController", + "LocalConnector", + "PlannerConnector", + "KubernetesConnector", + "LoadPlannerDefaults", + "SLAPlannerDefaults", +]After this change the two imports are “used” via
__all__, silencing Ruff and preventing consumers from importing a non-existent symbol.🧰 Tools
🪛 Ruff (0.11.9)
26-26:
dynamo.planner.defaults.LoadPlannerDefaultsimported but unused(F401)
26-26:
dynamo.planner.defaults.SLAPlannerDefaultsimported but unused(F401)
🪛 Pylint (3.3.7)
[error] 21-21: Undefined variable name 'PlannerDefaults' in all
(E0603)
♻️ Duplicate comments (2)
components/planner/src/dynamo/planner/planner_sla.py (1)
32-35: Fixed 30-second sleep is still hereThe hard-coded
INIT_PLANNER_START_DELAY = 30remains, despite previous feedback. This blocks the event-loop and hard-codes an environment assumption. Please make it configurable (default 0) or poll for readiness of dependent components instead.Also applies to: 105-110
components/planner/src/dynamo/planner/utils/planner_core.py (1)
221-235: Division by zero still possible for corrected ITL
corrected_itl = self.args.itl / self.d_correction_factoris executed without guarding againstself.d_correction_factor == 0, an issue already raised earlier.- corrected_itl = self.args.itl / self.d_correction_factor + if self.d_correction_factor == 0: + logger.error("Decode correction factor is zero – skipping scaling step to avoid div-by-zero") + return + corrected_itl = self.args.itl / self.d_correction_factor
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
components/planner/src/dynamo/planner/__init__.py(1 hunks)components/planner/src/dynamo/planner/defaults.py(1 hunks)components/planner/src/dynamo/planner/planner_sla.py(1 hunks)components/planner/src/dynamo/planner/utils/load_predictor.py(1 hunks)components/planner/src/dynamo/planner/utils/perf_interpolation.py(1 hunks)components/planner/src/dynamo/planner/utils/planner_core.py(1 hunks)deploy/sdk/src/dynamo/sdk/cli/utils.py(3 hunks)examples/llm/components/planner.py(2 hunks)examples/llm/components/planner_service.py(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- examples/llm/components/planner_service.py
- deploy/sdk/src/dynamo/sdk/cli/utils.py
🚧 Files skipped from review as they are similar to previous changes (2)
- examples/llm/components/planner.py
- components/planner/src/dynamo/planner/utils/load_predictor.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
components/planner/src/dynamo/planner/planner_sla.py (6)
components/planner/src/dynamo/planner/defaults.py (1)
SLAPlannerDefaults(38-46)components/planner/src/dynamo/planner/utils/planner_core.py (2)
start_sla_planner(306-308)Planner(51-303)lib/bindings/python/src/dynamo/runtime/logging.py (1)
configure_dynamo_logging(77-105)deploy/sdk/src/dynamo/sdk/lib/decorators.py (1)
async_on_start(95-99)deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)
ComponentType(43-46)examples/llm/components/planner_service.py (4)
RequestType(32-33)Planner(45-114)async_init(103-109)generate(112-114)
🪛 Ruff (0.11.9)
components/planner/src/dynamo/planner/__init__.py
26-26: dynamo.planner.defaults.LoadPlannerDefaults imported but unused
(F401)
26-26: dynamo.planner.defaults.SLAPlannerDefaults imported but unused
(F401)
🪛 Pylint (3.3.7)
components/planner/src/dynamo/planner/utils/planner_core.py
[refactor] 40-40: Too many instance attributes (8/7)
(R0902)
[refactor] 51-51: Too many instance attributes (18/7)
(R0902)
[refactor] 162-162: Too many branches (13/12)
(R0912)
components/planner/src/dynamo/planner/defaults.py
[refactor] 18-18: Too few public methods (0/2)
(R0903)
[refactor] 30-30: Too few public methods (0/2)
(R0903)
[refactor] 38-38: Too few public methods (0/2)
(R0903)
components/planner/src/dynamo/planner/planner_sla.py
[refactor] 37-37: Too few public methods (0/2)
(R0903)
components/planner/src/dynamo/planner/utils/perf_interpolation.py
[refactor] 56-56: Too many instance attributes (12/7)
(R0902)
🔇 Additional comments (3)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)
137-145: Axis-swap issue fixed – good catchThe earlier mix-up between x/y axes is now resolved (
self.itl_interpolator[iy, ix]and same for throughput). ✅
No further concerns here.components/planner/src/dynamo/planner/utils/planner_core.py (1)
152-156:❌ Incorrect review comment
Formatting assumes metrics are non-None
{self.last_metrics.num_req:.2f}etc. will raiseTypeErrorif Prometheus returnsNone. Either validate each metric or supply a fallback value before formatting.
🏁 Script executed:
#!/bin/bash set -e # Find references to last_metrics in planner_core.py rg "last_metrics" -n components/planner/src/dynamo/planner/utils/planner_core.py # Show lines around the formatting snippet sed -n '140,180p' components/planner/src/dynamo/planner/utils/planner_core.py # Locate any Metrics class or dataclass definitions under the dynamo folder rg "class .*Metrics" -n components/planner/src/dynamo rg "@dataclass" -n components/planner/src/dynamoLength of output: 3623
🏁 Script executed:
#!/bin/bash set -e # Display the Metrics dataclass definition to inspect default values and types sed -n '30,80p' components/planner/src/dynamo/planner/utils/planner_core.pyLength of output: 1896
🏁 Script executed:
#!/bin/bash set -e # Locate PrometheusAPIClient definition and related get_avg methods rg "class PrometheusAPIClient" -n components/planner/src/dynamo/planner/utils/prometheus.py # Show the file to inspect get_avg_* implementations sed -n '1,200p' components/planner/src/dynamo/planner/utils/prometheus.pyLength of output: 4176
Ignore None‐check for metrics formatting
PrometheusAPIClient.get_avg_* methods always return a float (0.0 on error) and each metric is assigned immediately before the f-string logs. Therefore,self.last_metrics.num_req,isl,osl,ttft, anditlare guaranteed to be floats, and using.2f/.3fwill not raise aTypeError. No additional validation or fallbacks are required here.Likely an incorrect or invalid review comment.
components/planner/src/dynamo/planner/defaults.py (1)
18-28: All good – clear split between base / load / SLA defaultsThe refactor cleanly separates shared and planner-specific parameters.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 18-18: Too few public methods (0/2)
(R0903)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)
72-73: Hard-coded token-capacity fixed – looks goodLoading
max_kv_tokensfrom the profiling artifact instead of a constant resolves the skew highlighted earlier.components/planner/src/dynamo/planner/utils/planner_core.py (4)
170-186:⚠️ Potential issueDivision by zero still possible in correction-factor maths
expect_ttftorexpect_itlcan be0(orNone– see previous comment), causingZeroDivisionError.
The identical issue was flagged in the previous review but is still present.if expect_ttft == 0: logger.warning("Expected TTFT is zero – skipping correction factor update") else: self.p_correction_factor = self.last_metrics.ttft / expect_ttft ... if expect_itl == 0: logger.warning("Expected ITL is zero – skipping correction factor update") else: self.d_correction_factor = self.last_metrics.itl / expect_itl
220-226:⚠️ Potential issueGuard against zero decode correction factor
self.d_correction_factormay be zero after the above fix or on first run, leading to a crash when computingcorrected_itl.if self.d_correction_factor == 0: logger.error("Decode correction factor is zero – cannot compute corrected ITL") return corrected_itl = self.args.itl / self.d_correction_factor
269-284: 🛠️ Refactor suggestionScaling race TODO still unresolved – risk of concurrent operations
Without a lock/flag, a new adjustment cycle may enqueue scaling while the previous one is ongoing, especially in Kubernetes where scale-up is slow.
Implement a simple asyncio.Lock or an atomic “scaling-in-progress” flag before releasing this feature.
125-156:⚠️ Potential issueMetric values may be
None– logging & math will crash
PrometheusAPIClientcan returnNoneon query failure.
Formatting with:.2f/:.3for later arithmetic (num_req,isl, etc.) will raiseTypeError.self.last_metrics.ttft = ... ... # validate for name, val in vars(self.last_metrics).items(): if val is None: logger.warning("%s metric unavailable, skipping adjustment interval", name) return # abort this cycle early
🧹 Nitpick comments (1)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)
158-161: Backward scan infind_best_throughput_per_gpuis O(resolution) – use vectorised searchA full Python
forloop overresolution(100) is fine now, but a larger grid will regress.
Considernp.whereto locate the first valid index in native code:valid_ix = np.where(self.itl_interpolator[iy, :] <= itl)[0] if valid_ix.size: ix = valid_ix[-1] # right-most idx return self.thpt_interpolator[iy, ix] return self.thpt_interpolator[iy, 0]
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
components/planner/src/dynamo/planner/__init__.py(1 hunks)components/planner/src/dynamo/planner/utils/perf_interpolation.py(1 hunks)components/planner/src/dynamo/planner/utils/planner_core.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- components/planner/src/dynamo/planner/init.py
🧰 Additional context used
🪛 Pylint (3.3.7)
components/planner/src/dynamo/planner/utils/planner_core.py
[refactor] 40-40: Too many instance attributes (8/7)
(R0902)
[refactor] 51-51: Too many instance attributes (18/7)
(R0902)
[refactor] 162-162: Too many branches (13/12)
(R0912)
components/planner/src/dynamo/planner/utils/perf_interpolation.py
[refactor] 56-56: Too many instance attributes (12/7)
(R0902)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/architecture/planner.md (1)
38-40: Improve link text clarity for planner documentation.Using generic link text like “docs” can be ambiguous. Consider adopting more descriptive labels, for example:
-1. Load-based planner: [docs](load_planner.md) -2. SLA-based planner: [docs](sla_planner.md) +1. Load-based planner documentation: see [load-based planner docs](load_planner.md) +2. SLA-based planner documentation: see [SLA-based planner docs](sla_planner.md)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/architecture/planner.md(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - vllm
|
It still feels like its in a POC phase. What is the largest trials and such that we have run? Can we note down the largest scale experiments and testing that we have done? |
Co-authored-by: Alec <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>
@alec-flowers Yes, SLA planner in 0.3.1 will be an experimental release. I will run large scale exps in k8s to tune it and it will be the next release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
components/planner/src/dynamo/planner/prometheus.py (3)
38-38: Doc-string still references “Frontend service”
The constructor comment is unchanged from the original copy-paste and is misleading for a Prometheus service.- """Initialize Frontend service with HTTP server and model configuration.""" + """Spin-up an embedded Prometheus server using the service configuration."""
46-55:⚠️ Potential issueTemp file never cleaned up & not wrapped in a context manager
NamedTemporaryFile(delete=False)persists on disk but there is no teardown path, so every container run leaves an orphaned YAML file. This was flagged previously.@@ - self.temp_file = tempfile.NamedTemporaryFile( - mode="w", suffix=".yml", delete=False - ) - yaml.dump(self.config, self.temp_file) - self.temp_file.close() + with tempfile.NamedTemporaryFile(mode="w", + suffix=".yml", + delete=False) as tmp: + yaml.dump(self.config, tmp) + self._config_path = tmp.nameAdd a destructor (or framework-specific shutdown hook) to remove the file:
+ def __del__(self): + """Terminate child process and delete temporary config on GC/shutdown.""" + if getattr(self, "process", None) and self.process.poll() is None: + self.process.terminate() + if getattr(self, "_config_path", None): + try: + os.remove(self._config_path) + except OSError: + logger.warning("Failed to delete Prometheus config %s", self._config_path)(Remember
import osat the top.)🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 49-51: Consider using 'with' for resource-allocating operations
(R1732)
56-67: 🛠️ Refactor suggestion
subprocess.Popenlaunched without safety nets
No error handling for a missingprometheusbinary and stdout/stderr are discarded, making failures silent. This was noted in the last review.- self.process = subprocess.Popen( - cmd, - stdout=None, - stderr=None, - ) + try: + self.process = subprocess.Popen( + cmd, + stdout=subprocess.PIPE, + stderr=subprocess.PIPE, + text=True, + ) + except FileNotFoundError as exc: + logger.error("Prometheus binary not found: %s", exc) + raiseConsider a watchdog or
self.process.wait()in a background task to restart on crash.🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 63-67: Consider using 'with' for resource-allocating operations
(R1732)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)
62-78:⚠️ Potential issueGuard against
resolution < 2to avoid div-by-zero
compute_idxdivides by(self.xi[1] - self.xi[0]). Withresolution==1this is0, raisingZeroDivisionError. The same concern was raised earlier but is still unresolved.- self.resolution = resolution + if resolution < 2: + raise ValueError("resolution must be ≥ 2") + self.resolution = resolution
🧹 Nitpick comments (2)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (2)
16-18: Prefer module-level import forinterpolateImporting the full
scipypackage loads a large namespace unnecessarily. Import only the required sub-module to keep import time and memory footprint low:-import scipy +from scipy import interpolateand adjust calls (
scipy.interpolate.*→interpolate.*).
146-161: Minor efficiency nit: break early once a match is foundThe reverse scan exits via
return, but Python still evaluates the loop condition afterreturnis prepared. Slight micro-optimisation:- for ix in range(self.resolution - 1, -1, -1): - if self.itl_interpolator[iy, ix] <= itl: - return self.thpt_interpolator[iy, ix] + for ix in range(self.resolution - 1, -1, -1): + if self.itl_interpolator[iy, ix] <= itl: + return self.thpt_interpolator[iy, ix] # early exit
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/planner/src/dynamo/planner/prometheus.py(1 hunks)components/planner/src/dynamo/planner/utils/perf_interpolation.py(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
components/planner/src/dynamo/planner/prometheus.py
[refactor] 49-51: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 63-67: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 36-36: Too few public methods (1/2)
(R0903)
components/planner/src/dynamo/planner/utils/perf_interpolation.py
[refactor] 56-56: Too many instance attributes (12/7)
(R0902)
🔇 Additional comments (2)
components/planner/src/dynamo/planner/prometheus.py (1)
61-61: Log level correctly set to INFO – good catch
Previous warning-level log has been adjusted tologger.info, matching reviewer feedback.components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)
31-45: Double-check TTFT units
DecodeInterpolatorconverts ITL from ms → s (line 97) butPrefillInterpolatorleaves TTFT untouched. If the downstream planner expects both latency metrics in the same unit, this asymmetry will introduce subtle bugs.Confirm the unit of
prefill_ttftin the NPZ; convert to seconds if it is stored in milliseconds.- self.prefill_ttft = raw_data["prefill_ttft"] + self.prefill_ttft = raw_data["prefill_ttft"] / 1000 # ms → s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
container/Dockerfile.vllm(1 hunks)docs/architecture/planner.md(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- container/Dockerfile.vllm
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: Build and Test - vllm
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <[email protected]>
Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
See sla_planner.md for documentation
pending:
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Refactor
Chores