feat: SLA-based Planner #1420

tedzhouhk · 2025-06-06T23:17:11Z

See sla_planner.md for documentation

pending:

support common components in dynamo serve
testing in k8s

Summary by CodeRabbit

New Features
- Introduced an SLA-based planner for dynamic autoscaling of prefill and decode workers based on predictive load forecasting and SLA targets.
- Added Prometheus service integration for real-time metric collection and monitoring.
- Implemented load prediction models (Constant, ARIMA, Prophet) and performance interpolators for more accurate scaling decisions.
- Added new configuration options and YAML files for disaggregated planner setups.
- Added a new Planner service component with asynchronous startup and minimal endpoint.
- Integrated Planner and Prometheus dependencies into the frontend service.
Bug Fixes
- Improved removal logic for worker processes to support non-blocking operation and better error handling.
Documentation
- Added comprehensive documentation for both load-based and SLA-based planners, including deployment and profiling instructions.
- Updated architecture docs to reference new planner documentation.
Refactor
- Split and reorganized planner default configurations into base, load-based, and SLA-based classes.
- Updated code to use new configuration classes and improved argument organization.
Chores
- Updated dependencies to include packages required for load prediction and monitoring.
- Enhanced Dockerfile to install Prometheus for monitoring support.

copy-pr-bot · 2025-06-06T23:17:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…/sla_planner_v2

components/planner/src/dynamo/planner/local_connector.py

docs/architecture/planner.md

Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>

…/sla_planner_v2

…ynamo into hzhou/sla_planner_v2

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (1)

components/planner/src/dynamo/planner/__init__.py (1)
16-27: ⚠️ Potential issue

__all__ exports stale symbol & makes Ruff unhappy

PlannerDefaults no longer exists after the split; meanwhile the two new defaults classes are imported but unused, triggering F401.
-__all__ = [
-    "CircusController",
-    "LocalConnector",
-    "PlannerConnector",
-    "KubernetesConnector",
-    "PlannerDefaults",
-]
+__all__ = [
+    "CircusController",
+    "LocalConnector",
+    "PlannerConnector",
+    "KubernetesConnector",
+    "LoadPlannerDefaults",
+    "SLAPlannerDefaults",
+]
After this change the two imports are “used” via __all__, silencing Ruff and preventing consumers from importing a non-existent symbol.

🧰 Tools

🪛 Ruff (0.11.9)

26-26: dynamo.planner.defaults.LoadPlannerDefaults imported but unused

(F401)

26-26: dynamo.planner.defaults.SLAPlannerDefaults imported but unused

(F401)

🪛 Pylint (3.3.7)

[error] 21-21: Undefined variable name 'PlannerDefaults' in all

(E0603)

♻️ Duplicate comments (2)

components/planner/src/dynamo/planner/planner_sla.py (1)

32-35: Fixed 30-second sleep is still here

The hard-coded INIT_PLANNER_START_DELAY = 30 remains, despite previous feedback. This blocks the event-loop and hard-codes an environment assumption. Please make it configurable (default 0) or poll for readiness of dependent components instead.

Also applies to: 105-110
components/planner/src/dynamo/planner/utils/planner_core.py (1)
221-235: Division by zero still possible for corrected ITL

corrected_itl = self.args.itl / self.d_correction_factor is executed without guarding against self.d_correction_factor == 0, an issue already raised earlier.
-            corrected_itl = self.args.itl / self.d_correction_factor
+            if self.d_correction_factor == 0:
+                logger.error("Decode correction factor is zero – skipping scaling step to avoid div-by-zero")
+                return
+            corrected_itl = self.args.itl / self.d_correction_factor

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90b1267 and b6f3161.

📒 Files selected for processing (9)

components/planner/src/dynamo/planner/__init__.py (1 hunks)
components/planner/src/dynamo/planner/defaults.py (1 hunks)
components/planner/src/dynamo/planner/planner_sla.py (1 hunks)
components/planner/src/dynamo/planner/utils/load_predictor.py (1 hunks)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (1 hunks)
components/planner/src/dynamo/planner/utils/planner_core.py (1 hunks)
deploy/sdk/src/dynamo/sdk/cli/utils.py (3 hunks)
examples/llm/components/planner.py (2 hunks)
examples/llm/components/planner_service.py (2 hunks)

✅ Files skipped from review due to trivial changes (2)

examples/llm/components/planner_service.py
deploy/sdk/src/dynamo/sdk/cli/utils.py

🚧 Files skipped from review as they are similar to previous changes (2)

examples/llm/components/planner.py
components/planner/src/dynamo/planner/utils/load_predictor.py

🧰 Additional context used

🧬 Code Graph Analysis (1)

components/planner/src/dynamo/planner/planner_sla.py (6)

components/planner/src/dynamo/planner/defaults.py (1)

SLAPlannerDefaults (38-46)

components/planner/src/dynamo/planner/utils/planner_core.py (2)

start_sla_planner (306-308)

Planner (51-303)

lib/bindings/python/src/dynamo/runtime/logging.py (1)

configure_dynamo_logging (77-105)

deploy/sdk/src/dynamo/sdk/lib/decorators.py (1)

async_on_start (95-99)

deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1)

ComponentType (43-46)

examples/llm/components/planner_service.py (4)

RequestType (32-33)

Planner (45-114)

async_init (103-109)

generate (112-114)

🪛 Ruff (0.11.9)

components/planner/src/dynamo/planner/__init__.py

26-26: dynamo.planner.defaults.LoadPlannerDefaults imported but unused

(F401)

26-26: dynamo.planner.defaults.SLAPlannerDefaults imported but unused

(F401)

🪛 Pylint (3.3.7)

components/planner/src/dynamo/planner/utils/planner_core.py

[refactor] 40-40: Too many instance attributes (8/7)

(R0902)

[refactor] 51-51: Too many instance attributes (18/7)

(R0902)

[refactor] 162-162: Too many branches (13/12)

(R0912)

components/planner/src/dynamo/planner/defaults.py

[refactor] 18-18: Too few public methods (0/2)

(R0903)

[refactor] 30-30: Too few public methods (0/2)

(R0903)

[refactor] 38-38: Too few public methods (0/2)

(R0903)

components/planner/src/dynamo/planner/planner_sla.py

[refactor] 37-37: Too few public methods (0/2)

(R0903)

components/planner/src/dynamo/planner/utils/perf_interpolation.py

[refactor] 56-56: Too many instance attributes (12/7)

(R0902)

🔇 Additional comments (3)

components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)

137-145: Axis-swap issue fixed – good catch

The earlier mix-up between x/y axes is now resolved (self.itl_interpolator[iy, ix] and same for throughput). ✅
No further concerns here.
components/planner/src/dynamo/planner/utils/planner_core.py (1)

152-156:
❌ Incorrect review comment

Formatting assumes metrics are non-None

{self.last_metrics.num_req:.2f} etc. will raise TypeError if Prometheus returns None. Either validate each metric or supply a fallback value before formatting.

🏁 Script executed:
#!/bin/bash
set -e

# Find references to last_metrics in planner_core.py
rg "last_metrics" -n components/planner/src/dynamo/planner/utils/planner_core.py

# Show lines around the formatting snippet
sed -n '140,180p' components/planner/src/dynamo/planner/utils/planner_core.py

# Locate any Metrics class or dataclass definitions under the dynamo folder
rg "class .*Metrics" -n components/planner/src/dynamo
rg "@dataclass" -n components/planner/src/dynamo
Length of output: 3623

🏁 Script executed:
#!/bin/bash
set -e

# Display the Metrics dataclass definition to inspect default values and types
sed -n '30,80p' components/planner/src/dynamo/planner/utils/planner_core.py
Length of output: 1896

🏁 Script executed:
#!/bin/bash
set -e

# Locate PrometheusAPIClient definition and related get_avg methods
rg "class PrometheusAPIClient" -n components/planner/src/dynamo/planner/utils/prometheus.py

# Show the file to inspect get_avg_* implementations
sed -n '1,200p' components/planner/src/dynamo/planner/utils/prometheus.py
Length of output: 4176
Ignore None‐check for metrics formatting
PrometheusAPIClient.get_avg_* methods always return a float (0.0 on error) and each metric is assigned immediately before the f-string logs. Therefore, self.last_metrics.num_req, isl, osl, ttft, and itl are guaranteed to be floats, and using .2f/.3f will not raise a TypeError. No additional validation or fallbacks are required here.

Likely an incorrect or invalid review comment.
components/planner/src/dynamo/planner/defaults.py (1)

18-28: All good – clear split between base / load / SLA defaults

The refactor cleanly separates shared and planner-specific parameters.

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 18-18: Too few public methods (0/2)

(R0903)

components/planner/src/dynamo/planner/utils/perf_interpolation.py

components/planner/src/dynamo/planner/utils/planner_core.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (5)

components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)

72-73: Hard-coded token-capacity fixed – looks good

Loading max_kv_tokens from the profiling artifact instead of a constant resolves the skew highlighted earlier.
components/planner/src/dynamo/planner/utils/planner_core.py (4)
170-186: ⚠️ Potential issue

Division by zero still possible in correction-factor maths

expect_ttft or expect_itl can be 0 (or None – see previous comment), causing ZeroDivisionError.
The identical issue was flagged in the previous review but is still present.
if expect_ttft == 0:
    logger.warning("Expected TTFT is zero – skipping correction factor update")
else:
    self.p_correction_factor = self.last_metrics.ttft / expect_ttft

...
if expect_itl == 0:
    logger.warning("Expected ITL is zero – skipping correction factor update")
else:
    self.d_correction_factor = self.last_metrics.itl / expect_itl
220-226: ⚠️ Potential issue

Guard against zero decode correction factor

self.d_correction_factor may be zero after the above fix or on first run, leading to a crash when computing corrected_itl.
if self.d_correction_factor == 0:
    logger.error("Decode correction factor is zero – cannot compute corrected ITL")
    return
corrected_itl = self.args.itl / self.d_correction_factor
269-284: 🛠️ Refactor suggestion

Scaling race TODO still unresolved – risk of concurrent operations

Without a lock/flag, a new adjustment cycle may enqueue scaling while the previous one is ongoing, especially in Kubernetes where scale-up is slow.
Implement a simple asyncio.Lock or an atomic “scaling-in-progress” flag before releasing this feature.

125-156: ⚠️ Potential issue

Metric values may be None – logging & math will crash

PrometheusAPIClient can return None on query failure.
Formatting with :.2f / :.3f or later arithmetic (num_req, isl, etc.) will raise TypeError.
self.last_metrics.ttft = ...
...
# validate
for name, val in vars(self.last_metrics).items():
    if val is None:
        logger.warning("%s metric unavailable, skipping adjustment interval", name)
        return  # abort this cycle early

🧹 Nitpick comments (1)

components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)
158-161: Backward scan in find_best_throughput_per_gpu is O(resolution) – use vectorised search

A full Python for loop over resolution (100) is fine now, but a larger grid will regress.
Consider np.where to locate the first valid index in native code:
valid_ix = np.where(self.itl_interpolator[iy, :] <= itl)[0]
if valid_ix.size:
    ix = valid_ix[-1]          # right-most idx
    return self.thpt_interpolator[iy, ix]
return self.thpt_interpolator[iy, 0]

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6f3161 and a4acd2b.

📒 Files selected for processing (3)

components/planner/src/dynamo/planner/__init__.py (1 hunks)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (1 hunks)
components/planner/src/dynamo/planner/utils/planner_core.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

components/planner/src/dynamo/planner/init.py

🧰 Additional context used

🪛 Pylint (3.3.7)

components/planner/src/dynamo/planner/utils/planner_core.py

[refactor] 40-40: Too many instance attributes (8/7)

(R0902)

[refactor] 51-51: Too many instance attributes (18/7)

(R0902)

[refactor] 162-162: Too many branches (13/12)

(R0912)

components/planner/src/dynamo/planner/utils/perf_interpolation.py

[refactor] 56-56: Too many instance attributes (12/7)

(R0902)

components/planner/src/dynamo/planner/utils/perf_interpolation.py

…/sla_planner_v2

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

docs/architecture/planner.md (1)
38-40: Improve link text clarity for planner documentation.

Using generic link text like “docs” can be ambiguous. Consider adopting more descriptive labels, for example:
-1. Load-based planner: [docs](load_planner.md)
-2. SLA-based planner: [docs](sla_planner.md)
+1. Load-based planner documentation: see [load-based planner docs](load_planner.md)
+2. SLA-based planner documentation: see [SLA-based planner docs](sla_planner.md)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a4acd2b and e024c59.

📒 Files selected for processing (1)

docs/architecture/planner.md (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)
GitHub Check: Build and Test - vllm

components/planner/src/dynamo/planner/prometheus.py

examples/vllm_v0/components/frontend.py

container/deps/requirements.txt

alec-flowers · 2025-06-13T19:22:26Z

It still feels like its in a POC phase. What is the largest trials and such that we have run? Can we note down the largest scale experiments and testing that we have done?

Co-authored-by: Alec <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>

tedzhouhk · 2025-06-13T23:29:02Z

It still feels like its in a POC phase. What is the largest trials and such that we have run? Can we note down the largest scale experiments and testing that we have done?

@alec-flowers Yes, SLA planner in 0.3.1 will be an experimental release. I will run large scale exps in k8s to tune it and it will be the next release.

…/sla_planner_v2

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

components/planner/src/dynamo/planner/prometheus.py (3)
38-38: Doc-string still references “Frontend service”
The constructor comment is unchanged from the original copy-paste and is misleading for a Prometheus service.
-        """Initialize Frontend service with HTTP server and model configuration."""
+        """Spin-up an embedded Prometheus server using the service configuration."""
46-55: ⚠️ Potential issue

Temp file never cleaned up & not wrapped in a context manager
NamedTemporaryFile(delete=False) persists on disk but there is no teardown path, so every container run leaves an orphaned YAML file. This was flagged previously.
@@
-        self.temp_file = tempfile.NamedTemporaryFile(
-            mode="w", suffix=".yml", delete=False
-        )
-        yaml.dump(self.config, self.temp_file)
-        self.temp_file.close()
+        with tempfile.NamedTemporaryFile(mode="w",
+                                         suffix=".yml",
+                                         delete=False) as tmp:
+            yaml.dump(self.config, tmp)
+            self._config_path = tmp.name
Add a destructor (or framework-specific shutdown hook) to remove the file:
+    def __del__(self):
+        """Terminate child process and delete temporary config on GC/shutdown."""
+        if getattr(self, "process", None) and self.process.poll() is None:
+            self.process.terminate()
+        if getattr(self, "_config_path", None):
+            try:
+                os.remove(self._config_path)
+            except OSError:
+                logger.warning("Failed to delete Prometheus config %s", self._config_path)
(Remember import os at the top.)

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 49-51: Consider using 'with' for resource-allocating operations

(R1732)

56-67: 🛠️ Refactor suggestion

subprocess.Popen launched without safety nets
No error handling for a missing prometheus binary and stdout/stderr are discarded, making failures silent. This was noted in the last review.
-        self.process = subprocess.Popen(
-            cmd,
-            stdout=None,
-            stderr=None,
-        )
+        try:
+            self.process = subprocess.Popen(
+                cmd,
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                text=True,
+            )
+        except FileNotFoundError as exc:
+            logger.error("Prometheus binary not found: %s", exc)
+            raise
Consider a watchdog or self.process.wait() in a background task to restart on crash.

🧰 Tools

🪛 Pylint (3.3.7)

[refactor] 63-67: Consider using 'with' for resource-allocating operations

(R1732)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)
62-78: ⚠️ Potential issue

Guard against resolution < 2 to avoid div-by-zero

compute_idx divides by (self.xi[1] - self.xi[0]). With resolution==1 this is 0, raising ZeroDivisionError. The same concern was raised earlier but is still unresolved.
-        self.resolution = resolution
+        if resolution < 2:
+            raise ValueError("resolution must be ≥ 2")
+        self.resolution = resolution

🧹 Nitpick comments (2)

components/planner/src/dynamo/planner/utils/perf_interpolation.py (2)
16-18: Prefer module-level import for interpolate

Importing the full scipy package loads a large namespace unnecessarily. Import only the required sub-module to keep import time and memory footprint low:
-import scipy
+from scipy import interpolate
and adjust calls (scipy.interpolate.* → interpolate.*).

146-161: Minor efficiency nit: break early once a match is found

The reverse scan exits via return, but Python still evaluates the loop condition after return is prepared. Slight micro-optimisation:
-        for ix in range(self.resolution - 1, -1, -1):
-            if self.itl_interpolator[iy, ix] <= itl:
-                return self.thpt_interpolator[iy, ix]
+        for ix in range(self.resolution - 1, -1, -1):
+            if self.itl_interpolator[iy, ix] <= itl:
+                return self.thpt_interpolator[iy, ix]  # early exit

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e024c59 and 6a9437e.

📒 Files selected for processing (2)

components/planner/src/dynamo/planner/prometheus.py (1 hunks)
components/planner/src/dynamo/planner/utils/perf_interpolation.py (1 hunks)

🧰 Additional context used

🪛 Pylint (3.3.7)

components/planner/src/dynamo/planner/prometheus.py

[refactor] 49-51: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 63-67: Consider using 'with' for resource-allocating operations

(R1732)

[refactor] 36-36: Too few public methods (1/2)

(R0903)

components/planner/src/dynamo/planner/utils/perf_interpolation.py

[refactor] 56-56: Too many instance attributes (12/7)

(R0902)

🔇 Additional comments (2)

components/planner/src/dynamo/planner/prometheus.py (1)

61-61: Log level correctly set to INFO – good catch
Previous warning-level log has been adjusted to logger.info, matching reviewer feedback.
components/planner/src/dynamo/planner/utils/perf_interpolation.py (1)

31-45: Double-check TTFT units

DecodeInterpolator converts ITL from ms → s (line 97) but PrefillInterpolator leaves TTFT untouched. If the downstream planner expects both latency metrics in the same unit, this asymmetry will introduce subtle bugs.

Confirm the unit of prefill_ttft in the NPZ; convert to seconds if it is stored in milliseconds.
-        self.prefill_ttft = raw_data["prefill_ttft"]
+        self.prefill_ttft = raw_data["prefill_ttft"] / 1000  # ms → s

components/planner/src/dynamo/planner/utils/perf_interpolation.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a9437e and b08ddfd.

📒 Files selected for processing (2)

container/Dockerfile.vllm (1 hunks)
docs/architecture/planner.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

container/Dockerfile.vllm

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: Build and Test - vllm

docs/architecture/planner.md

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <[email protected]>

Signed-off-by: Hongkuan Zhou <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

initial impl

2f44c26

pull-request-size bot added the size/XXL label Jun 6, 2025

github-actions bot added the feat label Jun 6, 2025

tedzhouhk and others added 11 commits June 6, 2025 16:18

doc

1e67375

add connector to sla-planner

609bd8a

bug fix

cf5a27b

add note

dcfa9f6

correct note

cf6f825

pc

053d703

pc

d55b990

mypy

5c60e46

mypy

3e8b88b

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

3709874

…/sla_planner_v2

Merge branch 'main' into hzhou/sla_planner_v2

6cb95b0

julienmancuso reviewed Jun 11, 2025

View reviewed changes

components/planner/src/dynamo/planner/local_connector.py Show resolved Hide resolved

tedzhouhk mentioned this pull request Jun 11, 2025

[FEATURE]: Support SGLANG in Dynamo Planner #1196

Open

hhzhang16 reviewed Jun 11, 2025

View reviewed changes

docs/architecture/planner.md Outdated Show resolved Hide resolved

tedzhouhk and others added 7 commits June 11, 2025 20:46

Update docs/architecture/planner.md

5d0fef6

Co-authored-by: hhzhang16 <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

0f2cd44

…/sla_planner_v2

stash

6f69a79

Merge branch 'hzhou/sla_planner_v2' of https://github.com/ai-dynamo/d…

29380ee

…ynamo into hzhou/sla_planner_v2

stage

2aae82e

finish refactor

fa6702c

update doc

19d1fc5

tedzhouhk marked this pull request as ready for review June 12, 2025 23:22

tedzhouhk requested review from GuanLuo, alec-flowers, biswapanda and grahamking as code owners June 12, 2025 23:22

better code quality

b6f3161

tedzhouhk requested a review from hutm as a code owner June 13, 2025 17:13

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

components/planner/src/dynamo/planner/utils/perf_interpolation.py Show resolved Hide resolved

components/planner/src/dynamo/planner/utils/planner_core.py Show resolved Hide resolved

tedzhouhk added 3 commits June 13, 2025 10:59

fix init.py

66b8402

fix code for debug

52787de

fix typo

a4acd2b

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

components/planner/src/dynamo/planner/utils/perf_interpolation.py Show resolved Hide resolved

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

e024c59

…/sla_planner_v2

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

pc

9c66f11

alec-flowers reviewed Jun 13, 2025

View reviewed changes

components/planner/src/dynamo/planner/prometheus.py Outdated Show resolved Hide resolved

alec-flowers reviewed Jun 13, 2025

View reviewed changes

examples/vllm_v0/components/frontend.py Show resolved Hide resolved

alec-flowers reviewed Jun 13, 2025

View reviewed changes

container/deps/requirements.txt Show resolved Hide resolved

alec-flowers approved these changes Jun 13, 2025

View reviewed changes

Update components/planner/src/dynamo/planner/prometheus.py

6a9437e

Co-authored-by: Alec <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

b08ddfd

…/sla_planner_v2

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

components/planner/src/dynamo/planner/utils/perf_interpolation.py Show resolved Hide resolved

coderabbitai bot reviewed Jun 13, 2025

View reviewed changes

docs/architecture/planner.md Outdated Show resolved Hide resolved

Update docs/architecture/planner.md

41e7d23

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <[email protected]>

PeaBrane approved these changes Jun 13, 2025

View reviewed changes

biswapanda approved these changes Jun 13, 2025

View reviewed changes

tedzhouhk merged commit 3f53a78 into main Jun 14, 2025
11 checks passed

tedzhouhk deleted the hzhou/sla_planner_v2 branch June 14, 2025 01:40

coderabbitai bot mentioned this pull request Jun 27, 2025

feat: support sla planner in vllm_v1 example #1680

Merged

This was referenced Jul 22, 2025

feat: deploy SLA profiler to k8s #2030

Merged

feat: Deploy SLA planner to Kubernetes #2135

Merged

coderabbitai bot mentioned this pull request Sep 2, 2025

chore: many bug fixes and improvements when testing planner #2776

Merged

feat: SLA-based Planner #1420

feat: SLA-based Planner #1420

Uh oh!

Conversation

tedzhouhk commented Jun 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alec-flowers commented Jun 13, 2025

Uh oh!

tedzhouhk commented Jun 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

tedzhouhk commented Jun 6, 2025 •

edited by coderabbitai bot

Loading