Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
211bbab
fix: circular rust dynamo-parsers, dynamo-llm dependency (#3607) (#3…
saturley-hall Oct 14, 2025
0b6ef5f
chore: update the relevant my-registry and my-tag (#3611)
saturley-hall Oct 14, 2025
a25268d
chore: typo and new commands (#3617) (#3625)
saturley-hall Oct 14, 2025
a61a800
feat: cherry pick PR#3306 benchmarks use aiperf (#3626)
saturley-hall Oct 14, 2025
c4b41fd
feat: add pre-deployment check for storageclass (#3573) (#3608)
biswapanda Oct 14, 2025
58e063c
docs: Reorganize structure - move files to logical sections
athreesh Oct 15, 2025
7988582
docs: Update all internal links to new structure
athreesh Oct 15, 2025
1031848
docs: Fix hidden_toctree.rst paths
athreesh Oct 15, 2025
a4bd70e
docs: Add URL migration guide for external maintainers
athreesh Oct 15, 2025
798c337
docs: Add reorganization summary
athreesh Oct 15, 2025
3e72444
docs: Move kvbm/, planner/, router/ to top level
athreesh Oct 15, 2025
50321ff
docs: Add final structure documentation
athreesh Oct 15, 2025
013aa24
docs: Fix all internal links after docs reorganization
athreesh Oct 15, 2025
5a34105
docs: Fix broken relative link to planner docs
athreesh Oct 15, 2025
41c0585
docs: Remove leading slashes from README doc links
athreesh Oct 15, 2025
66f6083
docs: Fix all broken documentation links after reorganization
athreesh Oct 15, 2025
4e00ecc
Merge branch 'main' into docs-reorg
athreesh Oct 15, 2025
7d69a60
Merge branch 'main' into docs-reorg
athreesh Oct 15, 2025
165276f
chore: update sglang container and version (#3647)
ishandhanani Oct 15, 2025
ec47178
fix: cherrypick cuda 129 (#3652)
alec-flowers Oct 15, 2025
23c6273
Merge branch 'main' into docs-reorg
athreesh Oct 16, 2025
cc220c6
Merge branch 'docs-reorg' of https://github.com/ai-dynamo/dynamo into…
athreesh Oct 16, 2025
c62ac73
Merge remote-tracking branch 'origin/main' into docs-reorg
athreesh Oct 16, 2025
af01be2
Merge docs-reorg into release/0.6.0
athreesh Oct 16, 2025
b52f23a
docs: fix broken links after documentation reorganization
dagil-nvidia Oct 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
481 changes: 481 additions & 0 deletions DGD_ARCHITECTURE_ANALYSIS.md

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions Earthfile
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ dynamo-build:

dynamo-base-docker:
ARG IMAGE=dynamo-base-docker
ARG DOCKER_SERVER=my-registry
ARG DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo
ARG IMAGE_TAG=latest

FROM ubuntu:24.04
Expand Down Expand Up @@ -175,7 +175,7 @@ all-test:
BUILD ./deploy/cloud/operator+test

all-docker:
ARG DOCKER_SERVER=my-registry
ARG DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo
ARG IMAGE_TAG=latest
BUILD ./deploy/cloud/operator+docker --DOCKER_SERVER=$DOCKER_SERVER --IMAGE_TAG=$IMAGE_TAG

Expand All @@ -189,6 +189,6 @@ all:

# For testing
custom:
ARG DOCKER_SERVER=my-registry
ARG DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo
ARG IMAGE_TAG=latest
BUILD +all-test
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,9 @@ Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLa
| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |
| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |
| [**Load Based Planner**](/docs/architecture/load_planner.md) | 🚧 | 🚧 | 🚧 |
| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | ✅ | ✅ |
| [**KVBM**](/docs/architecture/kvbm_architecture.md) | ✅ | 🚧 | ✅ |
| [**Load Based Planner**](docs/planner/load_planner.md) | 🚧 | 🚧 | 🚧 |
| [**SLA-Based Planner**](docs/planner/sla_planner.md) | ✅ | ✅ | ✅ |
| [**KVBM**](docs/kvbm/kvbm_architecture.md) | ✅ | 🚧 | ✅ |

To learn more about each framework and their capabilities, check out each framework's README!

Expand All @@ -74,7 +74,7 @@ Built in Rust for performance and in Python for extensibility, Dynamo is fully o
# Installation

The following examples require a few system level packages.
Recommended to use Ubuntu 24.04 with a x86_64 CPU. See [docs/support_matrix.md](docs/support_matrix.md)
Recommended to use Ubuntu 24.04 with a x86_64 CPU. See [docs/reference/support-matrix.md](docs/reference/support-matrix.md)

## 1. Initial setup

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/incluster/benchmark_job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ spec:
containers:
- name: benchmark-runner
# TODO: update to latest public image in next release
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0
securityContext:
allowPrivilegeEscalation: false
capabilities:
Expand Down
170 changes: 165 additions & 5 deletions benchmarks/profiler/profile_sla.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import asyncio
import logging
import math
Expand All @@ -22,9 +23,13 @@
import yaml

from benchmarks.profiler.utils.aiperf import benchmark_decode, benchmark_prefill
from benchmarks.profiler.utils.config import generate_dgd_config_with_planner
from benchmarks.profiler.utils.config_modifiers import CONFIG_MODIFIERS
from benchmarks.profiler.utils.config import (
CONFIG_MODIFIERS,
WORKER_COMPONENT_NAMES,
generate_dgd_config_with_planner,
)
from benchmarks.profiler.utils.estimate_perf import AIConfiguratorPerfEstimator
from benchmarks.profiler.utils.planner_utils import add_planner_arguments_to_parser
from benchmarks.profiler.utils.plot import (
plot_decode_performance,
plot_prefill_performance,
Expand All @@ -44,12 +49,10 @@
profile_prefill,
profile_prefill_aiconfigurator,
)
from benchmarks.profiler.utils.profiler_argparse import create_profiler_parser
from deploy.utils.dynamo_deployment import (
DynamoDeploymentClient,
cleanup_remaining_deployments,
)
from dynamo.planner.defaults import WORKER_COMPONENT_NAMES

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
Expand Down Expand Up @@ -740,9 +743,166 @@ async def run_profile(args):
await cleanup_remaining_deployments(deployment_clients, args.namespace)
logger.info("Final cleanup completed.")

# deploy the optimized DGD with planner
if args.deploy_after_profile and not args.dry_run:
logger.info("Deploying the optimized DGD with planner...")
# TODO: check conflicts for dynamo namespace and DGD name
# TODO: handle deployment errors and propagate proper error messages to users
client = DynamoDeploymentClient(
namespace=args.namespace,
base_log_dir=f"{args.output_dir}/final_deployment",
model_name=model_name,
service_name=args.service_name,
frontend_port=frontend_port,
deployment_name=config["metadata"]["name"],
)
await client.create_deployment(f"{args.output_dir}/config_with_planner.yaml")

Comment on lines +746 to +760

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Risk of undefined variables and missing deployment verification.

This post-profile deployment logic has several critical issues:

  1. Undefined variable risk: The variables model_name (line 754), frontend_port (line 756), and config (line 757) are defined inside the try block (lines 121, 171, 721). If an exception occurs before these variables are set, this code will fail with a NameError.

  2. No deployment verification: The deployment is created (line 759) but there's no await client.wait_for_deployment_ready() call. This means the function returns without confirming the deployment is functional.

  3. No error handling: There's no try-except block around the deployment creation. Deployment failures will either crash the script or leave broken deployments.

  4. No cleanup mechanism: Unlike the deployments in the try block which are tracked in deployment_clients and cleaned up in the finally block, this deployment is never added to that list and has no cleanup path.

  5. Incomplete implementation: The TODO comments (lines 749-750) confirm that conflict checking and error handling are missing.

Consider this structure:

+    # deploy the optimized DGD with planner
+    if args.deploy_after_profile and not args.dry_run:
+        try:
+            logger.info("Deploying the optimized DGD with planner...")
+            client = DynamoDeploymentClient(
+                namespace=args.namespace,
+                base_log_dir=f"{args.output_dir}/final_deployment",
+                model_name=model_name,
+                service_name=args.service_name,
+                frontend_port=frontend_port,
+                deployment_name=config["metadata"]["name"],
+            )
+            await client.create_deployment(f"{args.output_dir}/config_with_planner.yaml")
+            logger.info("Waiting for final deployment to be ready...")
+            await client.wait_for_deployment_ready(timeout=1800)
+            logger.info("Final deployment is ready and operational")
+        except Exception as e:
+            logger.error(f"Failed to deploy optimized DGD with planner: {e}")
+            logger.info("Attempting to clean up failed deployment...")
+            try:
+                await client.delete_deployment()
+            except Exception as cleanup_error:
+                logger.warning(f"Failed to clean up deployment: {cleanup_error}")
+            raise

Additionally, move this block inside the try block (before line 736) or ensure all required variables are defined with fallback values before this block executes.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In benchmarks/profiler/profile_sla.py around lines 746 to 760, the post-profile
deployment uses variables (model_name, frontend_port, config) that may be
undefined, lacks error handling, verification and cleanup registration; move
this deployment block inside the existing try block (or ensure those variables
have safe defaults set earlier), add a pre-deploy check for namespace/DGD name
conflicts, wrap client.create_deployment in a try/except to log and surface
errors, call await client.wait_for_deployment_ready() after creation to verify
readiness, and append the client to deployment_clients so it will be cleaned up
in the finally block.


if __name__ == "__main__":
args = create_profiler_parser()
parser = argparse.ArgumentParser(
description="Profile the TTFT and ITL of the Prefill and Decode engine with different parallelization mapping. When profiling prefill we mock/fix decode,when profiling decode we mock/fix prefill."
)
parser.add_argument(
"--namespace",
type=str,
default="dynamo-sla-profiler",
help="Kubernetes namespace to deploy the DynamoGraphDeployment",
)
parser.add_argument(
"--backend",
type=str,
default="vllm",
choices=["vllm", "sglang", "trtllm"],
help="backend type, currently support [vllm, sglang, trtllm]",
)
parser.add_argument(
"--config",
type=str,
required=True,
help="Path to the DynamoGraphDeployment config file",
)
parser.add_argument(
"--output-dir",
type=str,
default="profiling_results",
help="Path to the output results directory",
)
parser.add_argument(
"--min-num-gpus-per-engine",
type=int,
default=1,
help="minimum number of GPUs per engine",
)
parser.add_argument(
"--max-num-gpus-per-engine",
type=int,
default=8,
help="maximum number of GPUs per engine",
)
parser.add_argument(
"--skip-existing-results",
action="store_true",
help="Skip TP sizes that already have results in the output directory",
)
parser.add_argument(
"--force-rerun",
action="store_true",
help="Force re-running all tests even if results already exist (overrides --skip-existing-results)",
)
parser.add_argument(
"--isl", type=int, default=3000, help="target input sequence length"
)
parser.add_argument(
"--osl", type=int, default=500, help="target output sequence length"
)
parser.add_argument(
"--ttft", type=int, default=50, help="target Time To First Token in ms"
)
parser.add_argument(
"--itl", type=int, default=10, help="target Inter Token Latency in ms"
)

# arguments used for interpolating TTFT and ITL under different ISL/OSL
parser.add_argument(
"--max-context-length",
type=int,
default=16384,
help="maximum context length supported by the served model",
)
parser.add_argument(
"--prefill-interpolation-granularity",
type=int,
default=16,
help="how many samples to benchmark to interpolate TTFT under different ISL",
)
parser.add_argument(
"--decode-interpolation-granularity",
type=int,
default=6,
help="how many samples to benchmark to interpolate ITL under different active kv cache size and decode context length",
)
parser.add_argument(
"--service-name",
type=str,
default="",
help="Service name for port forwarding (default: {deployment_name}-frontend)",
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Dry run the profile job",
)
parser.add_argument(
"--is-moe-model",
action="store_true",
dest="is_moe_model",
help="Enable MoE (Mixture of Experts) model support, use TEP for prefill and DEP for decode",
)
parser.add_argument(
"--num-gpus-per-node",
type=int,
default=8,
help="Number of GPUs per node for MoE models - this will be the granularity when searching for the best TEP/DEP size",
)

# arguments for dgd config generation and deployment
parser.add_argument(
"--deploy-after-profile",
action="store_true",
help="deploy the optimized DGD with planner",
)
# Dynamically add all planner arguments from planner_argparse.py
add_planner_arguments_to_parser(parser, prefix="planner-")

# arguments if using aiconfigurator
parser.add_argument(
"--use-ai-configurator",
action="store_true",
help="Use ai-configurator to estimate benchmarking results instead of running actual deployment.",
)
parser.add_argument(
"--aic-system",
type=str,
help="Target system for use with aiconfigurator (e.g. h100_sxm, h200_sxm)",
)
parser.add_argument(
"--aic-model-name",
type=str,
help="aiconfigurator name of the target model (e.g. QWEN3_32B, DEEPSEEK_V3)",
)
parser.add_argument(
"--aic-backend",
type=str,
default="",
help="aiconfigurator backend of the target model, if not provided, will use args.backend",
)
parser.add_argument(
"--aic-backend-version",
type=str,
help="Specify backend version when using aiconfigurator to estimate perf.",
)
args = parser.parse_args()

# setup file logging
os.makedirs(args.output_dir, exist_ok=True)
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/profiler/utils/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ class DgdPlannerServiceConfig(BaseModel):
volumeMounts: list[VolumeMount] = [VolumeMount()]
extraPodSpec: PodSpec = PodSpec(
mainContainer=Container(
image="my-registry/dynamo-runtime:my-tag", # placeholder
image="nvcr.io/nvidia/ai-dynamo/dynamo-runtime:0.6.0", # placeholder
workingDir="/workspace/components/src/dynamo/planner",
command=["python3", "-m", "planner_sla"],
args=[],
Expand Down
4 changes: 2 additions & 2 deletions components/backends/sglang/deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ resources:
```yaml
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
workingDir: /workspace/components/backends/sglang
args:
- "python3"
Expand Down Expand Up @@ -92,7 +92,7 @@ Edit the template to match your environment:

```yaml
# Update image registry and tag
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0

# Configure your model
args:
Expand Down
4 changes: 2 additions & 2 deletions components/backends/sglang/deploy/agg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ spec:
replicas: 1
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
decode:
envFromSecret: hf-token-secret
dynamoNamespace: sglang-agg
Expand All @@ -24,7 +24,7 @@ spec:
gpu: "1"
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
workingDir: /workspace/components/backends/sglang
command:
- python3
Expand Down
4 changes: 2 additions & 2 deletions components/backends/sglang/deploy/agg_logging.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ spec:
replicas: 1
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
decode:
envFromSecret: hf-token-secret
dynamoNamespace: sglang-agg
Expand All @@ -27,7 +27,7 @@ spec:
gpu: "1"
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
workingDir: /workspace/components/backends/sglang
command:
- python3
Expand Down
4 changes: 2 additions & 2 deletions components/backends/sglang/deploy/agg_router.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ spec:
replicas: 1
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
envs:
- name: DYN_ROUTER_MODE
value: kv
Expand All @@ -27,7 +27,7 @@ spec:
gpu: "1"
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
workingDir: /workspace/components/backends/sglang
command:
- python3
Expand Down
6 changes: 3 additions & 3 deletions components/backends/sglang/deploy/disagg-multinode.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ spec:
replicas: 1
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
decode:
multinode:
nodeCount: 2
Expand All @@ -35,7 +35,7 @@ spec:
gpu: "4"
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
workingDir: /workspace/components/backends/sglang
command:
- python3
Expand Down Expand Up @@ -72,7 +72,7 @@ spec:
gpu: "4"
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
workingDir: /workspace/components/backends/sglang
command:
- python3
Expand Down
6 changes: 3 additions & 3 deletions components/backends/sglang/deploy/disagg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ spec:
replicas: 1
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
decode:
envFromSecret: hf-token-secret
dynamoNamespace: sglang-disagg
Expand All @@ -25,7 +25,7 @@ spec:
gpu: "1"
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
workingDir: /workspace/components/backends/sglang
command:
- python3
Expand Down Expand Up @@ -61,7 +61,7 @@ spec:
gpu: "1"
extraPodSpec:
mainContainer:
image: my-registry/sglang-runtime:my-tag
image: nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0
workingDir: /workspace/components/backends/sglang
command:
- python3
Expand Down
Loading
Loading