fix: fix how model card is found in router bindings by atchernych · Pull Request #3753 · ai-dynamo/dynamo

atchernych · 2025-10-20T21:23:24Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added pre-deployment validation scripts for Kubernetes cluster checks
- Added AIPerf benchmarking tool integration for performance profiling
- Enhanced SGLang disaggregation documentation and deployment configurations
Bug Fixes & Improvements
- Standardized container image registry to nvcr.io/nvidia/ai-dynamo
- Pinned container versions to 0.6.0 for consistency
- Converted timing metrics (TTFT/ITL) from seconds to milliseconds
- Added Git LFS support for Python dependencies
- Updated AIConfigurator to use official repository
Documentation
- Expanded deployment guides and pre-deployment setup documentation
- Updated benchmarking references from GenAI-Perf to AIPerf
- Added NIXL benchmark deployment guide
Chores
- Updated all deployment manifests with standardized image references
- Replaced GenAI-Perf references with AIPerf throughout codebase

Signed-off-by: Graham King <grahamk@nvidia.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Graham King <grahamk@nvidia.com>

Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>

Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: lkomali <lkomali@nvidia.com> Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>

Signed-off-by: Harrison Saturley-Hall <harrison.saturley.hall@gmail.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

Signed-off-by: alec-flowers <aflowers@nvidia.com>

…template #3637 (#3656)

Signed-off-by: Anant Sharma <anants@nvidia.com>

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

…ly (#3686) Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

…3682) Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>

Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>

#3689) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

…and downloading (#3692) Signed-off-by: PeaBrane <yanrpei@gmail.com>

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

Signed-off-by: Andrew Schilling <aschilling@nvidia.com> Co-authored-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>

Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Dillon Cullinan <dcullinan92@gmail.com>

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>

coderabbitai · 2025-10-20T21:29:41Z

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request integrates multiple major updates: (1) enhances the Docker build GitHub Action with configurable build parameters (base image tag, runtime image tag, CUDA version, Torch backend), (2) migrates all benchmarking from genai-perf to aiperf, (3) transitions container image registries from generic (my-registry) to NVIDIA's official (nvcr.io/nvidia/ai-dynamo) with version 0.6.0, (4) updates TTFT/ITL units from seconds to milliseconds, and (5) adds Kubernetes pre-deployment validation infrastructure.

Changes

Cohort / File(s)	Summary
GitHub Actions & Workflows `.github/actions/docker-build/action.yml`, `.github/workflows/container-validation-backends.yml`	Added four optional input parameters (base_image_tag, runtime_image_tag, cuda_version, torch_backend) to the Docker build action for propagating build-time overrides. Updated container-validation workflow to include new operator job with arm64-specific conditional tag inputs and dependencies.
Build & Registry Configuration `Earthfile`, `deploy/cloud/operator/Earthfile`, `container/deps/requirements.txt`	Updated DOCKER_SERVER registry from my-registry to nvcr.io/nvidia/ai-dynamo. Added UV_GIT_LFS=1 flag during Python package installation. Updated aiconfigurator dependency to git-sourced version and added aiperf as git dependency; updated pydantic and scipy constraints.
Benchmarking Tool Migration `benchmarks/llm/perf.sh`, `benchmarks/llm/plot_pareto.py`, `benchmarks/profiler/utils/aiperf.py`, `benchmarks/utils/aiperf.py`, `benchmarks/router/prefix_ratio_benchmark.py`, `benchmarks/router/real_data_benchmark.py`, `benchmarks/utils/workflow.py`, `benchmarks/utils/plot.py`, `benchmarks/profiler/profile_sla.py`, `benchmarks/profiler/utils/profile_cache.py`, `benchmarks/profiler/utils/profile_decode.py`, `benchmarks/profiler/utils/profile_prefill.py`	Comprehensive migration from genai-perf to aiperf: renamed functions, updated CLI commands, replaced artifact paths from profile_export_genai_perf.json to profile_export_aiperf.json, removed --max-threads arguments, added aggregate_results function for router benchmarks. Updated CLI parameter types for TTFT/ITL from int to float.
Benchmarking Configuration & Utils `benchmarks/profiler/utils/config.py`, `benchmarks/profiler/utils/estimate_perf.py`, `benchmarks/profiler/deploy/profile_sla_aic_job.yaml`	Guarded kv_cache_config initialization in convert_config for multiple backends; improved break_arguments to preserve JSON-like values; added backend argument to aiconfigurator.sdk.models.get_model call. Renamed --backend-version flag to --aic-backend-version. Updated DgdPlannerServiceConfig image from my-registry/dynamo-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/dynamo-runtime:0.6.0.
vLLM Runtime Deployments `components/backends/vllm/deploy/agg.yaml`, `components/backends/vllm/deploy/agg_kvbm.yaml`, `components/backends/vllm/deploy/agg_router.yaml`, `components/backends/vllm/deploy/disagg.yaml`, `components/backends/vllm/deploy/disagg-multinode.yaml`, `components/backends/vllm/deploy/disagg_kvbm.yaml`, `components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml`, `components/backends/vllm/deploy/disagg_kvbm_tp2.yaml`, `components/backends/vllm/deploy/disagg_planner.yaml`, `components/backends/vllm/deploy/disagg_router.yaml`, `components/backends/vllm/deploy/README.md`, `components/backends/vllm/launch/dsr1_dep.sh`	Updated container image tags from nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 across all deployments. Changed max-model-len from 10240 to 4096 and gpu-memory-utilization from 0.95 to 0.9 in dsr1_dep.sh.
SGLang Runtime Deployments `components/backends/sglang/deploy/agg.yaml`, `components/backends/sglang/deploy/agg_logging.yaml`, `components/backends/sglang/deploy/agg_router.yaml`, `components/backends/sglang/deploy/disagg.yaml`, `components/backends/sglang/deploy/disagg-multinode.yaml`, `components/backends/sglang/deploy/disagg_planner.yaml`, `components/backends/sglang/deploy/README.md`	Updated container images from my-registry/sglang-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0. Added --host 0.0.0.0 and --disaggregation-bootstrap-port 12345 flags to disaggregation configurations.
SGLang Launch & Documentation `components/backends/sglang/launch/disagg_dp_attn.sh`, `docs/backends/sglang/README.md`, `docs/backends/sglang/dsr1-wideep-h100.md`, `docs/backends/sglang/dsr1-wideep-gb200.md`, `docs/backends/sglang/multinode-examples.md`, `docs/backends/sglang/sglang-disaggregation.md`	Removed expert-distribution-recorder configuration; replaced with load-balance-method and prefill-round-robin-balance flags. Added comprehensive SGLang disaggregation documentation. Updated build and deployment guidance with multi-arch and BRANCH_TYPE support.
TensorRT-LLM Runtime Deployments `components/backends/trtllm/deploy/agg.yaml`, `components/backends/trtllm/deploy/agg_router.yaml`, `components/backends/trtllm/deploy/disagg.yaml`, `components/backends/trtllm/deploy/disagg-multinode.yaml`, `components/backends/trtllm/deploy/disagg_planner.yaml`, `components/backends/trtllm/deploy/disagg_router.yaml`, `components/backends/trtllm/deploy/agg-with-config.yaml`, `components/backends/trtllm/deploy/README.md`, `components/backends/trtllm/performance_sweeps/...`	Updated container images to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0. Migrated benchmarking artifacts from genai_perf_artifacts to aiperf_artifacts and updated related file patterns.
Container Dockerfiles `container/Dockerfile`, `container/Dockerfile.sglang`, `container/Dockerfile.sglang-wideep`, `container/Dockerfile.trtllm`, `container/Dockerfile.vllm`, `container/deps/vllm/install_vllm.sh`, `deploy/cloud/operator/Dockerfile`	Added git-lfs package, UV_GIT_LFS=1 environment variable during pip installs, and --no-cache flags. Refactored Dockerfile.sglang-wideep with BRANCH_TYPE conditional logic for local/remote Dynamo installation. Updated vLLM PyPI installation condition to support arm64 with cu129 backend. Removed TARGETARCH default in operator Dockerfile.
Timing Units (TTFT/ITL) `benchmarks/profiler/profile_sla.py`, `components/src/dynamo/planner/defaults.py`, `components/src/dynamo/planner/utils/perf_interpolation.py`, `components/src/dynamo/planner/utils/planner_argparse.py`, `components/src/dynamo/planner/utils/planner_core.py`	Changed TTFT and ITL units from seconds to milliseconds: updated defaults from 0.5/0.05 to 500.0/50.0, added millisecond unit specifications to help text, converted Prometheus metrics by multiplying by 1000 in planner_core.
Kubernetes Pre-deployment Infrastructure `deploy/cloud/pre-deployment/pre-deployment-check.sh`, `deploy/cloud/pre-deployment/nixl/build_and_deploy.sh`, `deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml`, `deploy/cloud/pre-deployment/nixl/README.md`, `deploy/cloud/pre-deployment/README.md`	Added comprehensive pre-deployment validation script (kubectl, StorageClass, GPU nodes, GPU operator checks). New NIXL build and deploy helper script with dependency validation, architecture selection, and workflow orchestration. Updated NIXL deployment YAML with ETCD configuration and resource specifications.
Example Deployments & Configurations `benchmarks/incluster/benchmark_job.yaml`, `benchmarks/nixl/README.md`, `components/backends/sglang/slurm_jobs/scripts/gap/bench.sh`, `examples/basics/kubernetes//agg_router.yaml`, `examples/custom_backend/hello_world/deploy/hello_world.yaml`, `examples/deployments/ECS/task_definition_.json`, `examples/deployments/router_standalone/perf.sh`, `examples/multimodal/deploy/agg_*.yaml`	Updated container image tags to 0.6.0. Migrated benchmarking from genai-perf to aiperf in bench scripts. Removed NIXL benchmark README. Updated ECS task definitions and router deployments with new image versions and aiperf integration.
Documentation Updates `README.md`, `benchmarks/README.md`, `benchmarks/router/README.md`, `benchmarks/profiler/deploy/README.md`, `components/backends/trtllm/performance_sweeps/README.md`, `docs/backends/trtllm/README.md`, `docs/backends/trtllm/gpt-oss.md`, `docs/benchmarks/benchmarking.md`, `docs/backends/trtllm/gpt-oss.md`, `docs/backends/sglang/README.md`, `docs/kubernetes/README.md`, `docs/kubernetes/create_deployment.md`, `docs/kubernetes/sla_planner_quickstart.md`, `docs/guides/disagg_perf_tuning.md`, `examples/basics/kubernetes/Distributed_Inference/README.md`, `examples/deployments/router_standalone/README.md`	Global terminology updates from GenAI-Perf to AIPerf in all benchmarking documentation. Updated example image references from my-tag to 0.6.0. Added pre-deployment check section to Kubernetes docs. Updated timing units in examples from seconds to milliseconds.
Sphinx Documentation Configuration `docs/conf.py`, `docs/project.json`, `docs/versions1.json`, `docs/hidden_toctree.rst`, `docs/_includes/install.rst`	Added release variable and version switcher configuration with Adobe Launch/Satellite scripts. Created project.json and versions1.json for version management. Added sglang-disaggregation.md to hidden toctree. Pinned ai-dynamo sglang dependency to 0.6.0 in install docs.
Source Code Updates `benchmarks/profiler/profile_endpoint.py`, `components/src/dynamo/sglang/request_handlers/handler_base.py`	Updated import paths to full module paths for profile utilities. Changed logging level from info to debug for cancellation monitor task creation.

Sequence Diagram(s)

sequenceDiagram
    participant GHA as GitHub Action<br/>(docker-build)
    participant Workflow as Workflow<br/>(container-validation)
    participant BuildScript as build.sh
    participant Docker as Docker Build
    
    Workflow->>GHA: trigger with base_image_tag,<br/>runtime_image_tag,<br/>cuda_version,<br/>torch_backend
    GHA->>GHA: collect inputs into<br/>EXTRA_ARGS string
    GHA->>BuildScript: invoke with EXTRA_ARGS<br/>--base-image-tag VALUE<br/>--build-arg RUNTIME_IMAGE_TAG=VALUE<br/>etc.
    BuildScript->>Docker: pass EXTRA_ARGS to<br/>docker build command
    Docker->>Docker: apply overrides during<br/>image build process
    Docker-->>GHA: return image_tag
    GHA-->>Workflow: output image_tag to<br/>GITHUB_OUTPUT

sequenceDiagram
    participant Client
    participant Script as perf.sh
    participant AIPerf as AIPerf Tool
    participant Artifacts as Result Artifacts
    
    Note over Client,Artifacts: Old (GenAI-Perf Flow)
    Client->>Script: invoke perf.sh
    Script->>Script: build genai-perf command
    Script->>GenAI: run genai-perf profile
    GenAI-->>Artifacts: write profile_export_genai_perf.json
    
    Note over Client,Artifacts: New (AIPerf Flow)
    Client->>Script: invoke perf.sh
    Script->>Script: build aiperf command<br/>(removed --max-threads)
    Script->>AIPerf: run aiperf profile
    AIPerf-->>Artifacts: write profile_export_aiperf.json
    Script->>Artifacts: parse results (TTFT, ITL in ms)
    Script-->>Client: return aggregated metrics

sequenceDiagram
    participant Prometheus as Prometheus<br/>(seconds)
    participant PlannerCore as planner_core.py
    participant SLA as SLA Calculator
    
    Prometheus->>PlannerCore: observe_metrics()<br/>returns ttft, itl in seconds
    PlannerCore->>PlannerCore: convert to ms<br/>ttft_ms = ttft * 1000<br/>itl_ms = itl * 1000
    Note over PlannerCore: Log message updated<br/>to show .2f ms units
    PlannerCore->>SLA: pass ttft_ms, itl_ms<br/>(milliseconds)
    SLA->>SLA: compare against<br/>defaults (500ms, 50ms)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: This PR involves substantial heterogeneous changes across multiple domains:

Scope: 100+ files touched with varied types of edits (CI/CD, Docker configurations, benchmarking tool migration, documentation, source code, Kubernetes manifests)
Logic density: Several non-trivial logic changes (guarded kv_cache_config initialization in config.py, conditional vLLM installation logic, BRANCH_TYPE-driven Dockerfile logic, Kubernetes pre-deployment validation with multiple checks)
Heterogeneity: Mixed pattern changes (benchmarking migration has repetitive string replacements but also semantic function renames and new utilities; image tag updates are homogeneous; unit conversions are systematic; documentation is mostly repetitive)
Risky areas: Dockerfile.sglang-wideep significant refactoring; planner timing unit changes affecting SLA calculations; vLLM installation conditional logic; benchmarking aggregation logic
Testing surface: Changes span CI/CD workflows, deployment manifests, benchmarking scripts, and core planner logic—each domain requires distinct validation

Possibly related PRs

fix: Cherry-pick in last of aiperf replacements #3683: Adds the same four optional inputs (base_image_tag, runtime_image_tag, cuda_version, torch_backend) to .github/actions/docker-build/action.yml and propagates them in the container-validation-backends.yml workflow with arm64-specific overrides.
fix: cherry pick sglang bump + fix k8s yamls #3708: Modifies the Docker build GitHub Action with identical input additions and build argument propagation for configurable build parameters.
feat: update gpt-oss 120b model recipe #3143 #3454 #3431: Overlaps in benchmarking infrastructure updates (genai-perf → aiperf migration, runtime image tag updates to 0.6.0, deployment manifest modifications) and profiling utilities integration.

Poem

🐰 Registry doors swing wide open to NVIDIA's keep,
AIPerf charts the benchmarking deep,
Milliseconds measure the tokens that leap,
Pre-flight checks ensure clusters don't weep,
With git-lfs holding secrets so steep,
Version 0.6.0—a promise to keep! 🚀

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67d27bc and f4864e6.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (107)

.github/actions/docker-build/action.yml (2 hunks)
.github/workflows/container-validation-backends.yml (3 hunks)
Earthfile (4 hunks)
README.md (1 hunks)
benchmarks/README.md (1 hunks)
benchmarks/incluster/benchmark_job.yaml (1 hunks)
benchmarks/llm/perf.sh (3 hunks)
benchmarks/llm/plot_pareto.py (6 hunks)
benchmarks/nixl/README.md (0 hunks)
benchmarks/profiler/deploy/profile_sla_aic_job.yaml (1 hunks)
benchmarks/profiler/profile_endpoint.py (1 hunks)
benchmarks/profiler/profile_sla.py (4 hunks)
benchmarks/profiler/utils/aiperf.py (8 hunks)
benchmarks/profiler/utils/config.py (4 hunks)
benchmarks/profiler/utils/estimate_perf.py (1 hunks)
benchmarks/profiler/utils/profile_cache.py (5 hunks)
benchmarks/profiler/utils/profile_decode.py (2 hunks)
benchmarks/profiler/utils/profile_prefill.py (2 hunks)
benchmarks/pyproject.toml (1 hunks)
benchmarks/router/README.md (2 hunks)
benchmarks/router/prefix_ratio_benchmark.py (9 hunks)
benchmarks/router/real_data_benchmark.py (6 hunks)
benchmarks/sin_load_generator/README.md (1 hunks)
benchmarks/utils/aiperf.py (4 hunks)
benchmarks/utils/plot.py (1 hunks)
benchmarks/utils/workflow.py (1 hunks)
components/backends/sglang/deploy/README.md (2 hunks)
components/backends/sglang/deploy/agg.yaml (2 hunks)
components/backends/sglang/deploy/agg_logging.yaml (2 hunks)
components/backends/sglang/deploy/agg_router.yaml (2 hunks)
components/backends/sglang/deploy/disagg-multinode.yaml (5 hunks)
components/backends/sglang/deploy/disagg.yaml (5 hunks)
components/backends/sglang/deploy/disagg_planner.yaml (6 hunks)
components/backends/sglang/launch/disagg_dp_attn.sh (2 hunks)
components/backends/sglang/slurm_jobs/scripts/gap/bench.sh (4 hunks)
components/backends/trtllm/deploy/README.md (4 hunks)
components/backends/trtllm/deploy/agg-with-config.yaml (2 hunks)
components/backends/trtllm/deploy/agg.yaml (2 hunks)
components/backends/trtllm/deploy/agg_router.yaml (2 hunks)
components/backends/trtllm/deploy/disagg-multinode.yaml (3 hunks)
components/backends/trtllm/deploy/disagg.yaml (3 hunks)
components/backends/trtllm/deploy/disagg_planner.yaml (4 hunks)
components/backends/trtllm/deploy/disagg_router.yaml (3 hunks)
components/backends/trtllm/performance_sweeps/README.md (3 hunks)
components/backends/trtllm/performance_sweeps/benchmark_agg.slurm (1 hunks)
components/backends/trtllm/performance_sweeps/post_process.py (3 hunks)
components/backends/trtllm/performance_sweeps/scripts/bench.sh (2 hunks)
components/backends/vllm/deploy/README.md (2 hunks)
components/backends/vllm/deploy/agg.yaml (2 hunks)
components/backends/vllm/deploy/agg_kvbm.yaml (2 hunks)
components/backends/vllm/deploy/agg_router.yaml (2 hunks)
components/backends/vllm/deploy/disagg-multinode.yaml (3 hunks)
components/backends/vllm/deploy/disagg.yaml (3 hunks)
components/backends/vllm/deploy/disagg_kvbm.yaml (3 hunks)
components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (3 hunks)
components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (3 hunks)
components/backends/vllm/deploy/disagg_planner.yaml (4 hunks)
components/backends/vllm/deploy/disagg_router.yaml (3 hunks)
components/backends/vllm/launch/dsr1_dep.sh (1 hunks)
components/src/dynamo/planner/defaults.py (1 hunks)
components/src/dynamo/planner/utils/perf_interpolation.py (6 hunks)
components/src/dynamo/planner/utils/planner_argparse.py (1 hunks)
components/src/dynamo/planner/utils/planner_core.py (2 hunks)
components/src/dynamo/sglang/request_handlers/handler_base.py (1 hunks)
container/Dockerfile (3 hunks)
container/Dockerfile.sglang (4 hunks)
container/Dockerfile.sglang-wideep (1 hunks)
container/Dockerfile.trtllm (3 hunks)
container/Dockerfile.vllm (3 hunks)
container/deps/requirements.txt (2 hunks)
container/deps/vllm/install_vllm.sh (1 hunks)
deploy/cloud/operator/Dockerfile (1 hunks)
deploy/cloud/operator/Earthfile (1 hunks)
deploy/cloud/pre-deployment/README.md (1 hunks)
deploy/cloud/pre-deployment/nixl/README.md (1 hunks)
deploy/cloud/pre-deployment/nixl/build_and_deploy.sh (1 hunks)
deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml (1 hunks)
deploy/cloud/pre-deployment/pre-deployment-check.sh (1 hunks)
docs/_includes/install.rst (2 hunks)
docs/backends/sglang/README.md (7 hunks)
docs/backends/sglang/dsr1-wideep-gb200.md (2 hunks)
docs/backends/sglang/dsr1-wideep-h100.md (4 hunks)
docs/backends/sglang/multinode-examples.md (4 hunks)
docs/backends/sglang/sgl-hicache-example.md (1 hunks)
docs/backends/sglang/sglang-disaggregation.md (1 hunks)
docs/backends/trtllm/README.md (2 hunks)
docs/backends/trtllm/gpt-oss.md (5 hunks)
docs/benchmarks/benchmarking.md (9 hunks)
docs/benchmarks/pre_deployment_profiling.md (2 hunks)
docs/conf.py (2 hunks)
docs/guides/disagg_perf_tuning.md (1 hunks)
docs/hidden_toctree.rst (1 hunks)
docs/kubernetes/README.md (1 hunks)
docs/kubernetes/create_deployment.md (1 hunks)
docs/kubernetes/sla_planner_quickstart.md (2 hunks)
docs/project.json (1 hunks)
docs/versions1.json (1 hunks)
examples/basics/kubernetes/Distributed_Inference/README.md (1 hunks)
examples/basics/kubernetes/Distributed_Inference/agg_router.yaml (2 hunks)
examples/basics/kubernetes/shared_frontend/README.md (1 hunks)
examples/custom_backend/hello_world/deploy/hello_world.yaml (2 hunks)
examples/deployments/ECS/task_definition_frontend.json (1 hunks)
examples/deployments/ECS/task_definition_prefillworker.json (1 hunks)
examples/deployments/router_standalone/README.md (1 hunks)
examples/deployments/router_standalone/perf.sh (1 hunks)
examples/multimodal/deploy/agg_llava.yaml (4 hunks)
examples/multimodal/deploy/agg_qwen.yaml (4 hunks)

⛔ Files not processed due to max files limit (33)

lib/bindings/c/src/lib.rs
lib/llm/src/kv_router/subscriber.rs
lib/parsers/Cargo.toml
lib/parsers/tests/mod.rs
lib/runtime/src/transports/etcd.rs
lib/runtime/src/transports/etcd/lock.rs
pyproject.toml
recipes/README.md
recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml
recipes/gpt-oss-120b/trtllm/agg/deploy.yaml
recipes/gpt-oss-120b/trtllm/agg/perf.yaml
recipes/llama-3-70b/model-cache/model-download.yaml
recipes/llama-3-70b/vllm/agg/deploy.yaml
recipes/llama-3-70b/vllm/agg/perf.yaml
recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml
recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml
recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml
recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml
tests/fault_tolerance/deploy/client.py
tests/fault_tolerance/deploy/parse_results.py
tests/planner/README.md
tests/planner/perf_test_configs/agg_8b.yaml
tests/planner/perf_test_configs/disagg_8b_2p2d.yaml
tests/planner/perf_test_configs/disagg_8b_3p1d.yaml
tests/planner/perf_test_configs/disagg_8b_planner.yaml
tests/planner/perf_test_configs/disagg_8b_tp2.yaml
tests/planner/perf_test_configs/image_cache_daemonset.yaml
tests/planner/scaling/disagg_planner.yaml
tests/planner/scaling/run_scaling_test.sh
tests/planner/test_replica_calculation.py
tests/planner/utils/load_generator.py
tests/profiler/test_profile_sla_aiconfigurator.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

saturley-hall and others added 26 commits October 14, 2025 17:38

fix: circular rust dynamo-parsers, dynamo-llm dependency (#3607) (#3609)

211bbab

Signed-off-by: Graham King <grahamk@nvidia.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Graham King <grahamk@nvidia.com>

chore: update the relevant my-registry and my-tag (#3611)

0b6ef5f

Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

chore: typo and new commands (#3617) (#3625)

a25268d

Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>

feat: add pre-deployment check for storageclass (#3573) (#3608)

c4b41fd

Signed-off-by: Harrison Saturley-Hall <harrison.saturley.hall@gmail.com> Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

chore: update sglang container and version (#3647)

165276f

fix: cherrypick cuda 129 (#3652)

ec47178

Signed-off-by: alec-flowers <aflowers@nvidia.com>

fix: update model recipe for llama-3 70b to match with common recipe …

1ef8cc1

…template #3637 (#3656)

fix: copy commit info in trtllm build (#3619) (#3670)

bf73dde

Signed-off-by: Anant Sharma <anants@nvidia.com>

fix: update invalid AIPerf scripts and parsing logic (#3681)

048ebd8

Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>

fix: aiconfigurator breaking tests due to not being installed correct…

cbe523f

…ly (#3686) Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>

fix: Cherry-pick in last of aiperf replacements (#3683)

b28b8bb

Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>

fix: Reduce memory usage to avoid vLLM dsr1 OOM (#3660) (#3661)

c55f34a

fix: cherry pick sglang bump + fix k8s yamls (#3708)

b08e97b

fix: json strings should remain intact through profiler arg processin… (

249c21a

#3689) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

feat: (cherrypick) custom distributed rw lock for radix snapshotting …

8bc9f2f

…and downloading (#3692) Signed-off-by: PeaBrane <yanrpei@gmail.com>

chore: Fix cuda lock in trtllm dockerfile (#3684) (#3704)

c77b5dd

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

docs: add gpu details for model recipes #3594 (#3707)

b2053cc

docs: Adding elements required for version switcher (#3521) (#3711)

7a22663

Signed-off-by: Andrew Schilling <aschilling@nvidia.com> Co-authored-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>

ci: OPS-980: Add operator build and push per-commit (#3620) (#3712)

e8531f5

Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Dillon Cullinan <dcullinan92@gmail.com>

fix: (cherry-pick) update k8s aic profile job arguments (#3699) (#3706)

7ae690f

Signed-off-by: hongkuanz <hongkuanz@nvidia.com>

fix: cherry-pick to standardize planner units (#3713)

34c4231

Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>

fix: rename folder (#3718)

a52f59e

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>

fix: remove invalid aiperf args (#3710) (#3719)

4ebc72d

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

fix ModelCard code

f4864e6

Signed-off-by: Anna Tchernych <atchernych@nvidia.com>

atchernych requested a review from a team as a code owner October 20, 2025 21:23

atchernych requested a review from a team October 20, 2025 21:23

atchernych requested review from a team as code owners October 20, 2025 21:23

github-actions Bot added the fix label Oct 20, 2025

pull-request-size Bot added the size/XXL label Oct 20, 2025

atchernych closed this Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix how model card is found in router bindings#3753

fix: fix how model card is found in router bindings#3753
atchernych wants to merge 26 commits into
mainfrom
patch-fix

atchernych commented Oct 20, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Oct 20, 2025

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

Conversation

atchernych commented Oct 20, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 20, 2025

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

atchernych commented Oct 20, 2025 •

edited by coderabbitai Bot

Loading