Skip to content

fix: fix how model card is found in router bindings#3753

Closed
atchernych wants to merge 26 commits into
mainfrom
patch-fix
Closed

fix: fix how model card is found in router bindings#3753
atchernych wants to merge 26 commits into
mainfrom
patch-fix

Conversation

@atchernych
Copy link
Copy Markdown
Contributor

@atchernych atchernych commented Oct 20, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added pre-deployment validation scripts for Kubernetes cluster checks
    • Added AIPerf benchmarking tool integration for performance profiling
    • Enhanced SGLang disaggregation documentation and deployment configurations
  • Bug Fixes & Improvements

    • Standardized container image registry to nvcr.io/nvidia/ai-dynamo
    • Pinned container versions to 0.6.0 for consistency
    • Converted timing metrics (TTFT/ITL) from seconds to milliseconds
    • Added Git LFS support for Python dependencies
    • Updated AIConfigurator to use official repository
  • Documentation

    • Expanded deployment guides and pre-deployment setup documentation
    • Updated benchmarking references from GenAI-Perf to AIPerf
    • Added NIXL benchmark deployment guide
  • Chores

    • Updated all deployment manifests with standardized image references
    • Replaced GenAI-Perf references with AIPerf throughout codebase

saturley-hall and others added 26 commits October 14, 2025 17:38
Signed-off-by: Graham King <grahamk@nvidia.com>
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: Graham King <grahamk@nvidia.com>
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: lkomali <lkomali@nvidia.com>
Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Signed-off-by: Harrison Saturley-Hall <harrison.saturley.hall@gmail.com>
Co-authored-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…ly (#3686)

Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
…3682)

Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Signed-off-by: lkomali <lkomali@nvidia.com>
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
…and downloading (#3692)

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Signed-off-by: Andrew Schilling <aschilling@nvidia.com>
Co-authored-by: Andrew Schilling <85314306+aschilling-nv@users.noreply.github.com>
Signed-off-by: Dillon Cullinan <dcullinan@nvidia.com>
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Co-authored-by: Dillon Cullinan <dcullinan92@gmail.com>
Signed-off-by: hongkuanz <hongkuanz@nvidia.com>
Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
@atchernych atchernych requested a review from a team as a code owner October 20, 2025 21:23
@atchernych atchernych requested a review from a team October 20, 2025 21:23
@atchernych atchernych requested review from a team as code owners October 20, 2025 21:23
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 20, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request integrates multiple major updates: (1) enhances the Docker build GitHub Action with configurable build parameters (base image tag, runtime image tag, CUDA version, Torch backend), (2) migrates all benchmarking from genai-perf to aiperf, (3) transitions container image registries from generic (my-registry) to NVIDIA's official (nvcr.io/nvidia/ai-dynamo) with version 0.6.0, (4) updates TTFT/ITL units from seconds to milliseconds, and (5) adds Kubernetes pre-deployment validation infrastructure.

Changes

Cohort / File(s) Summary
GitHub Actions & Workflows
.github/actions/docker-build/action.yml, .github/workflows/container-validation-backends.yml
Added four optional input parameters (base_image_tag, runtime_image_tag, cuda_version, torch_backend) to the Docker build action for propagating build-time overrides. Updated container-validation workflow to include new operator job with arm64-specific conditional tag inputs and dependencies.
Build & Registry Configuration
Earthfile, deploy/cloud/operator/Earthfile, container/deps/requirements.txt
Updated DOCKER_SERVER registry from my-registry to nvcr.io/nvidia/ai-dynamo. Added UV_GIT_LFS=1 flag during Python package installation. Updated aiconfigurator dependency to git-sourced version and added aiperf as git dependency; updated pydantic and scipy constraints.
Benchmarking Tool Migration
benchmarks/llm/perf.sh, benchmarks/llm/plot_pareto.py, benchmarks/profiler/utils/aiperf.py, benchmarks/utils/aiperf.py, benchmarks/router/prefix_ratio_benchmark.py, benchmarks/router/real_data_benchmark.py, benchmarks/utils/workflow.py, benchmarks/utils/plot.py, benchmarks/profiler/profile_sla.py, benchmarks/profiler/utils/profile_cache.py, benchmarks/profiler/utils/profile_decode.py, benchmarks/profiler/utils/profile_prefill.py
Comprehensive migration from genai-perf to aiperf: renamed functions, updated CLI commands, replaced artifact paths from profile_export_genai_perf.json to profile_export_aiperf.json, removed --max-threads arguments, added aggregate_results function for router benchmarks. Updated CLI parameter types for TTFT/ITL from int to float.
Benchmarking Configuration & Utils
benchmarks/profiler/utils/config.py, benchmarks/profiler/utils/estimate_perf.py, benchmarks/profiler/deploy/profile_sla_aic_job.yaml
Guarded kv_cache_config initialization in convert_config for multiple backends; improved break_arguments to preserve JSON-like values; added backend argument to aiconfigurator.sdk.models.get_model call. Renamed --backend-version flag to --aic-backend-version. Updated DgdPlannerServiceConfig image from my-registry/dynamo-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/dynamo-runtime:0.6.0.
vLLM Runtime Deployments
components/backends/vllm/deploy/agg.yaml, components/backends/vllm/deploy/agg_kvbm.yaml, components/backends/vllm/deploy/agg_router.yaml, components/backends/vllm/deploy/disagg.yaml, components/backends/vllm/deploy/disagg-multinode.yaml, components/backends/vllm/deploy/disagg_kvbm.yaml, components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml, components/backends/vllm/deploy/disagg_kvbm_tp2.yaml, components/backends/vllm/deploy/disagg_planner.yaml, components/backends/vllm/deploy/disagg_router.yaml, components/backends/vllm/deploy/README.md, components/backends/vllm/launch/dsr1_dep.sh
Updated container image tags from nvcr.io/nvidia/ai-dynamo/vllm-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/vllm-runtime:0.6.0 across all deployments. Changed max-model-len from 10240 to 4096 and gpu-memory-utilization from 0.95 to 0.9 in dsr1_dep.sh.
SGLang Runtime Deployments
components/backends/sglang/deploy/agg.yaml, components/backends/sglang/deploy/agg_logging.yaml, components/backends/sglang/deploy/agg_router.yaml, components/backends/sglang/deploy/disagg.yaml, components/backends/sglang/deploy/disagg-multinode.yaml, components/backends/sglang/deploy/disagg_planner.yaml, components/backends/sglang/deploy/README.md
Updated container images from my-registry/sglang-runtime:my-tag to nvcr.io/nvidia/ai-dynamo/sglang-runtime:0.6.0. Added --host 0.0.0.0 and --disaggregation-bootstrap-port 12345 flags to disaggregation configurations.
SGLang Launch & Documentation
components/backends/sglang/launch/disagg_dp_attn.sh, docs/backends/sglang/README.md, docs/backends/sglang/dsr1-wideep-h100.md, docs/backends/sglang/dsr1-wideep-gb200.md, docs/backends/sglang/multinode-examples.md, docs/backends/sglang/sglang-disaggregation.md
Removed expert-distribution-recorder configuration; replaced with load-balance-method and prefill-round-robin-balance flags. Added comprehensive SGLang disaggregation documentation. Updated build and deployment guidance with multi-arch and BRANCH_TYPE support.
TensorRT-LLM Runtime Deployments
components/backends/trtllm/deploy/agg.yaml, components/backends/trtllm/deploy/agg_router.yaml, components/backends/trtllm/deploy/disagg.yaml, components/backends/trtllm/deploy/disagg-multinode.yaml, components/backends/trtllm/deploy/disagg_planner.yaml, components/backends/trtllm/deploy/disagg_router.yaml, components/backends/trtllm/deploy/agg-with-config.yaml, components/backends/trtllm/deploy/README.md, components/backends/trtllm/performance_sweeps/...
Updated container images to nvcr.io/nvidia/ai-dynamo/trtllm-runtime:0.6.0. Migrated benchmarking artifacts from genai_perf_artifacts to aiperf_artifacts and updated related file patterns.
Container Dockerfiles
container/Dockerfile, container/Dockerfile.sglang, container/Dockerfile.sglang-wideep, container/Dockerfile.trtllm, container/Dockerfile.vllm, container/deps/vllm/install_vllm.sh, deploy/cloud/operator/Dockerfile
Added git-lfs package, UV_GIT_LFS=1 environment variable during pip installs, and --no-cache flags. Refactored Dockerfile.sglang-wideep with BRANCH_TYPE conditional logic for local/remote Dynamo installation. Updated vLLM PyPI installation condition to support arm64 with cu129 backend. Removed TARGETARCH default in operator Dockerfile.
Timing Units (TTFT/ITL)
benchmarks/profiler/profile_sla.py, components/src/dynamo/planner/defaults.py, components/src/dynamo/planner/utils/perf_interpolation.py, components/src/dynamo/planner/utils/planner_argparse.py, components/src/dynamo/planner/utils/planner_core.py
Changed TTFT and ITL units from seconds to milliseconds: updated defaults from 0.5/0.05 to 500.0/50.0, added millisecond unit specifications to help text, converted Prometheus metrics by multiplying by 1000 in planner_core.
Kubernetes Pre-deployment Infrastructure
deploy/cloud/pre-deployment/pre-deployment-check.sh, deploy/cloud/pre-deployment/nixl/build_and_deploy.sh, deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml, deploy/cloud/pre-deployment/nixl/README.md, deploy/cloud/pre-deployment/README.md
Added comprehensive pre-deployment validation script (kubectl, StorageClass, GPU nodes, GPU operator checks). New NIXL build and deploy helper script with dependency validation, architecture selection, and workflow orchestration. Updated NIXL deployment YAML with ETCD configuration and resource specifications.
Example Deployments & Configurations
benchmarks/incluster/benchmark_job.yaml, benchmarks/nixl/README.md, components/backends/sglang/slurm_jobs/scripts/gap/bench.sh, examples/basics/kubernetes/*/agg_router.yaml, examples/custom_backend/hello_world/deploy/hello_world.yaml, examples/deployments/ECS/task_definition_*.json, examples/deployments/router_standalone/perf.sh, examples/multimodal/deploy/agg_*.yaml
Updated container image tags to 0.6.0. Migrated benchmarking from genai-perf to aiperf in bench scripts. Removed NIXL benchmark README. Updated ECS task definitions and router deployments with new image versions and aiperf integration.
Documentation Updates
README.md, benchmarks/README.md, benchmarks/router/README.md, benchmarks/profiler/deploy/README.md, components/backends/trtllm/performance_sweeps/README.md, docs/backends/trtllm/README.md, docs/backends/trtllm/gpt-oss.md, docs/benchmarks/benchmarking.md, docs/backends/trtllm/gpt-oss.md, docs/backends/sglang/README.md, docs/kubernetes/README.md, docs/kubernetes/create_deployment.md, docs/kubernetes/sla_planner_quickstart.md, docs/guides/disagg_perf_tuning.md, examples/basics/kubernetes/Distributed_Inference/README.md, examples/deployments/router_standalone/README.md
Global terminology updates from GenAI-Perf to AIPerf in all benchmarking documentation. Updated example image references from my-tag to 0.6.0. Added pre-deployment check section to Kubernetes docs. Updated timing units in examples from seconds to milliseconds.
Sphinx Documentation Configuration
docs/conf.py, docs/project.json, docs/versions1.json, docs/hidden_toctree.rst, docs/_includes/install.rst
Added release variable and version switcher configuration with Adobe Launch/Satellite scripts. Created project.json and versions1.json for version management. Added sglang-disaggregation.md to hidden toctree. Pinned ai-dynamo sglang dependency to 0.6.0 in install docs.
Source Code Updates
benchmarks/profiler/profile_endpoint.py, components/src/dynamo/sglang/request_handlers/handler_base.py
Updated import paths to full module paths for profile utilities. Changed logging level from info to debug for cancellation monitor task creation.

Sequence Diagram(s)

sequenceDiagram
    participant GHA as GitHub Action<br/>(docker-build)
    participant Workflow as Workflow<br/>(container-validation)
    participant BuildScript as build.sh
    participant Docker as Docker Build
    
    Workflow->>GHA: trigger with base_image_tag,<br/>runtime_image_tag,<br/>cuda_version,<br/>torch_backend
    GHA->>GHA: collect inputs into<br/>EXTRA_ARGS string
    GHA->>BuildScript: invoke with EXTRA_ARGS<br/>--base-image-tag VALUE<br/>--build-arg RUNTIME_IMAGE_TAG=VALUE<br/>etc.
    BuildScript->>Docker: pass EXTRA_ARGS to<br/>docker build command
    Docker->>Docker: apply overrides during<br/>image build process
    Docker-->>GHA: return image_tag
    GHA-->>Workflow: output image_tag to<br/>GITHUB_OUTPUT
Loading
sequenceDiagram
    participant Client
    participant Script as perf.sh
    participant AIPerf as AIPerf Tool
    participant Artifacts as Result Artifacts
    
    Note over Client,Artifacts: Old (GenAI-Perf Flow)
    Client->>Script: invoke perf.sh
    Script->>Script: build genai-perf command
    Script->>GenAI: run genai-perf profile
    GenAI-->>Artifacts: write profile_export_genai_perf.json
    
    Note over Client,Artifacts: New (AIPerf Flow)
    Client->>Script: invoke perf.sh
    Script->>Script: build aiperf command<br/>(removed --max-threads)
    Script->>AIPerf: run aiperf profile
    AIPerf-->>Artifacts: write profile_export_aiperf.json
    Script->>Artifacts: parse results (TTFT, ITL in ms)
    Script-->>Client: return aggregated metrics
Loading
sequenceDiagram
    participant Prometheus as Prometheus<br/>(seconds)
    participant PlannerCore as planner_core.py
    participant SLA as SLA Calculator
    
    Prometheus->>PlannerCore: observe_metrics()<br/>returns ttft, itl in seconds
    PlannerCore->>PlannerCore: convert to ms<br/>ttft_ms = ttft * 1000<br/>itl_ms = itl * 1000
    Note over PlannerCore: Log message updated<br/>to show .2f ms units
    PlannerCore->>SLA: pass ttft_ms, itl_ms<br/>(milliseconds)
    SLA->>SLA: compare against<br/>defaults (500ms, 50ms)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: This PR involves substantial heterogeneous changes across multiple domains:

  • Scope: 100+ files touched with varied types of edits (CI/CD, Docker configurations, benchmarking tool migration, documentation, source code, Kubernetes manifests)
  • Logic density: Several non-trivial logic changes (guarded kv_cache_config initialization in config.py, conditional vLLM installation logic, BRANCH_TYPE-driven Dockerfile logic, Kubernetes pre-deployment validation with multiple checks)
  • Heterogeneity: Mixed pattern changes (benchmarking migration has repetitive string replacements but also semantic function renames and new utilities; image tag updates are homogeneous; unit conversions are systematic; documentation is mostly repetitive)
  • Risky areas: Dockerfile.sglang-wideep significant refactoring; planner timing unit changes affecting SLA calculations; vLLM installation conditional logic; benchmarking aggregation logic
  • Testing surface: Changes span CI/CD workflows, deployment manifests, benchmarking scripts, and core planner logic—each domain requires distinct validation

Possibly related PRs

Poem

🐰 Registry doors swing wide open to NVIDIA's keep,
AIPerf charts the benchmarking deep,
Milliseconds measure the tokens that leap,
Pre-flight checks ensure clusters don't weep,
With git-lfs holding secrets so steep,
Version 0.6.0—a promise to keep! 🚀


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67d27bc and f4864e6.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (107)
  • .github/actions/docker-build/action.yml (2 hunks)
  • .github/workflows/container-validation-backends.yml (3 hunks)
  • Earthfile (4 hunks)
  • README.md (1 hunks)
  • benchmarks/README.md (1 hunks)
  • benchmarks/incluster/benchmark_job.yaml (1 hunks)
  • benchmarks/llm/perf.sh (3 hunks)
  • benchmarks/llm/plot_pareto.py (6 hunks)
  • benchmarks/nixl/README.md (0 hunks)
  • benchmarks/profiler/deploy/profile_sla_aic_job.yaml (1 hunks)
  • benchmarks/profiler/profile_endpoint.py (1 hunks)
  • benchmarks/profiler/profile_sla.py (4 hunks)
  • benchmarks/profiler/utils/aiperf.py (8 hunks)
  • benchmarks/profiler/utils/config.py (4 hunks)
  • benchmarks/profiler/utils/estimate_perf.py (1 hunks)
  • benchmarks/profiler/utils/profile_cache.py (5 hunks)
  • benchmarks/profiler/utils/profile_decode.py (2 hunks)
  • benchmarks/profiler/utils/profile_prefill.py (2 hunks)
  • benchmarks/pyproject.toml (1 hunks)
  • benchmarks/router/README.md (2 hunks)
  • benchmarks/router/prefix_ratio_benchmark.py (9 hunks)
  • benchmarks/router/real_data_benchmark.py (6 hunks)
  • benchmarks/sin_load_generator/README.md (1 hunks)
  • benchmarks/utils/aiperf.py (4 hunks)
  • benchmarks/utils/plot.py (1 hunks)
  • benchmarks/utils/workflow.py (1 hunks)
  • components/backends/sglang/deploy/README.md (2 hunks)
  • components/backends/sglang/deploy/agg.yaml (2 hunks)
  • components/backends/sglang/deploy/agg_logging.yaml (2 hunks)
  • components/backends/sglang/deploy/agg_router.yaml (2 hunks)
  • components/backends/sglang/deploy/disagg-multinode.yaml (5 hunks)
  • components/backends/sglang/deploy/disagg.yaml (5 hunks)
  • components/backends/sglang/deploy/disagg_planner.yaml (6 hunks)
  • components/backends/sglang/launch/disagg_dp_attn.sh (2 hunks)
  • components/backends/sglang/slurm_jobs/scripts/gap/bench.sh (4 hunks)
  • components/backends/trtllm/deploy/README.md (4 hunks)
  • components/backends/trtllm/deploy/agg-with-config.yaml (2 hunks)
  • components/backends/trtllm/deploy/agg.yaml (2 hunks)
  • components/backends/trtllm/deploy/agg_router.yaml (2 hunks)
  • components/backends/trtllm/deploy/disagg-multinode.yaml (3 hunks)
  • components/backends/trtllm/deploy/disagg.yaml (3 hunks)
  • components/backends/trtllm/deploy/disagg_planner.yaml (4 hunks)
  • components/backends/trtllm/deploy/disagg_router.yaml (3 hunks)
  • components/backends/trtllm/performance_sweeps/README.md (3 hunks)
  • components/backends/trtllm/performance_sweeps/benchmark_agg.slurm (1 hunks)
  • components/backends/trtllm/performance_sweeps/post_process.py (3 hunks)
  • components/backends/trtllm/performance_sweeps/scripts/bench.sh (2 hunks)
  • components/backends/vllm/deploy/README.md (2 hunks)
  • components/backends/vllm/deploy/agg.yaml (2 hunks)
  • components/backends/vllm/deploy/agg_kvbm.yaml (2 hunks)
  • components/backends/vllm/deploy/agg_router.yaml (2 hunks)
  • components/backends/vllm/deploy/disagg-multinode.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_kvbm.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_kvbm_2p2d.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_kvbm_tp2.yaml (3 hunks)
  • components/backends/vllm/deploy/disagg_planner.yaml (4 hunks)
  • components/backends/vllm/deploy/disagg_router.yaml (3 hunks)
  • components/backends/vllm/launch/dsr1_dep.sh (1 hunks)
  • components/src/dynamo/planner/defaults.py (1 hunks)
  • components/src/dynamo/planner/utils/perf_interpolation.py (6 hunks)
  • components/src/dynamo/planner/utils/planner_argparse.py (1 hunks)
  • components/src/dynamo/planner/utils/planner_core.py (2 hunks)
  • components/src/dynamo/sglang/request_handlers/handler_base.py (1 hunks)
  • container/Dockerfile (3 hunks)
  • container/Dockerfile.sglang (4 hunks)
  • container/Dockerfile.sglang-wideep (1 hunks)
  • container/Dockerfile.trtllm (3 hunks)
  • container/Dockerfile.vllm (3 hunks)
  • container/deps/requirements.txt (2 hunks)
  • container/deps/vllm/install_vllm.sh (1 hunks)
  • deploy/cloud/operator/Dockerfile (1 hunks)
  • deploy/cloud/operator/Earthfile (1 hunks)
  • deploy/cloud/pre-deployment/README.md (1 hunks)
  • deploy/cloud/pre-deployment/nixl/README.md (1 hunks)
  • deploy/cloud/pre-deployment/nixl/build_and_deploy.sh (1 hunks)
  • deploy/cloud/pre-deployment/nixl/nixlbench-deployment.yaml (1 hunks)
  • deploy/cloud/pre-deployment/pre-deployment-check.sh (1 hunks)
  • docs/_includes/install.rst (2 hunks)
  • docs/backends/sglang/README.md (7 hunks)
  • docs/backends/sglang/dsr1-wideep-gb200.md (2 hunks)
  • docs/backends/sglang/dsr1-wideep-h100.md (4 hunks)
  • docs/backends/sglang/multinode-examples.md (4 hunks)
  • docs/backends/sglang/sgl-hicache-example.md (1 hunks)
  • docs/backends/sglang/sglang-disaggregation.md (1 hunks)
  • docs/backends/trtllm/README.md (2 hunks)
  • docs/backends/trtllm/gpt-oss.md (5 hunks)
  • docs/benchmarks/benchmarking.md (9 hunks)
  • docs/benchmarks/pre_deployment_profiling.md (2 hunks)
  • docs/conf.py (2 hunks)
  • docs/guides/disagg_perf_tuning.md (1 hunks)
  • docs/hidden_toctree.rst (1 hunks)
  • docs/kubernetes/README.md (1 hunks)
  • docs/kubernetes/create_deployment.md (1 hunks)
  • docs/kubernetes/sla_planner_quickstart.md (2 hunks)
  • docs/project.json (1 hunks)
  • docs/versions1.json (1 hunks)
  • examples/basics/kubernetes/Distributed_Inference/README.md (1 hunks)
  • examples/basics/kubernetes/Distributed_Inference/agg_router.yaml (2 hunks)
  • examples/basics/kubernetes/shared_frontend/README.md (1 hunks)
  • examples/custom_backend/hello_world/deploy/hello_world.yaml (2 hunks)
  • examples/deployments/ECS/task_definition_frontend.json (1 hunks)
  • examples/deployments/ECS/task_definition_prefillworker.json (1 hunks)
  • examples/deployments/router_standalone/README.md (1 hunks)
  • examples/deployments/router_standalone/perf.sh (1 hunks)
  • examples/multimodal/deploy/agg_llava.yaml (4 hunks)
  • examples/multimodal/deploy/agg_qwen.yaml (4 hunks)
⛔ Files not processed due to max files limit (33)
  • lib/bindings/c/src/lib.rs
  • lib/llm/src/kv_router/subscriber.rs
  • lib/parsers/Cargo.toml
  • lib/parsers/tests/mod.rs
  • lib/runtime/src/transports/etcd.rs
  • lib/runtime/src/transports/etcd/lock.rs
  • pyproject.toml
  • recipes/README.md
  • recipes/deepseek-r1/sglang-wideep/tep16p-dep16d-disagg.yaml
  • recipes/deepseek-r1/sglang-wideep/tep8p-dep8d-disagg.yaml
  • recipes/gpt-oss-120b/trtllm/agg/deploy.yaml
  • recipes/gpt-oss-120b/trtllm/agg/perf.yaml
  • recipes/llama-3-70b/model-cache/model-download.yaml
  • recipes/llama-3-70b/vllm/agg/deploy.yaml
  • recipes/llama-3-70b/vllm/agg/perf.yaml
  • recipes/llama-3-70b/vllm/disagg-multi-node/deploy.yaml
  • recipes/llama-3-70b/vllm/disagg-multi-node/perf.yaml
  • recipes/llama-3-70b/vllm/disagg-single-node/deploy.yaml
  • recipes/llama-3-70b/vllm/disagg-single-node/perf.yaml
  • tests/fault_tolerance/deploy/client.py
  • tests/fault_tolerance/deploy/parse_results.py
  • tests/planner/README.md
  • tests/planner/perf_test_configs/agg_8b.yaml
  • tests/planner/perf_test_configs/disagg_8b_2p2d.yaml
  • tests/planner/perf_test_configs/disagg_8b_3p1d.yaml
  • tests/planner/perf_test_configs/disagg_8b_planner.yaml
  • tests/planner/perf_test_configs/disagg_8b_tp2.yaml
  • tests/planner/perf_test_configs/image_cache_daemonset.yaml
  • tests/planner/scaling/disagg_planner.yaml
  • tests/planner/scaling/run_scaling_test.sh
  • tests/planner/test_replica_calculation.py
  • tests/planner/utils/load_generator.py
  • tests/profiler/test_profile_sla_aiconfigurator.py

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.