fix: fix how model card is found in router bindings#3753
Conversation
Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: lkomali <lkomali@nvidia.com> Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
Signed-off-by: alec-flowers <aflowers@nvidia.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…ly (#3686) Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com>
Signed-off-by: lkomali <lkomali@nvidia.com> Signed-off-by: Harrison Saturley-Hall <hsaturleyhal@nvidia.com> Co-authored-by: Harshini Komali <157742537+lkomali@users.noreply.github.com>
#3689) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
…and downloading (#3692) Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: William Arnold <7565007+Aphoh@users.noreply.github.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
Signed-off-by: Anna Tchernych <atchernych@nvidia.com>
|
Caution Review failedThe pull request is closed. WalkthroughThis pull request integrates multiple major updates: (1) enhances the Docker build GitHub Action with configurable build parameters (base image tag, runtime image tag, CUDA version, Torch backend), (2) migrates all benchmarking from genai-perf to aiperf, (3) transitions container image registries from generic (my-registry) to NVIDIA's official (nvcr.io/nvidia/ai-dynamo) with version 0.6.0, (4) updates TTFT/ITL units from seconds to milliseconds, and (5) adds Kubernetes pre-deployment validation infrastructure. Changes
Sequence Diagram(s)sequenceDiagram
participant GHA as GitHub Action<br/>(docker-build)
participant Workflow as Workflow<br/>(container-validation)
participant BuildScript as build.sh
participant Docker as Docker Build
Workflow->>GHA: trigger with base_image_tag,<br/>runtime_image_tag,<br/>cuda_version,<br/>torch_backend
GHA->>GHA: collect inputs into<br/>EXTRA_ARGS string
GHA->>BuildScript: invoke with EXTRA_ARGS<br/>--base-image-tag VALUE<br/>--build-arg RUNTIME_IMAGE_TAG=VALUE<br/>etc.
BuildScript->>Docker: pass EXTRA_ARGS to<br/>docker build command
Docker->>Docker: apply overrides during<br/>image build process
Docker-->>GHA: return image_tag
GHA-->>Workflow: output image_tag to<br/>GITHUB_OUTPUT
sequenceDiagram
participant Client
participant Script as perf.sh
participant AIPerf as AIPerf Tool
participant Artifacts as Result Artifacts
Note over Client,Artifacts: Old (GenAI-Perf Flow)
Client->>Script: invoke perf.sh
Script->>Script: build genai-perf command
Script->>GenAI: run genai-perf profile
GenAI-->>Artifacts: write profile_export_genai_perf.json
Note over Client,Artifacts: New (AIPerf Flow)
Client->>Script: invoke perf.sh
Script->>Script: build aiperf command<br/>(removed --max-threads)
Script->>AIPerf: run aiperf profile
AIPerf-->>Artifacts: write profile_export_aiperf.json
Script->>Artifacts: parse results (TTFT, ITL in ms)
Script-->>Client: return aggregated metrics
sequenceDiagram
participant Prometheus as Prometheus<br/>(seconds)
participant PlannerCore as planner_core.py
participant SLA as SLA Calculator
Prometheus->>PlannerCore: observe_metrics()<br/>returns ttft, itl in seconds
PlannerCore->>PlannerCore: convert to ms<br/>ttft_ms = ttft * 1000<br/>itl_ms = itl * 1000
Note over PlannerCore: Log message updated<br/>to show .2f ms units
PlannerCore->>SLA: pass ttft_ms, itl_ms<br/>(milliseconds)
SLA->>SLA: compare against<br/>defaults (500ms, 50ms)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Rationale: This PR involves substantial heterogeneous changes across multiple domains:
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (107)
⛔ Files not processed due to max files limit (33)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Bug Fixes & Improvements
Documentation
Chores