feat: Add a "model" label to Component metrics #2383

tzulingk · 2025-08-08T23:44:28Z

Overview:

Add a "model" label to Component metrics.

Details:

This pull request introduces model-specific metrics by adding a model field to the Component struct and updating the metrics labeling logic.

Changes

Component Struct: The Component struct has been updated to include an optional model field.
create_metrics() Function: This function now checks if the metrics registry contains a model name. If present, the model name is added as a label to the generated metrics.
Related Code: All affected code paths have been modified to handle the new model field.

Where should the reviewer start?

lib/runtime/src/component.rs: model is added to Component/Endpoint.
lib/runtime/src/metrics.rs: how model label is added to the metrics.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-360 Add a "model" label to Component metrics

Summary by CodeRabbit

Documentation
- Major updates and additions across backend and deployment documentation for SGLang, vLLM, and TensorRT-LLM, including comprehensive READMEs, Kubernetes and SLURM deployment guides, and multi-node setup instructions.
- Improved and reorganized quick start, troubleshooting, and example guides for easier onboarding and deployment.
- Enhanced feature support matrices and clarified request migration, disaggregation strategies, and configuration options.
- Numerous link corrections, content consolidations, and removal of redundant or outdated documentation sections.
- Added detailed local deployment quick start and expanded engine usage documentation.
- Introduced manual Helm deployment guide and updated operator deployment references.
New Features
- Added detailed documentation and deployment guides for new backend integrations and advanced distributed deployment patterns.
- Introduced model-aware component resolution across runtime and metrics systems, improving model-specific tracking and observability.
Bug Fixes
- Corrected documentation links and improved clarity in several guides and READMEs.
Chores
- Removed outdated or duplicate documentation files and sections for improved maintainability.
- Refactored code to support optional model name parameters in component creation and metric labeling without changing external APIs.

Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: Ishan Dhanani <[email protected]>

Co-authored-by: Dmitry Tokarev <[email protected]>

…0rc4 (#2233)

Signed-off-by: Anish <[email protected]> Co-authored-by: Anish <[email protected]>

…2260) Signed-off-by: Biswa Panda <[email protected]>

…#2322)

…#2319) Signed-off-by: Anish <[email protected]> Co-authored-by: Kristen Kelleher <[email protected]> Co-authored-by: Biswa Panda <[email protected]> Co-authored-by: Neal Vaidya <[email protected]>

Signed-off-by: Anish <[email protected]>

copy-pr-bot · 2025-08-08T23:44:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

rmccorm4 · 2025-08-08T23:50:49Z

The base isn't right here, there are many extra commits and a huge diff included. Please clean up the branch or start a new one only with the net new changes.

coderabbitai · 2025-08-08T23:53:17Z

Caution

Review failed

Failed to post review comments.

Walkthrough

This update introduces major enhancements and restructuring to documentation across the project, especially for backend integrations (vLLM, SGLang, TensorRT-LLM) and their deployment guides. It adds detailed feature support matrices, clarifies installation steps (notably for SGLang), consolidates and corrects documentation links, and provides new or updated guides for Kubernetes, SLURM, and multi-node deployments. Minor code changes propagate model-awareness in component and metrics handling.

Changes

Cohort / File(s)	Change Summary
Top-Level and Example Documentation Updates `README.md`, `examples/README.md`, `examples/runtime/hello_world/README.md`, `docs/examples/README.md`, `docs/examples/runtime/hello_world/README.md`	Moved and enhanced the "Framework Support Matrix", clarified framework support, improved installation instructions, removed duplication, and fixed/corrected doc links. Added new "Hello World" example documentation.
Backend Documentation Overhaul: vLLM, SGLang, TRTLLM `components/backends/vllm/README.md`, `components/backends/sglang/README.md`, `components/backends/trtllm/README.md`, `docs/components/backends/vllm/README.md`, `docs/components/backends/sglang/README.md`, `docs/components/backends/trtllm/README.md`	Rewrote and expanded backend READMEs with detailed feature matrices, deployment instructions, and request migration guidance. Added or updated advanced deployment and usage documentation for each backend.
Kubernetes and Deployment Guides `docs/components/backends/vllm/deploy/README.md`, `docs/components/backends/sglang/deploy/README.md`, `docs/components/backends/trtllm/deploy/README.md`, `deploy/cloud/README.md`, `deploy/inference-gateway/README.md`, `docs/guides/dynamo_deploy/README.md`, `docs/guides/dynamo_deploy/helm_install.md`, `docs/guides/dynamo_deploy/dynamo_operator.md`, `docs/guides/dynamo_deploy/quickstart.md`, `docs/guides/dynamo_deploy/gke_setup.md`	Added, corrected, or expanded deployment guides for Kubernetes, Helm, and operator-based deployments. Clarified prerequisites, secret handling, custom image usage, and troubleshooting. Added new Helm install guide and improved cloud deployment references.
SLURM and Multi-Node Deployment Documentation `docs/components/backends/sglang/slurm_jobs/README.md`, `docs/components/backends/sglang/docs/dsr1-wideep-h100.md`, `docs/components/backends/sglang/docs/multinode-examples.md`, `docs/components/backends/vllm/multi-node.md`, `components/backends/sglang/slurm_jobs/README.md`, `components/backends/sglang/slurm_jobs/scripts/worker_setup.py`	Added or replaced detailed SLURM and multi-node deployment guides and scripts, especially for SGLang and vLLM. Updated worker setup logic for SLURM jobs and corrected references to advanced multinode examples.
Metrics and Model-Aware Component Propagation `lib/runtime/src/component.rs`, `lib/runtime/src/metrics.rs`, `lib/bindings/python/rust/lib.rs`, `lib/llm/src/discovery/watcher.rs`, `lib/llm/src/entrypoint/input/endpoint.rs`, `lib/llm/src/mocker/engine.rs`, `components/metrics/src/lib.rs`, `components/metrics/src/main.rs`, `components/metrics/src/bin/mock_worker.rs`, `components/router/src/main.rs`, `launch/dynamo-run/src/lib.rs`, `lib/bindings/python/rust/llm/entrypoint.rs`, `components/backends/vllm/src/dynamo/vllm/main.py`, `lib/bindings/c/src/lib.rs`	Added optional `model` parameter to components, updated metrics registry and CLI config to propagate model-awareness, and updated all relevant code paths and bindings to support model-aware metrics labeling and component resolution.
Container and Installation Script Updates `container/Dockerfile.sglang`, `container/Dockerfile.sglang-wideep`, `container/Dockerfile.tensorrt_llm`, `container/deps/vllm/install_vllm.sh`	Improved Dockerfiles and install scripts: clarified and fixed Python package installation (notably for SGLang and TRTLLM), added explicit pre-release FlashInfer install, pinned PyTorch version for vLLM, and improved CUDA copying.
Reference and Navigation Fixes `components/README.md`, `components/backends/vllm/deploy/README.md`, `components/backends/trtllm/deploy/README.md`, `docs/components/backends/llm/README.md`, `docs/components/backends/sglang/docs/sgl-http-server.md`, `docs/components/backends/trtllm/kv-cache-tranfer.md`, `docs/components/backends/trtllm/llama4_plus_eagle.md`, `benchmarks/llm/README.md`, `examples/basics/quickstart/README.md`, `examples/basics/disaggregated_serving/README.md`, `examples/basics/multinode/README.md`, `examples/deployments/EKS/Deploy_VLLM_example.md`, `components/backends/trtllm/README.md~HEAD`	Fixed, updated, or removed redundant or broken references and navigation links across various documentation and example files.
Docs Index and Toctree Restructuring `docs/index.rst`, `docs/hidden_toctree.rst`	Reorganized documentation index and hidden toctree, added new guides and references, improved quick start and example navigation, and included more files in the Sphinx build without exposing them in the main navigation.
API and Architecture Docs `docs/API/nixl_connect/README.md`, `docs/API/nixl_connect/connector.md`, `docs/architecture/dynamo_flow.md`, `docs/architecture/kv_cache_routing.md`, `docs/architecture/planner_intro.rst`, `docs/runtime/README.md`	Updated or removed examples, clarified technical details, corrected links, and improved descriptions in API and architecture documentation.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Docs
    participant Backend (vLLM/SGLang/TRTLLM)
    participant Metrics
    participant DeployTool (K8s/SLURM/Helm)

    User->>Docs: Reads feature matrix, install, and deployment guides
    User->>DeployTool: Follows deployment instructions (K8s/SLURM/Helm)
    DeployTool->>Backend: Launches backend with specified model/config
    Backend->>Metrics: Reports metrics with model-aware labels
    User->>Backend: Sends inference requests
    Backend->>User: Returns results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

chore: fix QA bugs in documentation/readmes #2199: Also updates the README.md to move and enhance the "Framework Support Matrix" and SGLang installation steps—overlapping documentation changes.
feat: add sgl deploy readme #2238: Adds and updates SGLang deploy README with Kubernetes deployment documentation, closely related to new and updated deployment docs in this PR.
fix: doc links #2309: Fixes broken documentation links for backend deployment examples, directly related to the documentation link corrections and restructuring in this PR.

Poem

In burrows deep, I hop and write,
New docs and guides to shed some light.
With matrices clear and links anew,
Deploying backends is easy to do!
From vLLM to SGLang’s might,
Kubernetes or SLURM—your path is right.
🐇✨ Happy reading, day or night!

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

…rvice_name has been removed in PR#2349

ishandhanani and others added 22 commits August 8, 2025 16:17

chore: fix install (#2191)

a8dd326

Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: Ishan Dhanani <[email protected]>

chore: fix QA bugs in documentation/readmes (#2199)

79e6711

fix(sglang): disagg yaml worker change and agg kv router fix (#2205)

9320d68

Co-authored-by: Dmitry Tokarev <[email protected]>

chore: cleanup dead links (#2208)

1c9c7d3

Co-authored-by: Dmitry Tokarev <[email protected]>

chore: Remove multimodal readme. (#2212) (#2234)

a6d48bd

fix: drop cuda graph bs (batch size) on dsr1 h100 sgl (#2235)

44cbf88

fix: Locked triton==3.3.1 since triton 3.4.0 breaks tensorrt-llm 1.0.…

a57bade

…0rc4 (#2233)

fix: sgl instructions point to new frontend (#2245)

95c8b58

fix: readme instruction (#2265)

bfe2808

docs: Backport: Dyn 591 (#2247) to 0.4.0 (#2251)

e2552ed

Signed-off-by: Anish <[email protected]> Co-authored-by: Anish <[email protected]>

fix: trtllm container - ENV var used before declaration (#2277)

9af0a01

docs: add instruction to deploy model with inference gateway #2257 (#…

d60af96

…2260) Signed-off-by: Biswa Panda <[email protected]>

fix: fix broken doc links (#2308)

c948f1d

fix: Copy cuda libraries from devel to runtime stage (#2298)

add5fa8

docs: update deploy readme (#2306)

f8b95fd

fix: Add common and test dependencies to sglang runtime build (#2279) (…

3f7c7a7

…#2322)

fix: Backport/anish index rst into 0.4.0 - fix links in docs and more (…

741496e

…#2319) Signed-off-by: Anish <[email protected]> Co-authored-by: Kristen Kelleher <[email protected]> Co-authored-by: Biswa Panda <[email protected]> Co-authored-by: Neal Vaidya <[email protected]>

docs: Final fixes to links reported by QA (#2334)

b4be3c2

Signed-off-by: Anish <[email protected]>

docs: address sphinx build errors for docs.nvidia.com (#2346)

2ed36b8

Signed-off-by: Anish <[email protected]>

docs: Address vincent issue with trtllm symlink (#2351)

2846f9e

Pinned PyTorch version

b4a3cb3

Add model label to Component

59a2005

tzulingk requested review from a team, hutm, ishandhanani, nealvaidya, nnshah1 and whoisj as code owners August 8, 2025 23:44

github-actions bot added the feat label Aug 8, 2025

pull-request-size bot added the size/XXL label Aug 8, 2025

Use ModelDeploymentCard.slug() for model name. ModelDeploymentCard.se…

6c95b2b

…rvice_name has been removed in PR#2349

tzulingk closed this Aug 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add a "model" label to Component metrics #2383

feat: Add a "model" label to Component metrics #2383

Uh oh!

tzulingk commented Aug 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 8, 2025

Uh oh!

rmccorm4 commented Aug 8, 2025

Uh oh!

coderabbitai bot commented Aug 8, 2025 •

edited

Loading

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

feat: Add a "model" label to Component metrics #2383

feat: Add a "model" label to Component metrics #2383

Uh oh!

Conversation

tzulingk commented Aug 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 8, 2025

Uh oh!

rmccorm4 commented Aug 8, 2025

Uh oh!

coderabbitai bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

tzulingk commented Aug 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 8, 2025 •

edited

Loading