Skip to content

Conversation

@tzulingk
Copy link
Contributor

@tzulingk tzulingk commented Aug 8, 2025

Overview:

Add a "model" label to Component metrics.

Details:

This pull request introduces model-specific metrics by adding a model field to the Component struct and updating the metrics labeling logic.

Changes

  • Component Struct: The Component struct has been updated to include an optional model field.
  • create_metrics() Function: This function now checks if the metrics registry contains a model name. If present, the model name is added as a label to the generated metrics.
  • Related Code: All affected code paths have been modified to handle the new model field.

Where should the reviewer start?

lib/runtime/src/component.rs: model is added to Component/Endpoint.
lib/runtime/src/metrics.rs: how model label is added to the metrics.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-360 Add a "model" label to Component metrics

Summary by CodeRabbit

  • Documentation
    • Major updates and additions across backend and deployment documentation for SGLang, vLLM, and TensorRT-LLM, including comprehensive READMEs, Kubernetes and SLURM deployment guides, and multi-node setup instructions.
    • Improved and reorganized quick start, troubleshooting, and example guides for easier onboarding and deployment.
    • Enhanced feature support matrices and clarified request migration, disaggregation strategies, and configuration options.
    • Numerous link corrections, content consolidations, and removal of redundant or outdated documentation sections.
    • Added detailed local deployment quick start and expanded engine usage documentation.
    • Introduced manual Helm deployment guide and updated operator deployment references.
  • New Features
    • Added detailed documentation and deployment guides for new backend integrations and advanced distributed deployment patterns.
    • Introduced model-aware component resolution across runtime and metrics systems, improving model-specific tracking and observability.
  • Bug Fixes
    • Corrected documentation links and improved clarity in several guides and READMEs.
  • Chores
    • Removed outdated or duplicate documentation files and sections for improved maintainability.
    • Refactored code to support optional model name parameters in component creation and metric labeling without changing external APIs.

ishandhanani and others added 22 commits August 8, 2025 16:17
Co-authored-by: Anant Sharma <[email protected]>
Co-authored-by: Ishan Dhanani <[email protected]>
…#2319)

Signed-off-by: Anish <[email protected]>
Co-authored-by: Kristen Kelleher <[email protected]>
Co-authored-by: Biswa Panda <[email protected]>
Co-authored-by: Neal Vaidya <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 8, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rmccorm4
Copy link
Contributor

rmccorm4 commented Aug 8, 2025

The base isn't right here, there are many extra commits and a huge diff included. Please clean up the branch or start a new one only with the net new changes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 8, 2025

Caution

Review failed

Failed to post review comments.

Walkthrough

This update introduces major enhancements and restructuring to documentation across the project, especially for backend integrations (vLLM, SGLang, TensorRT-LLM) and their deployment guides. It adds detailed feature support matrices, clarifies installation steps (notably for SGLang), consolidates and corrects documentation links, and provides new or updated guides for Kubernetes, SLURM, and multi-node deployments. Minor code changes propagate model-awareness in component and metrics handling.

Changes

Cohort / File(s) Change Summary
Top-Level and Example Documentation Updates
README.md, examples/README.md, examples/runtime/hello_world/README.md, docs/examples/README.md, docs/examples/runtime/hello_world/README.md
Moved and enhanced the "Framework Support Matrix", clarified framework support, improved installation instructions, removed duplication, and fixed/corrected doc links. Added new "Hello World" example documentation.
Backend Documentation Overhaul: vLLM, SGLang, TRTLLM
components/backends/vllm/README.md, components/backends/sglang/README.md, components/backends/trtllm/README.md, docs/components/backends/vllm/README.md, docs/components/backends/sglang/README.md, docs/components/backends/trtllm/README.md
Rewrote and expanded backend READMEs with detailed feature matrices, deployment instructions, and request migration guidance. Added or updated advanced deployment and usage documentation for each backend.
Kubernetes and Deployment Guides
docs/components/backends/vllm/deploy/README.md, docs/components/backends/sglang/deploy/README.md, docs/components/backends/trtllm/deploy/README.md, deploy/cloud/README.md, deploy/inference-gateway/README.md, docs/guides/dynamo_deploy/README.md, docs/guides/dynamo_deploy/helm_install.md, docs/guides/dynamo_deploy/dynamo_operator.md, docs/guides/dynamo_deploy/quickstart.md, docs/guides/dynamo_deploy/gke_setup.md
Added, corrected, or expanded deployment guides for Kubernetes, Helm, and operator-based deployments. Clarified prerequisites, secret handling, custom image usage, and troubleshooting. Added new Helm install guide and improved cloud deployment references.
SLURM and Multi-Node Deployment Documentation
docs/components/backends/sglang/slurm_jobs/README.md, docs/components/backends/sglang/docs/dsr1-wideep-h100.md, docs/components/backends/sglang/docs/multinode-examples.md, docs/components/backends/vllm/multi-node.md, components/backends/sglang/slurm_jobs/README.md, components/backends/sglang/slurm_jobs/scripts/worker_setup.py
Added or replaced detailed SLURM and multi-node deployment guides and scripts, especially for SGLang and vLLM. Updated worker setup logic for SLURM jobs and corrected references to advanced multinode examples.
Metrics and Model-Aware Component Propagation
lib/runtime/src/component.rs, lib/runtime/src/metrics.rs, lib/bindings/python/rust/lib.rs, lib/llm/src/discovery/watcher.rs, lib/llm/src/entrypoint/input/endpoint.rs, lib/llm/src/mocker/engine.rs, components/metrics/src/lib.rs, components/metrics/src/main.rs, components/metrics/src/bin/mock_worker.rs, components/router/src/main.rs, launch/dynamo-run/src/lib.rs, lib/bindings/python/rust/llm/entrypoint.rs, components/backends/vllm/src/dynamo/vllm/main.py, lib/bindings/c/src/lib.rs
Added optional model parameter to components, updated metrics registry and CLI config to propagate model-awareness, and updated all relevant code paths and bindings to support model-aware metrics labeling and component resolution.
Container and Installation Script Updates
container/Dockerfile.sglang, container/Dockerfile.sglang-wideep, container/Dockerfile.tensorrt_llm, container/deps/vllm/install_vllm.sh
Improved Dockerfiles and install scripts: clarified and fixed Python package installation (notably for SGLang and TRTLLM), added explicit pre-release FlashInfer install, pinned PyTorch version for vLLM, and improved CUDA copying.
Reference and Navigation Fixes
components/README.md, components/backends/vllm/deploy/README.md, components/backends/trtllm/deploy/README.md, docs/components/backends/llm/README.md, docs/components/backends/sglang/docs/sgl-http-server.md, docs/components/backends/trtllm/kv-cache-tranfer.md, docs/components/backends/trtllm/llama4_plus_eagle.md, benchmarks/llm/README.md, examples/basics/quickstart/README.md, examples/basics/disaggregated_serving/README.md, examples/basics/multinode/README.md, examples/deployments/EKS/Deploy_VLLM_example.md, components/backends/trtllm/README.md~HEAD
Fixed, updated, or removed redundant or broken references and navigation links across various documentation and example files.
Docs Index and Toctree Restructuring
docs/index.rst, docs/hidden_toctree.rst
Reorganized documentation index and hidden toctree, added new guides and references, improved quick start and example navigation, and included more files in the Sphinx build without exposing them in the main navigation.
API and Architecture Docs
docs/API/nixl_connect/README.md, docs/API/nixl_connect/connector.md, docs/architecture/dynamo_flow.md, docs/architecture/kv_cache_routing.md, docs/architecture/planner_intro.rst, docs/runtime/README.md
Updated or removed examples, clarified technical details, corrected links, and improved descriptions in API and architecture documentation.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Docs
    participant Backend (vLLM/SGLang/TRTLLM)
    participant Metrics
    participant DeployTool (K8s/SLURM/Helm)

    User->>Docs: Reads feature matrix, install, and deployment guides
    User->>DeployTool: Follows deployment instructions (K8s/SLURM/Helm)
    DeployTool->>Backend: Launches backend with specified model/config
    Backend->>Metrics: Reports metrics with model-aware labels
    User->>Backend: Sends inference requests
    Backend->>User: Returns results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • chore: fix QA bugs in documentation/readmes #2199: Also updates the README.md to move and enhance the "Framework Support Matrix" and SGLang installation steps—overlapping documentation changes.
  • feat: add sgl deploy readme #2238: Adds and updates SGLang deploy README with Kubernetes deployment documentation, closely related to new and updated deployment docs in this PR.
  • fix: doc links #2309: Fixes broken documentation links for backend deployment examples, directly related to the documentation link corrections and restructuring in this PR.

Poem

In burrows deep, I hop and write,
New docs and guides to shed some light.
With matrices clear and links anew,
Deploying backends is easy to do!
From vLLM to SGLang’s might,
Kubernetes or SLURM—your path is right.
🐇✨ Happy reading, day or night!

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@tzulingk tzulingk closed this Aug 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.