Skip to content

Conversation

@keivenchang
Copy link
Contributor

@keivenchang keivenchang commented Jul 30, 2025

Overview:

Standardizes metric naming convention by adding dynamo_component_ prefix to all metric names and updating labels to use dynamo_ prefixed versions to avoid Kubernetes collisions.

Details:

  • Metric Naming: All metrics now consistently use dynamo_component_ prefix (e.g., dynamo_component_uptime_seconds)
  • Label Standardization: Updated labels from namespace, component, endpoint to dynamo_namespace, dynamo_component, dynamo_endpoint to avoid Kubernetes label collisions
  • Documentation Updates: Updated system_metrics README to reflect new naming convention and changed "Work Handlers" to "Request Handlers" terminology
  • Test Fixes: Updated test expectations to match new metric format and fixed HTTP server test assertions
  • Code Simplification: Simplified build_metric_name function to always use dynamo_component_ prefix

Where should the reviewer start?

  • lib/runtime/src/metrics.rs - Core metric naming logic and label handling
  • lib/runtime/src/http_server.rs - HTTP server metric test updates
  • lib/runtime/examples/system_metrics/README.md - Documentation updates and terminology changes

Related Issues:

Relates to DIS-339

Summary by CodeRabbit

  • Documentation

    • Updated documentation and examples to reflect new naming conventions for metrics and labels, including prefixing with "dynamo_" and terminology changes from "work handlers" to "request handlers".
  • Refactor

    • Standardized all metric names and labels to use the "dynamo_component_" and "dynamo_" prefixes for consistency and to avoid conflicts with other monitoring systems.
    • Updated internal logic for metric name and label construction to enforce the new prefixing scheme.
  • Bug Fixes

    • Corrected test header construction and error handling in integration tests.

@keivenchang keivenchang requested a review from a team as a code owner July 30, 2025 00:33
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 30, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 30, 2025

Walkthrough

The changes standardize metric naming and labeling conventions throughout the documentation, implementation, and tests. All metrics now use a "dynamo_" prefix for both names and labels to avoid conflicts with Kubernetes and other systems. Related documentation, code, and tests were updated to reflect this unified scheme, with minor improvements to test syntax and comments.

Changes

Cohort / File(s) Change Summary
Documentation & Examples
lib/runtime/examples/system_metrics/README.md
Updated all references from "work handlers" to "request handlers." Changed metric system and label names to use "dynamo_" prefix. Updated example outputs and curl commands. Clarified documentation to reflect new naming conventions.
HTTP Server Implementation & Tests
lib/runtime/src/http_server.rs
Changed uptime gauge metric name and label to use new "dynamo_" prefix. Updated test assertions and HTTP header construction for correctness. Removed outdated comments. No changes to core logic or public APIs.
Metrics Registry & Tests
lib/runtime/src/metrics.rs
Unified metric naming to always use "dynamo_component_" prefix. Changed auto-labeling to use "dynamo_"-prefixed labels. Updated validation to prevent label collisions. Adjusted function signatures and documentation. Modified tests to expect new conventions and corrected a test comment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Poem

In fields of code where metrics grow,
Prefixes bloom—"dynamo" in tow!
Labels and names, all neat and bright,
No more collisions in the night.
With tests and docs now in accord,
This bunny hops and gives reward:
A tidy patch, metrics restored! 🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da0c572 and f5a08a4.

📒 Files selected for processing (3)
  • lib/runtime/examples/system_metrics/README.md (6 hunks)
  • lib/runtime/src/http_server.rs (3 hunks)
  • lib/runtime/src/metrics.rs (11 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:260-275
Timestamp: 2025-05-29T21:46:45.112Z
Learning: The team prefers to evaluate performance-related changes like channel capacity limits during dedicated performance testing rather than implementing them immediately, especially for components like DynamoEventManager where the performance impact needs careful measurement.
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
lib/runtime/src/http_server.rs (1)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (13)
lib/runtime/src/http_server.rs (3)

101-101: LGTM: Correct alignment with new metric naming convention.

The change from "dynamo_uptime_seconds" to "uptime_seconds" is correct. The build_metric_name function will now automatically prepend "dynamo_component_" to create the final metric name "dynamo_component_uptime_seconds".


369-372: LGTM: Test expectations correctly updated for new naming scheme.

The integration test expectations have been properly updated to reflect:

  • Metric name: dynamo_component_uptime_seconds (with the new prefix)
  • Label key: dynamo_namespace (prefixed to avoid Kubernetes collisions)

These changes align perfectly with the standardized naming conventions.


446-447: LGTM: Correct syntax fix for HTTP header construction.

The HTTP header construction has been fixed with proper Rust syntax:

  • Using :: instead of incorrect method calls on types
  • Using .unwrap() instead of ? for error propagation, which is appropriate in test code
lib/runtime/examples/system_metrics/README.md (4)

1-1: LGTM: Improved terminology for clarity.

The change from "work handlers" to "request handlers" improves clarity and uses more standard terminology in the web/API context.

Also applies to: 3-3, 7-7, 18-18, 38-38, 44-44


32-32: LGTM: Comprehensive metric naming standardization.

All metric names have been systematically updated to use the "dynamo_component_" prefix and labels now use "dynamo_" prefixes to avoid collisions with Kubernetes labels. The documentation accurately reflects the new naming conventions.

Also applies to: 47-50, 53-53, 62-62, 65-65, 68-68, 72-76


83-119: LGTM: Example Prometheus output correctly updated.

The example metrics output has been thoroughly updated to reflect:

  • All metric names with dynamo_component_ prefix
  • All labels with dynamo_ prefix (dynamo_namespace, dynamo_component, dynamo_endpoint)
  • Consistent formatting and proper Prometheus syntax

213-213: LGTM: Curl commands updated for new naming scheme.

The curl commands have been correctly updated to grep for "dynamo_component" and "dynamo_frontend" patterns to match the new metric naming conventions.

Also applies to: 216-216

lib/runtime/src/metrics.rs (6)

35-37: LGTM: Core metric naming standardization implemented correctly.

The build_metric_name function now unconditionally prepends "dynamo_component_" to all metric names, which is the key implementation of the standardized naming convention to avoid Kubernetes label collisions.


19-21: LGTM: Documentation updated to reflect new naming conventions.

The module documentation and comments have been properly updated to describe:

  • The universal "dynamo_component_" prefix for metric names
  • The "dynamo_" prefix for auto-generated labels
  • The rationale of avoiding collisions with Kubernetes and other monitoring systems

Also applies to: 28-30


224-228: LGTM: Auto-label system correctly updated for prefixed labels.

The auto-labeling logic has been properly updated to:

  • Validate against the new "dynamo_" prefixed reserved label keys
  • Generate auto-labels with "dynamo_namespace", "dynamo_component", and "dynamo_endpoint" keys
  • Maintain the same hierarchy logic while using the new prefixed keys

This prevents conflicts with user-provided labels and ensures consistency.

Also applies to: 234-257


556-562: LGTM: Test updated to validate new metric naming.

The test for build_metric_name has been correctly updated to expect the "dynamo_component_" prefix in all metric names.


811-815: LGTM: Test comment clarification for namespace naming.

The comment has been correctly updated to clarify that hyphens are allowed in namespace names, as they are sanitized by the lint_prometheus_name function.


864-867: LGTM: Integration tests comprehensively updated for new naming scheme.

All integration test expected outputs have been systematically updated to reflect:

  • "dynamo_component_" prefixed metric names
  • "dynamo_" prefixed auto-generated labels
  • Consistent formatting and proper test validation

The tests will now correctly validate the standardized metric naming and labeling conventions.

Also applies to: 892-897, 922-931, 998-1027


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - we can address documentation in the guide PR

Copy link
Contributor Author

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New PR coming up

@keivenchang keivenchang force-pushed the keivenchang/DIS-339/fix-component-metric-names branch from a8d2018 to 20bb59b Compare August 1, 2025 00:29
@keivenchang keivenchang force-pushed the keivenchang/DIS-339/fix-component-metric-names branch from 7abaa92 to 36a7e0b Compare August 1, 2025 01:34
@keivenchang keivenchang merged commit efd863d into main Aug 1, 2025
10 checks passed
@keivenchang keivenchang deleted the keivenchang/DIS-339/fix-component-metric-names branch August 1, 2025 02:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants