feat: add more metrics to rust frontend #1315

tedzhouhk · 2025-05-31T00:43:41Z

Add these metrics to rust frontend (stream mode only):

ISL
OSL
TTFT
ITL

Special thanks to @jthomson04 for the help in rust implementation!

Summary by CodeRabbit

New Features
- Added detailed token usage metrics to streaming responses, including per-chunk tokens, input tokens, and cumulative output tokens.
- Enhanced API responses to include richer metadata about token counts for better usage insights.
- Introduced new metrics for input/output sequence length, time to first token, and inter-token latency, improving observability and monitoring.
- Added input sequence length retrieval method to token generators for improved token count tracking.
- Extended annotated response structures with new optional token count fields.
- Improved event processing in streaming endpoints to update response metrics in real time.
Bug Fixes
- Improved thread safety and reliability in metrics tracking during streaming responses.
Documentation
- Updated API to reflect new optional token-related fields in responses.

…/metrics

copy-pr-bot · 2025-05-31T00:43:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-05-31T00:46:37Z

Walkthrough

This update introduces new optional token-related metadata fields—chunk_tokens, input_tokens, and output_tokens—to the Annotated struct and propagates them throughout the LLM pipeline, including engines, protocol codecs, preprocessors, and HTTP service metrics. It also adds new Prometheus metrics and thread-safe handling for inflight metric tracking, with event processing updated to record token-level metrics.

Changes

File(s)	Change Summary
lib/runtime/src/protocols/annotated.rs	Added `chunk_tokens`, `input_tokens`, `output_tokens` fields to `Annotated<R>`; updated constructors and methods to handle them.
lib/llm/src/protocols/codec.rs	Set new token fields to `None` in `Message`→`Annotated<T>` conversion.
lib/llm/src/protocols/openai.rs lib/llm/src/protocols/openai/chat_completions/delta.rs lib/llm/src/protocols/openai/completions/delta.rs	Added `get_isl()` method to `DeltaGeneratorExt` trait and its implementations for prompt token count retrieval.
lib/llm/src/protocols/openai/chat_completions/aggregator.rs lib/llm/src/protocols/openai/completions/aggregator.rs	Added new token fields to test data in `Annotated` instances.
lib/engines/mistralrs/src/lib.rs lib/llm/src/engines.rs	Set new token fields to `None` in `Annotated` construction in engine implementations.
lib/llm/src/preprocessor.rs	Tracks per-chunk and cumulative output tokens in `OpenAIPreprocessor`; attaches token metrics to responses.
lib/llm/src/http/service/metrics.rs	Added Prometheus histograms for input/output sequence length, TTFT, and ITL; added `ResponseMetricCollector` for detailed tracking; updated `InflightGuard` to lowercase model names.
lib/llm/src/http/service/openai.rs	Refactored event processing to use `process_event_converter` function that updates response metrics; replaced `TryFrom<EventConverter>`; made `InflightGuard` mutable and thread-safe.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP_Service
    participant Preprocessor
    participant Engine
    participant Metrics

    Client->>HTTP_Service: Sends completion/chat request
    HTTP_Service->>Preprocessor: Forwards request
    Preprocessor->>Engine: Sends processed request
    Engine-->>Preprocessor: Streams Annotated response chunks (with token fields)
    Preprocessor-->>HTTP_Service: Streams Annotated responses (with token fields)
    HTTP_Service->>Metrics: Updates TTFT, ITL, input/output token metrics
    HTTP_Service-->>Client: Streams SSE events

Poem

In fields of code where tokens grow,
Three new seeds now softly sow—
Chunk, input, output, counted true,
Metrics bloom where once were few.
Thread-safe bunnies guard the gate,
Prometheus watches, annotate!
Hop along, the data flows anew.

((\
( -.-)
o_(")(")

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c1dc774 and 664bb54.

📒 Files selected for processing (1)

lib/llm/src/http/service/openai.rs (12 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

lib/llm/src/http/service/openai.rs

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: Build and Test - vllm

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

lib/llm/src/http/service/metrics.rs (1)

128-177: Review bucket configurations for production appropriateness.

The histogram bucket configurations look reasonable, but consider if they align with expected token ranges and latencies in your production environment.

Consider validating these bucket ranges against expected production metrics:

ISL buckets: 0-128K tokens might be appropriate for most use cases

OSL buckets: 0-32K tokens seems reasonable for typical responses

TTFT buckets: 0-480 seconds covers a wide range

ITL buckets: 0-2 seconds should capture most inter-token delays

You may want to adjust based on your specific model performance characteristics.

lib/llm/src/http/service/openai.rs (1)

447-489: Consider potential performance impact of frequent mutex locking.

The process_event_converter function correctly handles event conversion and metrics updates, but consider the performance impact of locking the mutex on every event.

If performance becomes an issue with high-throughput streams, consider:

Batching metrics updates

Using lock-free atomic operations for simple counters

Measuring the actual impact before optimizing

The current implementation prioritizes correctness and is likely acceptable for most use cases.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c939da0 and a0d2711.

📒 Files selected for processing (12)

lib/engines/mistralrs/src/lib.rs (2 hunks)
lib/llm/src/engines.rs (2 hunks)
lib/llm/src/http/service/metrics.rs (7 hunks)
lib/llm/src/http/service/openai.rs (10 hunks)
lib/llm/src/preprocessor.rs (3 hunks)
lib/llm/src/protocols/codec.rs (1 hunks)
lib/llm/src/protocols/openai.rs (1 hunks)
lib/llm/src/protocols/openai/chat_completions/aggregator.rs (2 hunks)
lib/llm/src/protocols/openai/chat_completions/delta.rs (1 hunks)
lib/llm/src/protocols/openai/completions/aggregator.rs (2 hunks)
lib/llm/src/protocols/openai/completions/delta.rs (1 hunks)
lib/runtime/src/protocols/annotated.rs (6 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (3)

lib/llm/src/protocols/openai.rs (2)

lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

get_isl (216-218)

lib/llm/src/protocols/openai/completions/delta.rs (1)

get_isl (131-133)

lib/llm/src/protocols/openai/completions/delta.rs (2)

lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

get_isl (216-218)

lib/llm/src/protocols/openai.rs (1)

get_isl (312-312)

lib/llm/src/protocols/openai/chat_completions/delta.rs (2)

lib/llm/src/protocols/openai/completions/delta.rs (1)

get_isl (131-133)

lib/llm/src/protocols/openai.rs (1)

get_isl (312-312)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: Build and Test - vllm

🔇 Additional comments (26)

lib/llm/src/protocols/codec.rs (1)

121-123: LGTM! Clean token metadata infrastructure addition.

The addition of the three token-related fields (chunk_tokens, input_tokens, output_tokens) initialized to None is consistent with the broader effort to add stream mode metrics. This non-breaking change properly extends the Annotated struct for token tracking.

lib/llm/src/protocols/openai/chat_completions/aggregator.rs (1)

287-289: LGTM! Test cases properly updated for new token fields.

The test helper functions correctly initialize the new token metadata fields to None, maintaining consistency with the production code changes. This ensures proper test coverage for the enhanced Annotated struct.

Also applies to: 433-435

lib/llm/src/protocols/openai.rs (1)

311-312: LGTM! Well-designed trait extension for ISL metrics.

The addition of get_isl() method to the DeltaGeneratorExt trait provides a clean interface for accessing Input Sequence Length metrics. The method signature and documentation are clear, and based on the relevant code snippets, implementations correctly return prompt_tokens from usage data.

lib/engines/mistralrs/src/lib.rs (1)

410-412: LGTM! Engine implementation consistently updated for token tracking.

Both the chat completion and text completion generation paths correctly include the new token metadata fields initialized to None. This maintains consistency across the engine implementations and prepares the infrastructure for token metrics collection in the processing pipeline.

Also applies to: 572-574

lib/llm/src/protocols/openai/completions/aggregator.rs (1)

208-210: LGTM! Consistent test data structure updates.

The addition of the new optional token-related fields (chunk_tokens, input_tokens, output_tokens) initialized to None in test data structures is consistent with the updated Annotated struct definition. This ensures test compatibility without introducing any functional changes.

Also applies to: 320-322

lib/llm/src/engines.rs (1)

205-205: LGTM! Consistent struct initialization in echo engine.

The explicit initialization of the new token-related fields to None is appropriate for the EchoEngineFull implementation, which simulates responses without tracking actual token metrics. This maintains consistency with the updated Annotated struct definition.

Also applies to: 213-213, 237-237, 241-241

lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

216-218: LGTM! Clean implementation of ISL metric retrieval.

The get_isl method implementation correctly returns the prompt tokens count as the input sequence length (ISL) metric. The implementation is clean and aligns with the trait definition in lib/llm/src/protocols/openai.rs.

lib/runtime/src/protocols/annotated.rs (6)

40-44: LGTM! Well-structured token metadata addition.

The three new optional token fields are properly defined with appropriate serde attributes for optional serialization.

56-58: LGTM! Consistent field initialization.

The new token fields are properly initialized to None in the from_error constructor, maintaining consistency with the struct design.

69-71: LGTM! Consistent field initialization.

The new token fields are properly initialized to None in the from_data constructor.

87-89: LGTM! Consistent field initialization.

The new token fields are properly initialized to None in the from_annotation constructor.

125-127: LGTM! Proper field propagation.

The transfer method correctly propagates the token fields from the source to the target Annotated instance.

143-145: LGTM! Proper field propagation in transformation.

The map_data method correctly propagates the token fields during data transformation, ensuring metadata is preserved.
lib/llm/src/preprocessor.rs (4)

196-196: LGTM! Appropriate field addition for token tracking.

The cumulative_output_tokens field is correctly added to track output tokens across response chunks.

204-204: LGTM! Proper field initialization.

The cumulative_output_tokens field is correctly initialized to zero.

254-256: LGTM! Proper token metadata attachment.

The token counts are correctly attached to the response, providing comprehensive token usage information.

225-236:
✅ Verification successful

Verify the get_isl() method availability.

The token calculation logic looks correct, but ensure that the get_isl() method is properly implemented in the DeltaGeneratorExt trait.

🏁 Script executed:
#!/bin/bash
# Description: Verify that get_isl() method exists in DeltaGeneratorExt trait
# Expected: Find the method definition and implementations

# Search for the get_isl method definition in the trait
ast-grep --pattern $'trait DeltaGeneratorExt<$_> {
  $$$
  fn get_isl($_) -> $_ {
    $$$
  }
  $$$
}'

# Also search for get_isl method implementations
rg -A 3 "fn get_isl"
Length of output: 913
get_isl() implementation verified

The DeltaGeneratorExt trait in lib/llm/src/protocols/openai.rs declares fn get_isl(&self) -> Option<u32>;

Concrete implementations exist in:

lib/llm/src/protocols/openai/completions/delta.rs

lib/llm/src/protocols/openai/chat_completions/delta.rs

No missing or unimplemented usages of get_isl() were found.

All checks pass—no further changes needed.
lib/llm/src/http/service/metrics.rs (4)

31-34: LGTM! Comprehensive token metrics addition.

The four new histogram metrics provide excellent coverage for token-related performance monitoring: ISL, OSL, TTFT, and ITL.

48-50: LGTM! Appropriate fields for token tracking.

The new fields in InflightGuard properly support token metrics calculation and timing.

299-301: LGTM! Proper field initialization.

The new fields in InflightGuard are correctly initialized with appropriate default values.

317-352: LGTM! Well-implemented metrics observation methods.

The observe_response method properly handles TTFT and ITL calculations, with appropriate logic for first token detection and per-token ITL recording.

lib/llm/src/http/service/openai.rs (5)

156-160: LGTM! Proper thread-safe wrapping of InflightGuard.

The Arc<Mutex<_>> wrapping enables safe sharing of the inflight guard across async tasks for metrics updates.

180-182: LGTM! Efficient stream processing with metrics.

The stream mapping correctly uses the new process_event_converter function to handle both event conversion and metrics updates.

277-281: LGTM! Consistent thread-safe implementation.

The chat completions handler uses the same thread-safe pattern as the completions handler.

303-305: LGTM! Consistent stream processing.

The chat completions stream processing follows the same pattern as completions.

469-480: LGTM! Proper conditional metrics updates.

The metrics updates are correctly gated by the presence of token data, ensuring metrics are only recorded when relevant information is available.

lib/llm/src/protocols/openai/completions/delta.rs

lib/llm/src/http/service/metrics.rs

lib/llm/src/http/service/openai.rs

Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

lib/llm/src/http/service/metrics.rs (2)
313-313: Address model name sanitization concern.

This line records metrics using the raw model name. As mentioned in past reviews, consider sanitizing the model name to prevent issues with metric labels.

Apply sanitization to the model name before using it in metric labels:
-            .with_label_values(&[&self.model])
+            .with_label_values(&[&self.model.to_lowercase().replace('-', "_")])
335-335: TODO comment matches previous review feedback.

This TODO about publishing ISL as soon as tokenization completes aligns with previous review feedback. The current implementation publishes ISL with the first token response, which may not be optimal for timing accuracy.

Would you like me to help design a solution to publish ISL earlier in the pipeline, such as after request preprocessing?

🧹 Nitpick comments (2)

lib/llm/src/http/service/metrics.rs (2)
48-54: Good field documentation, but consider boolean naming convention.

The comments clearly explain the purpose of each field. However, based on past feedback about boolean naming, consider renaming is_first_token to has_first_token for clarity.
-    // we use is_first_token to distinguish TTFT from ITL. It is false by default and
-    // flip to true when the first token is returned and TTFT is published.
-    is_first_token: bool,
+    // we use has_first_token to distinguish TTFT from ITL. It is true by default and
+    // flips to false when the first token is returned and TTFT is published.
+    has_first_token: bool,
346-352: ITL calculation logic is correct but could be optimized.

The logic correctly calculates inter-token latency and records it once per token. However, the loop could be replaced with a single observation with the count parameter if Prometheus supports it.

Consider if this can be optimized (depending on Prometheus library capabilities):
-            for _ in 0..num_tokens {
-                self.metrics
-                    .inter_token_latency
-                    .with_label_values(&[&self.model])
-                    .observe(itl);
-            }
+            let histogram = self.metrics
+                .inter_token_latency
+                .with_label_values(&[&self.model]);
+            for _ in 0..num_tokens {
+                histogram.observe(itl);
+            }

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c332158 and f650358.

📒 Files selected for processing (1)

lib/llm/src/http/service/metrics.rs (7 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/bindings/python)

🔇 Additional comments (3)

lib/llm/src/http/service/metrics.rs (3)

6-9: LGTM: Import additions are appropriate.

The additional imports for Duration and Instant are necessary for the new timing functionality.

31-34: LGTM: New metric fields follow established patterns.

The four new histogram fields are well-named and consistent with existing metrics structure.

132-180: Verify histogram bucket ranges for production workloads.

The bucket ranges look reasonable, but please confirm they align with expected production workloads:

ISL buckets go up to 128k tokens (good for long context models)

OSL buckets max at 32k tokens

TTFT buckets range from 1ms to 8 minutes

ITL buckets range from 1ms to 2 seconds

Consider validating these ranges against your typical workload patterns to ensure good histogram resolution.

lib/llm/src/http/service/metrics.rs

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

lib/llm/src/http/service/metrics.rs (1)
434-446: ITL calculation methodology is reasonable but consider documenting the averaging approach.

The implementation calculates average inter-token latency across all tokens in a response chunk, which is a practical approximation given the available timing data. However, this approach may not capture true per-token latency variations within a chunk.

Consider adding a comment to clarify this design choice:
+        // Calculate average ITL for all tokens in this response chunk
+        // This is an approximation since individual token timing isn't available
         let response_duration = current_duration - last_response_time;
         let itl = response_duration.as_secs_f64() / num_tokens as f64;

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68912c5 and d2a1b63.

📒 Files selected for processing (2)

lib/llm/src/http/service/metrics.rs (8 hunks)
lib/llm/src/http/service/openai.rs (10 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

lib/llm/src/http/service/openai.rs

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (7)

lib/llm/src/http/service/metrics.rs (7)

6-9: LGTM! Proper imports for timing functionality.

The additional imports for Duration and Instant are correctly added to support the new timing-based metrics.

31-34: LGTM! New histogram metrics properly declared.

The four new histogram metrics are correctly added to the Metrics struct for tracking input/output sequence lengths and timing metrics.

78-90: Well-designed ResponseMetricCollector struct.

The struct properly encapsulates all necessary state for tracking response metrics:

Uses Arc<Metrics> for thread-safe sharing

Tracks timing state with start_time and last_response_time

Uses boolean flag to distinguish first token (TTFT) vs subsequent tokens (ITL)

Maintains running count of output sequence length

139-187: Excellent bucket configurations for different metric types.

The histogram buckets are well-chosen for each metric:

Input/output sequence length: Exponential buckets from 0 to 128K tokens covering typical use cases

TTFT: Fine-grained buckets for sub-second measurements up to 8 minutes

ITL: Very fine-grained buckets for measuring inter-token delays in milliseconds

The bucket ranges align well with expected LLM performance characteristics.

285-296: Good integration of model name sanitization.

The to_lowercase() conversion addresses the previous review comment about model name sanitization and ensures consistent metric labeling.

407-430: Robust implementation with proper edge case handling.

The method correctly:

Guards against division by zero with early return

Distinguishes first token (TTFT) from subsequent tokens (ITL)

Publishes ISL only once on first token

Includes the TODO comment as requested in previous reviews

449-457: Proper RAII pattern for final metric publication.

The Drop implementation ensures OSL is always published when the collector is destroyed, preventing metric loss even if the response processing is interrupted.

tedzhouhk and others added 5 commits May 30, 2025 11:58

stage

bc71508

A bunch of fixes

0770dc0

fixes

7ed7332

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

d3720f6

…/metrics

add 0.0 to bucket

a0d2711

tedzhouhk requested review from a team, GuanLuo, PeaBrane, alec-flowers, biswapanda, grahamking, jthomson04, kkranen, oandreeva-nv, paulhendricks, rmccorm4, ryanolson and tmonty12 as code owners May 31, 2025 00:43

pull-request-size bot added the size/L label May 31, 2025

github-actions bot added the feat label May 31, 2025

coderabbitai bot reviewed May 31, 2025

View reviewed changes

lib/llm/src/protocols/openai/completions/delta.rs Show resolved Hide resolved

rmccorm4 reviewed Jun 1, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

grahamking reviewed Jun 2, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

grahamking reviewed Jun 2, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

grahamking reviewed Jun 2, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

grahamking reviewed Jun 2, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

grahamking reviewed Jun 2, 2025

View reviewed changes

lib/llm/src/http/service/openai.rs Outdated Show resolved Hide resolved

grahamking reviewed Jun 2, 2025

View reviewed changes

lib/llm/src/http/service/openai.rs Outdated Show resolved Hide resolved

tedzhouhk and others added 4 commits June 2, 2025 08:50

Update lib/llm/src/http/service/metrics.rs

99b1d9a

Co-authored-by: Ryan McCormick <[email protected]> Signed-off-by: Hongkuan Zhou <[email protected]>

better naming for unit

4a1ea83

rename + comment

c332158

pc

f650358

coderabbitai bot reviewed Jun 2, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

tedzhouhk added 7 commits June 2, 2025 09:18

check num_tokens=0

bf76a1e

sanitize model name

e1b3188

fmt

68912c5

comment

59af2d5

separate metric from inflight guard

9fdf0d0

fmt

32c3d58

remove comment

d2a1b63

coderabbitai bot reviewed Jun 2, 2025

View reviewed changes

tedzhouhk added 2 commits June 2, 2025 14:13

add TODO

c1dc774

use mut response_collector

664bb54

grahamking approved these changes Jun 3, 2025

View reviewed changes

tedzhouhk merged commit 98d4abb into main Jun 3, 2025
10 checks passed

tedzhouhk deleted the hzhou/metrics branch June 3, 2025 15:46

tedzhouhk mentioned this pull request Jun 4, 2025

[FEATURE]: Support SGLANG in Dynamo Planner #1196

Open

coderabbitai bot mentioned this pull request Jun 4, 2025

refactor: use comment filed in annotated to pass metric-related information #1385

Merged

ZichengMa mentioned this pull request Jun 9, 2025

Introduce Unified Metrics & Health-Check HTTP Endpoints for Dynamo Components ai-dynamo/enhancements#9

Open

coderabbitai bot mentioned this pull request Aug 13, 2025

feat: LLM metrics for non-streaming requests in frontend #2427

Merged

This was referenced Sep 9, 2025

feat: frontend, disconnect metrics (#1) #2953

Merged

fix: implement OpenAI-compliant usage stats for streaming responses #3022

Merged

feat: add more metrics to rust frontend #1315

feat: add more metrics to rust frontend #1315

Uh oh!

Conversation

tedzhouhk commented May 31, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented May 31, 2025

Uh oh!

coderabbitai bot commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tedzhouhk commented May 31, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 31, 2025 •

edited

Loading