Skip to content

Conversation

tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented Aug 13, 2025

Summary by CodeRabbit

  • Chores
    • Enhanced internal observability for non-streaming completions and chat completions by recording response and token metrics.
    • Improves monitoring and analytics without changing outputs, error handling, or the public API.
    • Streaming behavior remains unchanged; only non-streaming paths gain telemetry.
    • Aids capacity planning, usage reporting, and troubleshooting through more accurate metric collection.
    • No user-facing changes; functionality and performance remain consistent while backend insights increase.

@tedzhouhk tedzhouhk requested a review from a team as a code owner August 13, 2025 15:40
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 13, 2025

Walkthrough

Non-streaming completions and chat completions now pass responses through a new helper that extracts LLM metric annotations and updates response metrics, then returns the original response. Streaming paths and public APIs remain unchanged.

Changes

Cohort / File(s) Summary
OpenAI non-streaming metrics integration
lib/llm/src/http/service/openai.rs
Added process_metrics_only(...) helper to extract LLMMetricAnnotation and update ResponseMetricCollector via observe_current_osl/observe_response; wired into non-streaming completions and chat_completions map/collect flow; no changes to streaming or error handling surfaces.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant OpenAIService
  participant Provider
  participant Metrics as ResponseMetricCollector

  Client->>OpenAIService: completions/chat (non-streaming)
  OpenAIService->>Provider: send request
  Provider-->>OpenAIService: responses[]
  loop each response
    OpenAIService->>Metrics: process_metrics_only(annotated)
    Metrics-->>OpenAIService: metrics observed
    OpenAIService-->>OpenAIService: keep original response
  end
  OpenAIService-->>Client: aggregated response
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Poem

A bunny counts tokens with delicate care,
Hops through responses, metrics everywhere.
No streams disturbed, just gentle taps,
Annotations whisper their tidy maps.
Ears up, charts up—collect, then go,
Another neat burrow in data’s flow. 🐇📈


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
lib/llm/src/http/service/openai.rs (3)

293-299: Prefer StreamExt::inspect to tap metrics without altering stream items

Using map here returns the original item unmodified. inspect expresses the intent better and avoids unnecessary moves.

Two options:

  • Option A (preferred): switch to inspect.
  • Option B: if NvCreateCompletionResponse::from_annotated_stream does not already ignore LLM metrics annotations, keep map but also chomp the metrics event (parity with streaming) to prevent internal annotations from leaking into the aggregated JSON.

Option A:

-        // Process the stream to collect metrics for non-streaming requests
-        let stream = stream.map(move |response| {
-            // Process metrics but return the original response for aggregation
-            process_metrics_only(&response, &mut response_collector);
-            response
-        });
+        // Tap the stream to collect metrics for non-streaming requests without altering items
+        let stream = stream.inspect(move |response| {
+            process_metrics_only(response, &mut response_collector);
+        });

Option B (parity with streaming; remove metrics annotations from items):

-        // Process the stream to collect metrics for non-streaming requests
-        let stream = stream.map(move |response| {
-            // Process metrics but return the original response for aggregation
-            process_metrics_only(&response, &mut response_collector);
-            response
-        });
+        // Process metrics and chomp metrics annotations to avoid leaking internal events
+        let stream = stream.map(move |mut response| {
+            process_metrics_only(&response, &mut response_collector);
+            if response.event.as_deref() == Some(crate::preprocessor::ANNOTATION_LLM_METRICS) {
+                response.event = None;
+                response.comment = None;
+            }
+            response
+        });

Follow-up: Please confirm whether from_annotated_stream discards ANNOTATION_LLM_METRICS events so we can safely use Option A. If not, Option B maintains user-visible parity with the streaming path.


524-530: Mirror the non-streaming tap with inspect and consider chomping metrics annotations

Same rationale as in completions: inspect better conveys tapping for metrics; chomp if the folder doesn’t already ignore LLM metrics annotations.

Option A:

-        // Process the stream to collect metrics for non-streaming requests
-        let stream = stream.map(move |response| {
-            // Process metrics but return the original response for aggregation
-            process_metrics_only(&response, &mut response_collector);
-            response
-        });
+        // Tap the stream to collect metrics for non-streaming requests without altering items
+        let stream = stream.inspect(move |response| {
+            process_metrics_only(response, &mut response_collector);
+        });

Option B (parity with streaming; remove metrics annotations from items):

-        let stream = stream.map(move |response| {
-            // Process metrics but return the original response for aggregation
-            process_metrics_only(&response, &mut response_collector);
-            response
-        });
+        let stream = stream.map(move |mut response| {
+            process_metrics_only(&response, &mut response_collector);
+            if response.event.as_deref() == Some(crate::preprocessor::ANNOTATION_LLM_METRICS) {
+                response.event = None;
+                response.comment = None;
+            }
+            response
+        });

Please verify whether NvCreateChatCompletionResponse::from_annotated_stream intentionally drops internal metrics events; if not, Option B prevents them from surfacing in the final JSON.


926-936: Helper looks good; add a brief doc comment (and optionally #[inline])

Small readability nit: document intent and parity with streaming, since this is now shared across endpoints.

-fn process_metrics_only<T>(
+/// Tap LLMMetricAnnotation events to update response metrics without mutating or filtering the item.
+/// Used in non-streaming paths to collect token counts while preserving original stream items.
+#[inline]
+fn process_metrics_only<T>(
     annotated: &Annotated<T>,
     response_collector: &mut ResponseMetricCollector,
 ) {
     // update metrics
     if let Ok(Some(metrics)) = LLMMetricAnnotation::from_annotation(annotated) {
         response_collector.observe_current_osl(metrics.output_tokens);
         response_collector.observe_response(metrics.input_tokens, metrics.chunk_tokens);
     }
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 72ec5f5 and 9d2323e.

📒 Files selected for processing (1)
  • lib/llm/src/http/service/openai.rs (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)

@grahamking
Copy link
Contributor

@tedzhouhk What do you think about the Code Rabbit nit about stream.inspect?
#2427 (review)

@tedzhouhk
Copy link
Contributor Author

@tedzhouhk What do you think about the Code Rabbit nit about stream.inspect? #2427 (review)

Good call! I didn't know we have inspect, let me modify and test

@tedzhouhk tedzhouhk merged commit c3ecaf6 into main Aug 13, 2025
10 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/nonstream-metric branch August 13, 2025 17:01
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants