-
Notifications
You must be signed in to change notification settings - Fork 646
refactor: use comment filed in annotated to pass metric-related information #1385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change removes the Changes
Sequence Diagram(s)sequenceDiagram
participant Engine
participant Preprocessor
participant Annotated
participant HTTPService
Engine->>Preprocessor: Generate response with token counts
Preprocessor->>Annotated: Serialize token counts into comment field (LLMMetricAnnotation)
Annotated->>HTTPService: Yield response (token counts in comment)
HTTPService->>HTTPService: Parse token counts from comment for metrics via LLMMetricAnnotation
Possibly related PRs
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/llm/src/preprocessor.rs (1)
254-260
: Consider using constants for token metric prefixes to ensure consistency.The hardcoded string prefixes ("chunk_tokens: ", "input_tokens: ", "output_tokens: ") create a dependency between this encoding logic and the parsing logic in other modules. Consider defining these as constants to prevent inconsistencies and make future changes easier.
+const TOKEN_COMMENT_PREFIX_CHUNK: &str = "chunk_tokens: "; +const TOKEN_COMMENT_PREFIX_INPUT: &str = "input_tokens: "; +const TOKEN_COMMENT_PREFIX_OUTPUT: &str = "output_tokens: "; + // Store token information in comment field let token_info = vec![ - format!("chunk_tokens: {}", chunk_tokens), - format!("input_tokens: {}", isl), - format!("output_tokens: {}", current_osl), + format!("{}{}", TOKEN_COMMENT_PREFIX_CHUNK, chunk_tokens), + format!("{}{}", TOKEN_COMMENT_PREFIX_INPUT, isl), + format!("{}{}", TOKEN_COMMENT_PREFIX_OUTPUT, current_osl), ];
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
lib/engines/mistralrs/src/lib.rs
(0 hunks)lib/llm/src/engines.rs
(2 hunks)lib/llm/src/http/service/openai.rs
(1 hunks)lib/llm/src/preprocessor.rs
(1 hunks)lib/llm/src/protocols/codec.rs
(0 hunks)lib/llm/src/protocols/openai/chat_completions/aggregator.rs
(0 hunks)lib/llm/src/protocols/openai/completions/aggregator.rs
(0 hunks)lib/runtime/src/protocols/annotated.rs
(0 hunks)
💤 Files with no reviewable changes (5)
- lib/llm/src/protocols/openai/completions/aggregator.rs
- lib/llm/src/protocols/codec.rs
- lib/llm/src/protocols/openai/chat_completions/aggregator.rs
- lib/engines/mistralrs/src/lib.rs
- lib/runtime/src/protocols/annotated.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/llm/src/http/service/openai.rs (1)
lib/bindings/python/rust/lib.rs (1)
comments
(846-848)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (1)
lib/llm/src/engines.rs (1)
205-205
: LGTM: Annotated struct initialization correctly updated.The removal of explicit token field assignments aligns perfectly with the struct refactoring. The simplified initialization is cleaner and maintains the same functionality.
Also applies to: 213-213, 237-237, 241-241
How about something like
then to parse, use from_annotion |
@ryanolson thanks for the suggestion, not sure if I fully understand the to_annotation part but I made some changes and it seems to work. Please give another look. The part I don't understand well is if we want to use the current event field of a normal response (cause a normal response will have event None) or if we want to send out a new response with only the metrics. Currently I am doing the first approach. |
@ryanolson we do have This is available to use for Completions Requests and for ChatCompletions Requests. We can choose to add NvExt to both Completions Responses and ChatCompletions Responses as well? |
@ryanolson @paulhendricks based on our discussion today I think current approach is the right way to do it IIUC? Could you please take another look? thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great - LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/llm/src/preprocessor.rs (1)
76-96
: Minor: tighten error handling & return types infrom_annotation
- Returning
Box<dyn std::error::Error>
introduces dynamic dispatch for a simple parsing routine; the rest of the crate predominantly usesanyhow::Error
.- Re-creating ad-hoc validation logic duplicates behaviour already enforced by
Annotated::from_annotation
.comments.len() != 1
is a strong restriction – consider at least allowing>= 1
and picking the first element to stay forward-compatible.-pub fn from_annotation<T>( - annotation: &Annotated<T>, -) -> Result<Option<LLMMetricAnnotation>, Box<dyn std::error::Error>> { +pub fn from_annotation<T>( + annotation: &Annotated<T>, +) -> anyhow::Result<Option<Self>> { if annotation.event.as_deref() != Some(ANNOTATION_LLM_METRICS) { return Ok(None); } - let comments = annotation - .comment - .as_ref() - .ok_or("missing comments block")?; - if comments.len() != 1 { - return Err("malformed comments block - expected exactly 1 comment".into()); - } - let metrics: LLMMetricAnnotation = serde_json::from_str(&comments[0])?; - Ok(Some(metrics)) + let comment = annotation + .comment + .as_ref() + .and_then(|v| v.first()) + .context("llm_metrics annotation missing comment payload")?; + + Ok(Some(serde_json::from_str(comment)?)) }This keeps error semantics consistent and removes a few
unwrap
/manual checks.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/llm/src/preprocessor.rs
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (.)
use comment filed in annotated to pass metric-related information, this allows annotated to be aligned with SSE
Summary by CodeRabbit