Feat/completion request template 2494 #2984

hakkapelli · 2025-09-10T09:25:19Z

Overview:

Adds request template support to /v1/completions endpoint to match existing /v1/chat/completions functionality. Templates now provide default values for model, temperature, and max_tokens when not specified in completion requests.

Details:

Modified handler_completions to parse JSON before deserialization and apply template values
Updated completions_router to accept template parameter
Added --request-template CLI argument to Python frontend
Template values applied before JSON validation, enabling minimal requests like {"prompt": "hello"}

Where should the reviewer start?

lib/llm/src/http/service/openai.rs - Main template application logic in handler_completions
components/frontend/src/dynamo/frontend/main.py - CLI argument addition

Related Issues:

Fixes [FEATURE]: request_template should be applied to completion endpoint as well #2494

Summary by CodeRabbit

New Features
- Optional --request-template flag to load default request parameters (model, temperature, max tokens) from a JSON file.
- Server now applies sensible defaults for missing or zeroed completion parameters.
- Streaming responses include usage details for reasoning tokens.
Bug Fixes
- Clearer 400 errors for invalid JSON and invalid request formats in completions requests.

- Add reasoning_token_count field to DeltaGenerator struct - Implement update_reasoning_tokens() method for accurate token counting - Populate completion_tokens_details.reasoning_tokens in usage statistics - Support both streaming and non-streaming responses - Enable observability of reasoning token usage for reasoning models Fixes ai-dynamo#2941 Signed-off-by: Harshith Akkapelli <[email protected]>

Implements request template functionality for /v1/completions endpoint to match existing /v1/chat/completions behavior. Templates now provide default values for model, temperature, and max_tokens when not specified in completion requests. Changes: - Modified handler_completions to parse JSON before deserialization - Added template application logic for completion requests - Updated completions_router to accept template parameter - Added --request-template CLI argument to Python frontend - Template values are applied before JSON validation, allowing minimal requests like {"prompt": "hello"} to work with templates Fixes ai-dynamo#2494 Signed-off-by: Harshith Akkapelli [email protected]

copy-pr-bot · 2025-09-10T09:25:23Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-09-10T09:25:29Z

👋 Hi hakkapelli! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

Signed-off-by: Harshith Akkapelli <[email protected]>

coderabbitai · 2025-09-10T09:35:43Z

Walkthrough

Adds an optional --request-template CLI flag to pass a template file into engine startup, threads RequestTemplate through HTTP routing for completions/chat/responses, applies template defaults to completion requests with improved input error handling, and augments chat streaming with internal reasoning token counting surfaced in usage details.

Changes

Cohort / File(s)	Summary
Frontend CLI templating flag `components/frontend/src/dynamo/frontend/main.py`	Adds `--request-template` (pathlib.Path) to parse_args; passes as `template_file` kwarg to EntrypointArgs for optional default request parameters.
HTTP routing & request templating (OpenAI endpoints) `lib/llm/src/http/service/openai.rs`, `lib/llm/src/http/service/service_v2.rs`	Threads `RequestTemplate` through routers: updated signatures for `completions_router`, `chat_completions_router`, `responses_router`; stores `(state, template)` in router state; completions handler now reads raw body, parses JSON, applies template defaults (model/temperature/max_tokens), improves 400 error texts, logs receipt, and forwards template into processing.
Reasoning token observability in chat streaming `lib/llm/src/protocols/openai/chat_completions/delta.rs`	Adds `reasoning_token_count` to `DeltaGenerator`; introduces private `update_reasoning_tokens(...)`; updates count during streaming/post-processing when reasoning content present; includes `reasoning_tokens` in completion usage details when >0.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as Frontend CLI
  participant Svc as service_v2 Router
  participant OAI as OpenAI Routers
  participant H as Completions Handler
  participant T as RequestTemplate
  participant Eng as Engine Task

  User->>CLI: run with --request-template=path.json (optional)
  CLI->>Svc: start service with template_file (if provided)
  Svc->>OAI: build routers with (state, template)
  User->>OAI: POST /v1/completions (JSON body)
  OAI->>H: invoke handler with raw body + template from state
  H->>H: parse JSON or return 400 "Invalid JSON: …"
  H->>T: apply defaults (model, temperature, max_tokens) when missing/zero
  H->>H: deserialize to NvCreateCompletionRequest or 400 "Invalid request format: …"
  H->>Eng: spawn/await completions task
  Eng-->>H: result or error
  H-->>User: response
  note right of H: Logs "Received completions request: …"

sequenceDiagram
  autonumber
  participant Gen as DeltaGenerator
  participant Stream as Stream Loop
  participant Usage as Usage Details

  Stream->>Gen: emit delta with content + optional reasoning_text
  alt reasoning content present
    Gen->>Gen: update_reasoning_tokens(reasoning_text, total_text, token_ids)
  end
  Stream->>Usage: assemble completion_tokens_details
  alt reasoning_token_count > 0
    Usage->>Usage: include reasoning_tokens
  end
  Usage-->>Stream: usage snapshot per choice

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix: Do not apply chat template to completions #2718 — Adjusts how request templates are applied to avoid chat templates affecting completions; intersects with this PR’s completion templating path.

Pre-merge checks (3 passed, 2 warnings)

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The current title “Feat/completion request template 2494” includes an issue number and uses unconventional slash notation, which makes it less clear and concise than the standard imperative sentence form expected for PR titles. It references the feature but does not succinctly describe the primary change from the developer’s perspective.	Rename the title to a clear, concise imperative sentence that highlights the main change, for example “Add request template support to /v1/completions endpoint.”
Out of Scope Changes Check	⚠️ Warning	The PR introduces changes in chat_completions/delta.rs to add reasoning token observability, which is unrelated to request_template support for the completions endpoint and falls outside the scope defined by issue #2494.	Extract the reasoning token observability additions into a separate PR so this one remains focused solely on implementing request templates for the completions endpoint.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues Check	✅ Passed	The PR implements all coding requirements from issue #2494 by updating handler_completions to apply template defaults, modifying completions_router signatures to accept a template parameter, and adding the --request-template CLI flag in the Python frontend, thereby honoring request templates in the /v1/completions endpoint.
Description Check	✅ Passed	The description follows the repository template by including an Overview, Details, review entry points, and a Related Issues section with the appropriate action keyword “Fixes #2494,” thereby covering all required sections and providing clear information for reviewers.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Poem

A rabbit taps keys with a twitch of delight,
Templates now whisper defaults in the night.
Completions comply, with tidy attire,
Reasoning tokens dance by the fire.
Routers pass notes, handlers make sense—
Hop! The requests are now less intense.
Thump-thump: shipped with confidence.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (6)

components/frontend/src/dynamo/frontend/main.py (1)

203-208: Validate template path and clarify help text

Add a validator to ensure the file exists and is JSON.
Clarify “max_tokens vs max_completion_tokens” to avoid confusion across endpoints.

-    parser.add_argument(
-        "--request-template",
-        type=pathlib.Path,
-        default=None,
-        help="Path to JSON file containing default request parameters (model, temperature, max_completion_tokens).",
-    )
+    parser.add_argument(
+        "--request-template",
+        type=validate_request_template_path,
+        default=None,
+        help="Path to JSON file with default request parameters (model, temperature, max_tokens/max_completion_tokens).",
+    )

Add this helper (outside the shown hunk) to mirror existing validators:

def validate_request_template_path(value: pathlib.Path):
    p = pathlib.Path(value)
    if not p.is_file():
        raise argparse.ArgumentTypeError(f"--request-template must point to an existing file, got: {value}")
    if p.suffix.lower() != ".json":
        raise argparse.ArgumentTypeError(f"--request-template must be a .json file, got: {value}")
    return p

lib/llm/src/protocols/openai/chat_completions/delta.rs (2)

138-158: Make token accounting safer (avoid overflows, skip zero-token deltas)

Skip computation when token_ids is empty to save work.
Use saturating_add for defensive accumulation.

-    fn update_reasoning_tokens(&mut self, reasoning_text: &str, total_text: &str, token_ids: &[u32]) {
-        if !reasoning_text.is_empty() && !total_text.is_empty() {
+    fn update_reasoning_tokens(&mut self, reasoning_text: &str, total_text: &str, token_ids: &[u32]) {
+        if token_ids.is_empty() { return; }
+        if !reasoning_text.is_empty() && !total_text.is_empty() {
             // Estimate reasoning tokens based on the proportion of reasoning text to total text
             // This is more accurate than word counting alone
             let reasoning_chars = reasoning_text.chars().count() as f32;
             let total_chars = total_text.chars().count() as f32;
 
             if total_chars > 0.0 {
                 let reasoning_proportion = reasoning_chars / total_chars;
                 let estimated_reasoning_tokens = (token_ids.len() as f32 * reasoning_proportion).ceil() as u32;
-                self.reasoning_token_count += estimated_reasoning_tokens;
+                self.reasoning_token_count = self.reasoning_token_count.saturating_add(estimated_reasoning_tokens);
             }
         }
     }

378-383: Micro-guard to avoid unnecessary work when no tokens emitted

Saves char counting on empty-token deltas.

-        if let Some(ref reasoning_text) = reasoning_content {
-            let total_text = delta.text.as_deref().unwrap_or("");
-            self.update_reasoning_tokens(reasoning_text, total_text, &delta.token_ids);
-        }
+        if let Some(ref reasoning_text) = reasoning_content {
+            if !delta.token_ids.is_empty() {
+                let total_text = delta.text.as_deref().unwrap_or("");
+                self.update_reasoning_tokens(reasoning_text, total_text, &delta.token_ids);
+            }
+        }

lib/llm/src/http/service/openai.rs (3)

214-217: Prefer Bytes extractor to avoid extra copy and enable zero-copy JSON parsing

Using String forces an extra allocation and UTF-8 validation. Bytes + from_slice is leaner.

-async fn handler_completions(
-    State((state, template)): State<(Arc<service_v2::State>, Option<RequestTemplate>)>,
-    headers: HeaderMap,
-    body: String,
-) -> Result<Response, ErrorResponse> {
+async fn handler_completions(
+    State((state, template)): State<(Arc<service_v2::State>, Option<RequestTemplate>)>,
+    headers: HeaderMap,
+    body: axum::body::Bytes,
+) -> Result<Response, ErrorResponse> {

221-231: Parse from bytes, not string

Follow-up to the Bytes change: use serde_json::from_slice.

-    let mut json_value: serde_json::Value = serde_json::from_str(&body)
+    let mut json_value: serde_json::Value = serde_json::from_slice(&body)
         .map_err(|e| {
             (
                 StatusCode::BAD_REQUEST,
                 Json(ErrorMessage {
                     error: format!("Invalid JSON: {}", e),
                 }),
             )
         })?;

292-292: Unused template in completions()

_template isn’t used. Either drop the parameter and the corresponding argument at the call site, or leave as-is if you plan to use it soon. Minor clarity win to remove now.

-async fn completions(
-    state: Arc<service_v2::State>,
-    _template: Option<RequestTemplate>,
+async fn completions(
+    state: Arc<service_v2::State>,
     request: Context<NvCreateCompletionRequest>,
     stream_handle: ConnectionHandle,
 ) -> Result<Response, ErrorResponse> {

And at the call site:

-    let response = tokio::spawn(completions(state, template, request, stream_handle).in_current_span())
+    let response = tokio::spawn(completions(state, request, stream_handle).in_current_span())

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 905c920 and 4c62751.

📒 Files selected for processing (4)

components/frontend/src/dynamo/frontend/main.py (2 hunks)
lib/llm/src/http/service/openai.rs (4 hunks)
lib/llm/src/http/service/service_v2.rs (1 hunks)
lib/llm/src/protocols/openai/chat_completions/delta.rs (5 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

lib/llm/src/protocols/openai/chat_completions/delta.rs

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
PR: ai-dynamo/dynamo#2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.

Applied to files:

lib/llm/src/protocols/openai/chat_completions/delta.rs

🧬 Code graph analysis (2)

components/frontend/src/dynamo/frontend/main.py (1)

lib/llm/src/local_model.rs (3)

default (67-89)

request_template (154-157)

request_template (350-352)

lib/llm/src/http/service/service_v2.rs (2)

lib/llm/src/http/service/openai.rs (1)

completions_router (1067-1079)

lib/llm/src/local_model.rs (2)

request_template (154-157)

request_template (350-352)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2984/merge) by hakkapelli.

lib/llm/src/http/service/openai.rs

[error] 228-236: Trailing whitespace detected by pre-commit 'trailing-whitespace' hook; file was auto-fixed. CI exited with code 1.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (6)

components/frontend/src/dynamo/frontend/main.py (1)

271-272: No change needed for template_file Pyo3’s Option<PathBuf> parameters accept Python str, os.PathLike, and pathlib.Path natively, so passing flags.request_template directly is safe and casting to str is unnecessary. (pyo3.rs)

lib/llm/src/protocols/openai/chat_completions/delta.rs (1)

60-61: LGTM: field added and initialized

The dedicated reasoning_token_count with zero-init is straightforward and contained.

Also applies to: 123-127

lib/llm/src/http/service/service_v2.rs (1)

371-375: LGTM – RequestTemplate passed through completions_router for parity. Matches chat/responses routing and enables templated defaults on /v1/completions; no remaining outdated call sites found.

lib/llm/src/http/service/openai.rs (3)

253-262: Clear 400 mapping for deserialization errors — looks good

The “Invalid request format: …” 400 is appropriate after templating and before engine invocation.

273-281: Tokio spawn with span propagation — LGTM

Spawning the task with in_current_span() and a tailored await error message is correct.

1069-1077: Router state wiring with template — LGTM

Passing (state, template) and setting a body limit here aligns with the handler signature change.

coderabbitai · 2025-09-10T09:35:46Z

lib/llm/src/http/service/openai.rs

+    // Apply template values to JSON before deserialization
+    if let Some(template) = &template {
+        if let Some(obj) = json_value.as_object_mut() {
+            // Apply model from template if not present or empty
+            if !obj.contains_key("model") || obj.get("model").and_then(|v| v.as_str()).map_or(true, |s| s.is_empty()) {
+                obj.insert("model".to_string(), serde_json::Value::String(template.model.clone()));
+            }
+
+            // Apply temperature from template if not present or zero
+            if !obj.contains_key("temperature") || obj.get("temperature").and_then(|v| v.as_f64()).map_or(true, |t| t == 0.0) {
+                obj.insert("temperature".to_string(), serde_json::Value::Number(serde_json::Number::from_f64(template.temperature as f64).unwrap()));
+            }
+
+            // Apply max_tokens from template if not present or zero
+            if !obj.contains_key("max_tokens") || obj.get("max_tokens").and_then(|v| v.as_u64()).map_or(true, |t| t == 0) {
+                obj.insert("max_tokens".to_string(), serde_json::Value::Number(serde_json::Number::from(template.max_completion_tokens)));
+            }
+        }
+    }


💡 Verification agent

❓ Verification inconclusive

Template application: avoid overriding explicit values and remove potential panic

Don’t override user-specified temperature/max_tokens when explicitly set to 0; 0.0 can be a deliberate choice. Current chat path treats 0.0 as “unset”; confirm if that coupling is intentional. If not, only apply when the field is missing or null.

Number::from_f64(...).unwrap() can panic for NaN/Infinity. Handle None defensively.

- if let Some(obj) = json_value.as_object_mut() { - // Apply model from template if not present or empty - if !obj.contains_key("model") || obj.get("model").and_then(|v| v.as_str()).map_or(true, |s| s.is_empty()) { - obj.insert("model".to_string(), serde_json::Value::String(template.model.clone())); - } - - // Apply temperature from template if not present or zero - if !obj.contains_key("temperature") || obj.get("temperature").and_then(|v| v.as_f64()).map_or(true, |t| t == 0.0) { - obj.insert("temperature".to_string(), serde_json::Value::Number(serde_json::Number::from_f64(template.temperature as f64).unwrap())); - } - - // Apply max_tokens from template if not present or zero - if !obj.contains_key("max_tokens") || obj.get("max_tokens").and_then(|v| v.as_u64()).map_or(true, |t| t == 0) { - obj.insert("max_tokens".to_string(), serde_json::Value::Number(serde_json::Number::from(template.max_completion_tokens))); - } - } + if let Some(obj) = json_value.as_object_mut() { + // model: apply if missing, null, or blank + if obj + .get("model") + .and_then(|v| v.as_str()) + .map(|s| s.trim().is_empty()) + .unwrap_or(true) + { + obj.insert("model".into(), serde_json::Value::String(template.model.clone())); + } + + // temperature: apply only if missing or null (do not override explicit 0.0) + if obj.get("temperature").map(|v| v.is_null()).unwrap_or(true) { + if let Some(num) = serde_json::Number::from_f64(template.temperature as f64) { + obj.insert("temperature".into(), serde_json::Value::Number(num)); + } else { + tracing::warn!("Ignoring non-finite template.temperature={}", template.temperature); + } + } + + // max_tokens: apply only if missing or null (do not override explicit 0) + if obj.get("max_tokens").map(|v| v.is_null()).unwrap_or(true) { + obj.insert( + "max_tokens".into(), + serde_json::Value::Number(serde_json::Number::from(template.max_completion_tokens as u64)), + ); + } + }

Avoid overriding explicit zero values and guard potential panic

Apply temperature and max_tokens only when the fields are missing or null—an explicit 0.0 or 0 may be a deliberate choice.

Replace .unwrap() on serde_json::Number::from_f64(…) (which returns None for NaN/Infinity) with an if let Some(num) = … or equivalent fallback and log or error on non-finite values (to prevent panics). (docs.rs)

🤖 Prompt for AI Agents

lib/llm/src/http/service/openai.rs around lines 232-250: change the template application logic so temperature and max_tokens are only applied when the JSON object does not contain the field or the field is null (do not treat explicit 0.0 or 0 as absent), and replace the serde_json::Number::from_f64(...).unwrap() with a safe handling pattern (e.g. if let Some(num) = Number::from_f64(...) { insert num } else { log or return an error/skip insertion }) to avoid panics on NaN/Infinity.

coderabbitai · 2025-09-10T09:35:46Z

lib/llm/src/http/service/openai.rs

    check_ready(&state)?;

    let request_id = request.id().to_string();
+    tracing::trace!("Received completions request: {:?}", request.inner);


⚠️ Potential issue

Avoid logging full request bodies (PII/secrets risk)

{:?} request.inner can leak prompts and user data. Log only safe metadata (e.g., request_id/model/stream flags) as done in chat with content()-style logging.

- tracing::trace!("Received completions request: {:?}", request.inner); + // Avoid logging prompts/PII; keep metadata-only. + tracing::trace!("Received completions request: request_id={}", request_id);

If you want model/stream info too, move this log below the model and streaming derivations and log:

tracing::trace!("Received completions request: request_id={}, model={}, streaming={}", request_id, model, streaming);

🤖 Prompt for AI Agents

In lib/llm/src/http/service/openai.rs around line 300, the current trace log prints the full request body (tracing::trace!("Received completions request: {:?}", request.inner)), which can leak prompts/PII; replace this by removing the full request dump and instead log only safe metadata after you derive request_id, model, and streaming: move the trace call to below those derivations and emit a trace like "Received completions request: request_id={}, model={}, streaming={}" (populate with the local variables) so no request.inner or prompt content is logged.

coderabbitai · 2025-09-10T09:35:47Z

lib/llm/src/protocols/openai/chat_completions/delta.rs

+            // Update completion_tokens_details with reasoning tokens if we have any
+            if self.reasoning_token_count > 0 {
+                // Preserve existing completion_tokens_details if they exist, or create new ones
+                let mut details = usage.completion_tokens_details.unwrap_or_default();
+                details.reasoning_tokens = Some(self.reasoning_token_count);
+                usage.completion_tokens_details = Some(details);
+            }


🛠️ Refactor suggestion

Clamp reasoning_tokens to completion_tokens

Prevents reporting more reasoning_tokens than completion_tokens due to proportional rounding across chunks.

- // Update completion_tokens_details with reasoning tokens if we have any - if self.reasoning_token_count > 0 { - // Preserve existing completion_tokens_details if they exist, or create new ones - let mut details = usage.completion_tokens_details.unwrap_or_default(); - details.reasoning_tokens = Some(self.reasoning_token_count); - usage.completion_tokens_details = Some(details); - } + // Update completion_tokens_details with reasoning tokens if we have any + if self.reasoning_token_count > 0 { + let mut details = usage.completion_tokens_details.unwrap_or_default(); + let clamped = self.reasoning_token_count.min(usage.completion_tokens); + details.reasoning_tokens = Some(clamped); + usage.completion_tokens_details = Some(details); + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Update completion_tokens_details with reasoning tokens if we have any

if self.reasoning_token_count > 0 {

// Preserve existing completion_tokens_details if they exist, or create new ones

let mut details = usage.completion_tokens_details.unwrap_or_default();

details.reasoning_tokens = Some(self.reasoning_token_count);

usage.completion_tokens_details = Some(details);

}

// Update completion_tokens_details with reasoning tokens if we have any

if self.reasoning_token_count > 0 {

let mut details = usage.completion_tokens_details.unwrap_or_default();

let clamped = self.reasoning_token_count.min(usage.completion_tokens);

details.reasoning_tokens = Some(clamped);

usage.completion_tokens_details = Some(details);

}

🤖 Prompt for AI Agents

In lib/llm/src/protocols/openai/chat_completions/delta.rs around lines 285 to 291, clamp reasoning_tokens so it never exceeds completion_tokens: when building or updating completion_tokens_details, read completion_tokens from usage (fallback 0 if None), compute the minimum of self.reasoning_token_count and that completion_tokens value, and assign that min value to details.reasoning_tokens before storing details back on usage.

grahamking · 2025-09-10T14:59:34Z

lib/llm/src/http/service/openai.rs

+    State((state, template)): State<(Arc<service_v2::State>, Option<RequestTemplate>)>,
    headers: HeaderMap,
-    Json(request): Json<NvCreateCompletionRequest>,
+    body: String,


~~Why this change?~~

Could you use the same approach as chat completions, modifying request.inner.temperature etc?

grahamking · 2025-09-10T15:03:44Z

components/frontend/src/dynamo/frontend/main.py

        kwargs["namespace"] = flags.namespace
+    if flags.request_template:
+        kwargs["template_file"] = flags.request_template



In case the rest of the PR doesn't land, I would be happy for us to take just this file. It brings the python CLI closer to dynamo-run (https://github.com/ai-dynamo/dynamo/blob/main/launch/dynamo-run/src/flags.rs#L139), and that flag is already useful for chat completions.

github-actions · 2025-10-11T09:32:05Z

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2025-10-16T09:35:29Z

This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information.

hakkapelli added 2 commits September 9, 2025 04:17

hakkapelli requested review from a team as code owners September 10, 2025 09:25

pull-request-size bot added the size/L label Sep 10, 2025

github-actions bot added the external-contribution Pull request is from an external contributor label Sep 10, 2025

Update delta.rs

441350b

Signed-off-by: Harshith Akkapelli <[email protected]>

pull-request-size bot added size/M and removed size/L labels Sep 10, 2025

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

grahamking reviewed Sep 10, 2025

View reviewed changes

github-actions bot added the Stale label Oct 11, 2025

github-actions bot closed this Oct 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/completion request template 2494 #2984

Feat/completion request template 2494 #2984

Uh oh!

hakkapelli commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 10, 2025

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

coderabbitai bot commented Sep 10, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 10, 2025

Uh oh!

coderabbitai bot Sep 10, 2025

Uh oh!

coderabbitai bot Sep 10, 2025

Uh oh!

grahamking Sep 10, 2025 •

edited

Loading

Uh oh!

grahamking Sep 10, 2025

Uh oh!

github-actions bot commented Oct 11, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat/completion request template 2494 #2984

Feat/completion request template 2494 #2984

Uh oh!

Conversation

hakkapelli commented Sep 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 10, 2025

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

coderabbitai bot commented Sep 10, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks (3 passed, 2 warnings)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

grahamking Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grahamking Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 11, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hakkapelli commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

grahamking Sep 10, 2025 •

edited

Loading