Skip to content

Conversation

@ericevans-nv
Copy link
Contributor

@ericevans-nv ericevans-nv commented Oct 1, 2025

Description

This PR significantly improves OpenAI Chat Completions API compatibility by fixing response format compliance, and removing unused code. The changes ensure that NAT's OpenAI-compatible endpoints fully adhere to the OpenAI specification for both streaming and non-streaming responses.
Closes: #818
A follow-up issue has been created to address accurate calculation and passing of usage statistics from workflows to ChatResponse objects in OpenAI-compatible endpoints: Issue: 891.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • All responses — including error replies — now include usage statistics (prompt, completion, total tokens).
  • Refactor

    • OpenAI-compatible non‑streaming path simplified to return a single JSON response; Content-Type set explicitly for JSON and streaming.
    • Default model identifier standardized to "unknown-model" in responses.
  • Compatibility

    • Streaming chunk roles standardized to an enum-style role; response payloads and tests now include and expect usage metadata.

@ericevans-nv ericevans-nv requested a review from a team as a code owner October 1, 2025 17:07
@coderabbitai
Copy link

coderabbitai bot commented Oct 1, 2025

Walkthrough

Adds Usage metadata to responses, replaces string role fields with a UserMessageContentRoleType enum, introduces specialized choice models, updates from_string/create_streaming_chunk signatures and call sites (agents, examples, tests), and simplifies FastAPI non-streaming handler to return single JSON responses directly.

Changes

Cohort / File(s) Summary of changes
Data models API overhaul
src/nat/data_models/api_server.py
Add UserMessageContentRoleType enum; introduce ChoiceBase, ChatResponseChoice, ChatResponseChunkChoice, ChoiceDelta, and Usage (token fields optional); change Message.role/ChoiceDelta.role to enum; ChatResponse/ChatResponseChunk now use new choice types, include required usage: Usage, default model="unknown-model", and update .from_string/streaming helpers; keep Choice = ChatResponseChoice alias.
Agent response updates (ReAct / ReWOO)
src/nat/agent/react_agent/register.py, src/nat/agent/rewoo_agent/register.py
Import Usage; compute prompt/completion/total token counts from messages and outputs; return ChatResponse.from_string(..., usage=usage) at response sites; ReAct error path now logs and re-raises RuntimeError.
HITL example: error responses include usage
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
Import Usage; construct a Usage (fields None) in error branches and pass it to ChatResponse.from_string(..., usage=Usage(...)) so error responses include usage metadata.
Test utilities: include usage in responses
packages/nvidia_nat_test/src/nat/test/functions.py
Import Usage; compute token counts from input message content/tokens and return ChatResponse.from_string(content, usage=usage) instead of a bare string.
FastAPI OpenAI-compatible handler simplification
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
Remove fallback aggregation for non-streaming requests; set Content-Type: application/json early and directly return generate_single_response(..., result_type=ChatResponse) for non-streaming requests; streaming SSE path unchanged.
Message validator branching for chunks vs responses
src/nat/front_ends/fastapi/message_validator.py
Split isinstance check into separate branches for ChatResponse vs ChatResponseChunk; extract content from choices[0].message.content (ChatResponse) or choices[0].delta.content (ChatResponseChunk).
Tests updated for enum roles, usage, and streaming deltas
tests/nat/front_ends/fastapi/test_openai_compatibility.py, tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py, tests/nat/server/test_unified_api_server.py
Update tests to import/use Usage and UserMessageContentRoleType; construct/expect usage in non-streaming responses; streaming tests expect delta populated with enum-based role and message null; adjust SSE parsing to read delta.content.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant FE as FastAPI Endpoint
  participant SM as SessionManager
  participant AG as Agent
  participant DM as DataModels

  Note over FE: Non-streaming request
  C->>FE: POST /chat/completions (stream=false)
  FE->>SM: generate_single_response(payload, ChatResponse)
  SM->>AG: _response_fn(messages)
  AG->>AG: Compute output and token usage
  AG->>DM: ChatResponse.from_string(content, usage)
  DM-->>AG: ChatResponse (choices with ASSISTANT role)
  AG-->>SM: ChatResponse
  SM-->>FE: ChatResponse
  FE-->>C: 200 application/json

  rect rgba(230,245,255,0.6)
  Note right of FE: Streaming request
  C->>FE: POST /chat/completions (stream=true)
  FE->>SM: stream_response(payload, ChatResponseChunk)
  SM->>AG: stream tokens
  loop per token/chunk
    AG->>DM: ChatResponseChunk.create_streaming_chunk(delta, role=ASSISTANT)
    DM-->>FE: chunk with delta (message=null)
    FE-->>C: text/event-stream data: { ... delta: {content, role}, message: null }
  end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

external

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Linked Issues Check ⚠️ Warning While the PR updates response formatting to populate delta fields for streaming chunks and adjusts content extraction paths to align with the OpenAI spec, it does not address the requirement to simplify configuration by consolidating openai_api_v1_path and openai_api_path settings as specified in issue #818. Please include code changes to merge or unify configuration parameters for the OpenAI-compatible endpoint as described in issue #818 to fully satisfy the linked issue objectives.
Out of Scope Changes Check ⚠️ Warning The PR introduces extensive data model refactoring such as new enums, updated ChatResponse and ChatResponseChunk APIs, and broad usage telemetry integration that go beyond the streaming format and configuration fixes requested in issue #818. Consider extracting the data model updates and usage integration into a separate follow-up PR and limit this change set to the streaming format and configuration improvements outlined in the linked issue.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title “Enhance OpenAI Chat API Compatibility” is concise, descriptive, and written in the imperative mood, clearly reflecting the intent to improve API compatibility without exceeding recommended length.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52b8f61 and e00ec82.

📒 Files selected for processing (1)
  • tests/nat/server/test_unified_api_server.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • tests/nat/server/test_unified_api_server.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • tests/nat/server/test_unified_api_server.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests reside under tests/ and should use markers defined in pyproject.toml (e.g., integration)

Files:

  • tests/nat/server/test_unified_api_server.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/server/test_unified_api_server.py
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration

Files:

  • tests/nat/server/test_unified_api_server.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • tests/nat/server/test_unified_api_server.py
🧬 Code graph analysis (1)
tests/nat/server/test_unified_api_server.py (1)
src/nat/data_models/api_server.py (6)
  • ChatResponseChoice (215-217)
  • ChatResponseChunk (311-399)
  • ChatResponseChunkChoice (220-222)
  • ChoiceDelta (202-205)
  • Usage (229-232)
  • ChoiceMessage (197-199)
🔇 Additional comments (3)
tests/nat/server/test_unified_api_server.py (3)

35-35: LGTM!

The new imports (ChatResponseChoice, ChatResponseChunkChoice, ChoiceDelta, Usage) correctly reflect the updated public API surface from src/nat/data_models/api_server.py.

Also applies to: 37-38, 47-47


468-469: LGTM! Past review comment addressed.

The ChatResponse construction now correctly uses ChatResponseChoice (instead of the previously flagged Choice), and includes a non-null Usage instance. This aligns with the updated API expectations where ChatResponse.choices should be list[ChatResponseChoice] and usage should be explicitly set.


471-471: LGTM!

The ChatResponseChunk construction correctly uses ChatResponseChunkChoice with a delta field (containing ChoiceDelta), which aligns with the OpenAI streaming response spec where chunks should populate delta rather than message.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ericevans-nv ericevans-nv changed the title Updating openAI Chat endpoints to be fully compliant to spec. Adding … Enhance OpenAI Chat API Compatibility Oct 1, 2025
@ericevans-nv ericevans-nv self-assigned this Oct 1, 2025
@ericevans-nv ericevans-nv added improvement Improvement to existing functionality non-breaking Non-breaking change labels Oct 1, 2025
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 1, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/nat/agent/react_agent/register.py (1)

163-164: Provide context when raising RuntimeError.

The bare raise RuntimeError without a message loses the original exception context and makes debugging difficult.

Apply this diff to preserve context:

         except Exception as ex:
             logger.exception("%s ReAct Agent failed with exception: %s", AGENT_LOG_PREFIX, str(ex))
-            raise RuntimeError
+            raise RuntimeError(f"ReAct Agent failed: {ex}") from ex
src/nat/agent/rewoo_agent/register.py (1)

170-171: Provide context when raising RuntimeError.

The bare raise RuntimeError without a message loses the original exception context and makes debugging difficult.

Apply this diff to preserve context:

         except Exception as ex:
             logger.exception("ReWOO Agent failed with exception: %s", ex)
-            raise RuntimeError
+            raise RuntimeError(f"ReWOO Agent failed: {ex}") from ex
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)

696-709: Add missing test for unsupported single-output workflows
No existing test asserts that generate_single_response raises ValueError("Cannot get a single output value for streaming workflows") and returns a 500 response for streaming-only workflows. Add an integration/unit test targeting the OpenAI-compatible endpoint (e.g. in tests/nat/front_ends/fastapi/test_openai_compatibility.py) that submits a streaming workflow without stream=true and verifies the 500 status and error message.

🧹 Nitpick comments (12)
src/nat/agent/react_agent/register.py (1)

153-160: Token counting method is not OpenAI-compliant.

The word-based token counting (split()) differs from OpenAI's subword tokenization (tiktoken). This will produce inaccurate token counts for production usage.

Consider using the tiktoken library for accurate token counting:

import tiktoken

encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")  # or appropriate model
prompt_tokens = sum(len(encoding.encode(str(msg.content))) for msg in input_message.messages)
completion_tokens = len(encoding.encode(content)) if content else 0
total_tokens = prompt_tokens + completion_tokens
src/nat/agent/rewoo_agent/register.py (1)

162-167: Token counting method is not OpenAI-compliant.

The word-based token counting (split()) differs from OpenAI's subword tokenization (tiktoken). This will produce inaccurate token counts for production usage.

Consider using the tiktoken library for accurate token counting to align with OpenAI's specification.

examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (2)

165-169: Consider more specific error messages.

The error message "I seem to be having a problem." is generic. Consider providing more context about the specific failure (recursion limit, user declined approval, etc.) to help with debugging and user experience.

For example:

-                    error_msg = "I seem to be having a problem."
+                    error_msg = "Operation cancelled: recursion limit reached and retry declined by user."

210-214: Consider more specific error messages.

The error message "I seem to be having a problem." is generic. Consider indicating that the operation was cancelled due to user declining the retry.

packages/nvidia_nat_test/src/nat/test/functions.py (1)

38-46: Handle non-string content and document token counting.
Message.content is typed as str | list[UserContent], so it can’t be None—drop the if content guard for None, but either assert it’s a str or branch on list to avoid calling .split() on a list. Also note that using .split() is only a rough, word-based token count; document this approximation or swap in tiktoken for more realistic tests.

tests/nat/front_ends/fastapi/test_openai_compatibility.py (3)

137-145: Prefer using the enum in tests to avoid implicit coercion.

Use UserMessageContentRoleType.ASSISTANT directly to avoid relying on string→enum coercion in Pydantic.

-    # Test delta with role
-    delta = ChoiceDelta(role="assistant")
+    # Test delta with role
+    delta = ChoiceDelta(role=UserMessageContentRoleType.ASSISTANT)
     assert delta.content is None
-    assert delta.role == "assistant"
+    assert delta.role == UserMessageContentRoleType.ASSISTANT

302-375: Non‑streaming response shape checks are thorough; consider minimal logprobs/usage asserts.

Validates id/object/created/choices/message and usage presence. Optionally assert choice.get("delta") is None to harden against regressions.

Also applies to: 445-519


520-586: Streaming response shape checks: LGTM; optional: assert presence of a usage‑only chunk when requested.

You already tolerate empty choices for usage chunks. Consider adding a separate test with stream_options={"include_usage": True} to assert a final usage summary chunk.

Example new test (sketch):

@pytest.mark.asyncio
async def test_openai_streaming_includes_usage_summary_when_requested():
    fec = FastApiFrontEndConfig()
    fec.workflow.openai_api_v1_path = "/v1/chat/completions"
    cfg = Config(general=GeneralConfig(front_end=fec), workflow=StreamingEchoFunctionConfig(use_openai_api=True))
    async with _build_client(cfg) as client:
        saw_usage_summary = False
        async with aconnect_sse(client, "POST", "/v1/chat/completions",
                                json={"messages":[{"role":"user","content":"hi"}],
                                     "stream": True, "stream_options":{"include_usage": True}}) as es:
            async for sse in es.aiter_sse():
                if sse.data == "[DONE]":
                    break
                chunk = sse.json()
                if not chunk.get("choices") and "usage" in chunk:
                    saw_usage_summary = True
        assert saw_usage_summary
src/nat/data_models/api_server.py (4)

31-31: Unused import.

model_serializer isn’t used in this module. Remove to satisfy ruff F401 and keep imports clean.

-from pydantic import model_serializer

40-43: Public enum lacks a docstring.

Add a concise docstring per guidelines to clarify roles and keep API self‑documenting.

-class UserMessageContentRoleType(str, Enum):
+class UserMessageContentRoleType(str, Enum):
+    """Chat message roles supported by NAT/OpenAI-compatible chat APIs."""
     USER = "user"
     ASSISTANT = "assistant"

226-230: Usage model: add invariants and docstring; compute total when missing.

Make the contract explicit and prevent negative/invalid counts.

-class Usage(BaseModel):
-    prompt_tokens: int | None = None
-    completion_tokens: int | None = None
-    total_tokens: int | None = None
+from pydantic import model_validator
+
+class Usage(BaseModel):
+    """Token accounting for OpenAI-compatible responses."""
+    prompt_tokens: int | None = None
+    completion_tokens: int | None = None
+    total_tokens: int | None = None
+
+    @model_validator(mode="after")
+    def _validate_and_fill_totals(self) -> "Usage":
+        for name in ("prompt_tokens", "completion_tokens", "total_tokens"):
+            v = getattr(self, name)
+            if v is not None and v < 0:
+                raise ValueError(f"{name} must be non-negative")
+        if self.prompt_tokens is not None and self.completion_tokens is not None:
+            expected = self.prompt_tokens + self.completion_tokens
+            if self.total_tokens is None:
+                self.total_tokens = expected
+            elif self.total_tokens != expected:
+                raise ValueError("total_tokens must equal prompt_tokens + completion_tokens")
+        return self

360-396: Allow content=None for usage‑only or role‑only chunks; signature and guard already anticipate it.

The body checks content is not None, but the signature requires str. Make it optional to match the guard and enable empty-delta summary chunks cleanly.

-    def create_streaming_chunk(content: str,
+    def create_streaming_chunk(content: str | None = None,
                                *,
@@
-        delta = ChoiceDelta(content=content, role=role) if content is not None or role is not None else ChoiceDelta()
+        delta = ChoiceDelta(content=content, role=role) if (content is not None or role is not None) else ChoiceDelta()
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cedfcc and dd262b7.

📒 Files selected for processing (7)
  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (3 hunks)
  • packages/nvidia_nat_test/src/nat/test/functions.py (2 hunks)
  • src/nat/agent/react_agent/register.py (2 hunks)
  • src/nat/agent/rewoo_agent/register.py (2 hunks)
  • src/nat/data_models/api_server.py (13 hunks)
  • src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (2 hunks)
  • tests/nat/front_ends/fastapi/test_openai_compatibility.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (11)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • src/nat/agent/rewoo_agent/register.py
  • tests/nat/front_ends/fastapi/test_openai_compatibility.py
  • packages/nvidia_nat_test/src/nat/test/functions.py
  • src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
  • src/nat/agent/react_agent/register.py
  • src/nat/data_models/api_server.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • src/nat/agent/rewoo_agent/register.py
  • tests/nat/front_ends/fastapi/test_openai_compatibility.py
  • packages/nvidia_nat_test/src/nat/test/functions.py
  • src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
  • src/nat/agent/react_agent/register.py
  • src/nat/data_models/api_server.py
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ (or packages//src/)

Files:

  • src/nat/agent/rewoo_agent/register.py
  • src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
  • src/nat/agent/react_agent/register.py
  • src/nat/data_models/api_server.py
src/nat/**/*

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Changes in src/nat should prioritize backward compatibility

Files:

  • src/nat/agent/rewoo_agent/register.py
  • src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
  • src/nat/agent/react_agent/register.py
  • src/nat/data_models/api_server.py

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/agent/rewoo_agent/register.py
  • src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
  • src/nat/agent/react_agent/register.py
  • src/nat/data_models/api_server.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • src/nat/agent/rewoo_agent/register.py
  • packages/nvidia_nat_test/src/nat/test/functions.py
  • src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
  • src/nat/agent/react_agent/register.py
  • src/nat/data_models/api_server.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • src/nat/agent/rewoo_agent/register.py
  • tests/nat/front_ends/fastapi/test_openai_compatibility.py
  • packages/nvidia_nat_test/src/nat/test/functions.py
  • src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
  • src/nat/agent/react_agent/register.py
  • src/nat/data_models/api_server.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests reside under tests/ and should use markers defined in pyproject.toml (e.g., integration)

Files:

  • tests/nat/front_ends/fastapi/test_openai_compatibility.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/front_ends/fastapi/test_openai_compatibility.py
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration

Files:

  • tests/nat/front_ends/fastapi/test_openai_compatibility.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Importable Python code inside packages must live under packages//src/

Files:

  • packages/nvidia_nat_test/src/nat/test/functions.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_test/src/nat/test/functions.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
🧬 Code graph analysis (7)
src/nat/agent/rewoo_agent/register.py (1)
src/nat/data_models/api_server.py (5)
  • Usage (226-229)
  • ChatResponse (255-305)
  • from_string (166-177)
  • from_string (278-305)
  • from_string (332-358)
tests/nat/front_ends/fastapi/test_openai_compatibility.py (2)
src/nat/data_models/api_server.py (8)
  • Usage (226-229)
  • UserMessageContentRoleType (40-42)
  • ChatResponseChunk (308-396)
  • create_streaming_chunk (361-396)
  • ChatResponse (255-305)
  • from_string (166-177)
  • from_string (278-305)
  • from_string (332-358)
src/nat/front_ends/fastapi/fastapi_front_end_config.py (1)
  • FastApiFrontEndConfig (136-264)
packages/nvidia_nat_test/src/nat/test/functions.py (1)
src/nat/data_models/api_server.py (5)
  • Usage (226-229)
  • ChatResponse (255-305)
  • from_string (166-177)
  • from_string (278-305)
  • from_string (332-358)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (3)
src/nat/runtime/session.py (1)
  • session (92-127)
src/nat/front_ends/fastapi/response_helpers.py (1)
  • generate_single_response (108-117)
src/nat/data_models/api_server.py (1)
  • ChatResponse (255-305)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (1)
src/nat/data_models/api_server.py (5)
  • Usage (226-229)
  • ChatResponse (255-305)
  • from_string (166-177)
  • from_string (278-305)
  • from_string (332-358)
src/nat/agent/react_agent/register.py (1)
src/nat/data_models/api_server.py (5)
  • Usage (226-229)
  • ChatResponse (255-305)
  • from_string (166-177)
  • from_string (278-305)
  • from_string (332-358)
src/nat/data_models/api_server.py (1)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py (1)
  • Choice (404-407)
🪛 Ruff (0.13.2)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py

216-216: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (13)
packages/nvidia_nat_test/src/nat/test/functions.py (1)

24-24: LGTM: Usage import added.

The import aligns with the new requirement for attaching usage metadata to ChatResponse objects.

src/nat/agent/react_agent/register.py (1)

28-28: LGTM: Usage import added.

Aligns with the new requirement for attaching usage metadata to ChatResponse objects.

src/nat/agent/rewoo_agent/register.py (1)

29-29: LGTM: Usage import added.

Aligns with the new requirement for attaching usage metadata to ChatResponse objects.

src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)

692-693: LGTM: Content-Type header set correctly.

Setting Content-Type: application/json upfront for the non-streaming path is correct. The streaming path at line 700 appropriately overrides this with text/event-stream.

examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (2)

27-27: LGTM: Usage import added.

Aligns with the requirement for attaching usage metadata to ChatResponse objects, including error responses.


216-222: Bare Exception catch is acceptable here.

While static analysis flags the bare Exception catch, this is a top-level handler that needs to catch all exceptions to return a user-friendly error response. This is an appropriate use case.

However, consider providing more context in the error message or logging the exception type for debugging purposes.

tests/nat/front_ends/fastapi/test_openai_compatibility.py (3)

151-156: Good: streaming chunk uses delta with enum role (spec-compliant).

Delta is populated and message omitted in streaming; finish_reason None on non-final chunk. Matches OpenAI spec.


169-175: Non‑streaming timestamp + required usage: LGTM.

Unix timestamp serialization and mandatory Usage on ChatResponse are correctly exercised.


187-201: No action needed—async tests run under global auto mode
The root pyproject.toml declares [tool.pytest.ini_options] asyncio_mode = "auto", so pytest-asyncio will execute async def tests without individual @pytest.mark.asyncio markers.

src/nat/data_models/api_server.py (4)

205-224: Choice models split (message vs delta): spec‑aligned and backward‑compatible via alias.

Clear separation of non‑streaming and streaming shapes; alias Choice = ChatResponseChoice maintains compatibility.


261-271: ChatResponse defaults and required usage: LGTM.

model="unknown-model" default and mandatory usage align with tests and OpenAI spec for non‑streaming.

Also applies to: 277-306


655-656: Converters: sensible defaults and usage synthesis.

Defaulting request model to "unknown-model" and injecting computed usage keeps converters deterministic.

Also applies to: 671-682, 699-704


348-358: from_string() is only used in the non-streaming path (src/nat/data_models/api_server.py:703); streaming handlers use create_streaming_chunk(), which defaults finish_reason=None.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/nat/data_models/api_server.py (1)

31-31: Potentially unused import.

The model_serializer import on line 31 does not appear to be used in this file. If it's not required for future changes or other modules, consider removing it to keep the imports clean.

-from pydantic import model_serializer
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (1)

214-219: Bare except is acceptable here, but consider narrowing if possible.

The bare except Exception: on line 214 catches all exceptions and returns a generic error message. While Ruff flags this (BLE001), it's acceptable in a top-level handler that ensures the function always returns a ChatResponse. However, if you can identify specific exceptions that should be caught (e.g., LLMError, ValidationError), narrowing the except clause would improve clarity and allow unexpected errors to propagate for debugging.

The construction of Usage() and passing it to ChatResponse.from_string is correct.

If specific exceptions are expected, consider:

-        except Exception:
+        except (GraphRecursionError, ValueError, RuntimeError):
             # Handle any other unexpected exceptions
             error_msg = "I seem to be having a problem."
             
             # Create usage statistics for error response
             return ChatResponse.from_string(error_msg, usage=Usage())

Otherwise, the current implementation is acceptable for ensuring robustness.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd262b7 and 3137510.

📒 Files selected for processing (2)
  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (3 hunks)
  • src/nat/data_models/api_server.py (13 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • src/nat/data_models/api_server.py
  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • src/nat/data_models/api_server.py
  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ (or packages//src/)

Files:

  • src/nat/data_models/api_server.py
src/nat/**/*

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Changes in src/nat should prioritize backward compatibility

Files:

  • src/nat/data_models/api_server.py

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/data_models/api_server.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • src/nat/data_models/api_server.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • src/nat/data_models/api_server.py
  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
🧬 Code graph analysis (2)
src/nat/data_models/api_server.py (1)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py (1)
  • Choice (404-407)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (1)
src/nat/data_models/api_server.py (5)
  • Usage (226-229)
  • ChatResponse (255-305)
  • from_string (166-177)
  • from_string (278-305)
  • from_string (332-358)
🪛 Ruff (0.13.2)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py

214-214: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (10)
src/nat/data_models/api_server.py (7)

117-117: LGTM: Role fields now use enum for type safety.

The role fields in Message, ChoiceMessage, and ChoiceDelta have been correctly updated to use UserMessageContentRoleType (or UserMessageContentRoleType | None for optional cases). The factory methods ChatRequest.from_string and ChatRequest.from_content correctly use UserMessageContentRoleType.USER. This change improves type safety and aligns with the PR's goal of enforcing OpenAI API compliance.

Note: As flagged in a previous review comment on lines 115-118, downstream code (tests and plugin packages) still uses raw role strings and must be updated to use the enum. That refactoring is tracked separately.

Also applies to: 173-173, 187-187, 196-196, 202-202


205-223: LGTM: Specialized choice types align with OpenAI spec.

The introduction of ChoiceBase, ChatResponseChoice, and ChatResponseChunkChoice correctly separates streaming (delta) from non-streaming (message) responses, matching the OpenAI Chat Completions API specification. The backward compatibility alias Choice = ChatResponseChoice preserves existing API surface. Docstrings are concise and appropriate.


318-396: LGTM: ChatResponseChunk correctly implements streaming spec.

The updates to ChatResponseChunk correctly implement the OpenAI streaming response format:

  • Uses ChatResponseChunkChoice with delta field (not message).
  • role parameter in create_streaming_chunk is correctly typed as UserMessageContentRoleType | None.
  • Optional usage parameter in create_streaming_chunk allows final chunks to include usage stats per the spec.
  • Default model "unknown-model" is a reasonable fallback.

654-656: LGTM: Converter updated for new default model.

The converter _string_to_nat_chat_request correctly passes model="unknown-model" to ChatRequest.from_string, consistent with the updated default model handling.


671-681: LGTM: Converter builds usage stats (word-based approximation).

The converter _string_to_nat_chat_response correctly constructs a Usage object and passes it to ChatResponse.from_string. The token count is approximated by word count (len(data.split())), which is a reasonable heuristic for a type converter where exact tokenization is unavailable.

Note: This approximation means token counts in converted responses may not match actual model token usage, but this is acceptable for a fallback converter.


284-305: All ChatResponse.from_string call sites updated with usage. No missing invocations—breaking-change handled.


226-229: Please verify ChatResponse serialization of optional usage
sed -n '150,260p' src/nat/data_models/api_server.py
sed -n '260,350p' src/nat/data_models/api_server.py

This will confirm whether from_string omits or includes None usage fields when serializing.

examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (3)

27-27: LGTM: Usage import added for error responses.

The import of Usage on line 27 is correctly added to support the new requirement for ChatResponse.from_string to include usage statistics.


165-168: LGTM: Error response correctly includes usage.

The error path in handle_recursion_error (lines 165-168) correctly constructs a Usage() instance (with all token counts as None) and passes it to ChatResponse.from_string. This aligns with the updated API signature and is semantically appropriate for an error response where no tokens were consumed.


209-212: LGTM: Error response correctly includes usage.

The error path when the user declines retry (lines 209-212) correctly constructs a Usage() instance and passes it to ChatResponse.from_string, consistent with the updated signature.

@ericevans-nv ericevans-nv removed the breaking Breaking change label Oct 1, 2025
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 1, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
src/nat/data_models/api_server.py (6)

31-31: Import model_validator to support a small Usage consistency check (see Usage comment).

Add this alongside existing imports.

-from pydantic import model_serializer
+from pydantic import model_serializer
+from pydantic import model_validator

169-175: Add short Google‑style docstrings to public constructors.

Clarify defaults and intent for API users.

 @staticmethod
 def from_string(data: str,
                 *,
                 model: str | None = None,
                 temperature: float | None = None,
                 max_tokens: int | None = None,
                 top_p: float | None = None) -> "ChatRequest":
-
-        return ChatRequest(messages=[Message(content=data, role=UserMessageContentRoleType.USER)],
+        """Build a ChatRequest from plain text.
+
+        Args:
+            data: User prompt text.
+            model: Optional model identifier.
+            temperature: Sampling temperature.
+            max_tokens: Max tokens to generate.
+            top_p: Nucleus sampling parameter.
+        Returns:
+            ChatRequest with a single user message.
+        """
+        return ChatRequest(messages=[Message(content=data, role=UserMessageContentRoleType.USER)],
                            model=model,
                            temperature=temperature,
                            max_tokens=max_tokens,
                            top_p=top_p)
@@
 @staticmethod
 def from_content(content: list[UserContent],
                  *,
                  model: str | None = None,
                  temperature: float | None = None,
                  max_tokens: int | None = None,
                  top_p: float | None = None) -> "ChatRequest":
-
-        return ChatRequest(messages=[Message(content=content, role=UserMessageContentRoleType.USER)],
+        """Build a ChatRequest from structured content parts.
+
+        Args:
+            content: User content parts.
+            model: Optional model identifier.
+            temperature: Sampling temperature.
+            max_tokens: Max tokens to generate.
+            top_p: Nucleus sampling parameter.
+        Returns:
+            ChatRequest with a single user message.
+        """
+        return ChatRequest(messages=[Message(content=content, role=UserMessageContentRoleType.USER)],
                            model=model,
                            temperature=temperature,
                            max_tokens=max_tokens,
                            top_p=top_p)

Also applies to: 183-189


198-201: Add a brief docstring to ChoiceMessage.

-class ChoiceMessage(BaseModel):
+class ChoiceMessage(BaseModel):
+    """Message object in non‑streaming chat completion choices."""
     content: str | None = None
     role: UserMessageContentRoleType | None = None

209-214: Consider adding logprobs for closer spec parity.

Not required (extra fields allowed), but an explicit field helps typing and consumers that expect it.

 class ChoiceBase(BaseModel):
     """Base choice model with common fields for both streaming and non-streaming responses"""
     model_config = ConfigDict(extra="allow")
     finish_reason: typing.Literal['stop', 'length', 'tool_calls', 'content_filter', 'function_call'] | None = None
     index: int
+    logprobs: dict[str, typing.Any] | None = None

231-233: Auto‑compute total_tokens when not provided.

Prevents inconsistent Usage payloads until #891 lands.

 class Usage(BaseModel):
     prompt_tokens: int | None = None
     completion_tokens: int | None = None
     total_tokens: int | None = None
+
+    @model_validator(mode="after")
+    def _ensure_total(self) -> "Usage":
+        # If either component is provided, backfill total_tokens.
+        if self.total_tokens is None and (self.prompt_tokens is not None or self.completion_tokens is not None):
+            self.total_tokens = (self.prompt_tokens or 0) + (self.completion_tokens or 0)
+        return self

370-373: Treat empty string content as “no delta” to better match final-chunk shape.

Prevents emitting {"content": ""} when callers pass "" on terminal chunks.

-        delta = ChoiceDelta(content=content, role=role) if content is not None or role is not None else ChoiceDelta()
+        include_content = (content is not None and content != "")
+        delta = ChoiceDelta(content=content, role=role) if include_content or role is not None else ChoiceDelta()

Also applies to: 380-383

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3137510 and 6e95991.

📒 Files selected for processing (1)
  • src/nat/data_models/api_server.py (13 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • src/nat/data_models/api_server.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • src/nat/data_models/api_server.py
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ (or packages//src/)

Files:

  • src/nat/data_models/api_server.py
src/nat/**/*

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Changes in src/nat should prioritize backward compatibility

Files:

  • src/nat/data_models/api_server.py

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/data_models/api_server.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • src/nat/data_models/api_server.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • src/nat/data_models/api_server.py
🧬 Code graph analysis (1)
src/nat/data_models/api_server.py (1)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py (1)
  • Choice (404-407)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (7)
src/nat/data_models/api_server.py (7)

40-46: Roles enum with SYSTEM + docstring looks good.


119-122: Message.role migrated to enum — good interoperability.

Pydantic will serialize to the string values; schema stays OpenAI-friendly.


203-207: Delta.role optional aligns with OpenAI streaming. LGTM.


216-224: Specialized choice types and alias keep BC while improving clarity.


388-395: Choice construction and finish_reason casting are sound.


659-660: Defaulting to "unknown-model" in converters keeps responses consistent.


269-273: All ChatResponse.from_string callsites include the required usage parameter. Verified in API server, agents, tests, and examples.

@ericevans-nv ericevans-nv added the DO NOT MERGE PR should not be merged; see PR for details label Oct 1, 2025
@dagardner-nv dagardner-nv deleted the branch NVIDIA:release/1.3 October 1, 2025 20:06
@willkill07 willkill07 reopened this Oct 1, 2025
@ericevans-nv ericevans-nv removed breaking Breaking change DO NOT MERGE PR should not be merged; see PR for details labels Oct 2, 2025
@ericevans-nv
Copy link
Contributor Author

ericevans-nv commented Oct 2, 2025

/ok-to-test

Signed-off-by: Eric Evans <[email protected]>
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 2, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 47328bb and 52b8f61.

📒 Files selected for processing (3)
  • src/nat/front_ends/fastapi/message_validator.py (1 hunks)
  • tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py (2 hunks)
  • tests/nat/server/test_unified_api_server.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • src/nat/front_ends/fastapi/message_validator.py
  • tests/nat/server/test_unified_api_server.py
  • tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • src/nat/front_ends/fastapi/message_validator.py
  • tests/nat/server/test_unified_api_server.py
  • tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ (or packages//src/)

Files:

  • src/nat/front_ends/fastapi/message_validator.py
src/nat/**/*

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Changes in src/nat should prioritize backward compatibility

Files:

  • src/nat/front_ends/fastapi/message_validator.py

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/front_ends/fastapi/message_validator.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • src/nat/front_ends/fastapi/message_validator.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • src/nat/front_ends/fastapi/message_validator.py
  • tests/nat/server/test_unified_api_server.py
  • tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests reside under tests/ and should use markers defined in pyproject.toml (e.g., integration)

Files:

  • tests/nat/server/test_unified_api_server.py
  • tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

  • tests/nat/server/test_unified_api_server.py
  • tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration

Files:

  • tests/nat/server/test_unified_api_server.py
  • tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
🧬 Code graph analysis (3)
src/nat/front_ends/fastapi/message_validator.py (1)
src/nat/data_models/api_server.py (3)
  • ChatResponse (258-308)
  • SystemResponseContent (566-569)
  • ChatResponseChunk (311-399)
tests/nat/server/test_unified_api_server.py (1)
src/nat/data_models/api_server.py (4)
  • ChatResponseChunkChoice (220-222)
  • ChoiceDelta (202-205)
  • Usage (229-232)
  • ChatResponseChunk (311-399)
tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py (1)
src/nat/data_models/api_server.py (1)
  • ChatResponseChunk (311-399)
🔇 Additional comments (6)
tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py (2)

140-140: LGTM: Correct extraction from streaming delta.

The change correctly extracts content from delta.content instead of message.content, aligning with OpenAI Chat Completions API specification for streaming chunks. The or "" fallback appropriately handles None values.


162-162: LGTM: Consistent streaming content extraction.

This change mirrors the correction at line 140, ensuring consistent extraction of streaming content from delta.content across both test paths. The implementation correctly validates OpenAI-compatible streaming behavior.

src/nat/front_ends/fastapi/message_validator.py (1)

142-145: Correctly aligned with updated data models.

The content extraction paths now correctly use message.content for ChatResponse and delta.content for ChatResponseChunk, matching the new ChatResponseChoice and ChatResponseChunkChoice structures introduced in the PR.

tests/nat/server/test_unified_api_server.py (3)

36-36: LGTM: Correct imports for updated data models.

The new imports for ChatResponseChunkChoice, ChoiceDelta, and Usage align with the updated API surface introduced in this PR.

Also applies to: 38-38, 47-47


469-469: LGTM: Correct addition of required usage field.

The usage field is now required in ChatResponse. Using zero values for token counts is appropriate for this test fixture.


471-471: LGTM: Correct usage of ChatResponseChunkChoice with delta.

The change correctly uses ChatResponseChunkChoice with a delta field, matching the updated ChatResponseChunk model structure for streaming responses.

@coderabbitai coderabbitai bot added the external This issue was filed by someone outside of the NeMo Agent toolkit team label Oct 2, 2025
@ericevans-nv ericevans-nv removed breaking Breaking change external This issue was filed by someone outside of the NeMo Agent toolkit team labels Oct 2, 2025
@ericevans-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit d8becf6 into NVIDIA:release/1.3 Oct 2, 2025
17 checks passed
yczhang-nv pushed a commit to yczhang-nv/NeMo-Agent-Toolkit that referenced this pull request Oct 2, 2025
This PR significantly improves OpenAI Chat Completions API compatibility by fixing response format compliance, and removing unused code. The changes ensure that NAT's OpenAI-compatible endpoints fully adhere to the OpenAI specification for both streaming and non-streaming responses.
Closes: NVIDIA#818
A follow-up issue has been created to address accurate calculation and passing of usage statistics from workflows to ChatResponse objects in OpenAI-compatible endpoints: [Issue: 891](NVIDIA#891).

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.

## Summary by CodeRabbit

- New Features
  - All responses — including error replies — now include usage statistics (prompt, completion, total tokens).

- Refactor
  - OpenAI-compatible non‑streaming path simplified to return a single JSON response; Content-Type set explicitly for JSON and streaming.
  - Default model identifier standardized to "unknown-model" in responses.

- Compatibility
  - Streaming chunk roles standardized to an enum-style role; response payloads and tests now include and expect usage metadata.

Authors:
  - Eric Evans II (https://github.com/ericevans-nv)

Approvers:
  - Will Killian (https://github.com/willkill07)

URL: NVIDIA#889
Signed-off-by: Yuchen Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants