-
Notifications
You must be signed in to change notification settings - Fork 416
Enhance OpenAI Chat API Compatibility #889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance OpenAI Chat API Compatibility #889
Conversation
…tests. Signed-off-by: Eric Evans <[email protected]>
WalkthroughAdds Usage metadata to responses, replaces string role fields with a UserMessageContentRoleType enum, introduces specialized choice models, updates from_string/create_streaming_chunk signatures and call sites (agents, examples, tests), and simplifies FastAPI non-streaming handler to return single JSON responses directly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant FE as FastAPI Endpoint
participant SM as SessionManager
participant AG as Agent
participant DM as DataModels
Note over FE: Non-streaming request
C->>FE: POST /chat/completions (stream=false)
FE->>SM: generate_single_response(payload, ChatResponse)
SM->>AG: _response_fn(messages)
AG->>AG: Compute output and token usage
AG->>DM: ChatResponse.from_string(content, usage)
DM-->>AG: ChatResponse (choices with ASSISTANT role)
AG-->>SM: ChatResponse
SM-->>FE: ChatResponse
FE-->>C: 200 application/json
rect rgba(230,245,255,0.6)
Note right of FE: Streaming request
C->>FE: POST /chat/completions (stream=true)
FE->>SM: stream_response(payload, ChatResponseChunk)
SM->>AG: stream tokens
loop per token/chunk
AG->>DM: ChatResponseChunk.create_streaming_chunk(delta, role=ASSISTANT)
DM-->>FE: chunk with delta (message=null)
FE-->>C: text/event-stream data: { ... delta: {content, role}, message: null }
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (5)**/*.{py,yaml,yml}📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
Files:
**/*.py📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
Files:
tests/**/*.py📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
⚙️ CodeRabbit configuration file
Files:
{tests/**/*.py,examples/*/tests/**/*.py}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (1)tests/nat/server/test_unified_api_server.py (1)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…to openai-v1-endpoint-response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/nat/agent/react_agent/register.py (1)
163-164: Provide context when raising RuntimeError.The bare
raise RuntimeErrorwithout a message loses the original exception context and makes debugging difficult.Apply this diff to preserve context:
except Exception as ex: logger.exception("%s ReAct Agent failed with exception: %s", AGENT_LOG_PREFIX, str(ex)) - raise RuntimeError + raise RuntimeError(f"ReAct Agent failed: {ex}") from exsrc/nat/agent/rewoo_agent/register.py (1)
170-171: Provide context when raising RuntimeError.The bare
raise RuntimeErrorwithout a message loses the original exception context and makes debugging difficult.Apply this diff to preserve context:
except Exception as ex: logger.exception("ReWOO Agent failed with exception: %s", ex) - raise RuntimeError + raise RuntimeError(f"ReWOO Agent failed: {ex}") from exsrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
696-709: Add missing test for unsupported single-output workflows
No existing test asserts thatgenerate_single_responseraisesValueError("Cannot get a single output value for streaming workflows")and returns a 500 response for streaming-only workflows. Add an integration/unit test targeting the OpenAI-compatible endpoint (e.g. intests/nat/front_ends/fastapi/test_openai_compatibility.py) that submits a streaming workflow withoutstream=trueand verifies the 500 status and error message.
🧹 Nitpick comments (12)
src/nat/agent/react_agent/register.py (1)
153-160: Token counting method is not OpenAI-compliant.The word-based token counting (
split()) differs from OpenAI's subword tokenization (tiktoken). This will produce inaccurate token counts for production usage.Consider using the
tiktokenlibrary for accurate token counting:import tiktoken encoding = tiktoken.encoding_for_model("gpt-3.5-turbo") # or appropriate model prompt_tokens = sum(len(encoding.encode(str(msg.content))) for msg in input_message.messages) completion_tokens = len(encoding.encode(content)) if content else 0 total_tokens = prompt_tokens + completion_tokenssrc/nat/agent/rewoo_agent/register.py (1)
162-167: Token counting method is not OpenAI-compliant.The word-based token counting (
split()) differs from OpenAI's subword tokenization (tiktoken). This will produce inaccurate token counts for production usage.Consider using the
tiktokenlibrary for accurate token counting to align with OpenAI's specification.examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (2)
165-169: Consider more specific error messages.The error message "I seem to be having a problem." is generic. Consider providing more context about the specific failure (recursion limit, user declined approval, etc.) to help with debugging and user experience.
For example:
- error_msg = "I seem to be having a problem." + error_msg = "Operation cancelled: recursion limit reached and retry declined by user."
210-214: Consider more specific error messages.The error message "I seem to be having a problem." is generic. Consider indicating that the operation was cancelled due to user declining the retry.
packages/nvidia_nat_test/src/nat/test/functions.py (1)
38-46: Handle non-string content and document token counting.
Message.contentis typed asstr | list[UserContent], so it can’t beNone—drop theif contentguard forNone, but either assert it’s astror branch onlistto avoid calling.split()on a list. Also note that using.split()is only a rough, word-based token count; document this approximation or swap in tiktoken for more realistic tests.tests/nat/front_ends/fastapi/test_openai_compatibility.py (3)
137-145: Prefer using the enum in tests to avoid implicit coercion.Use
UserMessageContentRoleType.ASSISTANTdirectly to avoid relying on string→enum coercion in Pydantic.- # Test delta with role - delta = ChoiceDelta(role="assistant") + # Test delta with role + delta = ChoiceDelta(role=UserMessageContentRoleType.ASSISTANT) assert delta.content is None - assert delta.role == "assistant" + assert delta.role == UserMessageContentRoleType.ASSISTANT
302-375: Non‑streaming response shape checks are thorough; consider minimal logprobs/usage asserts.Validates id/object/created/choices/message and usage presence. Optionally assert
choice.get("delta") is Noneto harden against regressions.Also applies to: 445-519
520-586: Streaming response shape checks: LGTM; optional: assert presence of a usage‑only chunk when requested.You already tolerate empty choices for usage chunks. Consider adding a separate test with
stream_options={"include_usage": True}to assert a final usage summary chunk.Example new test (sketch):
@pytest.mark.asyncio async def test_openai_streaming_includes_usage_summary_when_requested(): fec = FastApiFrontEndConfig() fec.workflow.openai_api_v1_path = "/v1/chat/completions" cfg = Config(general=GeneralConfig(front_end=fec), workflow=StreamingEchoFunctionConfig(use_openai_api=True)) async with _build_client(cfg) as client: saw_usage_summary = False async with aconnect_sse(client, "POST", "/v1/chat/completions", json={"messages":[{"role":"user","content":"hi"}], "stream": True, "stream_options":{"include_usage": True}}) as es: async for sse in es.aiter_sse(): if sse.data == "[DONE]": break chunk = sse.json() if not chunk.get("choices") and "usage" in chunk: saw_usage_summary = True assert saw_usage_summarysrc/nat/data_models/api_server.py (4)
31-31: Unused import.
model_serializerisn’t used in this module. Remove to satisfy ruff F401 and keep imports clean.-from pydantic import model_serializer
40-43: Public enum lacks a docstring.Add a concise docstring per guidelines to clarify roles and keep API self‑documenting.
-class UserMessageContentRoleType(str, Enum): +class UserMessageContentRoleType(str, Enum): + """Chat message roles supported by NAT/OpenAI-compatible chat APIs.""" USER = "user" ASSISTANT = "assistant"
226-230: Usage model: add invariants and docstring; compute total when missing.Make the contract explicit and prevent negative/invalid counts.
-class Usage(BaseModel): - prompt_tokens: int | None = None - completion_tokens: int | None = None - total_tokens: int | None = None +from pydantic import model_validator + +class Usage(BaseModel): + """Token accounting for OpenAI-compatible responses.""" + prompt_tokens: int | None = None + completion_tokens: int | None = None + total_tokens: int | None = None + + @model_validator(mode="after") + def _validate_and_fill_totals(self) -> "Usage": + for name in ("prompt_tokens", "completion_tokens", "total_tokens"): + v = getattr(self, name) + if v is not None and v < 0: + raise ValueError(f"{name} must be non-negative") + if self.prompt_tokens is not None and self.completion_tokens is not None: + expected = self.prompt_tokens + self.completion_tokens + if self.total_tokens is None: + self.total_tokens = expected + elif self.total_tokens != expected: + raise ValueError("total_tokens must equal prompt_tokens + completion_tokens") + return self
360-396: Allowcontent=Nonefor usage‑only or role‑only chunks; signature and guard already anticipate it.The body checks
content is not None, but the signature requiresstr. Make it optional to match the guard and enable empty-delta summary chunks cleanly.- def create_streaming_chunk(content: str, + def create_streaming_chunk(content: str | None = None, *, @@ - delta = ChoiceDelta(content=content, role=role) if content is not None or role is not None else ChoiceDelta() + delta = ChoiceDelta(content=content, role=role) if (content is not None or role is not None) else ChoiceDelta()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py(3 hunks)packages/nvidia_nat_test/src/nat/test/functions.py(2 hunks)src/nat/agent/react_agent/register.py(2 hunks)src/nat/agent/rewoo_agent/register.py(2 hunks)src/nat/data_models/api_server.py(13 hunks)src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py(2 hunks)tests/nat/front_ends/fastapi/test_openai_compatibility.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (11)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
src/nat/agent/rewoo_agent/register.pytests/nat/front_ends/fastapi/test_openai_compatibility.pypackages/nvidia_nat_test/src/nat/test/functions.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pyexamples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.pysrc/nat/agent/react_agent/register.pysrc/nat/data_models/api_server.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
src/nat/agent/rewoo_agent/register.pytests/nat/front_ends/fastapi/test_openai_compatibility.pypackages/nvidia_nat_test/src/nat/test/functions.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pyexamples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.pysrc/nat/agent/react_agent/register.pysrc/nat/data_models/api_server.py
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All importable Python code must live under src/ (or packages//src/)
Files:
src/nat/agent/rewoo_agent/register.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/agent/react_agent/register.pysrc/nat/data_models/api_server.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Changes in src/nat should prioritize backward compatibility
Files:
src/nat/agent/rewoo_agent/register.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/agent/react_agent/register.pysrc/nat/data_models/api_server.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/agent/rewoo_agent/register.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/agent/react_agent/register.pysrc/nat/data_models/api_server.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Files:
src/nat/agent/rewoo_agent/register.pypackages/nvidia_nat_test/src/nat/test/functions.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/agent/react_agent/register.pysrc/nat/data_models/api_server.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/agent/rewoo_agent/register.pytests/nat/front_ends/fastapi/test_openai_compatibility.pypackages/nvidia_nat_test/src/nat/test/functions.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pyexamples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.pysrc/nat/agent/react_agent/register.pysrc/nat/data_models/api_server.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Unit tests reside under tests/ and should use markers defined in pyproject.toml (e.g., integration)
Files:
tests/nat/front_ends/fastapi/test_openai_compatibility.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/front_ends/fastapi/test_openai_compatibility.py
{tests/**/*.py,examples/*/tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Files:
tests/nat/front_ends/fastapi/test_openai_compatibility.py
packages/*/src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Importable Python code inside packages must live under packages//src/
Files:
packages/nvidia_nat_test/src/nat/test/functions.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.
Files:
packages/nvidia_nat_test/src/nat/test/functions.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
🧬 Code graph analysis (7)
src/nat/agent/rewoo_agent/register.py (1)
src/nat/data_models/api_server.py (5)
Usage(226-229)ChatResponse(255-305)from_string(166-177)from_string(278-305)from_string(332-358)
tests/nat/front_ends/fastapi/test_openai_compatibility.py (2)
src/nat/data_models/api_server.py (8)
Usage(226-229)UserMessageContentRoleType(40-42)ChatResponseChunk(308-396)create_streaming_chunk(361-396)ChatResponse(255-305)from_string(166-177)from_string(278-305)from_string(332-358)src/nat/front_ends/fastapi/fastapi_front_end_config.py (1)
FastApiFrontEndConfig(136-264)
packages/nvidia_nat_test/src/nat/test/functions.py (1)
src/nat/data_models/api_server.py (5)
Usage(226-229)ChatResponse(255-305)from_string(166-177)from_string(278-305)from_string(332-358)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (3)
src/nat/runtime/session.py (1)
session(92-127)src/nat/front_ends/fastapi/response_helpers.py (1)
generate_single_response(108-117)src/nat/data_models/api_server.py (1)
ChatResponse(255-305)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (1)
src/nat/data_models/api_server.py (5)
Usage(226-229)ChatResponse(255-305)from_string(166-177)from_string(278-305)from_string(332-358)
src/nat/agent/react_agent/register.py (1)
src/nat/data_models/api_server.py (5)
Usage(226-229)ChatResponse(255-305)from_string(166-177)from_string(278-305)from_string(332-358)
src/nat/data_models/api_server.py (1)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py (1)
Choice(404-407)
🪛 Ruff (0.13.2)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
216-216: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (13)
packages/nvidia_nat_test/src/nat/test/functions.py (1)
24-24: LGTM: Usage import added.The import aligns with the new requirement for attaching usage metadata to ChatResponse objects.
src/nat/agent/react_agent/register.py (1)
28-28: LGTM: Usage import added.Aligns with the new requirement for attaching usage metadata to ChatResponse objects.
src/nat/agent/rewoo_agent/register.py (1)
29-29: LGTM: Usage import added.Aligns with the new requirement for attaching usage metadata to ChatResponse objects.
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
692-693: LGTM: Content-Type header set correctly.Setting
Content-Type: application/jsonupfront for the non-streaming path is correct. The streaming path at line 700 appropriately overrides this withtext/event-stream.examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (2)
27-27: LGTM: Usage import added.Aligns with the requirement for attaching usage metadata to ChatResponse objects, including error responses.
216-222: Bare Exception catch is acceptable here.While static analysis flags the bare
Exceptioncatch, this is a top-level handler that needs to catch all exceptions to return a user-friendly error response. This is an appropriate use case.However, consider providing more context in the error message or logging the exception type for debugging purposes.
tests/nat/front_ends/fastapi/test_openai_compatibility.py (3)
151-156: Good: streaming chunk uses delta with enum role (spec-compliant).Delta is populated and message omitted in streaming; finish_reason None on non-final chunk. Matches OpenAI spec.
169-175: Non‑streaming timestamp + required usage: LGTM.Unix timestamp serialization and mandatory Usage on ChatResponse are correctly exercised.
187-201: No action needed—async tests run under global auto mode
The root pyproject.toml declares[tool.pytest.ini_options] asyncio_mode = "auto", so pytest-asyncio will executeasync deftests without individual@pytest.mark.asynciomarkers.src/nat/data_models/api_server.py (4)
205-224: Choice models split (message vs delta): spec‑aligned and backward‑compatible via alias.Clear separation of non‑streaming and streaming shapes; alias
Choice = ChatResponseChoicemaintains compatibility.
261-271: ChatResponse defaults and required usage: LGTM.
model="unknown-model"default and mandatoryusagealign with tests and OpenAI spec for non‑streaming.Also applies to: 277-306
655-656: Converters: sensible defaults and usage synthesis.Defaulting request model to
"unknown-model"and injecting computed usage keeps converters deterministic.Also applies to: 671-682, 699-704
348-358: from_string() is only used in the non-streaming path (src/nat/data_models/api_server.py:703); streaming handlers use create_streaming_chunk(), which defaults finish_reason=None.
Signed-off-by: Eric Evans <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/nat/data_models/api_server.py (1)
31-31: Potentially unused import.The
model_serializerimport on line 31 does not appear to be used in this file. If it's not required for future changes or other modules, consider removing it to keep the imports clean.-from pydantic import model_serializerexamples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (1)
214-219: Bare except is acceptable here, but consider narrowing if possible.The bare
except Exception:on line 214 catches all exceptions and returns a generic error message. While Ruff flags this (BLE001), it's acceptable in a top-level handler that ensures the function always returns aChatResponse. However, if you can identify specific exceptions that should be caught (e.g.,LLMError,ValidationError), narrowing the except clause would improve clarity and allow unexpected errors to propagate for debugging.The construction of
Usage()and passing it toChatResponse.from_stringis correct.If specific exceptions are expected, consider:
- except Exception: + except (GraphRecursionError, ValueError, RuntimeError): # Handle any other unexpected exceptions error_msg = "I seem to be having a problem." # Create usage statistics for error response return ChatResponse.from_string(error_msg, usage=Usage())Otherwise, the current implementation is acceptable for ensuring robustness.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py(3 hunks)src/nat/data_models/api_server.py(13 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
src/nat/data_models/api_server.pyexamples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
src/nat/data_models/api_server.pyexamples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All importable Python code must live under src/ (or packages//src/)
Files:
src/nat/data_models/api_server.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Changes in src/nat should prioritize backward compatibility
Files:
src/nat/data_models/api_server.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/data_models/api_server.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Files:
src/nat/data_models/api_server.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/data_models/api_server.pyexamples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
🧬 Code graph analysis (2)
src/nat/data_models/api_server.py (1)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py (1)
Choice(404-407)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (1)
src/nat/data_models/api_server.py (5)
Usage(226-229)ChatResponse(255-305)from_string(166-177)from_string(278-305)from_string(332-358)
🪛 Ruff (0.13.2)
examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py
214-214: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (10)
src/nat/data_models/api_server.py (7)
117-117: LGTM: Role fields now use enum for type safety.The role fields in
Message,ChoiceMessage, andChoiceDeltahave been correctly updated to useUserMessageContentRoleType(orUserMessageContentRoleType | Nonefor optional cases). The factory methodsChatRequest.from_stringandChatRequest.from_contentcorrectly useUserMessageContentRoleType.USER. This change improves type safety and aligns with the PR's goal of enforcing OpenAI API compliance.Note: As flagged in a previous review comment on lines 115-118, downstream code (tests and plugin packages) still uses raw role strings and must be updated to use the enum. That refactoring is tracked separately.
Also applies to: 173-173, 187-187, 196-196, 202-202
205-223: LGTM: Specialized choice types align with OpenAI spec.The introduction of
ChoiceBase,ChatResponseChoice, andChatResponseChunkChoicecorrectly separates streaming (delta) from non-streaming (message) responses, matching the OpenAI Chat Completions API specification. The backward compatibility aliasChoice = ChatResponseChoicepreserves existing API surface. Docstrings are concise and appropriate.
318-396: LGTM: ChatResponseChunk correctly implements streaming spec.The updates to
ChatResponseChunkcorrectly implement the OpenAI streaming response format:
- Uses
ChatResponseChunkChoicewithdeltafield (notmessage).roleparameter increate_streaming_chunkis correctly typed asUserMessageContentRoleType | None.- Optional
usageparameter increate_streaming_chunkallows final chunks to include usage stats per the spec.- Default model
"unknown-model"is a reasonable fallback.
654-656: LGTM: Converter updated for new default model.The converter
_string_to_nat_chat_requestcorrectly passesmodel="unknown-model"toChatRequest.from_string, consistent with the updated default model handling.
671-681: LGTM: Converter builds usage stats (word-based approximation).The converter
_string_to_nat_chat_responsecorrectly constructs aUsageobject and passes it toChatResponse.from_string. The token count is approximated by word count (len(data.split())), which is a reasonable heuristic for a type converter where exact tokenization is unavailable.Note: This approximation means token counts in converted responses may not match actual model token usage, but this is acceptable for a fallback converter.
284-305: AllChatResponse.from_stringcall sites updated withusage. No missing invocations—breaking-change handled.
226-229: Please verify ChatResponse serialization of optional usage
sed -n '150,260p' src/nat/data_models/api_server.py
sed -n '260,350p' src/nat/data_models/api_server.pyThis will confirm whether
from_stringomits or includesNoneusage fields when serializing.examples/HITL/simple_calculator_hitl/src/nat_simple_calculator_hitl/retry_react_agent.py (3)
27-27: LGTM: Usage import added for error responses.The import of
Usageon line 27 is correctly added to support the new requirement forChatResponse.from_stringto include usage statistics.
165-168: LGTM: Error response correctly includes usage.The error path in
handle_recursion_error(lines 165-168) correctly constructs aUsage()instance (with all token counts asNone) and passes it toChatResponse.from_string. This aligns with the updated API signature and is semantically appropriate for an error response where no tokens were consumed.
209-212: LGTM: Error response correctly includes usage.The error path when the user declines retry (lines 209-212) correctly constructs a
Usage()instance and passes it toChatResponse.from_string, consistent with the updated signature.
Signed-off-by: Eric Evans <[email protected]>
…to openai-v1-endpoint-response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
src/nat/data_models/api_server.py (6)
31-31: Importmodel_validatorto support a small Usage consistency check (see Usage comment).Add this alongside existing imports.
-from pydantic import model_serializer +from pydantic import model_serializer +from pydantic import model_validator
169-175: Add short Google‑style docstrings to public constructors.Clarify defaults and intent for API users.
@staticmethod def from_string(data: str, *, model: str | None = None, temperature: float | None = None, max_tokens: int | None = None, top_p: float | None = None) -> "ChatRequest": - - return ChatRequest(messages=[Message(content=data, role=UserMessageContentRoleType.USER)], + """Build a ChatRequest from plain text. + + Args: + data: User prompt text. + model: Optional model identifier. + temperature: Sampling temperature. + max_tokens: Max tokens to generate. + top_p: Nucleus sampling parameter. + Returns: + ChatRequest with a single user message. + """ + return ChatRequest(messages=[Message(content=data, role=UserMessageContentRoleType.USER)], model=model, temperature=temperature, max_tokens=max_tokens, top_p=top_p) @@ @staticmethod def from_content(content: list[UserContent], *, model: str | None = None, temperature: float | None = None, max_tokens: int | None = None, top_p: float | None = None) -> "ChatRequest": - - return ChatRequest(messages=[Message(content=content, role=UserMessageContentRoleType.USER)], + """Build a ChatRequest from structured content parts. + + Args: + content: User content parts. + model: Optional model identifier. + temperature: Sampling temperature. + max_tokens: Max tokens to generate. + top_p: Nucleus sampling parameter. + Returns: + ChatRequest with a single user message. + """ + return ChatRequest(messages=[Message(content=content, role=UserMessageContentRoleType.USER)], model=model, temperature=temperature, max_tokens=max_tokens, top_p=top_p)Also applies to: 183-189
198-201: Add a brief docstring toChoiceMessage.-class ChoiceMessage(BaseModel): +class ChoiceMessage(BaseModel): + """Message object in non‑streaming chat completion choices.""" content: str | None = None role: UserMessageContentRoleType | None = None
209-214: Consider addinglogprobsfor closer spec parity.Not required (extra fields allowed), but an explicit field helps typing and consumers that expect it.
class ChoiceBase(BaseModel): """Base choice model with common fields for both streaming and non-streaming responses""" model_config = ConfigDict(extra="allow") finish_reason: typing.Literal['stop', 'length', 'tool_calls', 'content_filter', 'function_call'] | None = None index: int + logprobs: dict[str, typing.Any] | None = None
231-233: Auto‑computetotal_tokenswhen not provided.Prevents inconsistent Usage payloads until #891 lands.
class Usage(BaseModel): prompt_tokens: int | None = None completion_tokens: int | None = None total_tokens: int | None = None + + @model_validator(mode="after") + def _ensure_total(self) -> "Usage": + # If either component is provided, backfill total_tokens. + if self.total_tokens is None and (self.prompt_tokens is not None or self.completion_tokens is not None): + self.total_tokens = (self.prompt_tokens or 0) + (self.completion_tokens or 0) + return self
370-373: Treat empty string content as “no delta” to better match final-chunk shape.Prevents emitting
{"content": ""}when callers pass""on terminal chunks.- delta = ChoiceDelta(content=content, role=role) if content is not None or role is not None else ChoiceDelta() + include_content = (content is not None and content != "") + delta = ChoiceDelta(content=content, role=role) if include_content or role is not None else ChoiceDelta()Also applies to: 380-383
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/nat/data_models/api_server.py(13 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
src/nat/data_models/api_server.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
src/nat/data_models/api_server.py
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All importable Python code must live under src/ (or packages//src/)
Files:
src/nat/data_models/api_server.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Changes in src/nat should prioritize backward compatibility
Files:
src/nat/data_models/api_server.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/data_models/api_server.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Files:
src/nat/data_models/api_server.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/data_models/api_server.py
🧬 Code graph analysis (1)
src/nat/data_models/api_server.py (1)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py (1)
Choice(404-407)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (7)
src/nat/data_models/api_server.py (7)
40-46: Roles enum with SYSTEM + docstring looks good.
119-122: Message.role migrated to enum — good interoperability.Pydantic will serialize to the string values; schema stays OpenAI-friendly.
203-207: Delta.role optional aligns with OpenAI streaming. LGTM.
216-224: Specialized choice types and alias keep BC while improving clarity.
388-395: Choice construction and finish_reason casting are sound.
659-660: Defaulting to"unknown-model"in converters keeps responses consistent.
269-273: All ChatResponse.from_string callsites include the required usage parameter. Verified in API server, agents, tests, and examples.
…to openai-v1-endpoint-response
…to openai-v1-endpoint-response
|
/ok-to-test |
Signed-off-by: Eric Evans <[email protected]>
Signed-off-by: Eric Evans <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/nat/front_ends/fastapi/message_validator.py(1 hunks)tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py(2 hunks)tests/nat/server/test_unified_api_server.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
src/nat/front_ends/fastapi/message_validator.pytests/nat/server/test_unified_api_server.pytests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
src/nat/front_ends/fastapi/message_validator.pytests/nat/server/test_unified_api_server.pytests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All importable Python code must live under src/ (or packages//src/)
Files:
src/nat/front_ends/fastapi/message_validator.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Changes in src/nat should prioritize backward compatibility
Files:
src/nat/front_ends/fastapi/message_validator.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/front_ends/fastapi/message_validator.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Files:
src/nat/front_ends/fastapi/message_validator.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/front_ends/fastapi/message_validator.pytests/nat/server/test_unified_api_server.pytests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Unit tests reside under tests/ and should use markers defined in pyproject.toml (e.g., integration)
Files:
tests/nat/server/test_unified_api_server.pytests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/server/test_unified_api_server.pytests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
{tests/**/*.py,examples/*/tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Files:
tests/nat/server/test_unified_api_server.pytests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py
🧬 Code graph analysis (3)
src/nat/front_ends/fastapi/message_validator.py (1)
src/nat/data_models/api_server.py (3)
ChatResponse(258-308)SystemResponseContent(566-569)ChatResponseChunk(311-399)
tests/nat/server/test_unified_api_server.py (1)
src/nat/data_models/api_server.py (4)
ChatResponseChunkChoice(220-222)ChoiceDelta(202-205)Usage(229-232)ChatResponseChunk(311-399)
tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py (1)
src/nat/data_models/api_server.py (1)
ChatResponseChunk(311-399)
🔇 Additional comments (6)
tests/nat/front_ends/fastapi/test_fastapi_front_end_plugin.py (2)
140-140: LGTM: Correct extraction from streaming delta.The change correctly extracts content from
delta.contentinstead ofmessage.content, aligning with OpenAI Chat Completions API specification for streaming chunks. Theor ""fallback appropriately handles None values.
162-162: LGTM: Consistent streaming content extraction.This change mirrors the correction at line 140, ensuring consistent extraction of streaming content from
delta.contentacross both test paths. The implementation correctly validates OpenAI-compatible streaming behavior.src/nat/front_ends/fastapi/message_validator.py (1)
142-145: Correctly aligned with updated data models.The content extraction paths now correctly use
message.contentforChatResponseanddelta.contentforChatResponseChunk, matching the newChatResponseChoiceandChatResponseChunkChoicestructures introduced in the PR.tests/nat/server/test_unified_api_server.py (3)
36-36: LGTM: Correct imports for updated data models.The new imports for
ChatResponseChunkChoice,ChoiceDelta, andUsagealign with the updated API surface introduced in this PR.Also applies to: 38-38, 47-47
469-469: LGTM: Correct addition of requiredusagefield.The
usagefield is now required inChatResponse. Using zero values for token counts is appropriate for this test fixture.
471-471: LGTM: Correct usage ofChatResponseChunkChoicewithdelta.The change correctly uses
ChatResponseChunkChoicewith adeltafield, matching the updatedChatResponseChunkmodel structure for streaming responses.
Signed-off-by: Eric Evans <[email protected]>
|
/merge |
This PR significantly improves OpenAI Chat Completions API compatibility by fixing response format compliance, and removing unused code. The changes ensure that NAT's OpenAI-compatible endpoints fully adhere to the OpenAI specification for both streaming and non-streaming responses. Closes: NVIDIA#818 A follow-up issue has been created to address accurate calculation and passing of usage statistics from workflows to ChatResponse objects in OpenAI-compatible endpoints: [Issue: 891](NVIDIA#891). ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit - New Features - All responses — including error replies — now include usage statistics (prompt, completion, total tokens). - Refactor - OpenAI-compatible non‑streaming path simplified to return a single JSON response; Content-Type set explicitly for JSON and streaming. - Default model identifier standardized to "unknown-model" in responses. - Compatibility - Streaming chunk roles standardized to an enum-style role; response payloads and tests now include and expect usage metadata. Authors: - Eric Evans II (https://github.com/ericevans-nv) Approvers: - Will Killian (https://github.com/willkill07) URL: NVIDIA#889 Signed-off-by: Yuchen Zhang <[email protected]>
Description
This PR significantly improves OpenAI Chat Completions API compatibility by fixing response format compliance, and removing unused code. The changes ensure that NAT's OpenAI-compatible endpoints fully adhere to the OpenAI specification for both streaming and non-streaming responses.
Closes: #818
A follow-up issue has been created to address accurate calculation and passing of usage statistics from workflows to ChatResponse objects in OpenAI-compatible endpoints: Issue: 891.
By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Refactor
Compatibility