Skip to content

Conversation

@asimurka
Copy link

@asimurka asimurka commented Dec 3, 2025

Description

Bump to llama stack version 0.3.0. Adds query and conversation endpoints using responses API that are compatible with OLS endpoints.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added Conversations API v3 (list/get/update/delete) and conversation ID normalization
    • Enhanced query responses with tool call/result summaries and expanded streaming end metadata (tokens, quotas, referenced documents)
  • Bug Fixes

    • Conversation deletion now returns a successful JSON response instead of 404 for non-existent conversations
  • Documentation

    • Added comprehensive Conversations API integration guide
  • Chores

    • Upgraded Llama Stack to v0.3.0 and migrated configs from Ollama-style to Starter-style

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds Conversations v3 endpoints, integrates conversation lifecycle with Responses API for query and streaming flows, replaces ToolCall with ToolCallSummary and adds ToolResultSummary, enhances SSE streaming/error payloads, bumps Llama Stack to 0.3.0, and migrates config/docs for Starter distribution.

Changes

Cohort / File(s) Summary
Documentation & OpenAPI
docs/conversations_api.md, docs/openapi.json
New conversations guide; OpenAPI updated with versioned tags, streaming schema changes, new ToolCallSummary/ToolResultSummary schemas, and expanded error examples.
Dependency & Build
pyproject.toml, src/constants.py, test.containerfile
Bumped llama-stack to 0.3.0, updated MAXIMAL_SUPPORTED_LLAMA_STACK_VERSION, added pyright excludes, and base image tag moved to rhoai-v3.0-latest.
Runtime Configuration
run.yaml, tests/e2e/configs/run-ci.yaml, tests/e2e/configs/run-azure.yaml, tests/configuration/minimal-stack.yaml
Migrated from Ollama-style config to Starter-oriented structure; added conversations_store, unified providers, and moved persistence paths to /opt/app-root/... starter distribution.
New Conversations API
src/app/endpoints/conversations_v3.py
New router implementing GET list, GET detail, DELETE, and PUT update handlers with Llama Stack integration, ID validation, access checks, and DB sync.
Conversations v2 adjustments
src/app/endpoints/conversations_v2.py, tests/unit/app/endpoints/test_conversations_v2.py, tests/e2e/features/conversations.feature
Removed 404 pre-check from GET and DELETE flows; delete behavior changed to return success-like response for missing conversations; tests updated accordingly.
Query & Streaming (Responses API integration)
src/app/endpoints/query.py, src/app/endpoints/query_v2.py, src/app/endpoints/streaming_query.py, src/app/endpoints/streaming_query_v2.py
Conversation lifecycle added (create/normalize/convert IDs), conversation passed to Responses API, content_to_str adoption, tool_results tracking added, referenced_documents extraction, SSE error streaming, enriched end-event metadata (token usage, available_quotas, referenced_documents).
Router wiring
src/app/routers.py, tests/unit/app/test_routers.py
V1 routes now wired to v2 implementations (query_v2, streaming_query_v2) and conversations_v3; router counts and tests adjusted.
Models & Types
src/models/responses.py, src/utils/types.py, tests/unit/models/responses/*, tests/unit/models/responses/test_query_response.py
Introduced ToolCallSummary and ToolResultSummary; QueryResponse schema updated to include tool_calls/tool_results (removed rag_chunks public surface); StreamingQueryResponse and PromptTooLongResponse (413) added; RAGChunk moved to utils.types.
Utilities & Helpers
src/utils/endpoints.py, src/utils/suid.py, src/utils/llama_stack_version.py, src/utils/transcripts.py, src/utils/token_counter.py, src/metrics/utils.py
delete_conversation now returns bool; added normalize_conversation_id and to_llama_stack_conversation_id; llama-stack version parsing hardened; transcripts now include tool_results; several import paths moved to alpha submodules.
Tests - Integration & Unit updates
tests/integration/*, tests/unit/*, tests/e2e/*
Many tests updated to reflect conv_id normalization, new tool_call/tool_result shapes, SSE/error stream changes, removal of 404 from DELETE expectations, and config changes; new mocks for conversation creation added; many legacy tests marked skipped where deprecated.
Misc (CI/test infra)
test.containerfile, tests/e2e/configs/*, tests/configuration/minimal-stack.yaml
CI/e2e configs and minimal stack updated to Starter layout and sqlite-backed stores for persistence.

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant Client
participant App as Lightspeed App
participant Llama as Llama Stack (Responses API)
participant DB as Local DB / Conversations Store
Note over Client,App: Query or Streaming request with optional conversation_id
Client->>App: POST /v1/query or /v1/streaming_query (conversation_id?)
alt no conversation_id
App->>Llama: POST /conversations.create
Llama-->>App: { id: "conv_" }
App->>DB: store conversation mapping (normalized id)
DB-->>App: stored
else conversation_id provided
App->>App: normalize / to_llama_stack_conversation_id(conversation_id)
end
App->>Llama: Call Responses API (include "conversation": llama_conv_id)
alt streaming
Llama-->>App: stream chunks (tokens, tool_call events, end)
App->>Client: SSE chunks (START with normalized id on first chunk, token/tool events...)
else non-streaming
Llama-->>App: full response
App->>Client: JSON response (includes tool_calls, tool_results, referenced_documents, token usage)
end
Note over App,DB: On end-event, App records transcript (including tool_results) and returns available_quotas & referenced_documents in final payload

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–75 minutes

  • Areas needing extra attention:
    • conversation ID normalization and conversion utilities in src/utils/suid.py and their use across query_v2/streaming_query_v2
    • ToolCallSummary/ToolResultSummary model migrations and QueryResponse/StreamingQueryResponse compatibility (src/models/responses.py, src/utils/types.py)
    • SSE error streaming and end-event schema changes in streaming_query.py / streaming_query_v2.py
    • delete_conversation behavior change and implications in utils/endpoints.py and conversations_v2/v3
    • Import path moves to alpha submodules (ensure runtime/package compatibility)

Possibly related PRs

Suggested reviewers

  • eranco74
  • tisnik
  • manstis

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly relates to the main objective of bumping llama stack to 0.3.0, which is reflected in the primary code change (dependency updates, version constants, and configuration adjustments throughout the changeset).
Docstring Coverage ✅ Passed Docstring coverage is 90.23% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch from 5099065 to 3d4e2c5 Compare December 3, 2025 13:52
@asimurka asimurka changed the title Bump llama stack to 0.3.0 LCORE-956: Bump llama stack to 0.3.0 Dec 3, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/models/responses.py (1)

417-451: Update model_config example to match the new ToolCallSummary and ToolResultSummary structure.

The example includes rag_chunks (which is no longer a field) and uses the old tool_calls format (tool_name, arguments, result) instead of the new ToolCallSummary structure (id, name, args, type). The example should also include tool_results.

 model_config = {
     "json_schema_extra": {
         "examples": [
             {
                 "conversation_id": "123e4567-e89b-12d3-a456-426614174000",
                 "response": "Operator Lifecycle Manager (OLM) helps users install...",
-                "rag_chunks": [
-                    {
-                        "content": "OLM is a component of the Operator Framework toolkit...",
-                        "source": "kubernetes-docs/operators.md",
-                        "score": 0.95,
-                    }
-                ],
                 "tool_calls": [
                     {
-                        "tool_name": "knowledge_search",
-                        "arguments": {"query": "operator lifecycle manager"},
-                        "result": {"chunks_found": 5},
+                        "id": "call_123",
+                        "name": "knowledge_search",
+                        "args": {"query": "operator lifecycle manager"},
+                        "type": "tool_call",
+                    }
+                ],
+                "tool_results": [
+                    {
+                        "id": "call_123",
+                        "status": "success",
+                        "content": {"chunks_found": 5},
+                        "type": "tool_result",
+                        "round": 1,
                     }
                 ],
                 "referenced_documents": [
src/utils/suid.py (1)

19-44: Align check_suid type handling with its docstring (bytes vs str)

The extended check_suid logic for conv_... and plain hex conversation IDs looks good, and the new normalize_conversation_id / to_llama_stack_conversation_id helpers are straightforward and correct for the conv_<hex> vs normalized formats.

However, the docstring now advertises:

  • suid (str | bytes) and
  • acceptance of a “byte representation”,

while the function is annotated as check_suid(suid: str) -> bool and, in practice, treats non‑str inputs as invalid (a bytes value falls through to uuid.UUID(suid) and is rejected with TypeError, returning False).

To avoid confusion for callers, consider either:

  • Updating the signature and implementation to genuinely support bytes, e.g.:
-def check_suid(suid: str) -> bool:
+def check_suid(suid: str | bytes) -> bool:
@@
-    try:
+    try:
+        if isinstance(suid, bytes):
+            try:
+                suid = suid.decode("ascii")
+            except UnicodeDecodeError:
+                return False
+
         # Accept llama-stack conversation IDs (conv_<hex> format)
-        if isinstance(suid, str) and suid.startswith("conv_"):
+        if suid.startswith("conv_"):
@@
-        if isinstance(suid, str) and len(suid) >= 32:
+        if len(suid) >= 32:
@@
-        uuid.UUID(suid)
+        uuid.UUID(str(suid))
  • Or, if bytes are not needed, simplifying the docstring to mention only str inputs.

Either way, keeping type hints and documentation consistent will make this utility safer to reuse.

Also applies to: 45-98, 101-145

🧹 Nitpick comments (29)
tests/e2e/features/info.feature (1)

19-19: Exact llama-stack version assertion may be too brittle

The step now hard-codes 0.3.0rc3+rhai0. If the underlying stack moves to another rc/build of 0.3.0, this e2e will fail even though the service is healthy and still on the intended major/minor version.

Consider relaxing this to:

  • Match just the semantic version prefix (e.g., 0.3.0), or
  • Delegate the exact expected value to a shared constant/env so it’s updated in one place when bumping the stack.

Also please confirm that the /info endpoint actually returns 0.3.0rc3+rhai0 in your target deployment before merging.

pyproject.toml (1)

64-70: Track technical debt for type checking exclusions.

Adding 4 more files to pyright exclusions increases the untyped surface area. The comment indicates these are deprecated v1 endpoints - consider adding a TODO with a tracking issue to either fix the type errors or remove these endpoints when v1 is fully deprecated.

src/utils/llama_stack_version.py (1)

43-60: Update docstring to reflect new behavior.

The docstring states version_info "must be parseable by semver.Version.parse", but the implementation now silently skips the check on parse failure rather than raising an exception. Update the docstring to document this behavior accurately.

src/utils/types.py (2)

113-124: Consider using Literal types for status field.

The status field accepts any string, but based on usage it appears to only be "success" or "failure". Consider using Literal["success", "failure"] for better type safety.

+from typing import Literal
+
 class ToolResultSummary(BaseModel):
     """Model representing a result from a tool call (for tool_results list)."""

     id: str = Field(
         description="ID of the tool call/result, matches the corresponding tool call 'id'"
     )
-    status: str = Field(
+    status: Literal["success", "failure"] = Field(
         ..., description="Status of the tool execution (e.g., 'success')"
     )

163-171: Address the TODO comment about the round attribute.

The comment # clarify meaning of this attribute on line 169 indicates uncertainty about the round field's semantics. Either document its intended meaning or track this as a follow-up issue.

Do you want me to open an issue to track clarifying the round attribute's meaning?

src/models/responses.py (1)

28-41: Remove commented-out code instead of leaving it in the codebase.

Dead code should be removed rather than commented out. Version control preserves history if these definitions are needed later.

-# class ToolCall(BaseModel):
-#     """Model representing a tool call made during response generation."""
-
-#     tool_name: str = Field(description="Name of the tool called")
-#     arguments: dict[str, Any] = Field(description="Arguments passed to the tool")
-#     result: dict[str, Any] | None = Field(None, description="Result from the tool")
-
-
-# class ToolResult(BaseModel):
-#     """Model representing a tool result."""
-
-#     tool_name: str = Field(description="Name of the tool")
-#     result: dict[str, Any] = Field(description="Result from the tool")
-
src/app/endpoints/streaming_query_v2.py (3)

76-76: Enable or remove the commented-out PromptTooLongResponse entry.

The 413 response is commented out but PromptTooLongResponse was added in this PR. If prompt-too-long errors can occur in this endpoint, enable the response; otherwise, remove the comment.

-    # 413: PromptTooLongResponse.openapi_response(),
+    413: PromptTooLongResponse.openapi_response(),

If enabling, also add the import:

from models.responses import (
    ...
    PromptTooLongResponse,
    ...
)

296-296: Fix typo: double closing parenthesis in comment.

-        # Perform cleanup tasks (database and cache operations))
+        # Perform cleanup tasks (database and cache operations)

457-461: Remove commented-out debug code.

Debug logging statements should be removed rather than left commented in production code.

     response_stream = cast(AsyncIterator[OpenAIResponseObjectStream], response)
-    # async for chunk in response_stream:
-    #     logger.error("Chunk: %s", chunk.model_dump_json())
-    # Return the normalized conversation_id (already normalized above)
-    # The response_generator will emit it in the start event
+    # Return the normalized conversation_id (already normalized above).
+    # The response_generator will emit it in the start event.
     return response_stream, conversation_id
src/utils/token_counter.py (1)

72-83: Input-token comment vs. implementation mismatch (system vs. user role)

The comment says this block uses the same logic as metrics.utils.update_llm_token_count_from_turn, but here the system_prompt is wrapped as RawMessage(role="system", ...) while the metrics helper uses role="user". This may slightly change tokenization and contradicts the “same logic” claim. Consider either aligning the role for true parity or updating the comment to reflect the intentional difference.

tests/unit/utils/test_endpoints.py (1)

261-803: Deprecated agent/temp-agent tests are now skipped

Marking these legacy get_agent / get_temp_agent async tests with @pytest.mark.skip(reason="Deprecated API test") is a pragmatic way to avoid failures against removed/changed APIs while preserving the scenarios for reference. You may eventually want to replace these with coverage for the new v2/v3 flows or group them under a custom marker (e.g. @pytest.mark.deprecated_api) to make later cleanup easier, but the current change is fine.

tests/e2e/features/conversations.feature (1)

173-181: Non-existent DELETE now returns 200 with explanatory payload

The DELETE /conversations/{conversation_id} scenario for a non-existent ID now expects a 200 with {"success": true, "response": "Conversation cannot be deleted"} (ignoring conversation_id). This matches the new idempotent delete behavior but is a semantic shift from the earlier 404. Please ensure client logic and API docs are updated to rely on the response body/message for “not found” vs. “deleted” distinctions rather than the status code alone.

docs/conversations_api.md (1)

14-31: Fix markdownlint issues: list indentation and fenced code languages

The doc reads well, but the markdownlint findings are valid and may break CI:

  • MD007: nested list items in the TOC (Lines ~16–27) are indented with three spaces instead of two.
  • MD040: several fenced code blocks lack an explicit language (e.g., the conv_... example, SSE snippets, error snippets).

You can address both with a small style-only patch, for example:

-   * [Llama Stack Format](#llama-stack-format)
-   * [Normalized Format](#normalized-format)
-   * [ID Conversion Utilities](#id-conversion-utilities)
+  * [Llama Stack Format](#llama-stack-format)
+  * [Normalized Format](#normalized-format)
+  * [ID Conversion Utilities](#id-conversion-utilities)
@@
-```
+```text
 conv_<48-character-hex-string>

@@
- +text
conv_0d21ba731f21f798dc9680125d5d6f493e4a7ab79f25670e

@@
-```
+```text
data: {"event": "start", "data": {"conversation_id": "0d21ba7..."}}
@@
-```
+```text
Error: Conversation not found (HTTP 404)

Apply the same `text` (or another appropriate language like `bash`, `json`, or `sql`) to the remaining unlabeled fenced blocks.




Also applies to: 58-80, 234-245, 369-379, 469-471

</blockquote></details>
<details>
<summary>tests/unit/models/responses/test_successful_responses.py (1)</summary><blockquote>

`41-45`: **Strengthening QueryResponse and StreamingQueryResponse tests**

The switch to `ToolCallSummary` / `ToolResultSummary` and the new `TestStreamingQueryResponse` look consistent with the updated response models:

- `test_constructor_minimal` correctly expects `tool_calls` and `tool_results` to default to `None`.
- `TestStreamingQueryResponse` validates the SSE `text/event-stream` OpenAPI shape and example presence exactly as implemented.

One small improvement: in `test_constructor_full`, you already construct `tool_results` and pass them into `QueryResponse`, but never assert they are preserved. Adding a single check would tighten coverage on the new field:

```diff
@@
-        response = QueryResponse(  # type: ignore[call-arg]
+        response = QueryResponse(  # type: ignore[call-arg]
@@
-        assert response.conversation_id == "conv-123"
-        assert response.tool_calls == tool_calls
+        assert response.conversation_id == "conv-123"
+        assert response.tool_calls == tool_calls
+        assert response.tool_results == tool_results
         assert response.referenced_documents == referenced_docs

Also applies to: 262-317, 968-996

tests/integration/endpoints/test_query_v2_integration.py (1)

84-90: Integration tests correctly exercise ID normalization, tool calls, and transcript handling

  • Mocking conversations.create to return conv_ + 48 hex chars and asserting response.conversation_id == "a" * 48 gives good end‑to‑end coverage of the new normalization flow.
  • The updated tool‑call tests now operate on structured results and assert tool_calls[0].name / aggregated tool_names, matching the new ToolCallSummary shape.
  • In the transcript behavior test, introducing mocker and patching store_transcript avoids real file I/O while still verifying that conversations are persisted regardless of transcript configuration.

You might optionally also assert len(response.conversation_id) == 48 in the normalization checks to make the invariant explicit, but the current tests are already solid.

Also applies to: 166-170, 346-355, 375-378, 508-510, 1198-1227

tests/unit/app/endpoints/test_query_v2.py (1)

108-146: Unit tests comprehensively cover query v2 behavior; tighten one referenced-docs test

Overall, these unit tests give strong coverage of the new query v2 behavior:

  • retrieve_response_* tests now mock conversations.create and assert that conversation IDs are normalized (e.g., "conv_abc123def456""abc123def456"), exercise RAG/MCP tool preparation, shields/guardrails wiring, and token‑usage extraction.
  • Output parsing tests correctly handle content parts with annotations lists and ensure tool calls are surfaced as ToolCallSummary instances.
  • The new test_retrieve_response_parses_referenced_documents nicely validates URL citations, file citations, and file_search_call results feeding into referenced_docs.

One small consistency improvement: in test_retrieve_response_parses_referenced_documents you don’t mock conversations.create, so the returned conv_id depends on AsyncMock’s default behavior and is unused. For clarity and future robustness (if the helper ever starts validating or persisting that ID), you could align it with the other tests:

@@
-    mock_client = mocker.AsyncMock()
+    mock_client = mocker.AsyncMock()
@@
-    mock_client.responses.create = mocker.AsyncMock(return_value=response_obj)
+    mock_client.responses.create = mocker.AsyncMock(return_value=response_obj)
+    # Ensure deterministic conversation_id normalization
+    mock_conversation = mocker.Mock()
+    mock_conversation.id = "conv_abc123def456"
+    mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation)
@@
-    _summary, _conv_id, referenced_docs, _token_usage = await retrieve_response(
+    _summary, conv_id, referenced_docs, _token_usage = await retrieve_response(
         mock_client, "model-docs", qr, token="tkn", provider_id="test-provider"
     )
+
+    assert conv_id == "abc123def456"

This keeps the conversation‑ID behavior under test consistent with the rest of the suite without changing the core assertions around referenced documents.

Also applies to: 151-199, 209-265, 268-399, 400-444, 468-475, 535-579, 582-629, 676-775, 779-860

src/app/endpoints/query.py (2)

197-201: Consider removing commented-out code.

The # toolgroups=None, comment on line 201 and commented # documents=documents on line 754 appear to be temporary placeholders. If these parameters are intentionally removed from the API call, the comments should be cleaned up. If they represent pending work, consider adding a TODO with a ticket reference.


742-757: Clarify status of commented-out code blocks.

Multiple code sections are commented out with no clear explanation:

  • Lines 742-749: documents construction
  • Lines 754: # documents=documents
  • Lines 756: # toolgroups=toolgroups

If this represents intentional API changes in llama-stack 0.3.0, consider adding brief comments explaining why. If temporary, add TODOs with ticket references.

tests/unit/app/endpoints/test_streaming_query.py (1)

46-144: Mock classes provide backward compatibility for deprecated Agent API tests.

These mock classes (TextDelta, ToolCallDelta, TurnResponseEvent, etc.) emulate the Agent API types that no longer exist in llama-stack-client 0.3.x. The docstrings clearly indicate their purpose for backward compatibility.

Consider consolidating these mocks into a shared test utilities module if they're needed across multiple test files, to reduce duplication.

src/utils/endpoints.py (2)

313-316: Clarify intent of commented-out agent retrieval code.

Multiple sections of agent retrieval and session management code are commented out with ... ellipsis placeholders:

  • Lines 313-315: Agent retrieval
  • Lines 335-342: Orphan agent deletion and session retrieval

If this represents a temporary state during the llama-stack 0.3.0 migration, add a TODO comment with a ticket reference. If these features are intentionally removed, the dead code should be cleaned up.

     if conversation_id:
         with suppress(ValueError):
-            # agent_response = await client.agents.retrieve(agent_id=conversation_id)
-            # existing_agent_id = agent_response.agent_id
-            ...
+            # TODO(LCORE-XXX): Agent retrieval disabled during llama-stack 0.3.0 migration
+            pass

Also applies to: 335-342


323-329: Shield parameters commented out with type:ignore.

The input_shields and output_shields parameters are commented out with # type: ignore[call-arg] annotations. If shields are intentionally disabled in llama-stack 0.3.0, add a clear comment explaining why. The enable_session_persistence parameter has a similar pattern.

src/app/endpoints/conversations_v3.py (1)

47-48: Logger name should use __name__ pattern; router tag inconsistent with filename.

Per coding guidelines, use logger = logging.getLogger(__name__) for module logging. Additionally, the router tag "conversations_v1" is inconsistent with the filename conversations_v3.py, which may cause confusion in API documentation.

-logger = logging.getLogger("app.endpoints.handlers")
-router = APIRouter(tags=["conversations_v1"])
+logger = logging.getLogger(__name__)
+router = APIRouter(tags=["conversations_v3"])
tests/unit/app/endpoints/test_query.py (1)

609-609: Consider tracking or removing deprecated tests.

Multiple tests are marked as skipped with "Deprecated API test". While this is acceptable during migration, consider:

  1. Creating an issue to track removal of these deprecated tests
  2. Converting them to test the new Responses API equivalents
  3. Removing them if the functionality is covered elsewhere

Leaving many skipped tests long-term can obscure coverage metrics and create maintenance burden.

Would you like me to open an issue to track the cleanup of these deprecated tests?

src/app/endpoints/streaming_query.py (2)

363-378: Improve error logging message in stream_http_error.

logger.exception(error) logs the error object as the message. While this works, it's more conventional to provide a descriptive message string:

 async def stream_http_error(error: AbstractErrorResponse) -> AsyncGenerator[str, None]:
     ...
     logger.error("Error while obtaining answer for user question")
-    logger.exception(error)
+    logger.exception("Streaming error: %s", type(error).__name__)
 
     yield format_stream_data({"event": "error", "data": {**error.detail.model_dump()}})

1110-1116: Commented-out parameters need clarification or cleanup.

The documents parameter has a TODO reference (LCORE-881), but the toolgroups parameter is also commented out without explanation. Either add a tracking reference or remove if intentionally disabled.

     response = await agent.create_turn(
         messages=[UserMessage(role="user", content=query_request.query).model_dump()],
         session_id=session_id,
-        # documents=documents,
+        # TODO: LCORE-881 - documents temporarily disabled
+        # documents=documents,
         stream=True,
-        # toolgroups=toolgroups,
+        # TODO: Add ticket reference for toolgroups disabled status
+        # toolgroups=toolgroups,
     )
src/app/endpoints/query_v2.py (1)

390-390: Consider reducing log verbosity for response object.

Logging the entire response object at INFO level could produce large log entries and impact performance. Consider using DEBUG level or logging only summary information:

-    logger.info("Response: %s", response)
+    logger.debug("Response object: %s", response)
docs/openapi.json (3)

3431-3442: Minor docs polish: consistent error examples and typos

  • Ensure example labels/messages are consistent (e.g., “cannot be deleted” vs “can not be deleted” elsewhere).
  • Typo “attatchment” appears in validation examples; prefer “attachment”.

No schema change needed; search/replace in example texts is sufficient.

Also applies to: 3451-3470, 3479-3492, 3497-3511, 3526-3539, 3542-3551


6990-7014: ToolCallSummary: consider making args optional-but-documented and align example usage

args is optional in schema but not shown in examples anywhere else. Add a minimal example or clarify default (empty object) to improve client codegen hints. Also ensure all endpoint examples use name/args/id consistently.


1517-1525: SSE schema representation — optional enhancement

OpenAPI can’t fully model SSE; consider linking to an explicit event schema (start/token/turn_complete/end) in description for stronger client expectations.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8aa441c and 3d4e2c5.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (40)
  • docs/conversations_api.md (1 hunks)
  • docs/openapi.json (11 hunks)
  • pyproject.toml (2 hunks)
  • run.yaml (1 hunks)
  • src/app/endpoints/conversations_v2.py (0 hunks)
  • src/app/endpoints/conversations_v3.py (1 hunks)
  • src/app/endpoints/query.py (9 hunks)
  • src/app/endpoints/query_v2.py (16 hunks)
  • src/app/endpoints/streaming_query.py (19 hunks)
  • src/app/endpoints/streaming_query_v2.py (14 hunks)
  • src/app/routers.py (2 hunks)
  • src/constants.py (1 hunks)
  • src/metrics/utils.py (1 hunks)
  • src/models/requests.py (1 hunks)
  • src/models/responses.py (7 hunks)
  • src/utils/endpoints.py (5 hunks)
  • src/utils/llama_stack_version.py (2 hunks)
  • src/utils/suid.py (1 hunks)
  • src/utils/token_counter.py (1 hunks)
  • src/utils/transcripts.py (1 hunks)
  • src/utils/types.py (3 hunks)
  • test.containerfile (1 hunks)
  • tests/configuration/minimal-stack.yaml (1 hunks)
  • tests/e2e/configs/run-ci.yaml (1 hunks)
  • tests/e2e/features/conversations.feature (1 hunks)
  • tests/e2e/features/info.feature (1 hunks)
  • tests/integration/endpoints/test_query_v2_integration.py (8 hunks)
  • tests/integration/test_openapi_json.py (4 hunks)
  • tests/unit/app/endpoints/test_conversations_v2.py (1 hunks)
  • tests/unit/app/endpoints/test_query.py (30 hunks)
  • tests/unit/app/endpoints/test_query_v2.py (27 hunks)
  • tests/unit/app/endpoints/test_streaming_query.py (28 hunks)
  • tests/unit/app/endpoints/test_streaming_query_v2.py (13 hunks)
  • tests/unit/app/test_routers.py (3 hunks)
  • tests/unit/models/responses/test_error_responses.py (3 hunks)
  • tests/unit/models/responses/test_query_response.py (3 hunks)
  • tests/unit/models/responses/test_rag_chunk.py (1 hunks)
  • tests/unit/models/responses/test_successful_responses.py (6 hunks)
  • tests/unit/utils/test_endpoints.py (11 hunks)
  • tests/unit/utils/test_transcripts.py (3 hunks)
💤 Files with no reviewable changes (1)
  • src/app/endpoints/conversations_v2.py
🧰 Additional context used
📓 Path-based instructions (9)
tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (CLAUDE.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

  • tests/e2e/features/conversations.feature
  • tests/e2e/features/info.feature
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/utils/suid.py
  • src/utils/llama_stack_version.py
  • src/metrics/utils.py
  • src/utils/transcripts.py
  • src/utils/token_counter.py
  • src/app/endpoints/conversations_v3.py
  • src/constants.py
  • src/utils/types.py
  • src/utils/endpoints.py
  • src/app/routers.py
  • src/models/responses.py
  • src/models/requests.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query.py
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/utils/test_endpoints.py
  • tests/unit/models/responses/test_successful_responses.py
  • tests/unit/models/responses/test_error_responses.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/models/responses/test_rag_chunk.py
  • tests/unit/app/endpoints/test_query_v2.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/test_routers.py
  • tests/integration/test_openapi_json.py
  • tests/unit/utils/test_transcripts.py
  • tests/integration/endpoints/test_query_v2_integration.py
  • tests/unit/models/responses/test_query_response.py
  • tests/unit/app/endpoints/test_query.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/unit/app/endpoints/test_conversations_v2.py
  • tests/unit/utils/test_endpoints.py
  • tests/unit/models/responses/test_successful_responses.py
  • tests/unit/models/responses/test_error_responses.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/models/responses/test_rag_chunk.py
  • tests/unit/app/endpoints/test_query_v2.py
  • tests/unit/app/endpoints/test_streaming_query_v2.py
  • tests/unit/app/test_routers.py
  • tests/integration/test_openapi_json.py
  • tests/unit/utils/test_transcripts.py
  • tests/integration/endpoints/test_query_v2_integration.py
  • tests/unit/models/responses/test_query_response.py
  • tests/unit/app/endpoints/test_query.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Configure pylint with source-roots = "src"
Exclude src/auth/k8s.py from pyright type checking

Files:

  • pyproject.toml
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/conversations_v3.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/conversations_v3.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/streaming_query_v2.py
  • src/app/endpoints/query.py
src/**/constants.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define shared constants in central constants.py file with descriptive comments

Files:

  • src/constants.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Pydantic configuration classes must extend ConfigurationBase; data models must extend BaseModel

Files:

  • src/models/responses.py
  • src/models/requests.py
🧠 Learnings (7)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to pyproject.toml : Exclude `src/auth/k8s.py` from pyright type checking

Applied to files:

  • pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in tests

Applied to files:

  • tests/unit/utils/test_endpoints.py
  • tests/unit/app/endpoints/test_streaming_query.py
  • tests/unit/app/endpoints/test_query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/{unit,integration}/**/*.py : Use pytest for all unit and integration tests; do not use unittest framework

Applied to files:

  • tests/unit/utils/test_endpoints.py
📚 Learning: 2025-08-06T06:02:21.060Z
Learnt from: eranco74
Repo: lightspeed-core/lightspeed-stack PR: 348
File: src/utils/endpoints.py:91-94
Timestamp: 2025-08-06T06:02:21.060Z
Learning: The direct assignment to `agent._agent_id` in `src/utils/endpoints.py` is a necessary workaround for the missing agent rehydration feature in the LLS client SDK. This allows preserving conversation IDs when handling existing agents.

Applied to files:

  • tests/unit/utils/test_endpoints.py
  • src/utils/endpoints.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code

Applied to files:

  • src/metrics/utils.py
  • src/app/endpoints/streaming_query.py
  • tests/unit/app/endpoints/test_query.py
  • src/app/endpoints/query.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

  • test.containerfile
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling

Applied to files:

  • tests/unit/app/endpoints/test_query.py
🧬 Code graph analysis (16)
tests/unit/app/endpoints/test_conversations_v2.py (4)
tests/unit/app/endpoints/test_tools.py (1)
  • mock_configuration (28-47)
src/cache/postgres_cache.py (1)
  • delete (312-347)
src/cache/sqlite_cache.py (1)
  • delete (310-343)
src/app/endpoints/conversations_v2.py (1)
  • delete_conversation_endpoint_handler (145-168)
tests/unit/models/responses/test_successful_responses.py (2)
src/models/responses.py (5)
  • StreamingQueryResponse (454-519)
  • openapi_response (47-60)
  • openapi_response (458-480)
  • openapi_response (858-882)
  • openapi_response (1187-1210)
src/utils/types.py (2)
  • ToolCallSummary (102-110)
  • ToolResultSummary (113-124)
src/utils/endpoints.py (3)
src/models/responses.py (2)
  • model (1624-1628)
  • NotFoundResponse (1431-1486)
src/utils/types.py (2)
  • GraniteToolParser (60-99)
  • get_parser (82-99)
src/utils/suid.py (1)
  • get_suid (6-16)
tests/unit/models/responses/test_error_responses.py (1)
src/models/responses.py (7)
  • PromptTooLongResponse (1489-1518)
  • AbstractErrorResponse (1157-1210)
  • DetailModel (1150-1154)
  • openapi_response (47-60)
  • openapi_response (458-480)
  • openapi_response (858-882)
  • openapi_response (1187-1210)
tests/unit/app/endpoints/test_streaming_query.py (3)
src/models/responses.py (2)
  • ReferencedDocument (328-338)
  • model (1624-1628)
src/models/requests.py (1)
  • QueryRequest (73-233)
src/app/endpoints/streaming_query.py (2)
  • streaming_query_endpoint_handler (961-995)
  • stream_end_event (148-197)
tests/unit/models/responses/test_rag_chunk.py (1)
src/utils/types.py (1)
  • RAGChunk (127-132)
tests/unit/app/endpoints/test_query_v2.py (2)
src/app/endpoints/streaming_query.py (1)
  • retrieve_response (998-1119)
src/app/endpoints/streaming_query_v2.py (1)
  • retrieve_response (364-461)
tests/unit/app/endpoints/test_streaming_query_v2.py (4)
src/app/endpoints/streaming_query_v2.py (1)
  • streaming_query_endpoint_handler_v2 (327-361)
tests/unit/app/endpoints/test_query.py (1)
  • dummy_request (59-68)
tests/unit/app/endpoints/test_query_v2.py (1)
  • dummy_request (31-34)
src/models/responses.py (1)
  • model (1624-1628)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
  • include_router (36-51)
tests/unit/utils/test_transcripts.py (1)
src/utils/types.py (3)
  • ToolCallSummary (102-110)
  • ToolResultSummary (113-124)
  • TurnSummary (135-220)
src/models/responses.py (1)
src/utils/types.py (2)
  • ToolCallSummary (102-110)
  • ToolResultSummary (113-124)
src/app/endpoints/query_v2.py (4)
src/utils/suid.py (2)
  • normalize_conversation_id (101-122)
  • to_llama_stack_conversation_id (125-145)
src/utils/responses.py (1)
  • extract_text_from_response_output_item (6-56)
src/utils/types.py (2)
  • ToolCallSummary (102-110)
  • ToolResultSummary (113-124)
src/models/responses.py (2)
  • conversation (1383-1392)
  • ReferencedDocument (328-338)
src/app/endpoints/streaming_query.py (2)
src/app/endpoints/query.py (1)
  • parse_referenced_documents (605-632)
src/utils/types.py (1)
  • content_to_str (20-39)
tests/unit/models/responses/test_query_response.py (2)
src/models/responses.py (2)
  • QueryResponse (341-451)
  • ReferencedDocument (328-338)
src/utils/types.py (2)
  • ToolCallSummary (102-110)
  • ToolResultSummary (113-124)
tests/unit/app/endpoints/test_query.py (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
  • ToolExecutionStep (128-133)
  • ToolResponse (136-141)
src/utils/types.py (1)
  • TurnSummary (135-220)
src/app/endpoints/query.py (2)
src/models/responses.py (7)
  • PromptTooLongResponse (1489-1518)
  • openapi_response (47-60)
  • openapi_response (458-480)
  • openapi_response (858-882)
  • openapi_response (1187-1210)
  • QuotaExceededResponse (1563-1643)
  • model (1624-1628)
src/utils/types.py (2)
  • TurnSummary (135-220)
  • content_to_str (20-39)
🪛 markdownlint-cli2 (0.18.1)
docs/conversations_api.md

16-16: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


17-17: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


18-18: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


20-20: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


21-21: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


22-22: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


24-24: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


25-25: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


26-26: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


27-27: Unordered list indentation
Expected: 2; Actual: 3

(MD007, ul-indent)


58-58: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


63-63: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


77-77: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


234-234: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


369-369: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


469-469: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (ci)

Comment on lines +7035 to 7107
"content": {
"title": "Content",
"description": "Content/result returned from the tool"
},
"type": {
"type": "string",
"title": "Type",
"description": "Type indicator for tool result",
"default": "tool_result"
},
"round": {
"type": "integer",
"title": "Round",
"description": "Round number or step of tool execution"
}
},
"type": "object",
"required": [
"tool_name",
"arguments"
"id",
"status",
"content",
"round"
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

ToolResultSummary.content lacks type; validation is ambiguous

content is required but unconstrained. Define an explicit union to reflect “any JSON value”.

-                    "content": {
-                        "title": "Content",
-                        "description": "Content/result returned from the tool"
-                    },
+                    "content": {
+                        "anyOf": [
+                            { "type": "string" },
+                            { "type": "number" },
+                            { "type": "boolean" },
+                            { "type": "object", "additionalProperties": true },
+                            { "type": "array", "items": {} },
+                            { "type": "null" }
+                        ],
+                        "title": "Content",
+                        "description": "Content/result returned from the tool"
+                    },
🤖 Prompt for AI Agents
In docs/openapi.json around lines 7035-7057, the ToolResultSummary.content
property is required but has no type, making validation ambiguous; replace the
unconstrained content schema with an explicit union that accepts any JSON value
(object with additionalProperties: true, array, string, number, boolean — and
nullable if null is allowed) so validators and generators know content can be
any JSON value, and keep content listed in required.

@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch from 11d5075 to d4bda49 Compare December 3, 2025 14:29
@radofuchs radofuchs marked this pull request as draft December 4, 2025 07:49
@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch 2 times, most recently from 89d02ab to e00fca7 Compare December 4, 2025 09:12
@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch 3 times, most recently from 7fcd1be to 80cfa20 Compare December 4, 2025 10:12
@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch from 80cfa20 to 6d3c0db Compare December 4, 2025 10:51
@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch 2 times, most recently from ed0159e to af81305 Compare December 4, 2025 15:22
@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch from af81305 to f137d04 Compare December 4, 2025 15:24
@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch from f137d04 to 93dccdc Compare December 4, 2025 15:28
@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch from 91ceee4 to a458bd2 Compare December 4, 2025 17:53
@asimurka asimurka force-pushed the bump_llama-stack_to_0.3.0 branch from a458bd2 to 7f07d7a Compare December 4, 2025 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants