-
Notifications
You must be signed in to change notification settings - Fork 58
LCORE-956: Bump llama stack to 0.3.0 #866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
LCORE-956: Bump llama stack to 0.3.0 #866
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds Conversations v3 endpoints, integrates conversation lifecycle with Responses API for query and streaming flows, replaces ToolCall with ToolCallSummary and adds ToolResultSummary, enhances SSE streaming/error payloads, bumps Llama Stack to 0.3.0, and migrates config/docs for Starter distribution. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–75 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
5099065 to
3d4e2c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/models/responses.py (1)
417-451: Updatemodel_configexample to match the newToolCallSummaryandToolResultSummarystructure.The example includes
rag_chunks(which is no longer a field) and uses the oldtool_callsformat (tool_name,arguments,result) instead of the newToolCallSummarystructure (id,name,args,type). The example should also includetool_results.model_config = { "json_schema_extra": { "examples": [ { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", "response": "Operator Lifecycle Manager (OLM) helps users install...", - "rag_chunks": [ - { - "content": "OLM is a component of the Operator Framework toolkit...", - "source": "kubernetes-docs/operators.md", - "score": 0.95, - } - ], "tool_calls": [ { - "tool_name": "knowledge_search", - "arguments": {"query": "operator lifecycle manager"}, - "result": {"chunks_found": 5}, + "id": "call_123", + "name": "knowledge_search", + "args": {"query": "operator lifecycle manager"}, + "type": "tool_call", + } + ], + "tool_results": [ + { + "id": "call_123", + "status": "success", + "content": {"chunks_found": 5}, + "type": "tool_result", + "round": 1, } ], "referenced_documents": [src/utils/suid.py (1)
19-44: Aligncheck_suidtype handling with its docstring (bytes vs str)The extended
check_suidlogic forconv_...and plain hex conversation IDs looks good, and the newnormalize_conversation_id/to_llama_stack_conversation_idhelpers are straightforward and correct for theconv_<hex>vs normalized formats.However, the docstring now advertises:
suid (str | bytes)and- acceptance of a “byte representation”,
while the function is annotated as
check_suid(suid: str) -> booland, in practice, treats non‑strinputs as invalid (abytesvalue falls through touuid.UUID(suid)and is rejected withTypeError, returningFalse).To avoid confusion for callers, consider either:
- Updating the signature and implementation to genuinely support bytes, e.g.:
-def check_suid(suid: str) -> bool: +def check_suid(suid: str | bytes) -> bool: @@ - try: + try: + if isinstance(suid, bytes): + try: + suid = suid.decode("ascii") + except UnicodeDecodeError: + return False + # Accept llama-stack conversation IDs (conv_<hex> format) - if isinstance(suid, str) and suid.startswith("conv_"): + if suid.startswith("conv_"): @@ - if isinstance(suid, str) and len(suid) >= 32: + if len(suid) >= 32: @@ - uuid.UUID(suid) + uuid.UUID(str(suid))
- Or, if bytes are not needed, simplifying the docstring to mention only
strinputs.Either way, keeping type hints and documentation consistent will make this utility safer to reuse.
Also applies to: 45-98, 101-145
🧹 Nitpick comments (29)
tests/e2e/features/info.feature (1)
19-19: Exact llama-stack version assertion may be too brittleThe step now hard-codes
0.3.0rc3+rhai0. If the underlying stack moves to another rc/build of 0.3.0, this e2e will fail even though the service is healthy and still on the intended major/minor version.Consider relaxing this to:
- Match just the semantic version prefix (e.g.,
0.3.0), or- Delegate the exact expected value to a shared constant/env so it’s updated in one place when bumping the stack.
Also please confirm that the
/infoendpoint actually returns0.3.0rc3+rhai0in your target deployment before merging.pyproject.toml (1)
64-70: Track technical debt for type checking exclusions.Adding 4 more files to pyright exclusions increases the untyped surface area. The comment indicates these are deprecated v1 endpoints - consider adding a TODO with a tracking issue to either fix the type errors or remove these endpoints when v1 is fully deprecated.
src/utils/llama_stack_version.py (1)
43-60: Update docstring to reflect new behavior.The docstring states
version_info"must be parseable by semver.Version.parse", but the implementation now silently skips the check on parse failure rather than raising an exception. Update the docstring to document this behavior accurately.src/utils/types.py (2)
113-124: Consider using Literal types for status field.The
statusfield accepts any string, but based on usage it appears to only be "success" or "failure". Consider usingLiteral["success", "failure"]for better type safety.+from typing import Literal + class ToolResultSummary(BaseModel): """Model representing a result from a tool call (for tool_results list).""" id: str = Field( description="ID of the tool call/result, matches the corresponding tool call 'id'" ) - status: str = Field( + status: Literal["success", "failure"] = Field( ..., description="Status of the tool execution (e.g., 'success')" )
163-171: Address the TODO comment about theroundattribute.The comment
# clarify meaning of this attributeon line 169 indicates uncertainty about theroundfield's semantics. Either document its intended meaning or track this as a follow-up issue.Do you want me to open an issue to track clarifying the
roundattribute's meaning?src/models/responses.py (1)
28-41: Remove commented-out code instead of leaving it in the codebase.Dead code should be removed rather than commented out. Version control preserves history if these definitions are needed later.
-# class ToolCall(BaseModel): -# """Model representing a tool call made during response generation.""" - -# tool_name: str = Field(description="Name of the tool called") -# arguments: dict[str, Any] = Field(description="Arguments passed to the tool") -# result: dict[str, Any] | None = Field(None, description="Result from the tool") - - -# class ToolResult(BaseModel): -# """Model representing a tool result.""" - -# tool_name: str = Field(description="Name of the tool") -# result: dict[str, Any] = Field(description="Result from the tool") -src/app/endpoints/streaming_query_v2.py (3)
76-76: Enable or remove the commented-outPromptTooLongResponseentry.The 413 response is commented out but
PromptTooLongResponsewas added in this PR. If prompt-too-long errors can occur in this endpoint, enable the response; otherwise, remove the comment.- # 413: PromptTooLongResponse.openapi_response(), + 413: PromptTooLongResponse.openapi_response(),If enabling, also add the import:
from models.responses import ( ... PromptTooLongResponse, ... )
296-296: Fix typo: double closing parenthesis in comment.- # Perform cleanup tasks (database and cache operations)) + # Perform cleanup tasks (database and cache operations)
457-461: Remove commented-out debug code.Debug logging statements should be removed rather than left commented in production code.
response_stream = cast(AsyncIterator[OpenAIResponseObjectStream], response) - # async for chunk in response_stream: - # logger.error("Chunk: %s", chunk.model_dump_json()) - # Return the normalized conversation_id (already normalized above) - # The response_generator will emit it in the start event + # Return the normalized conversation_id (already normalized above). + # The response_generator will emit it in the start event. return response_stream, conversation_idsrc/utils/token_counter.py (1)
72-83: Input-token comment vs. implementation mismatch (system vs. user role)The comment says this block uses the same logic as
metrics.utils.update_llm_token_count_from_turn, but here thesystem_promptis wrapped asRawMessage(role="system", ...)while the metrics helper usesrole="user". This may slightly change tokenization and contradicts the “same logic” claim. Consider either aligning the role for true parity or updating the comment to reflect the intentional difference.tests/unit/utils/test_endpoints.py (1)
261-803: Deprecated agent/temp-agent tests are now skippedMarking these legacy
get_agent/get_temp_agentasync tests with@pytest.mark.skip(reason="Deprecated API test")is a pragmatic way to avoid failures against removed/changed APIs while preserving the scenarios for reference. You may eventually want to replace these with coverage for the new v2/v3 flows or group them under a custom marker (e.g.@pytest.mark.deprecated_api) to make later cleanup easier, but the current change is fine.tests/e2e/features/conversations.feature (1)
173-181: Non-existent DELETE now returns 200 with explanatory payloadThe DELETE
/conversations/{conversation_id}scenario for a non-existent ID now expects a 200 with{"success": true, "response": "Conversation cannot be deleted"}(ignoringconversation_id). This matches the new idempotent delete behavior but is a semantic shift from the earlier 404. Please ensure client logic and API docs are updated to rely on the response body/message for “not found” vs. “deleted” distinctions rather than the status code alone.docs/conversations_api.md (1)
14-31: Fix markdownlint issues: list indentation and fenced code languagesThe doc reads well, but the markdownlint findings are valid and may break CI:
- MD007: nested list items in the TOC (Lines ~16–27) are indented with three spaces instead of two.
- MD040: several fenced code blocks lack an explicit language (e.g., the
conv_...example, SSE snippets, error snippets).You can address both with a small style-only patch, for example:
- * [Llama Stack Format](#llama-stack-format) - * [Normalized Format](#normalized-format) - * [ID Conversion Utilities](#id-conversion-utilities) + * [Llama Stack Format](#llama-stack-format) + * [Normalized Format](#normalized-format) + * [ID Conversion Utilities](#id-conversion-utilities) @@ -``` +```text conv_<48-character-hex-string>@@
-+text
conv_0d21ba731f21f798dc9680125d5d6f493e4a7ab79f25670e@@ -``` +```text data: {"event": "start", "data": {"conversation_id": "0d21ba7..."}} @@ -``` +```text Error: Conversation not found (HTTP 404)Apply the same `text` (or another appropriate language like `bash`, `json`, or `sql`) to the remaining unlabeled fenced blocks. Also applies to: 58-80, 234-245, 369-379, 469-471 </blockquote></details> <details> <summary>tests/unit/models/responses/test_successful_responses.py (1)</summary><blockquote> `41-45`: **Strengthening QueryResponse and StreamingQueryResponse tests** The switch to `ToolCallSummary` / `ToolResultSummary` and the new `TestStreamingQueryResponse` look consistent with the updated response models: - `test_constructor_minimal` correctly expects `tool_calls` and `tool_results` to default to `None`. - `TestStreamingQueryResponse` validates the SSE `text/event-stream` OpenAPI shape and example presence exactly as implemented. One small improvement: in `test_constructor_full`, you already construct `tool_results` and pass them into `QueryResponse`, but never assert they are preserved. Adding a single check would tighten coverage on the new field: ```diff @@ - response = QueryResponse( # type: ignore[call-arg] + response = QueryResponse( # type: ignore[call-arg] @@ - assert response.conversation_id == "conv-123" - assert response.tool_calls == tool_calls + assert response.conversation_id == "conv-123" + assert response.tool_calls == tool_calls + assert response.tool_results == tool_results assert response.referenced_documents == referenced_docsAlso applies to: 262-317, 968-996
tests/integration/endpoints/test_query_v2_integration.py (1)
84-90: Integration tests correctly exercise ID normalization, tool calls, and transcript handling
- Mocking
conversations.createto returnconv_+ 48 hex chars and assertingresponse.conversation_id == "a" * 48gives good end‑to‑end coverage of the new normalization flow.- The updated tool‑call tests now operate on structured
resultsand asserttool_calls[0].name/ aggregatedtool_names, matching the new ToolCallSummary shape.- In the transcript behavior test, introducing
mockerand patchingstore_transcriptavoids real file I/O while still verifying that conversations are persisted regardless of transcript configuration.You might optionally also assert
len(response.conversation_id) == 48in the normalization checks to make the invariant explicit, but the current tests are already solid.Also applies to: 166-170, 346-355, 375-378, 508-510, 1198-1227
tests/unit/app/endpoints/test_query_v2.py (1)
108-146: Unit tests comprehensively cover query v2 behavior; tighten one referenced-docs testOverall, these unit tests give strong coverage of the new query v2 behavior:
retrieve_response_*tests now mockconversations.createand assert that conversation IDs are normalized (e.g.,"conv_abc123def456"→"abc123def456"), exercise RAG/MCP tool preparation, shields/guardrails wiring, and token‑usage extraction.- Output parsing tests correctly handle content parts with
annotationslists and ensure tool calls are surfaced asToolCallSummaryinstances.- The new
test_retrieve_response_parses_referenced_documentsnicely validates URL citations, file citations, andfile_search_callresults feeding intoreferenced_docs.One small consistency improvement: in
test_retrieve_response_parses_referenced_documentsyou don’t mockconversations.create, so the returnedconv_iddepends on AsyncMock’s default behavior and is unused. For clarity and future robustness (if the helper ever starts validating or persisting that ID), you could align it with the other tests:@@ - mock_client = mocker.AsyncMock() + mock_client = mocker.AsyncMock() @@ - mock_client.responses.create = mocker.AsyncMock(return_value=response_obj) + mock_client.responses.create = mocker.AsyncMock(return_value=response_obj) + # Ensure deterministic conversation_id normalization + mock_conversation = mocker.Mock() + mock_conversation.id = "conv_abc123def456" + mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation) @@ - _summary, _conv_id, referenced_docs, _token_usage = await retrieve_response( + _summary, conv_id, referenced_docs, _token_usage = await retrieve_response( mock_client, "model-docs", qr, token="tkn", provider_id="test-provider" ) + + assert conv_id == "abc123def456"This keeps the conversation‑ID behavior under test consistent with the rest of the suite without changing the core assertions around referenced documents.
Also applies to: 151-199, 209-265, 268-399, 400-444, 468-475, 535-579, 582-629, 676-775, 779-860
src/app/endpoints/query.py (2)
197-201: Consider removing commented-out code.The
# toolgroups=None,comment on line 201 and commented# documents=documentson line 754 appear to be temporary placeholders. If these parameters are intentionally removed from the API call, the comments should be cleaned up. If they represent pending work, consider adding a TODO with a ticket reference.
742-757: Clarify status of commented-out code blocks.Multiple code sections are commented out with no clear explanation:
- Lines 742-749:
documentsconstruction- Lines 754:
# documents=documents- Lines 756:
# toolgroups=toolgroupsIf this represents intentional API changes in llama-stack 0.3.0, consider adding brief comments explaining why. If temporary, add TODOs with ticket references.
tests/unit/app/endpoints/test_streaming_query.py (1)
46-144: Mock classes provide backward compatibility for deprecated Agent API tests.These mock classes (
TextDelta,ToolCallDelta,TurnResponseEvent, etc.) emulate the Agent API types that no longer exist in llama-stack-client 0.3.x. The docstrings clearly indicate their purpose for backward compatibility.Consider consolidating these mocks into a shared test utilities module if they're needed across multiple test files, to reduce duplication.
src/utils/endpoints.py (2)
313-316: Clarify intent of commented-out agent retrieval code.Multiple sections of agent retrieval and session management code are commented out with
...ellipsis placeholders:
- Lines 313-315: Agent retrieval
- Lines 335-342: Orphan agent deletion and session retrieval
If this represents a temporary state during the llama-stack 0.3.0 migration, add a TODO comment with a ticket reference. If these features are intentionally removed, the dead code should be cleaned up.
if conversation_id: with suppress(ValueError): - # agent_response = await client.agents.retrieve(agent_id=conversation_id) - # existing_agent_id = agent_response.agent_id - ... + # TODO(LCORE-XXX): Agent retrieval disabled during llama-stack 0.3.0 migration + passAlso applies to: 335-342
323-329: Shield parameters commented out with type:ignore.The
input_shieldsandoutput_shieldsparameters are commented out with# type: ignore[call-arg]annotations. If shields are intentionally disabled in llama-stack 0.3.0, add a clear comment explaining why. Theenable_session_persistenceparameter has a similar pattern.src/app/endpoints/conversations_v3.py (1)
47-48: Logger name should use__name__pattern; router tag inconsistent with filename.Per coding guidelines, use
logger = logging.getLogger(__name__)for module logging. Additionally, the router tag"conversations_v1"is inconsistent with the filenameconversations_v3.py, which may cause confusion in API documentation.-logger = logging.getLogger("app.endpoints.handlers") -router = APIRouter(tags=["conversations_v1"]) +logger = logging.getLogger(__name__) +router = APIRouter(tags=["conversations_v3"])tests/unit/app/endpoints/test_query.py (1)
609-609: Consider tracking or removing deprecated tests.Multiple tests are marked as skipped with "Deprecated API test". While this is acceptable during migration, consider:
- Creating an issue to track removal of these deprecated tests
- Converting them to test the new Responses API equivalents
- Removing them if the functionality is covered elsewhere
Leaving many skipped tests long-term can obscure coverage metrics and create maintenance burden.
Would you like me to open an issue to track the cleanup of these deprecated tests?
src/app/endpoints/streaming_query.py (2)
363-378: Improve error logging message in stream_http_error.
logger.exception(error)logs the error object as the message. While this works, it's more conventional to provide a descriptive message string:async def stream_http_error(error: AbstractErrorResponse) -> AsyncGenerator[str, None]: ... logger.error("Error while obtaining answer for user question") - logger.exception(error) + logger.exception("Streaming error: %s", type(error).__name__) yield format_stream_data({"event": "error", "data": {**error.detail.model_dump()}})
1110-1116: Commented-out parameters need clarification or cleanup.The
documentsparameter has a TODO reference (LCORE-881), but thetoolgroupsparameter is also commented out without explanation. Either add a tracking reference or remove if intentionally disabled.response = await agent.create_turn( messages=[UserMessage(role="user", content=query_request.query).model_dump()], session_id=session_id, - # documents=documents, + # TODO: LCORE-881 - documents temporarily disabled + # documents=documents, stream=True, - # toolgroups=toolgroups, + # TODO: Add ticket reference for toolgroups disabled status + # toolgroups=toolgroups, )src/app/endpoints/query_v2.py (1)
390-390: Consider reducing log verbosity for response object.Logging the entire
responseobject at INFO level could produce large log entries and impact performance. Consider using DEBUG level or logging only summary information:- logger.info("Response: %s", response) + logger.debug("Response object: %s", response)docs/openapi.json (3)
3431-3442: Minor docs polish: consistent error examples and typos
- Ensure example labels/messages are consistent (e.g., “cannot be deleted” vs “can not be deleted” elsewhere).
- Typo “attatchment” appears in validation examples; prefer “attachment”.
No schema change needed; search/replace in example texts is sufficient.
Also applies to: 3451-3470, 3479-3492, 3497-3511, 3526-3539, 3542-3551
6990-7014: ToolCallSummary: consider making args optional-but-documented and align example usageargs is optional in schema but not shown in examples anywhere else. Add a minimal example or clarify default (empty object) to improve client codegen hints. Also ensure all endpoint examples use name/args/id consistently.
1517-1525: SSE schema representation — optional enhancementOpenAPI can’t fully model SSE; consider linking to an explicit event schema (start/token/turn_complete/end) in description for stronger client expectations.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (40)
docs/conversations_api.md(1 hunks)docs/openapi.json(11 hunks)pyproject.toml(2 hunks)run.yaml(1 hunks)src/app/endpoints/conversations_v2.py(0 hunks)src/app/endpoints/conversations_v3.py(1 hunks)src/app/endpoints/query.py(9 hunks)src/app/endpoints/query_v2.py(16 hunks)src/app/endpoints/streaming_query.py(19 hunks)src/app/endpoints/streaming_query_v2.py(14 hunks)src/app/routers.py(2 hunks)src/constants.py(1 hunks)src/metrics/utils.py(1 hunks)src/models/requests.py(1 hunks)src/models/responses.py(7 hunks)src/utils/endpoints.py(5 hunks)src/utils/llama_stack_version.py(2 hunks)src/utils/suid.py(1 hunks)src/utils/token_counter.py(1 hunks)src/utils/transcripts.py(1 hunks)src/utils/types.py(3 hunks)test.containerfile(1 hunks)tests/configuration/minimal-stack.yaml(1 hunks)tests/e2e/configs/run-ci.yaml(1 hunks)tests/e2e/features/conversations.feature(1 hunks)tests/e2e/features/info.feature(1 hunks)tests/integration/endpoints/test_query_v2_integration.py(8 hunks)tests/integration/test_openapi_json.py(4 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(30 hunks)tests/unit/app/endpoints/test_query_v2.py(27 hunks)tests/unit/app/endpoints/test_streaming_query.py(28 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(13 hunks)tests/unit/app/test_routers.py(3 hunks)tests/unit/models/responses/test_error_responses.py(3 hunks)tests/unit/models/responses/test_query_response.py(3 hunks)tests/unit/models/responses/test_rag_chunk.py(1 hunks)tests/unit/models/responses/test_successful_responses.py(6 hunks)tests/unit/utils/test_endpoints.py(11 hunks)tests/unit/utils/test_transcripts.py(3 hunks)
💤 Files with no reviewable changes (1)
- src/app/endpoints/conversations_v2.py
🧰 Additional context used
📓 Path-based instructions (9)
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/conversations.featuretests/e2e/features/info.feature
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/utils/suid.pysrc/utils/llama_stack_version.pysrc/metrics/utils.pysrc/utils/transcripts.pysrc/utils/token_counter.pysrc/app/endpoints/conversations_v3.pysrc/constants.pysrc/utils/types.pysrc/utils/endpoints.pysrc/app/routers.pysrc/models/responses.pysrc/models/requests.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/utils/test_endpoints.pytests/unit/models/responses/test_successful_responses.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/models/responses/test_rag_chunk.pytests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.pytests/integration/test_openapi_json.pytests/unit/utils/test_transcripts.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_query_response.pytests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/utils/test_endpoints.pytests/unit/models/responses/test_successful_responses.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/models/responses/test_rag_chunk.pytests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.pytests/integration/test_openapi_json.pytests/unit/utils/test_transcripts.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_query_response.pytests/unit/app/endpoints/test_query.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Configure pylint withsource-roots = "src"
Excludesrc/auth/k8s.pyfrom pyright type checking
Files:
pyproject.toml
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/conversations_v3.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/conversations_v3.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
src/**/constants.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define shared constants in central
constants.pyfile with descriptive comments
Files:
src/constants.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/responses.pysrc/models/requests.py
🧠 Learnings (7)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to pyproject.toml : Exclude `src/auth/k8s.py` from pyright type checking
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in tests
Applied to files:
tests/unit/utils/test_endpoints.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/app/endpoints/test_query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/{unit,integration}/**/*.py : Use pytest for all unit and integration tests; do not use unittest framework
Applied to files:
tests/unit/utils/test_endpoints.py
📚 Learning: 2025-08-06T06:02:21.060Z
Learnt from: eranco74
Repo: lightspeed-core/lightspeed-stack PR: 348
File: src/utils/endpoints.py:91-94
Timestamp: 2025-08-06T06:02:21.060Z
Learning: The direct assignment to `agent._agent_id` in `src/utils/endpoints.py` is a necessary workaround for the missing agent rehydration feature in the LLS client SDK. This allows preserving conversation IDs when handling existing agents.
Applied to files:
tests/unit/utils/test_endpoints.pysrc/utils/endpoints.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
src/metrics/utils.pysrc/app/endpoints/streaming_query.pytests/unit/app/endpoints/test_query.pysrc/app/endpoints/query.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.
Applied to files:
test.containerfile
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
tests/unit/app/endpoints/test_query.py
🧬 Code graph analysis (16)
tests/unit/app/endpoints/test_conversations_v2.py (4)
tests/unit/app/endpoints/test_tools.py (1)
mock_configuration(28-47)src/cache/postgres_cache.py (1)
delete(312-347)src/cache/sqlite_cache.py (1)
delete(310-343)src/app/endpoints/conversations_v2.py (1)
delete_conversation_endpoint_handler(145-168)
tests/unit/models/responses/test_successful_responses.py (2)
src/models/responses.py (5)
StreamingQueryResponse(454-519)openapi_response(47-60)openapi_response(458-480)openapi_response(858-882)openapi_response(1187-1210)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
src/utils/endpoints.py (3)
src/models/responses.py (2)
model(1624-1628)NotFoundResponse(1431-1486)src/utils/types.py (2)
GraniteToolParser(60-99)get_parser(82-99)src/utils/suid.py (1)
get_suid(6-16)
tests/unit/models/responses/test_error_responses.py (1)
src/models/responses.py (7)
PromptTooLongResponse(1489-1518)AbstractErrorResponse(1157-1210)DetailModel(1150-1154)openapi_response(47-60)openapi_response(458-480)openapi_response(858-882)openapi_response(1187-1210)
tests/unit/app/endpoints/test_streaming_query.py (3)
src/models/responses.py (2)
ReferencedDocument(328-338)model(1624-1628)src/models/requests.py (1)
QueryRequest(73-233)src/app/endpoints/streaming_query.py (2)
streaming_query_endpoint_handler(961-995)stream_end_event(148-197)
tests/unit/models/responses/test_rag_chunk.py (1)
src/utils/types.py (1)
RAGChunk(127-132)
tests/unit/app/endpoints/test_query_v2.py (2)
src/app/endpoints/streaming_query.py (1)
retrieve_response(998-1119)src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(364-461)
tests/unit/app/endpoints/test_streaming_query_v2.py (4)
src/app/endpoints/streaming_query_v2.py (1)
streaming_query_endpoint_handler_v2(327-361)tests/unit/app/endpoints/test_query.py (1)
dummy_request(59-68)tests/unit/app/endpoints/test_query_v2.py (1)
dummy_request(31-34)src/models/responses.py (1)
model(1624-1628)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(36-51)
tests/unit/utils/test_transcripts.py (1)
src/utils/types.py (3)
ToolCallSummary(102-110)ToolResultSummary(113-124)TurnSummary(135-220)
src/models/responses.py (1)
src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
src/app/endpoints/query_v2.py (4)
src/utils/suid.py (2)
normalize_conversation_id(101-122)to_llama_stack_conversation_id(125-145)src/utils/responses.py (1)
extract_text_from_response_output_item(6-56)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)src/models/responses.py (2)
conversation(1383-1392)ReferencedDocument(328-338)
src/app/endpoints/streaming_query.py (2)
src/app/endpoints/query.py (1)
parse_referenced_documents(605-632)src/utils/types.py (1)
content_to_str(20-39)
tests/unit/models/responses/test_query_response.py (2)
src/models/responses.py (2)
QueryResponse(341-451)ReferencedDocument(328-338)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
tests/unit/app/endpoints/test_query.py (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
ToolExecutionStep(128-133)ToolResponse(136-141)src/utils/types.py (1)
TurnSummary(135-220)
src/app/endpoints/query.py (2)
src/models/responses.py (7)
PromptTooLongResponse(1489-1518)openapi_response(47-60)openapi_response(458-480)openapi_response(858-882)openapi_response(1187-1210)QuotaExceededResponse(1563-1643)model(1624-1628)src/utils/types.py (2)
TurnSummary(135-220)content_to_str(20-39)
🪛 markdownlint-cli2 (0.18.1)
docs/conversations_api.md
16-16: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
20-20: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
25-25: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
26-26: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
58-58: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
63-63: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
77-77: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
369-369: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
469-469: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (ci)
| "content": { | ||
| "title": "Content", | ||
| "description": "Content/result returned from the tool" | ||
| }, | ||
| "type": { | ||
| "type": "string", | ||
| "title": "Type", | ||
| "description": "Type indicator for tool result", | ||
| "default": "tool_result" | ||
| }, | ||
| "round": { | ||
| "type": "integer", | ||
| "title": "Round", | ||
| "description": "Round number or step of tool execution" | ||
| } | ||
| }, | ||
| "type": "object", | ||
| "required": [ | ||
| "tool_name", | ||
| "arguments" | ||
| "id", | ||
| "status", | ||
| "content", | ||
| "round" | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToolResultSummary.content lacks type; validation is ambiguous
content is required but unconstrained. Define an explicit union to reflect “any JSON value”.
- "content": {
- "title": "Content",
- "description": "Content/result returned from the tool"
- },
+ "content": {
+ "anyOf": [
+ { "type": "string" },
+ { "type": "number" },
+ { "type": "boolean" },
+ { "type": "object", "additionalProperties": true },
+ { "type": "array", "items": {} },
+ { "type": "null" }
+ ],
+ "title": "Content",
+ "description": "Content/result returned from the tool"
+ },🤖 Prompt for AI Agents
In docs/openapi.json around lines 7035-7057, the ToolResultSummary.content
property is required but has no type, making validation ambiguous; replace the
unconstrained content schema with an explicit union that accepts any JSON value
(object with additionalProperties: true, array, string, number, boolean — and
nullable if null is allowed) so validators and generators know content can be
any JSON value, and keep content listed in required.
11d5075 to
d4bda49
Compare
89d02ab to
e00fca7
Compare
Implements the function parse_referenced_documents_from_responses_api checking at the Response API output at: - file_search_call objects (filename and attributes) - annotations within messages content (type, url, title) - 2 type of annoations, url_citation and file_citation
7fcd1be to
80cfa20
Compare
80cfa20 to
6d3c0db
Compare
ed0159e to
af81305
Compare
af81305 to
f137d04
Compare
f137d04 to
93dccdc
Compare
91ceee4 to
a458bd2
Compare
a458bd2 to
7f07d7a
Compare
Description
Bump to llama stack version 0.3.0. Adds query and conversation endpoints using responses API that are compatible with OLS endpoints.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.