-
Notifications
You must be signed in to change notification settings - Fork 58
Add tool calls to stored transcripts #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors sync and streaming query endpoints to produce and persist structured TurnSummary objects (llm_response + tool_calls) instead of raw LLM strings; moves transcript path/storage logic into Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant QueryHandler as /query
participant Agent
participant Transcripts as utils.transcripts
Client->>/query: POST /query (QueryRequest)
/query->>Agent: create_turn(...)
Agent-->>/query: Turn (output_message, steps)
/query->>/query: Build TurnSummary from output_message
loop each step
/query->>/query: append_tool_calls_from_llama(step.tool_execution)
end
/query->>Transcripts: store_transcript(..., summary, ...)
/query-->>Client: { response: summary.llm_response }
sequenceDiagram
participant Client
participant StreamHandler as /streaming_query
participant AgentStream as Agent(stream)
participant Transcripts as utils.transcripts
Client->>/streaming_query: GET (SSE / streaming)
/streaming_query->>AgentStream: create_turn_stream(...)
AgentStream-->>/streaming_query: AsyncIterator[AgentTurnResponseStreamChunk]
loop stream chunks
/streaming_query->>/streaming_query: update TurnSummary (tokens/tool_calls)
/streaming_query-->>Client: SSE events (partial tokens / tool call events)
end
/streaming_query->>Transcripts: store_transcript(..., summary, ...)
/streaming_query-->>Client: stream closed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (15)
tests/unit/utils/test_transcripts.py (2)
50-97: store_transcript test is on-point; consider adding attachment serialization coverageThis verifies metadata, llm_response, tool_calls, and flags well. One gap: attachments are asserted as an empty list. Since store_transcript serializes attachments via model_dump(), a test including a non-empty attachments list would harden coverage against regressions in serialization.
I can provide a follow-up test variant that includes an Attachment and asserts its serialized shape if you want.
59-66: Consider asserting UTC timestamp format and attach model_dump checkMinor enhancements:
- Validate that metadata.timestamp ends with "Z" or contains "+00:00" (UTC ISO 8601).
- Add a fixture with one Attachment and assert attachments contains its model_dump, ensuring the serialization path doesn’t regress.
I can supply a concrete test patch if you want to include attachments and a stricter timestamp assertion.
Also applies to: 100-127
tests/unit/app/endpoints/test_streaming_query.py (1)
295-302: Make the streamed content join robust to bytes vs strDepending on Starlette/FastAPI versions, StreamingResponse.body_iterator may yield bytes. Joining with "" may fail. Join as bytes and decode to avoid brittle behavior.
Apply this diff:
- # Convert to string for assertions - full_content = "".join(streaming_content) + # Convert to string for assertions (handle bytes or str chunks) + full_content = ( + b"".join( + c if isinstance(c, (bytes, bytearray)) else c.encode("utf-8") + for c in streaming_content + ) + ).decode("utf-8")src/utils/types.py (1)
10-10: Prefer importing BaseModel from pydantic’s public APIImporting from pydantic.main is an internal path and can change. Use the public import to avoid breakage.
Apply this diff:
-from pydantic.main import BaseModel +from pydantic import BaseModelsrc/app/endpoints/streaming_query.py (4)
52-52: Make metadata extraction regex non-greedy and more tolerant of whitespaceThe current pattern r"\nMetadata: ({.+})\n" is greedy and assumes trailing newline, which can over-capture or miss matches. A non-greedy pattern tolerant to optional whitespace and EOF improves resilience.
Apply this diff:
-METADATA_PATTERN = re.compile(r"\nMetadata: (\{.+})\n") +METADATA_PATTERN = re.compile(r"\nMetadata:\s*(\{.*?\})\s*(?:\n|$)")
143-152: Guard against missing error.message in error chunksAccessing chunk.error["message"] assumes a dict with a "message" key. Fall back gracefully to the stringified error if structure differs.
Apply this diff:
-def _handle_error_event(chunk: Any, chunk_id: int) -> Iterator[str]: - yield format_stream_data( +def _handle_error_event(chunk: Any, chunk_id: int) -> Iterator[str]: + message = ( + chunk.error["message"] + if isinstance(getattr(chunk, "error", None), dict) and "message" in chunk.error + else str(getattr(chunk, "error", "Unknown error")) + ) + yield format_stream_data( { "event": "error", "data": { "id": chunk_id, - "token": chunk.error["message"], + "token": message, }, } )
499-499: Set correct SSE media type to ensure proper client handlingStreamingResponse should specify media_type="text/event-stream" for SSE. Some clients rely on this header for correct parsing.
Apply this diff:
- return StreamingResponse(response_generator(response)) + return StreamingResponse( + response_generator(response), + media_type="text/event-stream", + )
425-441: Persisting conversation details after response setup is fine; consider failure scenariosCurrent order persists user conversation details after preparing the generator. If an exception occurs before first iteration, persistence still succeeds. If you want persistence only after first successful stream token, move persist call into response_generator after emitting "start". Optional.
tests/unit/app/endpoints/test_query.py (1)
11-13: Import path consistency for TextContentItemElsewhere (streaming tests), TextContentItem is imported from llama_stack_client.types.shared.interleaved_content_item. Here it’s imported from llama_stack_client.types.shared.interleaved_content. Both may work in current versions, but consider unifying the import path across tests to avoid surprises with upstream package changes.
src/utils/transcripts.py (3)
55-57: Docstring/type mismatch for rag_chunksDocstring mentions
RagChunkobjects but the type hint islist[str]. Align the docs with the current type or introduce the model type if available.Apply this docstring tweak:
- rag_chunks: The list of `RagChunk` objects. + rag_chunks: The list of retrieved chunk strings.
72-73: Confirm “redacted_query” is actually redactedThe key name implies the text is scrubbed, but
queryis passed through directly. If redaction is not happening upstream, we should either rename the field to avoid a false sense of safety or actually redact here.Option A (preferable): introduce a simple redaction helper and use it here:
- "redacted_query": query, + "redacted_query": redact_text_for_transcripts(query),Example helper you can place in this module:
def redact_text_for_transcripts(text: str) -> str: # Very basic examples; we can expand with configuration-driven patterns: # - Strip bearer tokens # - Mask API keys and UUIDs # - Remove email addresses and probable secrets import re text = re.sub(r'Bearer\s+[A-Za-z0-9\-\._~\+\/]+=*', 'Bearer [REDACTED]', text) text = re.sub(r'(?i)api[_-]?key\s*[:=]\s*[^\s,;"]+', 'api_key=[REDACTED]', text) text = re.sub(r'[A-F0-9]{8}-[A-F0-9]{4}-[4][A-F0-9]{3}-[89AB][A-F0-9]{3}-[A-F0-9]{12}', '[UUID]', text) text = re.sub(r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+', '[EMAIL]', text) return textOption B: rename the field to
"query"if we don't intend to redact at this stage (requires test updates).I can implement a configurable redaction pipeline (patterns + toggles) if desired.
84-85: Improve JSON readability and Unicode handling (optional)For local inspection, pretty-print and preserve Unicode characters.
Apply this diff:
- json.dump(data_to_store, transcript_file) + json.dump(data_to_store, transcript_file, ensure_ascii=False, indent=2)src/app/endpoints/query.py (3)
206-218: Don’t fail the request if transcript persistence errorsDisk issues shouldn’t break user-facing responses. Log and continue.
Apply this diff:
- store_transcript( - user_id=user_id, - conversation_id=conversation_id, - model_id=model_id, - provider_id=provider_id, - query_is_valid=True, # TODO(lucasagomes): implement as part of query validation - query=query_request.query, - query_request=query_request, - summary=summary, - rag_chunks=[], # TODO(lucasagomes): implement rag_chunks - truncated=False, # TODO(lucasagomes): implement truncation as part of quota work - attachments=query_request.attachments or [], - ) + try: + store_transcript( + user_id=user_id, + conversation_id=conversation_id, + model_id=model_id, + provider_id=provider_id, + query_is_valid=True, # TODO: implement as part of query validation + query=query_request.query, + query_request=query_request, + summary=summary, + rag_chunks=[], # TODO: implement rag_chunks + truncated=False, # TODO: implement truncation as part of quota work + attachments=query_request.attachments or [], + ) + except Exception as e: + logger.warning("Failed to store transcript: %s", e)
410-413: Guard against missing output_message to avoid attribute errorsDepending on Llama Stack behavior,
output_messagemight be absent in rare error/violation-only scenarios. A small guard makes this robust.Apply this diff:
- summary = TurnSummary( - llm_response=interleaved_content_as_str(response.output_message.content), - tool_calls=[], - ) + output_message = getattr(response, "output_message", None) + llm_text = ( + interleaved_content_as_str(output_message.content) + if output_message and getattr(output_message, "content", None) is not None + else "" + ) + summary = TurnSummary(llm_response=llm_text, tool_calls=[])Please confirm whether
output_messageis guaranteed on successful turns by the client library you’re targeting. If guaranteed, this can be skipped.
160-161: Reduce log verbosity for Llama stack configurationConfiguration dumps can be noisy and sometimes sensitive. Consider moving to debug.
Apply this diff:
- logger.info("LLama stack config: %s", llama_stack_config) + logger.debug("Llama stack config: %s", llama_stack_config)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
src/app/endpoints/query.py(7 hunks)src/app/endpoints/streaming_query.py(6 hunks)src/utils/transcripts.py(1 hunks)src/utils/types.py(2 hunks)tests/unit/app/endpoints/test_query.py(23 hunks)tests/unit/app/endpoints/test_streaming_query.py(3 hunks)tests/unit/utils/test_transcripts.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
tests/unit/utils/test_transcripts.py (4)
src/configuration.py (3)
configuration(55-60)AppConfig(27-129)init_from_dict(50-52)src/models/requests.py (1)
QueryRequest(70-220)src/utils/transcripts.py (2)
construct_transcripts_path(21-30)store_transcript(33-86)src/utils/types.py (2)
ToolCallSummary(43-56)TurnSummary(59-78)
src/utils/transcripts.py (4)
src/configuration.py (2)
configuration(55-60)user_data_collection_configuration(79-84)src/models/requests.py (2)
Attachment(15-67)QueryRequest(70-220)src/utils/suid.py (1)
get_suid(6-12)src/utils/types.py (1)
TurnSummary(59-78)
tests/unit/app/endpoints/test_streaming_query.py (1)
src/utils/types.py (2)
ToolCallSummary(43-56)TurnSummary(59-78)
src/app/endpoints/streaming_query.py (2)
src/utils/transcripts.py (1)
store_transcript(33-86)src/utils/types.py (2)
TurnSummary(59-78)append_tool_calls_from_llama(65-78)
src/app/endpoints/query.py (4)
src/utils/transcripts.py (1)
store_transcript(33-86)src/utils/types.py (2)
TurnSummary(59-78)append_tool_calls_from_llama(65-78)src/app/endpoints/streaming_query.py (1)
retrieve_response(514-603)src/models/responses.py (1)
QueryResponse(48-79)
tests/unit/app/endpoints/test_query.py (2)
src/utils/types.py (2)
ToolCallSummary(43-56)TurnSummary(59-78)src/app/endpoints/query.py (1)
retrieve_response(323-424)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests
🔇 Additional comments (19)
tests/unit/utils/test_transcripts.py (1)
13-47: Good coverage and correct sanitization verification for construct_transcripts_pathThe test correctly initializes AppConfig, patches the module-level configuration, and asserts the sanitized, normalized path. Looks solid.
tests/unit/app/endpoints/test_streaming_query.py (3)
317-341: TurnSummary expectation matches implementation — niceAsserting store_transcript with a structured TurnSummary including tool_calls aligns with utils.types.TurnSummary.append_tool_calls_from_llama. This ensures tool call data is persisted, not just the LLM text.
317-341: TurnSummary expectation is precise — mirrors append_tool_calls_from_llama behaviorThe asserted tool_calls entry matches how interleaved content is concatenated and indexed by call_id. This is the right test surface for regression safety.
90-116: Setup for configuration-not-loaded and connection error tests is adequateThe negative-path tests cover configuration absence and connection failures and assert correct HTTPException handling. Looks good.
Also applies to: 118-173
src/utils/types.py (2)
65-78: append_tool_calls_from_llama matches expected llama structuresMapping tool_calls/tool_responses by call_id and interleaving content into a string is the right approach for consistent storage. No issues found.
43-57: Models are minimal and future-proof for transcriptsThe ToolCallSummary fields and union-typed args accommodate current and future tool call shapes. Looks good.
src/app/endpoints/streaming_query.py (5)
443-487: Summary aggregation inside response_generator is correct and minimalCapturing llm_response on turn_complete and appending tool calls on tool_execution step_complete aligns with TurnSummary semantics and test expectations. Good separation from SSE emission.
514-603: Typed streaming iterator casting is appropriateReturning an AsyncIterator[AgentTurnResponseStreamChunk] and casting the agent.create_turn result clarifies usage and improves type safety in the streaming loop.
260-271: Tool call delta handling covers both str and ToolCall scenariosThis ensures tool call progress events are surfaced in SSE. No changes needed.
170-181: turn_complete SSE token composition is correctUsing interleaved_content_as_str against output_message.content avoids downstream parsing churn. No issues here.
360-373: Knowledge search metadata parsing is pragmatic; small regex/parse improvements already suggestedThe logic to extract and summarize knowledge_search responses and collect referenced documents works. With the regex fix above, this stays robust.
tests/unit/app/endpoints/test_query.py (3)
127-137: Using TurnSummary/ToolCallSummary in tests is aligned with the new contractGood refactor: tests now validate structured summaries end-to-end, reducing brittleness compared to raw strings.
423-437: End-to-end tests faithfully validate model/toolgroup selection and MCP headersThese unit tests are thorough in validating query flow: shields/mcp/vector db selection, extra_headers encoding, and toolgroups assembly. The expectations match the query endpoint’s retrieve_response logic.
Also applies to: 461-473, 510-523, 616-633, 672-689, 726-747, 781-821, 850-876, 921-971, 1221-1243, 1286-1313
162-184: Transcript assertion updated to summary looks correctSwitching the store_transcript expectation to pass summary preserves intent of previous tests and aligns with new signature.
src/utils/transcripts.py (1)
23-27: Path traversal hardening looks goodThe normpath + lstrip pattern effectively constrains user-provided IDs to safe path components. Good call-out in the comment as well.
src/app/endpoints/query.py (4)
193-200: Good shift to TurnSummary flowSwitching retrieve_response to return a TurnSummary keeps API compatibility while enabling richer transcript storage. Clean handoff.
227-230: API response remains stable while leveraging TurnSummaryReturning
summary.llm_responsemaintains the existing contract. Nice separation of API vs. persistence concerns.
421-423: Tool call aggregation via TurnSummary is clean and localizedCollecting tool calls only when step_type is "tool_execution" and appending to the summary keeps the core logic tidy.
323-330: Signature change ripple-check — OKQuick check ran across the repo: there are two distinct retrieve_response definitions and their call sites have been adapted to each variant.
Definitions found
- src/app/endpoints/query.py:323 — retrieve_response (used in non-streaming flow)
- src/app/endpoints/streaming_query.py:514 — retrieve_response (streaming flow)
Representative call sites (already adapted)
- src/app/endpoints/query.py:193 — summary, conversation_id = await retrieve_response(...)
- src/app/endpoints/streaming_query.py:434 — response, conversation_id = await retrieve_response(...)
Tests updated accordingly:
- tests/unit/app/endpoints/test_query.py — many places unpack as (summary, conversation_id)
- tests/unit/app/endpoints/test_streaming_query.py — many places unpack as (response, conversation_id)
No lingering callers in query.py expecting (str, str) were found; streaming_query.py provides its own variant and its callers/tests expect the streaming return shape. No changes required here.
| configuration.user_data_collection_configuration.transcripts_storage or "" | ||
| ) | ||
| return Path(file_path, uid, cid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Expand configured base path and handle user home (~) safely
If transcripts_storage uses ~ or relative paths, expanding to an absolute path avoids surprises. Also worth clarifying the empty-string fallback behavior.
Apply this diff to expand the base path before composing:
- file_path = (
- configuration.user_data_collection_configuration.transcripts_storage or ""
- )
- return Path(file_path, uid, cid)
+ base = (
+ configuration.user_data_collection_configuration.transcripts_storage or ""
+ )
+ base_path = Path(base).expanduser()
+ return base_path / uid / cidIf the empty fallback is intentional, consider documenting it (e.g., transcripts stored relative to CWD), or set a default directory (e.g., "./transcripts") in configuration. Do you want me to add that default?
You can also check where transcripts_storage is set or defaulted:
🏁 Script executed:
#!/bin/bash
rg -nP 'transcripts_storage' -C3Length of output: 9905
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== src/utils/transcripts.py ==="
sed -n '1,240p' src/utils/transcripts.py || true
echo "=== src/models/config.py ==="
sed -n '1,340p' src/models/config.py || true
echo "=== tests/unit/utils/test_transcripts.py ==="
sed -n '1,240p' tests/unit/utils/test_transcripts.py || trueLength of output: 17876
Expand configured base path and handle user home (~) safely
Quick check: src/models/config.py enforces that transcripts_storage is set when transcripts_enabled is true; examples/docs use absolute paths (e.g. /tmp/data/transcripts). Changing construct_transcripts_path to expand ~ is safe and won't break existing tests that use absolute paths.
Files to update:
- src/utils/transcripts.py — construct_transcripts_path (around lines 25–30)
Apply this diff:
- file_path = (
- configuration.user_data_collection_configuration.transcripts_storage or ""
- )
- return Path(file_path, uid, cid)
+ base = (
+ configuration.user_data_collection_configuration.transcripts_storage or ""
+ )
+ base_path = Path(base).expanduser()
+ return base_path / uid / cidIf the empty-string fallback is intentional (transcripts stored relative to CWD), consider documenting that or setting a default (e.g., "./transcripts"). Do you want me to add a default or documentation note?
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| configuration.user_data_collection_configuration.transcripts_storage or "" | |
| ) | |
| return Path(file_path, uid, cid) | |
| base = ( | |
| configuration.user_data_collection_configuration.transcripts_storage or "" | |
| ) | |
| base_path = Path(base).expanduser() | |
| return base_path / uid / cid |
🤖 Prompt for AI Agents
In src/utils/transcripts.py around lines 25–30, the construct_transcripts_path
currently uses the raw configuration value and can leave a leading '~'
unexpanded; update it to safely expand user home before building the Path.
Replace using the raw string with an expanded base path (e.g.,
Path(configuration.user_data_collection_configuration.transcripts_storage or
"").expanduser() or use os.path.expanduser on the string) and then join uid and
cid (return Path(expanded_base, uid, cid)). Keep the current empty-string
fallback behavior unless you want to change it to a specific default like
"./transcripts" — if you prefer that, substitute the fallback accordingly.
| transcripts_path = construct_transcripts_path(user_id, conversation_id) | ||
| transcripts_path.mkdir(parents=True, exist_ok=True) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harden on-disk permissions for transcripts (PII/secret safety)
Transcripts can contain sensitive data. Use restrictive permissions for the directory and files to avoid accidental leakage on multi-user hosts.
Apply this diff:
- transcripts_path.mkdir(parents=True, exist_ok=True)
+ transcripts_path.mkdir(parents=True, exist_ok=True, mode=0o700)
@@
- with open(transcript_file_path, "w", encoding="utf-8") as transcript_file:
+ with open(transcript_file_path, "w", encoding="utf-8") as transcript_file:
json.dump(data_to_store, transcript_file)
+ try:
+ os.chmod(transcript_file_path, 0o600)
+ except Exception: # best-effort on non-POSIX systems
+ logger.debug("Unable to set restrictive permissions on %s", transcript_file_path)Also applies to: 83-86
🤖 Prompt for AI Agents
In src/utils/transcripts.py around lines 59-61 (and similarly lines 83-86), the
code creates transcript directories/files with default permissions; change
creation to enforce restrictive on-disk permissions: after ensuring the
directory exists, explicitly set its mode to 0o700 (owner read/write/execute)
using os.chmod (don’t rely solely on mkdir(mode=...) because umask can alter
it). For transcript files, create/write them and then set their mode to 0o600
(owner read/write) using os.chmod, or create them with os.open and the mode flag
to guarantee permissions atomically; ensure errors are handled or logged if
chmod fails.
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you
umago
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! Thank you!
f9ad22d to
951a0ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
src/app/endpoints/streaming_query.py (1)
539-539: Avoid logging full Llama stack configuration (leaks secrets such as API keys)Line 539 logs the entire configuration at INFO. This can expose
api_keyand other secrets to logs.Apply this diff to log only non-sensitive fields at DEBUG:
- logger.info("LLama stack config: %s", llama_stack_config) + logger.debug( + "Llama stack URL: %s", + getattr(llama_stack_config, "url", "<unknown>"), + )src/app/endpoints/query.py (1)
174-174: Avoid logging full Llama stack configuration (leaks secrets such as API keys)Same as in streaming: logging the full config at INFO can expose secrets.
Apply this diff:
- logger.info("LLama stack config: %s", llama_stack_config) + logger.debug( + "Llama stack URL: %s", + getattr(llama_stack_config, "url", "<unknown>"), + )
♻️ Duplicate comments (2)
src/utils/transcripts.py (2)
27-30: Expand configured base path and handle user home (~) safelyIf
transcripts_storagecontains~or is relative, paths won’t expand and may resolve unexpectedly. Expand it before joining components.Apply this diff:
- file_path = ( - configuration.user_data_collection_configuration.transcripts_storage or "" - ) - return Path(file_path, uid, cid) + base = ( + configuration.user_data_collection_configuration.transcripts_storage or "" + ) + base_path = Path(base).expanduser() + return base_path / uid / cid
59-61: Harden on-disk permissions for transcripts (PII/secret safety)Directories and files are created with default umask-controlled modes. Given transcripts may contain sensitive data, set restrictive permissions explicitly.
Apply this diff:
- transcripts_path = construct_transcripts_path(user_id, conversation_id) - transcripts_path.mkdir(parents=True, exist_ok=True) + transcripts_path = construct_transcripts_path(user_id, conversation_id) + transcripts_path.mkdir(parents=True, exist_ok=True, mode=0o700) + try: + os.chmod(transcripts_path, 0o700) # ensure mode regardless of umask + except Exception: + logger.debug("Unable to set restrictive permissions on %s", transcripts_path) @@ - with open(transcript_file_path, "w", encoding="utf-8") as transcript_file: - json.dump(data_to_store, transcript_file) + with open(transcript_file_path, "w", encoding="utf-8") as transcript_file: + json.dump(data_to_store, transcript_file) + try: + os.chmod(transcript_file_path, 0o600) + except Exception: + logger.debug( + "Unable to set restrictive permissions on %s", transcript_file_path + )Also applies to: 83-86
🧹 Nitpick comments (4)
src/app/endpoints/streaming_query.py (3)
596-599: Initialize llm_response to an empty string instead of a placeholderAvoid persisting “No response from the model” in transcripts; default to empty to reflect absence of a turn_complete.
- summary = TurnSummary( - llm_response="No response from the model", tool_calls=[] - ) + summary = TurnSummary(llm_response="", tool_calls=[])
647-647: Set correct SSE media type for StreamingResponseExplicitly set
media_type="text/event-stream"so clients treat the stream as SSE.- return StreamingResponse(response_generator(response)) + return StreamingResponse( + response_generator(response), media_type="text/event-stream" + )
619-636: Transcript store runs inside the streaming generator; may be skipped on early client disconnectIf the client disconnects before the end event, the generator can be cancelled and
store_transcript(...)may never run. Consider moving persistence outside the generator or guarding with atry/finallyto ensure execution on cancellation.I can draft a small refactor that collects the
TurnSummaryoutside the generator and persists in the endpoint after streaming finishes (or in abackground_task) if that better fits your lifecycle.tests/unit/app/endpoints/test_query.py (1)
11-13: Align TextContentItem import path with application code to reduce fragilityApp code imports
TextContentItemfromllama_stack_client.types.shared.interleaved_content_item, while this test imports from...interleaved_content. Prefer a single import path to avoid version drift issues.-from llama_stack_client.types.shared.interleaved_content import ( - TextContentItem, -) +from llama_stack_client.types.shared.interleaved_content_item import TextContentItem
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
src/app/endpoints/query.py(8 hunks)src/app/endpoints/streaming_query.py(7 hunks)src/utils/transcripts.py(1 hunks)src/utils/types.py(2 hunks)tests/unit/app/endpoints/test_query.py(23 hunks)tests/unit/app/endpoints/test_streaming_query.py(3 hunks)tests/unit/utils/test_transcripts.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/unit/utils/test_transcripts.py
- tests/unit/app/endpoints/test_streaming_query.py
- src/utils/types.py
🧰 Additional context used
🧬 Code Graph Analysis (4)
src/utils/transcripts.py (4)
src/configuration.py (2)
configuration(55-60)user_data_collection_configuration(79-84)src/models/requests.py (2)
Attachment(15-67)QueryRequest(70-220)src/utils/suid.py (1)
get_suid(6-12)src/utils/types.py (1)
TurnSummary(59-78)
tests/unit/app/endpoints/test_query.py (2)
src/utils/types.py (2)
ToolCallSummary(43-56)TurnSummary(59-78)src/app/endpoints/query.py (1)
retrieve_response(382-507)
src/app/endpoints/streaming_query.py (2)
src/utils/transcripts.py (1)
store_transcript(33-86)src/utils/types.py (2)
TurnSummary(59-78)append_tool_calls_from_llama(65-78)
src/app/endpoints/query.py (4)
src/utils/transcripts.py (1)
store_transcript(33-86)src/utils/types.py (2)
TurnSummary(59-78)append_tool_calls_from_llama(65-78)src/app/endpoints/streaming_query.py (1)
retrieve_response(662-773)src/models/responses.py (1)
QueryResponse(48-79)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: e2e_tests
🔇 Additional comments (5)
src/utils/transcripts.py (1)
74-79: TurnSummary serialization looks correctStoring
llm_responseand serializingtool_callsviamodel_dump()aligns with the new data model and keeps room for extensibility.src/app/endpoints/streaming_query.py (1)
581-599: Typed streaming + TurnSummary accumulation is a solid improvementDefining
response_generatoroverAsyncIterator[AgentTurnResponseStreamChunk]and accumulatingllm_response+ tool calls into aTurnSummaryavoids fragile string aggregation. Good use ofappend_tool_calls_from_llama.tests/unit/app/endpoints/test_query.py (1)
127-137: Tests correctly updated to use TurnSummary and ToolCallSummaryGood transition from raw strings to structured summaries; assertions on
summary.llm_responseand propagation through endpoint handlers look consistent.Also applies to: 105-1132
src/app/endpoints/query.py (2)
207-214: End-to-end switch to TurnSummary is cleanReturning
(summary, conversation_id)fromretrieve_response, persisting viastore_transcript(summary=summary), and replying withsummary.llm_responsestreamlines the flow without breaking the API surface. Nicely done.Also applies to: 241-245
493-507: Summary building and tool-call aggregation LGTMCasting to
Turn, extracting content viainterleaved_content_as_str, and appending tool_execution details intoTurnSummaryis correct and resilient. Good guard onresponse.steps or [].
| "redacted_query": query, | ||
| "query_is_valid": query_is_valid, | ||
| "llm_response": summary.llm_response, | ||
| "rag_chunks": rag_chunks, | ||
| "truncated": truncated, | ||
| "attachments": [attachment.model_dump() for attachment in attachments], | ||
| "tool_calls": [tc.model_dump() for tc in summary.tool_calls], | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
"redacted_query" currently stores unredacted content; either redact or rename key to avoid misleading consumers
The field name implies PII-safe content, but Line 72 stores the raw query. This is risky and misleading for downstream tooling and auditors.
Two safe options:
- Implement real redaction before persisting (preferred).
- Or, rename the key to “query” (or “raw_query”) until redaction is in place.
If you opt for the rename now, apply this diff:
- "redacted_query": query,
+ "query": query,If backward compatibility requires keeping the existing key, add a clear code comment and a TODO noting it is currently unredacted, and track a follow-up to implement redaction. I can provide a minimal redactor (regex-based for emails, tokens, etc.) if you’d like.
🏁 Script executed:
#!/bin/bash
# List function definitions to locate where query is handled
rg -n '^def ' src/utils/transcripts.py
# Search for any redaction or sanitization logic
rg -n 'redact|sanitize|mask' src/utils/transcripts.py
# Show the first 200 lines to understand context and variable origins
sed -n '1,200p' src/utils/transcripts.pyLength of output: 3390
Avoid storing unredacted PII as redacted_query
The field name implies sanitized content, but store_transcript is currently persisting the raw query string. This misleads downstream consumers and risks exposing sensitive data.
Locations to update:
src/utils/transcripts.py, instore_transcriptat the"redacted_query": query,entry.
Recommended fixes:
- Implement proper redaction (e.g., regex‐based removal of emails, tokens) before assigning to
redacted_query. - Or, if redaction isn’t yet available, rename the key to reflect reality:
- "redacted_query": query,
+ "query": query,- If you must retain
redacted_queryfor compatibility, add a clear comment and TODO linking to a follow-up issue:
# TODO: redact sensitive data before storing under 'redacted_query' (see ISSUE-1234)
"redacted_query": query,🤖 Prompt for AI Agents
In src/utils/transcripts.py around lines 72 to 79, the key "redacted_query" is
being populated with the raw query string which falsely implies PII has been
sanitized; update this by either (1) applying a redaction step before assignment
(e.g., run the query through a small sanitizer that strips emails, tokens, SSNs,
API keys via regexes or a shared redact utility and assign the sanitized result
to "redacted_query"), or (2) if redaction is not ready, rename the key to
"original_query" (or similar) to avoid implying sanitization, or (3) keep the
key for compatibility but add a clear TODO comment referencing an issue (e.g.,
TODO: redact sensitive data before storing under 'redacted_query' — ISSUE-1234)
and ensure downstream consumers are aware; choose one approach and make sure
tests/consumers are updated accordingly.
|
@keitwb thanks yeah! LGTM :D |
|
@keitwb apparently there's some conflicts. This needs to be rebased |
Also moved all of the transcript handling to its own module as it grew a bit with this.
951a0ca to
906e60a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
src/app/endpoints/streaming_query.py (4)
581-599: Initialize TurnSummary with empty response for parity and avoid misleading transcriptsDefaulting to "No response from the model" will end up in stored transcripts when the model emits no turn_complete (e.g., error/interrupts). In query.py we default to an empty string; mirroring that here keeps semantics consistent.
Apply this diff:
- summary = TurnSummary( - llm_response="No response from the model", tool_calls=[] - ) + summary = TurnSummary( + llm_response="", + tool_calls=[], + )Additionally, ensure the StreamingResponse is served with the correct SSE media type to prevent client/parsing issues:
# near the return statement return StreamingResponse(response_generator(response), media_type="text/event-stream")
604-612: Guard against missing output_message on turn_completeDefensive check avoids attribute access errors if a turn_complete lacks an output_message/content (seen in some edge cases). Also logs for observability.
- if p.event_type == "turn_complete": - summary.llm_response = interleaved_content_as_str( - p.turn.output_message.content - ) + if p.event_type == "turn_complete": + content = getattr(getattr(p.turn, "output_message", None), "content", None) + if content is not None: + summary.llm_response = interleaved_content_as_str(content) + else: + logger.warning( + "turn_complete without output_message.content (conversation_id=%s)", + conversation_id, + )
630-631: Do not let transcript persistence failures break the stream completionIf store_transcript raises (I/O errors, serialization issues), the generator will error post-stream and may degrade client UX. Log and continue.
You can wrap the storage in a try/except:
try: store_transcript( user_id=user_id, conversation_id=conversation_id, model_id=model_id, provider_id=provider_id, query_is_valid=True, query=query_request.query, query_request=query_request, summary=summary, rag_chunks=[], truncated=False, attachments=query_request.attachments or [], ) except Exception: logger.exception("Failed to store transcript for conversation_id=%s", conversation_id)
692-699: Avoid duplicate network calls to shields.list()You call client.shields.list() twice (once per filter). Fetch once and derive both input/output sets to save latency.
Example refactor (outside this range):
shields = await client.shields.list() available_input_shields = [s.identifier for s in filter(is_input_shield, shields)] available_output_shields = [s.identifier for s in filter(is_output_shield, shields)]tests/unit/app/endpoints/test_query.py (2)
127-137: Good migration to TurnSummary in tests; consider a helper/fixture to reduce duplicationTurnSummary/ToolCallSummary objects are re-created in multiple tests. A fixture (e.g., make_turn_summary) would DRY this up and ease maintenance.
Example fixture:
import pytest from utils.types import TurnSummary, ToolCallSummary @pytest.fixture def sample_summary(): return TurnSummary( llm_response="LLM answer", tool_calls=[ ToolCallSummary(id="123", name="test-tool", args="testing", response="tool response") ], ) )Then use sample_summary in tests instead of re-declaring.
493-503: Add a test for tool_execution steps populating summary.tool_callsWe validate shield violations and llm_response, but not that summary.append_tool_calls_from_llama aggregates tool calls. Adding this improves confidence in transcript content.
I can draft a test that injects a response with a tool_execution step and asserts summary.tool_calls contains id/name/args/response as expected. Want me to add it?
src/app/endpoints/query.py (2)
281-299: Docstring return description is outdated (tuple size)The function returns three values (llama_stack_model_id, model_id, provider_id), but the Returns section mentions only combined model and provider. Update to avoid confusion.
Suggested doc tweak (outside this range):
Returns: tuple[str, str, str]: (llama_stack_model_id, model_id, provider_id).
415-421: Avoid duplicate shields.list() call to reduce latencyYou call await client.shields.list() twice to derive input and output sets. Fetch once and split results.
Example refactor (outside this range):
shields = await client.shields.list() available_input_shields = [s.identifier for s in filter(is_input_shield, shields)] available_output_shields = [s.identifier for s in filter(is_output_shield, shields)]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
src/app/endpoints/query.py(8 hunks)src/app/endpoints/streaming_query.py(7 hunks)src/utils/transcripts.py(1 hunks)src/utils/types.py(2 hunks)tests/unit/app/endpoints/test_query.py(25 hunks)tests/unit/app/endpoints/test_streaming_query.py(3 hunks)tests/unit/utils/test_transcripts.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- src/utils/transcripts.py
- tests/unit/utils/test_transcripts.py
- src/utils/types.py
- tests/unit/app/endpoints/test_streaming_query.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/app/endpoints/streaming_query.py (2)
src/utils/transcripts.py (1)
store_transcript(33-86)src/utils/types.py (2)
TurnSummary(59-78)append_tool_calls_from_llama(65-78)
tests/unit/app/endpoints/test_query.py (2)
src/utils/types.py (2)
ToolCallSummary(43-56)TurnSummary(59-78)src/app/endpoints/query.py (1)
retrieve_response(382-519)
src/app/endpoints/query.py (2)
src/utils/transcripts.py (1)
store_transcript(33-86)src/utils/types.py (2)
TurnSummary(59-78)append_tool_calls_from_llama(65-78)
🔇 Additional comments (12)
src/app/endpoints/streaming_query.py (2)
668-669: Typed return for streaming retrieve_response is a solid improvementReturning AsyncIterator[AgentTurnResponseStreamChunk] clarifies the contract and enables tighter downstream handling.
771-772: Explicit cast is appropriateCasting the agent.create_turn response to AsyncIterator[AgentTurnResponseStreamChunk] is pragmatic given the API behavior.
tests/unit/app/endpoints/test_query.py (4)
422-429: Assertion update aligns with new fallback behaviorAsserting an empty llm_response when output_message is None matches the TurnSummary construction in query.retrieve_response.
485-493: End-to-end TurnSummary verification LGTMValidates expected content propagation and agent invocation parameters under vector DB availability.
1048-1050: Verify TextContentItem shape matches interleaved_content_as_str expectationsIf interleaved_content_as_str expects an InterleavedContent sequence, providing a single TextContentItem instance may be brittle. Consider wrapping in a list to mirror actual API shapes.
Apply this diff if needed:
- mock_agent.create_turn.return_value.output_message.content = TextContentItem( - text="LLM answer", type="text" - ) + mock_agent.create_turn.return_value.output_message.content = [ + TextContentItem(text="LLM answer", type="text") + ]
843-886: MCP headers propagation and toolgroup formation tests are thoroughNice coverage ensuring header normalization to URLs and toolgroup composition from MCP server names.
src/app/endpoints/query.py (6)
38-40: Centralizing transcripts and introducing TurnSummary is the right directionImporting store_transcript from utils.transcripts and using TurnSummary improves cohesion and future extensibility.
207-213: Returning TurnSummary from retrieve_response simplifies endpoint plumbingThe endpoint consuming summary.llm_response and passing the full summary to store_transcript is clean and extensible.
241-244: API response built from TurnSummary is correctReturning QueryResponse with summary.llm_response preserves the external API while enabling richer transcript data internally.
388-412: Updated signature and docstring clearly communicate TurnSummary returnAccurately reflects the new contract of retrieve_response.
493-503: Summary construction handles absent content gracefullyUsing interleaved_content_as_str with guards is robust. Good call defaulting to "" when content is missing.
506-513: Tool call aggregation and violation metric tracking are correctIncrementing validation error metric and delegating tool call extraction to TurnSummary keeps logic focused.
Also moved all of the transcript handling to its own module as it grew a bit with this.
Added a
TurnSummarytype in order to encapsulate both the llm's text response as well as the tool calls. This can be expanded in the future if other turn data is needed in either the transcripts or elsewhere.Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
Ran this locally with assisted-chat instance. Saw the tool calls in the transcripts when I made a query to list all my clusters.
Summary by CodeRabbit
New Features
Refactor
Tests