fix(otel): reduce span attribute duplication and fix orphaned guardrail traces#3
fix(otel): reduce span attribute duplication and fix orphaned guardrail traces#3Harshit28j wants to merge 183 commits intomainfrom
Conversation
Update the root documentation page at https://docs.litellm.ai to match the current documentation at https://docs.litellm.ai/docs/: - Update model names to latest versions (openai/gpt-5, anthropic/claude-sonnet-4-5, vertex_ai/gemini-1.5-pro) - Add xAI (grok-2-latest) and Vercel AI Gateway providers - Update "How to use LiteLLM" section with comparison table - Add Response Format sections for completions and streaming - Update endpoint descriptions to reflect current capabilities - Update exception handling example with proper litellm exceptions - Update environment variable names (VERTEXAI_PROJECT/LOCATION)
- Fix data.token → hash_token(data.key) and remove non-existent data.is_active in create endpoint - Fix mapping_id → id in update, delete, and info endpoints to match Prisma schema - Extract direct DB query into get_jwt_key_mapping_object helper in auth_checks.py - Add hash_token import for proper key hashing before storage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Add page/size query params with take/skip to prevent unbounded queries. Returns paginated response with total_count, current_page, total_pages. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- Persist description field on create (was silently dropped) - Remove phantom key_alias from JWTKeyMappingResponse (not in schema) - Populate created_by/updated_by audit fields from authenticated user - Pass actual jwt_valid_token in admin path instead of empty dict - Restore hash_token on create and fix duplicate try block Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add @log_db_metrics decorator to get_jwt_key_mapping_object for consistent DB latency/error tracking with other helpers - Move virtual key mapping lookup before auth_builder() to avoid unnecessary team/user/org DB queries when mapping resolves - JWT is decoded early; auth_builder only runs when no mapping found Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion_call The response.output_item.done handler for function_call type was emitting finish_reason='tool_calls' and a duplicate tool_call delta. This caused premature stream termination after the first tool call in multi-tool scenarios — downstream wrappers (e.g. AnthropicStreamWrapper) would close the stream before subsequent tool calls arrived. The response.completed event already inspects the response output list and emits finish_reason='tool_calls' when function_call items are present, so output_item.done does not need to (and must not) do so. This mirrors the existing fix for message-type output_item.done (BerriAI#17246). Updated test_function_call_done_emits_is_finished (renamed) to assert finish_reason=None and no duplicate delta. Updated test_text_plus_tool_calls_sequence to match. Added test_multi_tool_call_stream_no_premature_finish which exercises a synthetic 2-tool-call stream and verifies no premature termination.
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Add test_parallel_tool_calls_comprehensive_streaming_integration which synthesizes the full 10-event Responses API SSE sequence with split argument deltas and asserts all fix invariants together: 1. output_item.done emits no finish_reason (no premature stream end) 2. Each call_id appears exactly once (no duplicate tool_call chunks) 3. Split argument deltas assemble to correct final JSON 4. Exactly one finish event, at the terminal response.completed chunk 5. Parallel tool calls have distinct indices (output_index 0 and 1) All 24 unit tests pass.
…ndex docs: sync main page with docs/index.md
…sh_reason races content
…odes, add tests - Remove token field from JWTKeyMappingResponse to prevent hashed key exposure - Use _to_response() helper on all CRUD endpoints to control returned fields - Return 409 for unique constraint violations, 400 for FK violations, 404 for not found - Add response_model to endpoint decorators - Add 8 new unit tests covering error handling and token redaction Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
The gemini-live-2.5-flash-preview-native-audio-09-2025 model only works with WebSocket (Live API), not REST endpoints. Changed supported_endpoints from /v1/chat/completions to /vertex_ai/live to reflect the actual passthrough endpoint available in LiteLLM proxy.
The gemini/ prefix indicates Google AI Studio, which uses /v1/realtime endpoint (OpenAI-compatible), not /vertex_ai/live.
The mode field is used by health checks to determine the correct check method (WebSocket for realtime vs REST for chat).
…ed-endpoints fix: update gemini-live model endpoints and mode to realtime
…reating keys (BerriAI#22816) * fix(proxy): add guardrails list routes for internal users * fix(ui): add guardrails fetch with v1/v2 fallback in networking * fix(ui): allow internal users/team admins to select guardrails in create key modal * fix(ui): show guardrails selector for internal users in key edit view * fix(ui): pass canEditGuardrails flag to key info view * test(ui): add tests for role-based guardrails access in key info view * test(ui): update key edit view test for guardrails
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
…or DB storage (BerriAI#22936) When the messages or response JSON fields in spend logs are truncated before being written to the database, the truncation marker now includes a note explaining: - This is a DB storage safeguard - Full, untruncated data is still sent to logging callbacks (OTEL, Datadog, etc.) - The MAX_STRING_LENGTH_PROMPT_IN_DB env var can be used to increase the limit Also emits a verbose_proxy_logger.info message when truncation occurs in the request body or response spend log paths. Adds 3 new tests: - test_truncation_includes_db_safeguard_note - test_response_truncation_logs_info_message - test_request_body_truncation_logs_info_message Co-authored-by: Cursor Agent <cursoragent@cursor.com>
The /team/daily/activity endpoint used Prisma pagination (page_size=1000) but the UI only fetched page 1. Teams with many keys/models easily exceed 1000 rows in LiteLLM_DailyTeamSpend, causing truncated totals. Switches the endpoint to use SQL GROUP BY via get_daily_activity_aggregated with include_entity_breakdown=True, returning all data in a single response while preserving per-team breakdown. Also adds timezone parameter support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /organization/list endpoint only checked for PROXY_ADMIN role, causing PROXY_ADMIN_VIEW_ONLY users to fall into the else branch which restricts results to orgs the user is a member of. Use the existing _user_has_admin_view() helper to include both roles.
…r_org_list Fix admin viewer unable to see all organizations
…eaming (BerriAI#22937) * feat(ui): add chat message and conversation types * feat(ui): add useChatHistory hook for localStorage-backed conversations * feat(ui): add ConversationList sidebar component * feat(ui): add MCPConnectPicker for attaching MCP servers to chat * feat(ui): add ModelSelector dropdown for chat * feat(ui): add ChatInputBar with MCP tool attachment support * feat(ui): add MCPAppsPanel with list/detail view for MCP servers * feat(ui): add ChatMessages component; remove auto-scrollIntoView that caused scroll-lock bypass * feat(ui): add ChatPage — ChatGPT-like UI with scroll lock, MCP tools, streaming * feat(ui): add /chat route wired to ChatPage * feat(ui): remove chat from leftnav — chat accessible via navbar button * feat(ui): add Chat button to top navbar * feat(ui): add dismissible Chat UI announcement banner to Playground page * feat(proxy): add Chat UI link to Swagger description * feat(ui): add react-markdown and syntax-highlighter deps for chat UI * fix(ui): replace missing BorderOutlined import with inline stop icon div * fix(ui): apply remark-gfm plugin to ReactMarkdown for GFM support * fix(ui): remove unused isEvenRow variable in MCPAppsPanel * fix(ui): add ellipsis when truncating conversation title * fix(ui): wire search button to chats view; remove non-functional keyboard hint * fix(ui): use serverRootPath in navbar chat link for sub-path deployments * fix(ui): remove unused ChatInputBar and ModelSelector files * fix(ui): correct grid bottom-border condition for odd server count * fix(chat): move localStorage writes out of setConversations updater (React purity) * fix(chat): fix stale closure in handleEditAndResend - compute history before async state update * fix(chat): fix 4 issues in ChatMessages - array redaction, clipboard error, inline detection, remove unused ref
- Add PayGo / Priority Cost Tracking section to Vertex AI provider docs - Document trafficType to service_tier mapping (ON_DEMAND_PRIORITY, FLEX, etc.) - Add service tier cost keys to custom pricing docs - Add provider-specific cost tracking note to spend tracking overview Made-with: Cursor
docs: add PayGo/priority cost tracking for Gemini Vertex AI
…asoning_effort fix(gemini): handle 'minimal' reasoning_effort param for gemini-3.1-f…
…gpt-53-oauth-models feat(models): add ChatGPT 5.3/5.4 aliases + OpenAI gpt-5.4-pro
[Chore] update mcp documentation for header forwarding
…lize-detection fix(o-series): generalize is_model_o_series_model to match any o+digit prefix
…pport (BerriAI#22945) * fix(chat): fix router.push paths to use /ui/chat with serverRootPath support * fix(chat): wrap chat page in Suspense boundary for Next.js static export * fix(chat): fix clipboard writeText rejection handler - remove undefined message.error call * feat(chat): rebuild UI with routing fixes * fix(chat): use useTheme logoUrl + /get_image fallback for sidebar logo * feat(chat): rebuild UI with logo fix * fix(chat): use /get_image directly for logo (no ThemeProvider outside dashboard layout) * feat(chat): add multi-model comparison and provider logos in chat UI - Replace single model selector with multi-select (up to 3 models) - Show provider logos next to model names in dropdown (openai, anthropic, gemini, mistral, groq, etc.) - Selected models float to the top of the dropdown list - Multi-model mode: responses stream in parallel side-by-side cards below each user message - Multi-turn: each follow-up message carries full per-model history as context - Surface API errors inline in response cards instead of silently swallowing them - Rebuild UI
- Add gpt-5.4-pro and gpt-5.4-pro-2026-03-05 snapshot - Input: $30/1M tokens, Output: $180/1M tokens - 1.05M context window, 128K max output tokens - Priority pricing for >272K input tokens (2x input, 1.5x output) - Supports reasoning.effort: medium, high, xhigh - Responses API, Chat Completions, Batch endpoints Made-with: Cursor
…el-map feat(openai): add gpt-5.4-pro to model map
…ls (BerriAI#22934) OpenAI gpt-5.1, gpt-5.2, and gpt-5.3 chat models all support the `web_search_options` parameter, but the model cost registry was missing the `supports_web_search` flag. Only `gpt-5.2-pro` had it set. Models updated: - gpt-5.1, gpt-5.1-2025-11-13, gpt-5.1-chat-latest - gpt-5.2, gpt-5.2-2025-12-11, gpt-5.2-chat-latest - gpt-5.3-chat-latest
…spend [Fix] Team Usage Spend Truncated Due to Pagination
Support passing duration=null on /key/update to reset a key's expiry to never expires, alongside the existing "-1" magic string (kept for backward compat). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
[Fix] Key Expiry Default Duration
…asoning_effort to model cost map - Shift from hardcoded model checks to dynamic lookup via _supports_factory - Add supports_none_reasoning_effort for gpt-5.1/5.2/5.4 chat variants - Add supports_xhigh_reasoning_effort for gpt-5.1-codex-max, gpt-5.2, gpt-5.4+ - Update model_prices_and_context_window.json and backup - Add ProviderSpecificModelInfo types for new fields - Fix Azure: use _supports_reasoning_effort_level instead of removed is_model_gpt_5_1_model Made-with: Cursor
…model_map feat(gpt-5): Add supports_none_reasoning_effort and supports_xhigh_reasoning_effort to model cost map
- Pass request_model to Azure AI cost calculator to detect router requests - Add router flat cost ($0.14/M input tokens) even when Azure returns actual model in response - Add test for router flat cost with response containing actual model - Update docs with cost calculation flow and configuration requirements Made-with: Cursor
…ter-cost-tracking feat(azure_ai): add router flat cost when response contains actual model
…1/messages - Add forward_llm_provider_auth_headers support from litellm_settings - When enabled, client x-api-key takes precedence over deployment keys - Forward x-api-key when x-litellm-api-key or Authorization used for auth - Fix duplicate patch lines in test_byok_oauth_endpoints.py - Add Claude Code BYOK documentation with /login and ANTHROPIC_CUSTOM_HEADERS - Add unit tests for clean_headers x-api-key forwarding logic - Sync model_prices backup (pre-commit hook) Made-with: Cursor
…used for LiteLLM proxy auth When forward_llm_provider_auth_headers=true, Authorization: Bearer <litellm-key> was being forwarded to Anthropic if it looked like an OAuth key, causing auth failures. Now checked against authenticated_with_header: if Authorization was used to authenticate with the proxy, it is always stripped before forwarding to the LLM provider. Made-with: Cursor
feat(proxy): Client-side provider API key precedence for Anthropic /v1/messages (BYOK)
[Feat] Add a2a custom headers
…il traces
- Remove redundant set_attributes() call from raw_gen_ai_request spans so they
only contain provider-specific llm.{provider}.{param} attributes (Issue #3)
- Replace full set_attributes() on litellm_proxy_request with minimal attributes
(model, call_type, response.id) when litellm_request child span exists (Issue #4)
- Skip guardrail span creation when no parent context exists to prevent orphaned
root-level traces in the "All Spans" view (Issue BerriAI#5)
- Fall back to litellm_call_id for gen_ai.response.id when response object has no
id field (EmbeddingResponse, ImageResponse) (Issue BerriAI#8)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| params = urlencode({"code": auth_code, "state": state}) | ||
| separator = "&" if "?" in redirect_uri else "?" | ||
| location = f"{redirect_uri}{separator}{params}" | ||
| return RedirectResponse(url=location, status_code=302) |
Check warning
Code scanning / CodeQL
URL redirection from remote source Medium
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 11 days ago
In general, to fix untrusted URL redirection you should not redirect to arbitrary user-provided URLs. Instead, either (a) maintain a whitelist of allowed redirect URIs (for example, pre-registered per client_id or server_id) and ensure the provided redirect_uri matches one of them, or (b) restrict redirects to same-origin or relative paths by validating the URL structure (e.g., no external host). For OAuth-like flows, the standard approach is to validate that the incoming redirect_uri exactly matches one of the URIs registered for that client.
The least-invasive fix here is to restrict redirect_uri to be either a relative URL (no scheme/host) or, if absolute, to share the same scheme/host/port as the current request (same-origin). That maintains flexibility (callers can still specify paths and query strings) without allowing redirection to arbitrary external domains. We can also treat state as a plain value in the query string, which is already safely encoded by urlencode; the primary vulnerability is the host of the redirect target, not embedding state.
Concretely, in byok_authorize_post (lines 629–676):
- Parse the
redirect_uriwithurlparse. - If it is absolute (has a scheme), require that its scheme is
httporhttpsand that itsnetlocmatches the current request’s base URL host (we can derive this via the existingget_request_base_url(request)helper or directly fromrequest.url). - If it is relative (no scheme and no netloc), normalize it to start with
/and treat it as a path relative to the current origin. - Build the final redirect URL by combining a trusted base (from
get_request_base_url(request)) with the validated/normalized path and query parameters. - Use this trusted
locationforRedirectResponse.
This preserves existing functionality (clients can still choose where within this site to be redirected and still get their state back) while preventing open redirects to attacker-controlled domains.
| @@ -645,11 +645,32 @@ | ||
| """ | ||
| _purge_expired_codes() | ||
|
|
||
| # Validate redirect_uri scheme to prevent open redirect | ||
| # Validate redirect_uri to prevent open redirects. Allow only same-origin absolute | ||
| # URLs or relative paths, and normalize the final redirect against the request base URL. | ||
| parsed_uri = urlparse(redirect_uri) | ||
| if parsed_uri.scheme not in ("http", "https"): | ||
| raise HTTPException(status_code=400, detail="Invalid redirect_uri scheme") | ||
| base_url = get_request_base_url(request) | ||
| base_parsed = urlparse(base_url) | ||
|
|
||
| if parsed_uri.scheme: | ||
| # Absolute URL: require http/https and same netloc as the current request. | ||
| if parsed_uri.scheme not in ("http", "https"): | ||
| raise HTTPException(status_code=400, detail="Invalid redirect_uri scheme") | ||
| if parsed_uri.netloc != base_parsed.netloc: | ||
| raise HTTPException(status_code=400, detail="redirect_uri must be same-origin") | ||
| safe_path_and_query = parsed_uri.path or "/" | ||
| if parsed_uri.query: | ||
| safe_path_and_query += f"?{parsed_uri.query}" | ||
| else: | ||
| # Relative URL: do not allow it to specify a host; normalize to a path. | ||
| if parsed_uri.netloc: | ||
| raise HTTPException(status_code=400, detail="Invalid redirect_uri") | ||
| safe_path = parsed_uri.path or "/" | ||
| if not safe_path.startswith("/"): | ||
| safe_path = "/" + safe_path | ||
| safe_path_and_query = safe_path | ||
| if parsed_uri.query: | ||
| safe_path_and_query += f"?{parsed_uri.query}" | ||
|
|
||
| # Reject new codes if the store is at capacity (prevents memory exhaustion | ||
| # from a burst of abandoned OAuth flows). | ||
| if len(_byok_auth_codes) >= _AUTH_CODES_MAX_SIZE: | ||
| @@ -671,8 +690,8 @@ | ||
| } | ||
|
|
||
| params = urlencode({"code": auth_code, "state": state}) | ||
| separator = "&" if "?" in redirect_uri else "?" | ||
| location = f"{redirect_uri}{separator}{params}" | ||
| separator = "&" if "?" in safe_path_and_query else "?" | ||
| location = f"{base_parsed.scheme}://{base_parsed.netloc}{safe_path_and_query}{separator}{params}" | ||
| return RedirectResponse(url=location, status_code=302) | ||
|
|
||
|
|
…onse IDs Addresses 4 critical OpenTelemetry span issues in LiteLLM: Issue #3: Remove redundant attributes from raw_gen_ai_request spans - Removed self.set_attributes() call that was duplicating all parent span attributes (gen_ai.*, metadata.*) onto the raw span - Raw span now only contains provider-specific llm.{provider}.* attributes - Reduces storage and eliminates search confusion from duplicate data Issue #4: Prevent attribute duplication on litellm_proxy_request parent span - When litellm_request child span exists, removed redundant set_attributes() call on the parent proxy span - Child span already carries all attributes; parent duplication doubles storage and complicates search Issue BerriAI#5: Fix orphaned guardrail traces - Guardrail spans were created with context=None when no parent proxy span existed, resulting in orphaned root spans (separate trace_id) - Added _resolve_guardrail_context() helper to ensure guardrails always have a valid parent (litellm_request or proxy span) - Applied fix to both _handle_success and _handle_failure paths Issue BerriAI#8: Add gen_ai.response.id for embeddings and image generation - EmbeddingResponse and ImageResponse types don't have provider response IDs - Added fallback to standard_logging_payload["id"] (litellm call ID) for correlation across LiteLLM UI, Phoenix traces, and provider logs - Completions still use provider ID (e.g. "chatcmpl-xxx") when available Tests added: - TestRawSpanAttributeIsolation: Verify raw span has no gen_ai/metadata attrs - TestNoParentSpanDuplication: Verify parent span doesn't get duplicated attrs - TestGuardrailSpanParenting: Verify guardrails are children (not orphaned) - TestResponseIdFallback: Verify response ID set for all call types All existing OTEL tests pass (73 passed, 14 pre-existing protocol failures). Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Summary
Fixes 4 OpenTelemetry tracing issues related to span attribute duplication and missing trace data:
raw_gen_ai_requestspanslitellm_proxy_requestwhen child span existslitellm_call_idfor responses without native id field (embeddings, image generation)Changes
set_raw_request_attributes(): Removed initial
set_attributes()call to avoid duplicating all parent span data. Now only sets provider-specificllm.{provider}.{param}attributes.Proxy span optimization: Created
_set_proxy_span_minimal_attributes()to set only model, call_type, and response.id on proxy request spans instead of copying all attributes from child litellm_request span.Guardrail span protection: Added
context is Nonecheck at start of_create_guardrail_span()to prevent creation of orphaned root-level traces.Response ID coverage: Modified OTEL span attribute setting to fall back to
litellm_call_idwhenresponse_obj.get("id")returns None (for EmbeddingResponse and ImageResponse types).Test Plan
✅ Updated existing guardrail span tests with valid parent context
✅ Added new test verifying guardrail spans are not created without parent context
✅ OTEL unit tests: 71 passed (2 previously failing guardrail tests now pass)
🤖 Generated with Claude Code