Skip to content

fix(otel): reduce span attribute duplication and fix orphaned guardrail traces#3

Closed
Harshit28j wants to merge 183 commits intomainfrom
litellm_branch_name_fix_otel_spans
Closed

fix(otel): reduce span attribute duplication and fix orphaned guardrail traces#3
Harshit28j wants to merge 183 commits intomainfrom
litellm_branch_name_fix_otel_spans

Conversation

@Harshit28j
Copy link
Owner

Summary

Fixes 4 OpenTelemetry tracing issues related to span attribute duplication and missing trace data:

Changes

  1. set_raw_request_attributes(): Removed initial set_attributes() call to avoid duplicating all parent span data. Now only sets provider-specific llm.{provider}.{param} attributes.

  2. Proxy span optimization: Created _set_proxy_span_minimal_attributes() to set only model, call_type, and response.id on proxy request spans instead of copying all attributes from child litellm_request span.

  3. Guardrail span protection: Added context is None check at start of _create_guardrail_span() to prevent creation of orphaned root-level traces.

  4. Response ID coverage: Modified OTEL span attribute setting to fall back to litellm_call_id when response_obj.get("id") returns None (for EmbeddingResponse and ImageResponse types).

Test Plan

✅ Updated existing guardrail span tests with valid parent context
✅ Added new test verifying guardrail spans are not created without parent context
✅ OTEL unit tests: 71 passed (2 previously failing guardrail tests now pass)

🤖 Generated with Claude Code

Chesars and others added 30 commits January 17, 2026 14:41
Update the root documentation page at https://docs.litellm.ai to match
the current documentation at https://docs.litellm.ai/docs/:

- Update model names to latest versions (openai/gpt-5, anthropic/claude-sonnet-4-5, vertex_ai/gemini-1.5-pro)
- Add xAI (grok-2-latest) and Vercel AI Gateway providers
- Update "How to use LiteLLM" section with comparison table
- Add Response Format sections for completions and streaming
- Update endpoint descriptions to reflect current capabilities
- Update exception handling example with proper litellm exceptions
- Update environment variable names (VERTEXAI_PROJECT/LOCATION)
- Fix data.token → hash_token(data.key) and remove non-existent data.is_active in create endpoint
- Fix mapping_id → id in update, delete, and info endpoints to match Prisma schema
- Extract direct DB query into get_jwt_key_mapping_object helper in auth_checks.py
- Add hash_token import for proper key hashing before storage

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Add page/size query params with take/skip to prevent unbounded queries.
Returns paginated response with total_count, current_page, total_pages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- Persist description field on create (was silently dropped)
- Remove phantom key_alias from JWTKeyMappingResponse (not in schema)
- Populate created_by/updated_by audit fields from authenticated user
- Pass actual jwt_valid_token in admin path instead of empty dict
- Restore hash_token on create and fix duplicate try block

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add @log_db_metrics decorator to get_jwt_key_mapping_object for
  consistent DB latency/error tracking with other helpers
- Move virtual key mapping lookup before auth_builder() to avoid
  unnecessary team/user/org DB queries when mapping resolves
- JWT is decoded early; auth_builder only runs when no mapping found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion_call

The response.output_item.done handler for function_call type was emitting
finish_reason='tool_calls' and a duplicate tool_call delta. This caused
premature stream termination after the first tool call in multi-tool
scenarios — downstream wrappers (e.g. AnthropicStreamWrapper) would close
the stream before subsequent tool calls arrived.

The response.completed event already inspects the response output list and
emits finish_reason='tool_calls' when function_call items are present, so
output_item.done does not need to (and must not) do so.

This mirrors the existing fix for message-type output_item.done (BerriAI#17246).

Updated test_function_call_done_emits_is_finished (renamed) to assert
finish_reason=None and no duplicate delta. Updated test_text_plus_tool_calls_sequence
to match. Added test_multi_tool_call_stream_no_premature_finish which exercises
a synthetic 2-tool-call stream and verifies no premature termination.
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Add test_parallel_tool_calls_comprehensive_streaming_integration which
synthesizes the full 10-event Responses API SSE sequence with split
argument deltas and asserts all fix invariants together:

1. output_item.done emits no finish_reason (no premature stream end)
2. Each call_id appears exactly once (no duplicate tool_call chunks)
3. Split argument deltas assemble to correct final JSON
4. Exactly one finish event, at the terminal response.completed chunk
5. Parallel tool calls have distinct indices (output_index 0 and 1)

All 24 unit tests pass.
…ndex

docs: sync main page with docs/index.md
…odes, add tests

- Remove token field from JWTKeyMappingResponse to prevent hashed key exposure
- Use _to_response() helper on all CRUD endpoints to control returned fields
- Return 409 for unique constraint violations, 400 for FK violations, 404 for not found
- Add response_model to endpoint decorators
- Add 8 new unit tests covering error handling and token redaction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
The gemini-live-2.5-flash-preview-native-audio-09-2025 model only works
with WebSocket (Live API), not REST endpoints. Changed supported_endpoints
from /v1/chat/completions to /vertex_ai/live to reflect the actual
passthrough endpoint available in LiteLLM proxy.
The gemini/ prefix indicates Google AI Studio, which uses /v1/realtime
endpoint (OpenAI-compatible), not /vertex_ai/live.
The mode field is used by health checks to determine the correct
check method (WebSocket for realtime vs REST for chat).
…ed-endpoints

fix: update gemini-live model endpoints and mode to realtime
…reating keys (BerriAI#22816)

* fix(proxy): add guardrails list routes for internal users

* fix(ui): add guardrails fetch with v1/v2 fallback in networking

* fix(ui): allow internal users/team admins to select guardrails in create key modal

* fix(ui): show guardrails selector for internal users in key edit view

* fix(ui): pass canEditGuardrails flag to key info view

* test(ui): add tests for role-based guardrails access in key info view

* test(ui): update key edit view test for guardrails
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
krrishdholakia and others added 27 commits March 5, 2026 16:58
…or DB storage (BerriAI#22936)

When the messages or response JSON fields in spend logs are truncated
before being written to the database, the truncation marker now includes
a note explaining:
- This is a DB storage safeguard
- Full, untruncated data is still sent to logging callbacks (OTEL, Datadog, etc.)
- The MAX_STRING_LENGTH_PROMPT_IN_DB env var can be used to increase the limit

Also emits a verbose_proxy_logger.info message when truncation occurs in
the request body or response spend log paths.

Adds 3 new tests:
- test_truncation_includes_db_safeguard_note
- test_response_truncation_logs_info_message
- test_request_body_truncation_logs_info_message

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
The /team/daily/activity endpoint used Prisma pagination (page_size=1000)
but the UI only fetched page 1. Teams with many keys/models easily exceed
1000 rows in LiteLLM_DailyTeamSpend, causing truncated totals.

Switches the endpoint to use SQL GROUP BY via get_daily_activity_aggregated
with include_entity_breakdown=True, returning all data in a single response
while preserving per-team breakdown. Also adds timezone parameter support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /organization/list endpoint only checked for PROXY_ADMIN role,
causing PROXY_ADMIN_VIEW_ONLY users to fall into the else branch
which restricts results to orgs the user is a member of. Use the
existing _user_has_admin_view() helper to include both roles.
…r_org_list

Fix admin viewer unable to see all organizations
…eaming (BerriAI#22937)

* feat(ui): add chat message and conversation types

* feat(ui): add useChatHistory hook for localStorage-backed conversations

* feat(ui): add ConversationList sidebar component

* feat(ui): add MCPConnectPicker for attaching MCP servers to chat

* feat(ui): add ModelSelector dropdown for chat

* feat(ui): add ChatInputBar with MCP tool attachment support

* feat(ui): add MCPAppsPanel with list/detail view for MCP servers

* feat(ui): add ChatMessages component; remove auto-scrollIntoView that caused scroll-lock bypass

* feat(ui): add ChatPage — ChatGPT-like UI with scroll lock, MCP tools, streaming

* feat(ui): add /chat route wired to ChatPage

* feat(ui): remove chat from leftnav — chat accessible via navbar button

* feat(ui): add Chat button to top navbar

* feat(ui): add dismissible Chat UI announcement banner to Playground page

* feat(proxy): add Chat UI link to Swagger description

* feat(ui): add react-markdown and syntax-highlighter deps for chat UI

* fix(ui): replace missing BorderOutlined import with inline stop icon div

* fix(ui): apply remark-gfm plugin to ReactMarkdown for GFM support

* fix(ui): remove unused isEvenRow variable in MCPAppsPanel

* fix(ui): add ellipsis when truncating conversation title

* fix(ui): wire search button to chats view; remove non-functional keyboard hint

* fix(ui): use serverRootPath in navbar chat link for sub-path deployments

* fix(ui): remove unused ChatInputBar and ModelSelector files

* fix(ui): correct grid bottom-border condition for odd server count

* fix(chat): move localStorage writes out of setConversations updater (React purity)

* fix(chat): fix stale closure in handleEditAndResend - compute history before async state update

* fix(chat): fix 4 issues in ChatMessages - array redaction, clipboard error, inline detection, remove unused ref
- Add PayGo / Priority Cost Tracking section to Vertex AI provider docs
- Document trafficType to service_tier mapping (ON_DEMAND_PRIORITY, FLEX, etc.)
- Add service tier cost keys to custom pricing docs
- Add provider-specific cost tracking note to spend tracking overview

Made-with: Cursor
docs: add PayGo/priority cost tracking for Gemini Vertex AI
…asoning_effort

fix(gemini): handle 'minimal' reasoning_effort param for gemini-3.1-f…
…gpt-53-oauth-models

feat(models): add ChatGPT 5.3/5.4 aliases + OpenAI gpt-5.4-pro
[Chore] update mcp documentation for header forwarding
…lize-detection

fix(o-series): generalize is_model_o_series_model to match any o+digit prefix
…pport (BerriAI#22945)

* fix(chat): fix router.push paths to use /ui/chat with serverRootPath support

* fix(chat): wrap chat page in Suspense boundary for Next.js static export

* fix(chat): fix clipboard writeText rejection handler - remove undefined message.error call

* feat(chat): rebuild UI with routing fixes

* fix(chat): use useTheme logoUrl + /get_image fallback for sidebar logo

* feat(chat): rebuild UI with logo fix

* fix(chat): use /get_image directly for logo (no ThemeProvider outside dashboard layout)

* feat(chat): add multi-model comparison and provider logos in chat UI

- Replace single model selector with multi-select (up to 3 models)
- Show provider logos next to model names in dropdown (openai, anthropic, gemini, mistral, groq, etc.)
- Selected models float to the top of the dropdown list
- Multi-model mode: responses stream in parallel side-by-side cards below each user message
- Multi-turn: each follow-up message carries full per-model history as context
- Surface API errors inline in response cards instead of silently swallowing them
- Rebuild UI
- Add gpt-5.4-pro and gpt-5.4-pro-2026-03-05 snapshot
- Input: $30/1M tokens, Output: $180/1M tokens
- 1.05M context window, 128K max output tokens
- Priority pricing for >272K input tokens (2x input, 1.5x output)
- Supports reasoning.effort: medium, high, xhigh
- Responses API, Chat Completions, Batch endpoints

Made-with: Cursor
…el-map

feat(openai): add gpt-5.4-pro to model map
…ls (BerriAI#22934)

OpenAI gpt-5.1, gpt-5.2, and gpt-5.3 chat models all support the
`web_search_options` parameter, but the model cost registry was missing
the `supports_web_search` flag. Only `gpt-5.2-pro` had it set.

Models updated:
- gpt-5.1, gpt-5.1-2025-11-13, gpt-5.1-chat-latest
- gpt-5.2, gpt-5.2-2025-12-11, gpt-5.2-chat-latest
- gpt-5.3-chat-latest
…spend

[Fix] Team Usage Spend Truncated Due to Pagination
Support passing duration=null on /key/update to reset a key's expiry to never expires, alongside the existing "-1" magic string (kept for backward compat).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…asoning_effort to model cost map

- Shift from hardcoded model checks to dynamic lookup via _supports_factory
- Add supports_none_reasoning_effort for gpt-5.1/5.2/5.4 chat variants
- Add supports_xhigh_reasoning_effort for gpt-5.1-codex-max, gpt-5.2, gpt-5.4+
- Update model_prices_and_context_window.json and backup
- Add ProviderSpecificModelInfo types for new fields
- Fix Azure: use _supports_reasoning_effort_level instead of removed is_model_gpt_5_1_model

Made-with: Cursor
…model_map

feat(gpt-5): Add supports_none_reasoning_effort and supports_xhigh_reasoning_effort to model cost map
- Pass request_model to Azure AI cost calculator to detect router requests
- Add router flat cost ($0.14/M input tokens) even when Azure returns actual model in response
- Add test for router flat cost with response containing actual model
- Update docs with cost calculation flow and configuration requirements

Made-with: Cursor
…ter-cost-tracking

feat(azure_ai): add router flat cost when response contains actual model
…1/messages

- Add forward_llm_provider_auth_headers support from litellm_settings
- When enabled, client x-api-key takes precedence over deployment keys
- Forward x-api-key when x-litellm-api-key or Authorization used for auth
- Fix duplicate patch lines in test_byok_oauth_endpoints.py
- Add Claude Code BYOK documentation with /login and ANTHROPIC_CUSTOM_HEADERS
- Add unit tests for clean_headers x-api-key forwarding logic
- Sync model_prices backup (pre-commit hook)

Made-with: Cursor
…used for LiteLLM proxy auth

When forward_llm_provider_auth_headers=true, Authorization: Bearer <litellm-key> was
being forwarded to Anthropic if it looked like an OAuth key, causing auth failures.

Now checked against authenticated_with_header: if Authorization was used to authenticate
with the proxy, it is always stripped before forwarding to the LLM provider.

Made-with: Cursor
feat(proxy): Client-side provider API key precedence for Anthropic /v1/messages (BYOK)
…il traces

- Remove redundant set_attributes() call from raw_gen_ai_request spans so they
  only contain provider-specific llm.{provider}.{param} attributes (Issue #3)
- Replace full set_attributes() on litellm_proxy_request with minimal attributes
  (model, call_type, response.id) when litellm_request child span exists (Issue #4)
- Skip guardrail span creation when no parent context exists to prevent orphaned
  root-level traces in the "All Spans" view (Issue BerriAI#5)
- Fall back to litellm_call_id for gen_ai.response.id when response object has no
  id field (EmbeddingResponse, ImageResponse) (Issue BerriAI#8)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
params = urlencode({"code": auth_code, "state": state})
separator = "&" if "?" in redirect_uri else "?"
location = f"{redirect_uri}{separator}{params}"
return RedirectResponse(url=location, status_code=302)

Check warning

Code scanning / CodeQL

URL redirection from remote source Medium

Untrusted URL redirection depends on a
user-provided value
.
Untrusted URL redirection depends on a
user-provided value
.

Copilot Autofix

AI 11 days ago

In general, to fix untrusted URL redirection you should not redirect to arbitrary user-provided URLs. Instead, either (a) maintain a whitelist of allowed redirect URIs (for example, pre-registered per client_id or server_id) and ensure the provided redirect_uri matches one of them, or (b) restrict redirects to same-origin or relative paths by validating the URL structure (e.g., no external host). For OAuth-like flows, the standard approach is to validate that the incoming redirect_uri exactly matches one of the URIs registered for that client.

The least-invasive fix here is to restrict redirect_uri to be either a relative URL (no scheme/host) or, if absolute, to share the same scheme/host/port as the current request (same-origin). That maintains flexibility (callers can still specify paths and query strings) without allowing redirection to arbitrary external domains. We can also treat state as a plain value in the query string, which is already safely encoded by urlencode; the primary vulnerability is the host of the redirect target, not embedding state.

Concretely, in byok_authorize_post (lines 629–676):

  1. Parse the redirect_uri with urlparse.
  2. If it is absolute (has a scheme), require that its scheme is http or https and that its netloc matches the current request’s base URL host (we can derive this via the existing get_request_base_url(request) helper or directly from request.url).
  3. If it is relative (no scheme and no netloc), normalize it to start with / and treat it as a path relative to the current origin.
  4. Build the final redirect URL by combining a trusted base (from get_request_base_url(request)) with the validated/normalized path and query parameters.
  5. Use this trusted location for RedirectResponse.

This preserves existing functionality (clients can still choose where within this site to be redirected and still get their state back) while preventing open redirects to attacker-controlled domains.

Suggested changeset 1
litellm/proxy/_experimental/mcp_server/byok_oauth_endpoints.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/litellm/proxy/_experimental/mcp_server/byok_oauth_endpoints.py b/litellm/proxy/_experimental/mcp_server/byok_oauth_endpoints.py
--- a/litellm/proxy/_experimental/mcp_server/byok_oauth_endpoints.py
+++ b/litellm/proxy/_experimental/mcp_server/byok_oauth_endpoints.py
@@ -645,11 +645,32 @@
     """
     _purge_expired_codes()
 
-    # Validate redirect_uri scheme to prevent open redirect
+    # Validate redirect_uri to prevent open redirects. Allow only same-origin absolute
+    # URLs or relative paths, and normalize the final redirect against the request base URL.
     parsed_uri = urlparse(redirect_uri)
-    if parsed_uri.scheme not in ("http", "https"):
-        raise HTTPException(status_code=400, detail="Invalid redirect_uri scheme")
+    base_url = get_request_base_url(request)
+    base_parsed = urlparse(base_url)
 
+    if parsed_uri.scheme:
+        # Absolute URL: require http/https and same netloc as the current request.
+        if parsed_uri.scheme not in ("http", "https"):
+            raise HTTPException(status_code=400, detail="Invalid redirect_uri scheme")
+        if parsed_uri.netloc != base_parsed.netloc:
+            raise HTTPException(status_code=400, detail="redirect_uri must be same-origin")
+        safe_path_and_query = parsed_uri.path or "/"
+        if parsed_uri.query:
+            safe_path_and_query += f"?{parsed_uri.query}"
+    else:
+        # Relative URL: do not allow it to specify a host; normalize to a path.
+        if parsed_uri.netloc:
+            raise HTTPException(status_code=400, detail="Invalid redirect_uri")
+        safe_path = parsed_uri.path or "/"
+        if not safe_path.startswith("/"):
+            safe_path = "/" + safe_path
+        safe_path_and_query = safe_path
+        if parsed_uri.query:
+            safe_path_and_query += f"?{parsed_uri.query}"
+
     # Reject new codes if the store is at capacity (prevents memory exhaustion
     # from a burst of abandoned OAuth flows).
     if len(_byok_auth_codes) >= _AUTH_CODES_MAX_SIZE:
@@ -671,8 +690,8 @@
     }
 
     params = urlencode({"code": auth_code, "state": state})
-    separator = "&" if "?" in redirect_uri else "?"
-    location = f"{redirect_uri}{separator}{params}"
+    separator = "&" if "?" in safe_path_and_query else "?"
+    location = f"{base_parsed.scheme}://{base_parsed.netloc}{safe_path_and_query}{separator}{params}"
     return RedirectResponse(url=location, status_code=302)
 
 
EOF
@@ -645,11 +645,32 @@
"""
_purge_expired_codes()

# Validate redirect_uri scheme to prevent open redirect
# Validate redirect_uri to prevent open redirects. Allow only same-origin absolute
# URLs or relative paths, and normalize the final redirect against the request base URL.
parsed_uri = urlparse(redirect_uri)
if parsed_uri.scheme not in ("http", "https"):
raise HTTPException(status_code=400, detail="Invalid redirect_uri scheme")
base_url = get_request_base_url(request)
base_parsed = urlparse(base_url)

if parsed_uri.scheme:
# Absolute URL: require http/https and same netloc as the current request.
if parsed_uri.scheme not in ("http", "https"):
raise HTTPException(status_code=400, detail="Invalid redirect_uri scheme")
if parsed_uri.netloc != base_parsed.netloc:
raise HTTPException(status_code=400, detail="redirect_uri must be same-origin")
safe_path_and_query = parsed_uri.path or "/"
if parsed_uri.query:
safe_path_and_query += f"?{parsed_uri.query}"
else:
# Relative URL: do not allow it to specify a host; normalize to a path.
if parsed_uri.netloc:
raise HTTPException(status_code=400, detail="Invalid redirect_uri")
safe_path = parsed_uri.path or "/"
if not safe_path.startswith("/"):
safe_path = "/" + safe_path
safe_path_and_query = safe_path
if parsed_uri.query:
safe_path_and_query += f"?{parsed_uri.query}"

# Reject new codes if the store is at capacity (prevents memory exhaustion
# from a burst of abandoned OAuth flows).
if len(_byok_auth_codes) >= _AUTH_CODES_MAX_SIZE:
@@ -671,8 +690,8 @@
}

params = urlencode({"code": auth_code, "state": state})
separator = "&" if "?" in redirect_uri else "?"
location = f"{redirect_uri}{separator}{params}"
separator = "&" if "?" in safe_path_and_query else "?"
location = f"{base_parsed.scheme}://{base_parsed.netloc}{safe_path_and_query}{separator}{params}"
return RedirectResponse(url=location, status_code=302)


Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
@Harshit28j Harshit28j closed this Mar 6, 2026
Harshit28j added a commit that referenced this pull request Mar 6, 2026
…onse IDs

Addresses 4 critical OpenTelemetry span issues in LiteLLM:

Issue #3: Remove redundant attributes from raw_gen_ai_request spans
- Removed self.set_attributes() call that was duplicating all parent span
  attributes (gen_ai.*, metadata.*) onto the raw span
- Raw span now only contains provider-specific llm.{provider}.* attributes
- Reduces storage and eliminates search confusion from duplicate data

Issue #4: Prevent attribute duplication on litellm_proxy_request parent span
- When litellm_request child span exists, removed redundant
  set_attributes() call on the parent proxy span
- Child span already carries all attributes; parent duplication doubles
  storage and complicates search

Issue BerriAI#5: Fix orphaned guardrail traces
- Guardrail spans were created with context=None when no parent proxy span
  existed, resulting in orphaned root spans (separate trace_id)
- Added _resolve_guardrail_context() helper to ensure guardrails always
  have a valid parent (litellm_request or proxy span)
- Applied fix to both _handle_success and _handle_failure paths

Issue BerriAI#8: Add gen_ai.response.id for embeddings and image generation
- EmbeddingResponse and ImageResponse types don't have provider response IDs
- Added fallback to standard_logging_payload["id"] (litellm call ID) for
  correlation across LiteLLM UI, Phoenix traces, and provider logs
- Completions still use provider ID (e.g. "chatcmpl-xxx") when available

Tests added:
- TestRawSpanAttributeIsolation: Verify raw span has no gen_ai/metadata attrs
- TestNoParentSpanDuplication: Verify parent span doesn't get duplicated attrs
- TestGuardrailSpanParenting: Verify guardrails are children (not orphaned)
- TestResponseIdFallback: Verify response ID set for all call types

All existing OTEL tests pass (73 passed, 14 pre-existing protocol failures).

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.