Skip to content

[Staging] - Ishaan March 17th #23903

Merged
ishaan-jaff merged 17 commits intomainfrom
litellm_ishaan_march_17
Mar 18, 2026
Merged

[Staging] - Ishaan March 17th #23903
ishaan-jaff merged 17 commits intomainfrom
litellm_ishaan_march_17

Conversation

@ishaan-jaff
Copy link
Copy Markdown
Contributor

[Staging] - Ishaan March 17th

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Add three grok-4.20 beta 2 model variants from xAI:
- grok-4.20-multi-agent-beta-0309 (reasoning + multi-agent)
- grok-4.20-beta-0309-reasoning (reasoning)
- grok-4.20-beta-0309-non-reasoning

Pricing (from https://docs.x.ai/docs/models):
- Input: $2.00/1M tokens ($0.20/1M cached)
- Output: $6.00/1M tokens
- Context: 2M tokens

All variants support vision, function calling, tool choice, and web search.
Closes LIT-2171
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 18, 2026 10:14pm

Request Review

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Mar 17, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_ishaan_march_17 (8c23f9f) with main (cbb4c2c)

Open in CodSpeed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This staging PR introduces several distinct features and fixes across 61 files: the MCPJWTSigner zero-trust guardrail (signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT), an interactive setup wizard (litellm --setup), supporting JWKS/OIDC discovery endpoints, jwt_claims propagation through the auth stack, a correctness fix for a model-variable shadowing bug in the router, and a large volume of cosmetic Black reformatting.

Key changes:

  • MCPJWTSigner guardrail (mcp_jwt_signer.py, ~891 lines): Signs outbound MCP requests after optionally verifying the incoming Bearer token against an upstream IdP. Supports configurable claim operations, two-token model, end-user identity mapping, and required/optional claim validation. Multiple bugs were flagged in previous review rounds (falsy-claim truthiness checks, wrong exception type for revoked tokens, broad except Exception masking infra failures, FastAPI imports in guardrail code, JWKS cache concurrency).
  • OIDC discovery augmentation (discoverable_endpoints.py): Adds /.well-known/jwks.json endpoint and augments /.well-known/openid-configuration when MCPJWTSigner is active. A new issue was found: the augmented response does not update the issuer field to reflect signer.issuer; if a custom issuer is configured, MCP servers performing issuer validation will reject all tokens because the JWT iss won't match the discovery doc issuer.
  • Setup wizard (setup_wizard.py, scripts/install.sh): Arrow-key interactive provider selection, API key validation, and YAML config generation. Previously flagged issues (silent overwrite, hardcoded model lists, unquoted YAML scalars, invalid-key propagation) remain open.
  • Router bug fix (router.py): _deployment_model variable prevents the outer model group name from being clobbered per loop iteration — a real correctness fix.
  • Auth propagation (user_api_key_auth.py, _types.py): jwt_claims is now threaded from JWT auth into UserAPIKeyAuth for downstream guardrail use; jwt_claims: Optional[dict] annotation-without-assignment (previously flagged) is a latent uninitialized-variable risk.
  • Cosmetic reformatting: The bulk of the line-count delta (litellm_logging.py, responses/main.py, etc.) is Black reformatting with no logic changes.

Confidence Score: 2/5

  • Not safe to merge until the MCPJWTSigner bugs and OIDC issuer mismatch are resolved; the setup wizard issues are lower severity but should also be addressed.
  • The PR introduces a large new security-critical feature (MCPJWTSigner) that has multiple confirmed bugs spanning previous review rounds: falsy claims treated as absent in _validate_required_claims and _resolve_end_user_identity, semantically wrong exception type for revoked tokens, a broad except Exception hiding infrastructure failures as HTTP 401, FastAPI imports in guardrail code, and a JWKS cache without concurrency protection. A new issue was found in this round: the OIDC discovery document does not reflect signer.issuer, which will cause token validation failures for any deployment using a custom issuer. The router bug fix and auth jwt_claims threading are good changes. The cosmetic reformatting is safe. The overall score is constrained by the unresolved security guardrail issues.
  • litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py (multiple logic bugs), litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py (OIDC issuer mismatch), litellm/setup_wizard.py (silent config overwrite, unquoted YAML, invalid key propagation)

Important Files Changed

Filename Overview
litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py New MCPJWTSigner guardrail with several known issues: falsy claim values incorrectly rejected in _validate_required_claims (truthiness check vs. key-presence), wrong exception type (ExpiredSignatureError) for inactive/revoked tokens from introspection, falsy values skipped in _resolve_end_user_identity, broad except Exception masking infrastructure failures as HTTP 401, FastAPI imports inside the guardrail, and module-level JWKS cache lacking concurrency safety — all flagged in previous review threads.
litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py OIDC discovery augmentation adds jwks_uri but doesn't update the issuer field to reflect signer.issuer; when a custom issuer is configured, the discovery doc's issuer will mismatch the JWT's iss claim, causing MCP servers to reject all tokens.
litellm/setup_wizard.py New interactive setup wizard; issues previously flagged include hardcoded model/env-key lists, silent overwrite of existing config, invalid keys silently written to config, and unquoted model_name/model YAML fields. Core wizard flow and fallback terminal handling are well-structured.
litellm/proxy/_experimental/mcp_server/mcp_server_manager.py pre_call_tool_check now returns a hook_result dict with arguments and extra_headers; JWT from hook is correctly forwarded to SSE/HTTP MCP servers but silently discarded for OpenAPI-backed servers (previously flagged). Bearer token extraction from raw headers is correct.
litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/init.py Clean guardrail initializer with proper registry entries; mode=None produces an unclean error message (previously flagged). Signer is registered as a logging callback correctly.
litellm/router.py Correct bug fix: introduces _deployment_model to prevent the outer model group name from being overwritten by a per-deployment base model on each loop iteration, fixing incorrect context-window error messages.
litellm/proxy/auth/user_api_key_auth.py Adds jwt_claims propagation through JWT auth paths; jwt_claims: Optional[dict] annotation without initialization (previously flagged) could be unbound in edge cases. The standard JWT auth path correctly assigns jwt_claims from result.get("jwt_claims", None).
scripts/install.sh New POSIX-compatible install script with correct Python version detection, pip/uv/venv handling, and colored output. Uses set -eu correctly for sh compatibility. Resolves the previously flagged missing scripts/install.sh referenced by documentation.
litellm/proxy/proxy_cli.py Adds --setup flag that invokes run_setup_wizard() and returns early; correctly resolves previously flagged missing CLI implementation.
litellm/litellm_core_utils/litellm_logging.py 288 lines changed, all purely cosmetic Black-style reformatting (parenthesis placement for long assignments). No logic changes.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLMProxy
    participant MCPJWTSigner
    participant IdP
    participant MCPServer

    Client->>LiteLLMProxy: MCP tool call (Bearer token or API key)
    LiteLLMProxy->>MCPJWTSigner: async_pre_call_hook(call_type="call_mcp_tool")

    alt FR-5: Verify + re-sign configured
        MCPJWTSigner->>IdP: Fetch OIDC discovery / JWKS
        IdP-->>MCPJWTSigner: JWKS keys (cached 1h)
        MCPJWTSigner->>MCPJWTSigner: Verify incoming JWT (RS256)
        MCPJWTSigner->>MCPJWTSigner: Validate required_claims (FR-15)
    end

    MCPJWTSigner->>MCPJWTSigner: Resolve end-user identity (FR-12)
    MCPJWTSigner->>MCPJWTSigner: Build outbound claims + scope (FR-10, FR-13)
    MCPJWTSigner->>MCPJWTSigner: Sign JWT with LiteLLM RSA key (RS256)

    opt FR-14: Two-token model
        MCPJWTSigner->>MCPJWTSigner: Sign channel token (separate aud/TTL)
    end

    MCPJWTSigner-->>LiteLLMProxy: extra_headers["Authorization"] = Bearer <signed-JWT>
    LiteLLMProxy->>MCPServer: Tool call + Authorization: Bearer <signed-JWT>

    MCPServer->>LiteLLMProxy: GET /.well-known/jwks.json
    LiteLLMProxy-->>MCPServer: RSA public key (JWKS)
    MCPServer->>MCPServer: Verify JWT signature + claims
    MCPServer-->>LiteLLMProxy: Tool result
    LiteLLMProxy-->>Client: Tool result
Loading

Last reviewed commit: "Merge branch 'main' ..."

Comment on lines +32357 to +32403
"xai/grok-4.20-multi-agent-beta-0309": {
"cache_read_input_token_cost": 2e-07,
"input_cost_per_token": 2e-06,
"litellm_provider": "xai",
"max_input_tokens": 2000000,
"max_output_tokens": 2000000,
"max_tokens": 2000000,
"mode": "chat",
"output_cost_per_token": 6e-06,
"source": "https://docs.x.ai/docs/models",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
},
"xai/grok-4.20-beta-0309-reasoning": {
"cache_read_input_token_cost": 2e-07,
"input_cost_per_token": 2e-06,
"litellm_provider": "xai",
"max_input_tokens": 2000000,
"max_output_tokens": 2000000,
"max_tokens": 2000000,
"mode": "chat",
"output_cost_per_token": 6e-06,
"source": "https://docs.x.ai/docs/models",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
},
"xai/grok-4.20-beta-0309-non-reasoning": {
"cache_read_input_token_cost": 2e-07,
"input_cost_per_token": 2e-06,
"litellm_provider": "xai",
"max_input_tokens": 2000000,
"max_output_tokens": 2000000,
"max_tokens": 2000000,
"mode": "chat",
"output_cost_per_token": 6e-06,
"source": "https://docs.x.ai/docs/models",
"supports_function_calling": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent max_tokens value type and generic source URL

Two minor inconsistencies compared to the surrounding xAI entries:

  1. Integer vs float: All nearby xAI entries use floats (2000000.0) for max_input_tokens, max_output_tokens, and max_tokens. The three new entries use bare integers (2000000). While JSON parsing in Python treats them identically at runtime, this is inconsistent with the file's established conventions.

  2. Generic source URL: All other recently-added xAI models link to model-specific documentation pages (e.g., https://docs.x.ai/docs/models/grok-4-1-fast-non-reasoning). The new entries use the generic https://docs.x.ai/docs/models. A model-specific URL improves traceability for future pricing verification.

Suggested change
"xai/grok-4.20-multi-agent-beta-0309": {
"cache_read_input_token_cost": 2e-07,
"input_cost_per_token": 2e-06,
"litellm_provider": "xai",
"max_input_tokens": 2000000,
"max_output_tokens": 2000000,
"max_tokens": 2000000,
"mode": "chat",
"output_cost_per_token": 6e-06,
"source": "https://docs.x.ai/docs/models",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
},
"xai/grok-4.20-beta-0309-reasoning": {
"cache_read_input_token_cost": 2e-07,
"input_cost_per_token": 2e-06,
"litellm_provider": "xai",
"max_input_tokens": 2000000,
"max_output_tokens": 2000000,
"max_tokens": 2000000,
"mode": "chat",
"output_cost_per_token": 6e-06,
"source": "https://docs.x.ai/docs/models",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
},
"xai/grok-4.20-beta-0309-non-reasoning": {
"cache_read_input_token_cost": 2e-07,
"input_cost_per_token": 2e-06,
"litellm_provider": "xai",
"max_input_tokens": 2000000,
"max_output_tokens": 2000000,
"max_tokens": 2000000,
"mode": "chat",
"output_cost_per_token": 6e-06,
"source": "https://docs.x.ai/docs/models",
"supports_function_calling": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
},
"xai/grok-4.20-multi-agent-beta-0309": {
"cache_read_input_token_cost": 2e-07,
"input_cost_per_token": 2e-06,
"litellm_provider": "xai",
"max_input_tokens": 2000000.0,
"max_output_tokens": 2000000.0,
"max_tokens": 2000000.0,
"mode": "chat",
"output_cost_per_token": 6e-06,
"source": "https://docs.x.ai/docs/models",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
},
"xai/grok-4.20-beta-0309-reasoning": {
"cache_read_input_token_cost": 2e-07,
"input_cost_per_token": 2e-06,
"litellm_provider": "xai",
"max_input_tokens": 2000000.0,
"max_output_tokens": 2000000.0,
"max_tokens": 2000000.0,
"mode": "chat",
"output_cost_per_token": 6e-06,
"source": "https://docs.x.ai/docs/models",
"supports_function_calling": true,
"supports_reasoning": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
},
"xai/grok-4.20-beta-0309-non-reasoning": {
"cache_read_input_token_cost": 2e-07,
"input_cost_per_token": 2e-06,
"litellm_provider": "xai",
"max_input_tokens": 2000000.0,
"max_output_tokens": 2000000.0,
"max_tokens": 2000000.0,
"mode": "chat",
"output_cost_per_token": 6e-06,
"source": "https://docs.x.ai/docs/models",
"supports_function_calling": true,
"supports_tool_choice": true,
"supports_vision": true,
"supports_web_search": true
},

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +395 to +409
def test_grok_4_20_beta_reasoning_cost_calculation(self):
"""Test cost calculation for grok-4.20-beta-0309-reasoning model."""
usage = Usage(prompt_tokens=100, completion_tokens=200, total_tokens=300)

prompt_cost, completion_cost = cost_per_token(
model="grok-4.20-beta-0309-reasoning", usage=usage
)

# Input: 100 tokens * $2e-6 = $0.0002
# Output: 200 tokens * $6e-6 = $0.0012
expected_prompt_cost = 100 * 2e-6
expected_completion_cost = 200 * 6e-6

assert math.isclose(prompt_cost, expected_prompt_cost, rel_tol=1e-10)
assert math.isclose(completion_cost, expected_completion_cost, rel_tol=1e-10)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing reasoning token test for reasoning model

grok-4.20-beta-0309-reasoning is a reasoning model ("supports_reasoning": true in the JSON), yet the test only validates basic token pricing with no reasoning tokens in completion_tokens_details. The XAI cost calculator has a key billing rule: reasoning tokens are added to completion tokens for billing purposes. Without a test case that includes reasoning_tokens, the XAI-specific billing logic isn't exercised for this new model.

Consider adding a variant that mirrors test_reasoning_tokens_cost_calculation — pass a CompletionTokensDetailsWrapper with reasoning_tokens set, and assert that the expected completion cost is (completion_tokens + reasoning_tokens) * output_cost_per_token.

* docs: add Quick Install section for litellm --setup wizard

* docs: clarify setup wizard is for local/beginner use
Comment on lines +22 to +24

This detects your OS, installs `litellm[proxy]`, and drops you straight into the setup wizard.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Referenced scripts/install.sh does not exist

The install command instructs users to run:

curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh

However, scripts/install.sh does not exist anywhere in this repository. A user following this documentation will get a curl error (the -f flag causes a silent failure on HTTP 404), resulting in nothing being installed with no useful error message to diagnose the problem.

This entire "Quick Install" section should only be merged after the referenced scripts/install.sh is created and committed.

Comment on lines +30 to +55
Welcome to LiteLLM

Choose your LLM providers
○ 1. OpenAI GPT-4o, GPT-4o-mini, o1
○ 2. Anthropic Claude Opus, Sonnet, Haiku
○ 3. Azure OpenAI GPT-4o via Azure
○ 4. Google Gemini Gemini 2.0 Flash, 1.5 Pro
○ 5. AWS Bedrock Claude, Llama via AWS
○ 6. Ollama Local models

❯ Provider(s): 1,2

❯ OpenAI API key: sk-...
❯ Anthropic API key: sk-ant-...

❯ Port [4000]:
❯ Master key [auto-generate]:

✔ Config saved → ./litellm_config.yaml

❯ Start the proxy now? (Y/n):
```

The wizard walks you through:
1. Pick your LLM providers (OpenAI, Anthropic, Azure, Bedrock, Gemini, Ollama)
2. Enter API keys for each provider
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 litellm --setup wizard is not implemented

The docs present a detailed interactive wizard invoked via litellm --setup, but this flag/command does not exist in the litellm CLI (litellm/proxy/proxy_cli.py). Searching the entire codebase finds zero implementations of a --setup argument or an interactive provider wizard.

Users who install the package and then run litellm --setup will get an unrecognized argument error. This documentation should not be published until the wizard is actually implemented.

* feat(setup): add interactive setup wizard + install.sh

Adds `litellm --setup` — a Claude Code-style TUI onboarding wizard that
guides users through provider selection, API key entry, and proxy config
generation, then optionally starts the proxy immediately.

- litellm/setup_wizard.py: wizard with ASCII art, numbered provider menu
  (OpenAI, Anthropic, Azure, Gemini, Bedrock, Ollama), API key prompts,
  port/master-key config, and litellm_config.yaml generation
- litellm/proxy/proxy_cli.py: adds --setup flag that invokes the wizard
- scripts/install.sh: curl-installable script (detect OS/Python, pip
  install litellm[proxy], launch wizard)

Usage:
  curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh
  litellm --setup

* fix(install.sh): remove orange color, add LITELLM_BRANCH env var for branch installs

* fix(install.sh): install from git branch so --setup is available for QA

* fix(install.sh): remove stale LITELLM_BRANCH reference that caused unbound variable error

* fix(install.sh): force-reinstall from git to bypass cached PyPI version

* fix(install.sh): show pip progress bar during install

* fix(install.sh): always launch wizard via $PYTHON_BIN -m litellm, not PATH binary

* fix(install.sh): use litellm.proxy.proxy_cli module (no __main__.py exists)

* fix(install.sh): suppress RuntimeWarning from module invocation

* fix(install.sh): use Python bin-dir litellm binary to avoid CWD sys.path shadowing

* fix(install.sh): use sysconfig.get_path('scripts') to find pip-installed litellm binary

* fix(install.sh): redirect stdin from /dev/tty on exec so wizard gets terminal, not exhausted pipe

* fix(install.sh): warn about git clone duration, drop --no-cache-dir so re-runs are faster

* feat(setup_wizard): arrow-key selector, updated model names

* fix(setup_wizard): use sysconfig binary to start proxy, not python -m litellm

* feat(setup_wizard): credential validation after key entry + clear next-steps after proxy start

* style(install.sh): show git clone warning in blue

* refactor(setup_wizard): class with static methods, use check_valid_key from litellm.utils

* address greptile review: fix yaml escaping, port validation, display name collisions, tests

- setup_wizard.py: add _yaml_escape() for safe YAML embedding of API keys
- setup_wizard.py: add _styled_input() with readline ANSI ignore markers
- setup_wizard.py: change DIVIDER to _divider() fn to avoid import-time color capture
- setup_wizard.py: validate port range 1-65535, initialize before loop
- setup_wizard.py: qualify azure display names (azure-gpt-4o) to avoid collision with openai
- setup_wizard.py: work on env_copy in _build_config to avoid mutating caller's dict
- setup_wizard.py: skip model_list entries for providers with no credentials
- setup_wizard.py: prompt for azure deployment name
- setup_wizard.py: wrap os.execlp in try/except with friendly fallback
- setup_wizard.py: wrap config write in try/except OSError
- setup_wizard.py: fix _validate_and_report to use two print lines (no \r overwrite)
- setup_wizard.py: add .gitignore tip next to key storage notice
- setup_wizard.py: fix run_setup_wizard() return type annotation to None
- scripts/install.sh: drop pipefail (not supported by dash on Ubuntu when invoked as sh)
- scripts/install.sh: use litellm[proxy] from PyPI (not hardcoded dev branch)
- scripts/install.sh: guard /dev/tty read with -r check for Docker/CI compat
- scripts/install.sh: remove --force-reinstall to avoid downgrading dependencies
- tests/test_litellm/test_setup_wizard.py: 13 unit tests for _build_config and _yaml_escape

* style: black format setup_wizard.py

* fix: address remaining greptile issues - Windows compat, YAML quoting, credential flow

- guard termios/tty imports with try/except ImportError for Windows compat
- quote master_key as YAML double-quoted scalar (same as env vars)
- remove unused port param from _build_config signature
- _validate_and_report now returns the final key so re-entered creds are stored
- add test for master_key YAML quoting

* fix: add --port to suggested command, guard /dev/tty exec in install.sh

* fix: quote api_base in YAML, skip azure if no deployment, only redraw on state change

* fix: address greptile review comments

- _yaml_escape: add control character escaping (\n, \r, \t)
- test: fix tautological assertion in test_build_config_azure_no_deployment_skipped
- test: add tests for control character escaping in _yaml_escape
Comment on lines +254 to +315
version = importlib.metadata.version("litellm")
except Exception:
version = "unknown"
print()
print(orange(LITELLM_ASCII.rstrip("\n")))
print(f" {orange('Welcome')} to {bold('LiteLLM')} {grey('v' + version)}")
print()
print(_divider())
print()

# ── provider selector ───────────────────────────────────────────────────

@staticmethod
def _select_providers() -> List[Dict]:
"""Arrow-key multi-select. Falls back to number input if /dev/tty unavailable."""
if not _HAS_RAW_TERMINAL:
return SetupWizard._select_fallback()
try:
return SetupWizard._select_interactive()
except OSError:
return SetupWizard._select_fallback()

@staticmethod
def _read_key() -> str:
"""Read one keypress from /dev/tty in raw mode."""
assert (
termios is not None and tty is not None
) # only called when _HAS_RAW_TERMINAL
with open("/dev/tty", "rb") as tty_fh:
fd = tty_fh.fileno()
old = termios.tcgetattr(fd)
try:
tty.setraw(fd)
ch = tty_fh.read(1)
if ch == b"\x1b":
ch2 = tty_fh.read(1)
if ch2 == b"[":
ch3 = tty_fh.read(1)
return "\x1b[" + ch3.decode("utf-8", errors="replace")
return "\x1b" + ch2.decode("utf-8", errors="replace")
return ch.decode("utf-8", errors="replace")
finally:
termios.tcsetattr(fd, termios.TCSADRAIN, old)

@staticmethod
def _render_selector(cursor: int, selected: Set[int], first_render: bool) -> int:
"""Draw or redraw the provider list. Returns the number of lines printed."""
lines = [
f"\n {bold('Add your first model')}\n",
grey(" ↑↓ to navigate · Space to select · Enter to confirm") + "\n",
"\n",
]
for i, p in enumerate(PROVIDERS):
arrow = blue("❯") if i == cursor else " "
bullet = green("◉") if i in selected else grey("○")
name_str = bold(p["name"]) if i == cursor else p["name"]
lines.append(f" {arrow} {bullet} {name_str} {grey(p['description'])}\n")
lines.append("\n")

content = "".join(lines)
if not first_render and _supports_color():
sys.stdout.write(_MOVE_UP.format(content.count("\n")))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded env-key names and model lists contradict CLAUDE.md guidance

The PROVIDERS list hardcodes both provider env-key names (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY) and model lists (e.g. ["gpt-4o", "gpt-4o-mini"]). The CLAUDE.md section added in this same PR explicitly states:

Do not hardcode provider env-key names or model lists that already exist in the codebase.

These env-key names already exist in the respective provider implementations (e.g. litellm.utils, individual provider configs) and the model lists exist in model_prices_and_context_window.json. This creates a maintenance burden: when new popular models are released, two places must be updated (the JSON and the wizard's PROVIDERS list).

The test_model field pattern is a good start — but the models list could be driven by querying get_model_info for each provider's top models rather than duplicating them inline.

Rule Used: CLAUDE.md (source)

Comment on lines +236 to +244
config_path = Path(os.getcwd()) / "litellm_config.yaml"
try:
config_path.write_text(
SetupWizard._build_config(providers, env_vars, master_key)
)
except OSError as exc:
print(f"\n {bold(_CROSS + ' Could not write config:')} {exc}")
print(" Try running from a directory you have write access to.\n")
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent overwrite of existing config

config_path.write_text(...) overwrites litellm_config.yaml without checking whether the file already exists. A user who runs litellm --setup a second time after having manually customised their config will silently lose all those edits. There is no backup, no diff, and no confirmation prompt.

Consider adding an existence check before writing:

Suggested change
config_path = Path(os.getcwd()) / "litellm_config.yaml"
try:
config_path.write_text(
SetupWizard._build_config(providers, env_vars, master_key)
)
except OSError as exc:
print(f"\n {bold(_CROSS + ' Could not write config:')} {exc}")
print(" Try running from a directory you have write access to.\n")
return
config_path = Path(os.getcwd()) / "litellm_config.yaml"
if config_path.exists():
overwrite = _styled_input(
f" {blue('❯')} {bold(str(config_path))} already exists. Overwrite? {grey('(y/N)')}: "
)
if overwrite.lower() != "y":
print(f"\n {grey('Setup cancelled — existing config was not changed.')}\n")
return
try:
config_path.write_text(
SetupWizard._build_config(providers, env_vars, master_key)
)

Comment on lines +480 to +485
print(f" {_CROSS} {bold(provider['name'])} {grey('— invalid API key')}")
if (
_styled_input(f" {blue('❯')} Re-enter key? {grey('(y/N)')}: ").lower()
!= "y"
):
return api_key
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Invalid key silently written to config

When check_valid_key reports that the key is invalid and the user answers "N" to the re-entry prompt, _validate_and_report returns the original (invalid) key unchanged. That key then flows back through _collect_keys into env_vars and is ultimately written verbatim into the environment_variables block of litellm_config.yaml.

When the user later starts the proxy and all requests to that provider fail with authentication errors, there is nothing in the config or any log to indicate this was flagged during setup. At a minimum, the caller (_collect_keys) should be told that validation failed so it can skip that provider's models in the generated config, or the return value should carry an is_valid flag.

The current callsite:

env_vars[p["env_key"]] = SetupWizard._validate_and_report(p, key)

silently uses whatever is returned, including a key already known to be wrong.

Comment on lines +547 to +549
lines += [
f" - model_name: {display}",
" litellm_params:",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unquoted model_name and model fields in generated YAML

model_name and model values are written as bare YAML scalars without quoting:

lines += [
    f"  - model_name: {display}",
    "    litellm_params:",
    f"      model: {model}",
]

Current hardcoded names happen to be safe, but the Bedrock model (bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0) produces model_name: anthropic.claude-3-5-sonnet-20241022-v2:0 — a colon-containing unquoted scalar. YAML parsers handle v2:0 (no space after :) correctly today, but any future model whose display name contains : (colon-space) or starts with {, [, >, |, &, *, # would silently break the generated config. Quoting both fields defensively avoids this class of failure:

lines += [
    f'  - model_name: "{display}"',
    "    litellm_params:",
    f'      model: "{model}"',
]

…P auth (#23897)

* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers

* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.

* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.

* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers

* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.

* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.

* feat(guardrails): add MCPJWTSigner built-in guardrail for zero trust MCP auth

Signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT so MCP servers
can trust a single signing authority instead of every upstream IdP.

Enable in config.yaml:
  guardrails:
    - guardrail_name: mcp-jwt-signer
      litellm_params:
        guardrail: mcp_jwt_signer
        mode: pre_mcp_call
        default_on: true

JWT carries sub (user_id), act.sub (team_id, RFC 8693), tool-level scope, iss,
aud, iat/exp/nbf. RSA-2048 keypair auto-generated at startup unless
MCP_JWT_SIGNING_KEY env var is set.

Adds /.well-known/jwks.json endpoint and jwks_uri to /.well-known/openid-configuration
so MCP servers can verify LiteLLM-issued tokens via OIDC discovery.

* Update MCPServerManager to raise HTTPException with status code 400 for extra headers in OpenAPI-backed servers. Adjust tests to verify the correct status code and exception message.

* fix: address P1 issues in MCPJWTSigner

- OpenAPI servers: warn + skip header injection instead of 500
- JWKS Cache-Control: 5min for auto-generated keys, 1h for persistent
- sub claim: fallback to apikey:{token_hash} for anonymous callers
- ttl_seconds: validate > 0 at init time

* docs: add MCP zero trust auth guide with architecture diagram

* docs: add FastMCP JWT verification guide to zero trust doc

* fix: address remaining Greptile review issues (round 2)

- mcp_server_manager: warn when hook Authorization overwrites existing header
- __init__: remove _mcp_jwt_signer_instance from __all__ (private internal)
- discoverable_endpoints: copy dict instead of mutating in-place on OIDC augmentation
- test docstring: reflect warn-and-continue behavior for OpenAPI servers
- test: update scope assertions for least-privilege (no mcp:tools/list on tool-call JWTs)

* fix: address Greptile round 3 feedback

- initialize_guardrail: validate mode='pre_mcp_call' at init time — misconfigured
  mode silently bypasses JWT injection, which is a zero-trust bypass
- _build_claims: remove duplicate inline 'import re' (module-level import already present)
- _types.py: add TODO comment explaining jwt_claims is forward-compat plumbing
  for a follow-up PR that will forward upstream IdP claims into outbound MCP JWTs

* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes

Addresses all missing pieces from the scoping doc review:

FR-5 (Verify + re-sign): MCPJWTSigner now accepts access_token_discovery_uri
and token_introspection_endpoint.  When set, the incoming Bearer token is
extracted from raw_headers (threaded through pre_call_tool_check), verified
against the IdP's JWKS (JWT) or introspected (opaque), and only re-signed if
valid.  Falls back to user_api_key_dict.jwt_claims for LiteLLM JWT-auth mode.

FR-12 (Configurable end-user identity mapping): end_user_claim_sources
ordered list drives sub resolution — sources: token:<claim>, litellm:user_id,
litellm:email, litellm:end_user_id, litellm:team_id.

FR-13 (Claim operations): add_claims (insert-if-absent), set_claims (always
override), remove_claims (delete) applied in that order.

FR-14 (Two-token model): channel_token_audience + channel_token_ttl issue a
second JWT injected as x-mcp-channel-token: Bearer <token>.

FR-15 (Incoming claim validation): required_claims raises HTTP 403 when any
listed claim is absent; optional_claims passes listed claims from verified
token into the outbound JWT.

FR-9 (Debug headers): debug_headers: true emits x-litellm-mcp-debug with kid,
sub, iss, exp, scope.

FR-10 (Configurable scopes): allowed_scopes replaces auto-generation.  Also
fixed: tool-call JWTs no longer grant mcp:tools/list (overpermission).

P1 fixes:
- proxy/utils.py: _convert_mcp_hook_response_to_kwargs merges rather than
  replaces extra_headers, preserving headers from prior guardrails.
- mcp_server_manager.py: warns when hook injects Authorization alongside a
  server-configured authentication_token (previously silent).
- mcp_server_manager.py: pre_call_tool_check now accepts raw_headers and
  extracts incoming_bearer_token so FR-5 verification has the raw token.
- proxy/utils.py: remove stray inline import inspect inside loop (pre-existing
  lint error, now cleaned up).

Tests: 43 passing (28 new tests covering all FR flags + P1 fixes).

* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes (core)

Remaining files from the FR implementation:

mcp_jwt_signer.py — full rewrite with all new params:
  FR-5:  access_token_discovery_uri, token_introspection_endpoint,
         verify_issuer, verify_audience + _verify_incoming_jwt(),
         _introspect_opaque_token()
  FR-12: end_user_claim_sources ordered resolution chain
  FR-13: add_claims, set_claims, remove_claims
  FR-14: channel_token_audience, channel_token_ttl → x-mcp-channel-token
  FR-15: required_claims (raises 403), optional_claims (passthrough)
  FR-9:  debug_headers → x-litellm-mcp-debug
  FR-10: allowed_scopes; tool-call JWTs no longer over-grant tools/list

mcp_server_manager.py:
  - pre_call_tool_check gains raw_headers param to extract incoming_bearer_token
  - Silent Authorization override warning fixed: now fires when server has
    authentication_token AND hook injects Authorization

tests/test_mcp_jwt_signer.py:
  28 new tests covering all FR flags + P1 fixes (43 total, all passing)

* fix(mcp_jwt_signer): address pre-landing review issues

- Remove stale TODO comment on UserAPIKeyAuth.jwt_claims — the field is
  already populated and consumed by MCPJWTSigner in the same PR
- Fix _get_oidc_discovery to only cache the OIDC discovery doc when
  jwks_uri is present; a malformed/empty doc now retries on the next
  request instead of being permanently cached until proxy restart
- Add FR-5 test coverage for _fetch_jwks (cache hit/miss),
  _get_oidc_discovery (cache/no-cache on bad doc), _verify_incoming_jwt
  (valid token, expired token), _introspect_opaque_token (active,
  inactive, no endpoint), and the end-to-end 401 hook path — 53 tests
  total, all passing

* docs(mcp_zero_trust): rewrite as use-case guide covering all new JWT signer features

Add scenario-driven sections for each new config area:
- Verify+re-sign with Okta/Azure AD (access_token_discovery_uri,
  end_user_claim_sources, token_introspection_endpoint)
- Enforcing caller attributes with required_claims / optional_claims
- Adding metadata via add_claims / set_claims / remove_claims
- Two-token model for AWS Bedrock AgentCore Gateway
  (channel_token_audience / channel_token_ttl)
- Controlling scopes with allowed_scopes
- Debugging JWT rejections with debug_headers

Update JWT claims table to reflect configurable sub (end_user_claim_sources)

* fix(mcp_jwt_signer): wire all config.yaml params through initialize_guardrail

The factory was only passing issuer/audience/ttl_seconds to MCPJWTSigner.
All FR-5/9/10/12/13/14/15 params (access_token_discovery_uri,
end_user_claim_sources, add/set/remove_claims, channel_token_audience,
required/optional_claims, debug_headers, allowed_scopes, etc.) were
silently dropped, making every advertised advanced feature non-functional
when loaded from config.yaml.

Add regression test that asserts every param is wired through correctly.

* docs(mcp_zero_trust): add hero image

* docs(mcp_zero_trust): apply Linear-style edits

- Lead with the problem (unsigned direct calls bypass access controls)
- Shorter statement section headers instead of question-form headers
- Move diagram/OIDC discovery block after the reader is bought in
- Add 'read further only if you need to' callout after basic setup
- Two-token section now opens from the user problem not product jargon
- Add concrete 403 error response example in required_claims section
- Debug section opens from the symptom (MCP server returning 401)
- Lowercase claims reference header for consistency

* fix(mcp_jwt_signer): fix algorithm confusion attack + add OIDC discovery 24h TTL

- Remove alg from unverified JWT header; use signing_jwk.algorithm_name from JWKS key instead.
  Reading alg from attacker-controlled headers enables alg:none / HS256 confusion attacks.
- Add _oidc_discovery_fetched_at timestamp and _OIDC_DISCOVERY_TTL = 86400 (24h).
  Without a TTL the cached discovery doc never refreshes, so IdP key rotation is invisible.

---------

Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>
Comment on lines +526 to +536
missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
if missing:
raise HTTPException(
status_code=403,
detail={
"error": (
f"MCPJWTSigner: incoming token is missing required claims: "
f"{missing}. Configure the IdP to include these claims."
)
},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Falsy-value claims incorrectly reported as missing

The check not (jwt_claims or {}).get(c) treats falsy claim values (False, 0, "", []) as absent — meaning a JWT containing {"email_verified": false} or {"retry_count": 0} would fail required_claims validation and block the request with HTTP 403, even though the claim is present.

The correct check tests for key absence, not value truthiness:

Suggested change
missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
if missing:
raise HTTPException(
status_code=403,
detail={
"error": (
f"MCPJWTSigner: incoming token is missing required claims: "
f"{missing}. Configure the IdP to include these claims."
)
},
)
missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

Comment on lines +503 to +507
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)
return result
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Wrong exception type for inactive token

jwt.exceptions.ExpiredSignatureError semantically means the token's exp claim is in the past. Using it to signal that an introspection endpoint returned active=false is misleading — the two conditions are distinct (e.g., a token could be revoked long before it expires).

When this is caught in async_pre_call_hook and re-raised as an HTTP 401, the error message will claim "expired signature" even when the real issue is token revocation or suspension, making production debugging much harder.

Use a more semantically accurate exception:

Suggested change
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)
return result
raise jwt.exceptions.InvalidTokenError(
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)

Comment on lines 2282 to +2293
# For OpenAPI servers, call the tool handler directly instead of via MCP client
if mcp_server.spec_path:
verbose_logger.debug(
f"Calling OpenAPI tool {name} directly via HTTP handler"
"Calling OpenAPI tool %s directly via HTTP handler", name
)
if hook_result.get("extra_headers"):
verbose_logger.warning(
"pre_mcp_call hook returned extra_headers for OpenAPI-backed "
"MCP server '%s' — header injection is not supported for "
"OpenAPI servers; headers will be ignored. Use SSE/HTTP "
"transport to enable hook header injection.",
server_name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 JWT silently dropped for OpenAPI-backed servers

When MCPJWTSigner is active and a tool call goes to an OpenAPI-backed MCP server, the signed JWT in hook_result["extra_headers"] is silently discarded after the warning log. The caller receives no indication that JWT authentication was not applied to the upstream call.

This creates a silent security gap: an operator may believe JWT enforcement is active, but calls to OpenAPI-backed servers bypass it entirely. The warning in the server log is easy to miss.

Consider either:

  1. Raising an HTTPException (or GuardrailRaisedException) to fail fast if MCPJWTSigner injects an Authorization header that cannot be forwarded, or
  2. Documenting this limitation prominently in the config YAML doc comment so operators are warned at setup time.

At minimum, the warning message should be escalated to error level so it stands out in logs.

Comment on lines +92 to +97
# Module-level singleton for the JWKS discovery endpoint to access.
_mcp_jwt_signer_instance: Optional["MCPJWTSigner"] = None

# Simple in-memory JWKS cache: keyed by JWKS URI → (keys_list, fetched_at).
_jwks_cache: Dict[str, tuple] = {}
_JWKS_CACHE_TTL = 3600 # 1 hour
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Module-level JWKS cache is not concurrency-safe

_jwks_cache is a plain module-level dict with a non-atomic read-check-write pattern in _fetch_jwks. In a high-concurrency async environment multiple coroutines can simultaneously observe a stale/missing cache entry and all proceed to fire HTTP requests to the same JWKS URI. This results in redundant IdP requests rather than incorrect behavior — but it is worth noting, especially for deployments under heavy load.

Consider using asyncio.Lock() per URI or an asyncio.Semaphore to serialise JWKS refreshes:

_jwks_cache: Dict[str, tuple] = {}
_jwks_cache_locks: Dict[str, "asyncio.Lock"] = {}

Then acquire the lock per jwks_uri before checking-and-refreshing the cache.

…ty CVEs, router bug, batch resolution

Fix 1: Run Black formatter on 35 files
Fix 2: Fix MyPy type errors:
  - setup_wizard.py: add type annotation for 'selected' set variable
  - user_api_key_auth.py: remove redundant type annotation on jwt_claims reassignment
Fix 3: Fix spend accuracy test burst 2 polling to wait for expected total
  spend instead of just 'any increase' from burst 2
Fix 4: Bump Next.js 16.1.6 -> 16.1.7 to fix CVE-2026-27978, CVE-2026-27979,
  CVE-2026-27980, CVE-2026-29057
Fix 5: Fix router _pre_call_checks model variable being overwritten inside
  loop, causing wrong model lookups on subsequent deployments. Use local
  _deployment_model variable instead.
Fix 6: Add missing resolve_output_file_ids_to_unified call in batch retrieve
  non-terminal-to-terminal path (matching the terminal path behavior)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ishaan-jaff
❌ cursoragent
You have signed the CLA already but the status is still pending? Let us recheck it.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

from fastapi import HTTPException

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Falsy-value claims silently treated as absent

not (jwt_claims or {}).get(c) treats any falsy value (False, 0, "", []) as absent. A JWT containing {"email_verified": false} or {"retry_count": 0} would incorrectly fail required_claims validation and block the request with HTTP 403, even though the claim is present in the token.

The correct check should test for key absence, not value truthiness:

Suggested change
missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

Comment on lines +505 to +509
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)
return result
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Wrong exception type for inactive/revoked token

jwt.exceptions.ExpiredSignatureError semantically means the token's exp claim is in the past. Using it to signal that an introspection endpoint returned active=false conflates two distinct conditions — a token can be revoked long before it expires.

When caught upstream in async_pre_call_hook and re-raised as HTTP 401, the error message will say "expired signature" even for revoked or suspended tokens, making production debugging much harder.

Use a more semantically accurate exception:

Suggested change
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)
return result
if not result.get("active", False):
raise jwt.exceptions.InvalidTokenError(
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)

Comment on lines +530 to +542
raise HTTPException(
status_code=403,
detail={
"error": (
f"MCPJWTSigner: incoming token is missing required claims: "
f"{missing}. Configure the IdP to include these claims."
)
},
)

# ------------------------------------------------------------------
# FR-12: End-user identity mapping
# ------------------------------------------------------------------
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 FastAPI import inside the litellm/ SDK directory

from fastapi import HTTPException is imported in _validate_required_claims (line ~533) and async_pre_call_hook (line ~595). Per the project rule, FastAPI is a proxy-only dependency and must not be imported in SDK files. While this file lives under litellm/proxy/guardrails/, the import pattern used (lazy inside a method) is still fragile and would fail if the guardrail is ever invoked in a non-proxy SDK context.

Consider raising a litellm.exceptions.AuthenticationError or a plain ValueError/PermissionError that the proxy layer can then translate into an HTTPException. This also decouples the guardrail from FastAPI's request lifecycle.

Rule Used: What: Do not allow fastapi imports on files outsid... (source)

Comment on lines +92 to +115
# Module-level singleton for the JWKS discovery endpoint to access.
_mcp_jwt_signer_instance: Optional["MCPJWTSigner"] = None

# Simple in-memory JWKS cache: keyed by JWKS URI → (keys_list, fetched_at).
_jwks_cache: Dict[str, tuple] = {}
_JWKS_CACHE_TTL = 3600 # 1 hour


def get_mcp_jwt_signer() -> Optional["MCPJWTSigner"]:
"""Return the active MCPJWTSigner singleton, or None if not initialized."""
return _mcp_jwt_signer_instance


def _load_private_key_from_env(env_var: str) -> RSAPrivateKey:
"""Load an RSA private key from an env var (PEM string or file:// path)."""
key_material = os.environ.get(env_var, "")
if not key_material:
raise ValueError(
f"MCPJWTSigner: environment variable '{env_var}' is set but empty."
)
if key_material.startswith("file://"):
path = key_material[len("file://") :]
with open(path, "rb") as f:
key_bytes = f.read()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Module-level JWKS cache lacks concurrency safety

_jwks_cache is a plain module-level dict with a non-atomic read-check-write pattern in _fetch_jwks. In a high-concurrency async environment, multiple coroutines can simultaneously observe a stale/missing cache entry and all fire HTTP requests to the same JWKS URI, causing redundant IdP requests. Under heavy load this can inadvertently trigger rate limits on the IdP.

Consider protecting the refresh with a per-URI asyncio.Lock:

_jwks_cache: Dict[str, tuple] = {}
_jwks_cache_locks: Dict[str, asyncio.Lock] = {}

async def _fetch_jwks(jwks_uri: str) -> List[Dict[str, Any]]:
    now = time.time()
    cached = _jwks_cache.get(jwks_uri)
    if cached and now - cached[1] < _JWKS_CACHE_TTL:
        return cached[0]

    if jwks_uri not in _jwks_cache_locks:
        _jwks_cache_locks[jwks_uri] = asyncio.Lock()
    async with _jwks_cache_locks[jwks_uri]:
        # Re-check after acquiring lock
        cached = _jwks_cache.get(jwks_uri)
        if cached and now - cached[1] < _JWKS_CACHE_TTL:
            return cached[0]
        # ... fetch and store

Comment on lines 2282 to +2291
# For OpenAPI servers, call the tool handler directly instead of via MCP client
if mcp_server.spec_path:
verbose_logger.debug(
f"Calling OpenAPI tool {name} directly via HTTP handler"
"Calling OpenAPI tool %s directly via HTTP handler", name
)
if hook_result.get("extra_headers"):
verbose_logger.warning(
"pre_mcp_call hook returned extra_headers for OpenAPI-backed "
"MCP server '%s' — header injection is not supported for "
"OpenAPI servers; headers will be ignored. Use SSE/HTTP "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 JWT silently dropped for OpenAPI-backed MCP servers

When MCPJWTSigner is active and a tool call targets an OpenAPI-backed MCP server, the signed JWT in hook_result["extra_headers"] is silently discarded after the warning log. The caller receives no indication that JWT authentication was not forwarded to the upstream.

This creates a silent security gap: operators may believe JWT enforcement is active for all MCP servers, but calls to OpenAPI-backed servers bypass it entirely. The warning message is easy to miss in logs.

Consider either raising an HTTPException to fail fast if the operator expects JWT enforcement, or escalating the log level to error so the gap is visible in monitoring:

verbose_logger.error(
    "pre_mcp_call hook returned extra_headers for OpenAPI-backed "
    "MCP server '%s' — header injection is NOT supported; the "
    "JWT will NOT be forwarded. Switch to SSE/HTTP transport or "
    "disable MCPJWTSigner for this server.",
    server_name,
)

Comment on lines +54 to +128
"name": "Anthropic",
"description": "Claude Opus 4.6, Sonnet 4.6, Haiku 4.5",
"env_key": "ANTHROPIC_API_KEY",
"key_hint": "sk-ant-...",
"test_model": "claude-haiku-4-5-20251001",
"models": ["claude-opus-4-6", "claude-sonnet-4-6", "claude-haiku-4-5-20251001"],
},
{
"id": "gemini",
"name": "Google Gemini",
"description": "Gemini 2.0 Flash, Gemini 2.5 Pro",
"env_key": "GEMINI_API_KEY",
"key_hint": "AIza...",
"test_model": "gemini/gemini-2.0-flash",
"models": ["gemini/gemini-2.0-flash", "gemini/gemini-2.5-pro"],
},
{
"id": "azure",
"name": "Azure OpenAI",
"description": "GPT-4o via Azure",
"env_key": "AZURE_API_KEY",
"key_hint": "your-azure-key",
"test_model": None, # needs deployment name — skip validation
"models": [],
"needs_api_base": True,
"api_base_hint": "https://<resource>.openai.azure.com/",
"api_version": "2024-07-01-preview",
},
{
"id": "bedrock",
"name": "AWS Bedrock",
"description": "Claude 3.5, Llama 3 via AWS",
"env_key": "AWS_ACCESS_KEY_ID",
"key_hint": "AKIA...",
"test_model": None, # multi-key auth — skip validation
"models": ["bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"],
"extra_keys": ["AWS_SECRET_ACCESS_KEY", "AWS_REGION_NAME"],
"extra_hints": ["your-secret-key", "us-east-1"],
},
{
"id": "ollama",
"name": "Ollama",
"description": "Local models (llama3.2, mistral, etc.)",
"env_key": None,
"key_hint": None,
"test_model": None, # local — no remote validation
"models": ["ollama/llama3.2", "ollama/mistral"],
"api_base": "http://localhost:11434",
},
]


# ---------------------------------------------------------------------------
# ANSI colour helpers
# ---------------------------------------------------------------------------

_ANSI_RE = re.compile(r"\033\[[^m]*m")

_ORANGE = "\033[38;2;215;119;87m"
_DIM = "\033[2m"
_BOLD = "\033[1m"
_GREEN = "\033[38;2;78;186;101m"
_BLUE = "\033[38;2;177;185;249m"
_GREY = "\033[38;2;153;153;153m"
_RESET = "\033[0m"
_CHECK = "✔"
_CROSS = "✘"

_CURSOR_HIDE = "\033[?25l"
_CURSOR_SHOW = "\033[?25h"
_MOVE_UP = "\033[{}A"


def _supports_color() -> bool:
return sys.stdout.isatty() and os.environ.get("NO_COLOR") is None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded provider env-key names and model lists

The PROVIDERS list hardcodes both provider env-key names (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY) and model lists (e.g. ["gpt-4o", "gpt-4o-mini"]). The CLAUDE.md section added in this same PR explicitly states:

Do not hardcode provider env-key names or model lists that already exist in the codebase.

These env-key names already exist in the respective provider implementations and the model lists exist in model_prices_and_context_window.json. When new popular models are released, two places must be updated.

The test_model field pattern is a good start — but the models list could be driven by querying get_model_info for each provider's top models rather than duplicating them inline.

Rule Used: CLAUDE.md (source)

)
return api_key

print(f" {_CROSS} {bold(provider['name'])} {grey('— invalid API key')}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Invalid key silently written to generated config

When check_valid_key reports the key is invalid and the user declines to re-enter, _validate_and_report returns the original (invalid) key unchanged. That key then flows back through _collect_keys into env_vars and is written verbatim into the environment_variables block of litellm_config.yaml.

When the proxy later starts and all requests to that provider fail with authentication errors, nothing in the config or logs indicates this was flagged during setup. At a minimum, _collect_keys should be told that validation failed so it can skip that provider's models in the generated config, or _validate_and_report should return an is_valid flag.

# Current (silent failure):
env_vars[p["env_key"]] = SetupWizard._validate_and_report(p, key)

Comment on lines +236 to +240
config_path = Path(os.getcwd()) / "litellm_config.yaml"
try:
config_path.write_text(
SetupWizard._build_config(providers, env_vars, master_key)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent overwrite of existing config file

config_path.write_text(...) overwrites litellm_config.yaml without checking whether the file already exists. A user who runs litellm --setup a second time after customising their config will silently lose all those edits with no backup, diff, or confirmation prompt.

Consider adding an existence check before writing:

if config_path.exists():
    answer = _styled_input(
        f"  {orange('Warning:')} {config_path.name} already exists. Overwrite? [y/N] "
    )
    if answer.lower() != "y":
        print(f"\n  {grey('Aborted. Existing config preserved.')}\n")
        return

Comment on lines +548 to +555
lines += [
f" - model_name: {display}",
" litellm_params:",
f" model: {model}",
]
if p["env_key"] and p["env_key"] in env_copy:
lines.append(f" api_key: os.environ/{p['env_key']}")
if p.get("api_base"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unquoted model_name and model fields in generated YAML

model_name and model values are written as bare YAML scalars without quoting. The Bedrock model bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 produces model_name: anthropic.claude-3-5-sonnet-20241022-v2:0 — a colon-containing unquoted scalar. While YAML parsers handle v2:0 (no space after :) correctly today, any future model whose display name contains : (colon-space) or starts with special YAML characters would silently break the generated config.

Quote both fields defensively:

lines += [
    f'  - model_name: "{display}"',
    "    litellm_params:",
    f'      model: "{model}"',
]

cursoragent and others added 2 commits March 18, 2026 01:40
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
…ibility

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Comment on lines +523 to +538
if not self.required_claims:
return

from fastapi import HTTPException

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
if missing:
raise HTTPException(
status_code=403,
detail={
"error": (
f"MCPJWTSigner: incoming token is missing required claims: "
f"{missing}. Configure the IdP to include these claims."
)
},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent key-presence check between required_claims and optional_claims

_validate_required_claims uses not (jwt_claims or {}).get(c) (a truthiness check), while _passthrough_optional_claims (line ~666) correctly uses claim in jwt_claims (a key-presence check). This inconsistency means two different semantics apply to the same incoming JWT claims depending on which feature processes them:

  • optional_claims: a claim with value False is passed through correctly
  • required_claims: the same claim with value False fails validation with HTTP 403

The fix should use key-presence consistently:

Suggested change
if not self.required_claims:
return
from fastapi import HTTPException
missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
if missing:
raise HTTPException(
status_code=403,
detail={
"error": (
f"MCPJWTSigner: incoming token is missing required claims: "
f"{missing}. Configure the IdP to include these claims."
)
},
)
missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

Comment on lines +504 to +508
result: Dict[str, Any] = resp.json()
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Semantically incorrect exception for inactive/revoked token

jwt.exceptions.ExpiredSignatureError means the token's exp claim is in the past. Using it to signal that an introspection endpoint returned active=false conflates two distinct situations — a token may be revoked, suspended, or otherwise deactivated long before its exp.

When this is caught upstream in async_pre_call_hook and re-raised as HTTP 401, the error message will claim "expired signature" even for tokens that have been explicitly revoked, making production debugging unnecessarily difficult.

Consider raising a more semantically accurate exception:

raise jwt.exceptions.InvalidTokenError(
    "MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)

Comment on lines 2282 to +2291
# For OpenAPI servers, call the tool handler directly instead of via MCP client
if mcp_server.spec_path:
verbose_logger.debug(
f"Calling OpenAPI tool {name} directly via HTTP handler"
"Calling OpenAPI tool %s directly via HTTP handler", name
)
if hook_result.get("extra_headers"):
verbose_logger.warning(
"pre_mcp_call hook returned extra_headers for OpenAPI-backed "
"MCP server '%s' — header injection is not supported for "
"OpenAPI servers; headers will be ignored. Use SSE/HTTP "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hook JWT silently discarded for OpenAPI-backed MCP servers

When MCPJWTSigner is active and a tool call targets an OpenAPI-backed MCP server, the signed JWT in hook_result["extra_headers"] is silently discarded after the warning log. The caller receives no indication that JWT authentication was not forwarded upstream.

This creates a silent security gap: an operator who configures MCPJWTSigner believing it enforces zero-trust for all MCP servers will find that OpenAPI-backed servers bypass it entirely. The WARNING log is easy to miss in production.

Consider escalating to ERROR level or raising an HTTPException to fail closed when JWT enforcement is expected:

verbose_logger.error(
    "pre_mcp_call hook returned extra_headers for OpenAPI-backed "
    "MCP server '%s' — JWT will NOT be forwarded. Use SSE/HTTP "
    "transport or disable MCPJWTSigner for this server.",
    server_name,
)

Comment on lines +562 to +567
value: Optional[str] = None

if source.startswith("token:"):
claim_name = source[len("token:") :]
raw = (jwt_claims or {}).get(claim_name)
value = str(raw) if raw else None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Falsy claim values silently skipped in end-user identity resolution

_resolve_end_user_identity uses value = str(raw) if raw else None. This skips any falsy claim value — 0, False, "", [] — treating it as absent and falling through to the next source in the chain (or the final hash fallback), producing an incorrect sub claim in the signed JWT.

This is a distinct issue from the required_claims truthiness bug. Note how _passthrough_optional_claims (line ~666) correctly uses claim in jwt_claims for key-presence. The same pattern should be used here:

Suggested change
value: Optional[str] = None
if source.startswith("token:"):
claim_name = source[len("token:") :]
raw = (jwt_claims or {}).get(claim_name)
value = str(raw) if raw else None
raw = (jwt_claims or {}).get(claim_name)
value = str(raw) if (raw is not None) else None

…el lookup)

Replace deprecated gpt-3.5-turbo-1106 with gpt-4.1-mini + mock_response in
test_router_region_pre_call_check, following the same pattern used in commit
717d37c for test_router_context_window_check_pre_call_check_out_group.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

from fastapi import HTTPException

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Falsy claim values incorrectly treated as missing in _validate_required_claims

not (jwt_claims or {}).get(c) uses a truthiness check, meaning any falsy but present claim value (False, 0, "", []) is treated as absent. A JWT with {"email_verified": false} or {"retry_count": 0} would incorrectly raise HTTP 403, even though the claim is present.

Contrast this with _passthrough_optional_claims on line 666, which correctly uses the claim in jwt_claims key-presence pattern.

Suggested change
missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

Comment on lines +505 to +508
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Wrong exception type for revoked/inactive token

jwt.exceptions.ExpiredSignatureError semantically means the token's exp claim is in the past. Using it here to signal that the introspection endpoint returned active=false conflates two distinct situations — a token can be revoked or suspended long before it expires.

When this exception is caught in async_pre_call_hook and re-raised as HTTP 401, the error message will say "expired signature" even for explicitly revoked tokens, making production debugging harder. Use a semantically accurate exception instead:

Suggested change
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)
raise jwt.exceptions.InvalidTokenError(
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)

Comment on lines +562 to +564
value: Optional[str] = None

if source.startswith("token:"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Falsy claim values skipped in _resolve_end_user_identity

value = str(raw) if raw else None uses truthiness, so any falsy claim value (0, False, "") is treated as absent and the resolution chain falls through to the next source — potentially producing a hash fallback sub even when a valid (but falsy) claim exists.

This is the same class of bug as in _validate_required_claims. The fix is to check for key presence before assigning:

if source.startswith("token:"):
    claim_name = source[len("token:"):]
    if jwt_claims is not None and claim_name in jwt_claims:
        value = str(jwt_claims[claim_name])

Comment on lines +22 to +27
mode = litellm_params.mode
if mode != "pre_mcp_call":
raise ValueError(
f"MCPJWTSigner guardrail '{guardrail_name}' has mode='{mode}' but must use "
"mode='pre_mcp_call'. JWT injection only fires for MCP tool calls."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 mode validation may break with valid non-string mode values

The initialize_guardrail function checks if mode != "pre_mcp_call" and raises. However, looking at the LitellmParams type, mode could be a Literal enum or None. If a user accidentally omits mode from the config, mode will be None and the error message will say mode='None' instead of something helpful like "mode is required". Consider explicitly handling the None case to provide a clearer error message for misconfigured guardrails.

Comment on lines 2282 to +2290
# For OpenAPI servers, call the tool handler directly instead of via MCP client
if mcp_server.spec_path:
verbose_logger.debug(
f"Calling OpenAPI tool {name} directly via HTTP handler"
"Calling OpenAPI tool %s directly via HTTP handler", name
)
if hook_result.get("extra_headers"):
verbose_logger.warning(
"pre_mcp_call hook returned extra_headers for OpenAPI-backed "
"MCP server '%s' — header injection is not supported for "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 JWT silently discarded for OpenAPI-backed MCP servers — security gap

When MCPJWTSigner is active and a tool call targets an OpenAPI-backed MCP server (mcp_server.spec_path is set), the signed JWT in hook_result["extra_headers"] is silently discarded after logging a WARNING. The caller receives no indication that JWT authentication was not forwarded.

An operator who configures MCPJWTSigner to enforce zero-trust for all MCP servers will find that OpenAPI-backed servers bypass it entirely, with only a WARNING log that is easy to miss in production. At minimum, escalate the log level to ERROR:

verbose_logger.error(
    "pre_mcp_call hook returned extra_headers for OpenAPI-backed "
    "MCP server '%s' — JWT will NOT be forwarded. Use SSE/HTTP "
    "transport or disable MCPJWTSigner for this server.",
    server_name,
)

Comment on lines 685 to 690
do_standard_jwt_auth = True
if jwt_handler.litellm_jwtauth.virtual_key_claim_field is not None:
# Decode JWT to get claims without running full auth_builder
jwt_claims: Optional[dict]
if jwt_handler.litellm_jwtauth.oidc_userinfo_enabled:
jwt_claims = await jwt_handler.get_oidc_userinfo(token=api_key)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 jwt_claims variable potentially uninitialized before first use

The type annotation jwt_claims: Optional[dict] is added at line 688 before the conditional block that assigns it, but the variable is not given a default value. If the if jwt_handler.litellm_jwtauth.virtual_key_claim_field is not None: branch is entered but neither the oidc_userinfo_enabled nor the else path runs (hypothetical future changes), jwt_claims would be referenced unbound.

More critically, on line 731, jwt_claims = result.get("jwt_claims", None) is placed in the standard JWT auth path but not in all code paths. If the virtual-key fast path (do_standard_jwt_auth = False) is taken without entering the claim extraction block, jwt_claims may not be defined when reached at line 760.

Consider initializing jwt_claims = None immediately at the point of declaration to make the intent explicit and safe.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Comment on lines +808 to +819
except Exception as exc:
verbose_proxy_logger.error(
"MCPJWTSigner: incoming token verification failed: %s", exc
)
from fastapi import HTTPException

raise HTTPException(
status_code=401,
detail={
"error": (
f"MCPJWTSigner: incoming token verification failed: {exc}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Broad except Exception masks infrastructure failures as HTTP 401

The catch-all except Exception converts every error — including network timeouts, DNS failures, HTTP 503 from the JWKS endpoint, and JSON parse errors — into an HTTP 401 response. From a client's perspective this looks identical to "your token is invalid", making infrastructure failures completely invisible and extremely hard to diagnose in production.

For example: if the IdP's JWKS endpoint returns a 503 or is temporarily unreachable, every MCP tool call is rejected with 401 incoming token verification failed: Connection error — which looks like a broken token to the caller.

The fix is to narrow the catch to JWT-specific errors and re-raise unexpected exceptions as 502/503:

import httpx

try:
    if is_jwt:
        jwt_claims = await self._verify_incoming_jwt(raw_token)
    elif self.token_introspection_endpoint:
        jwt_claims = await self._introspect_opaque_token(raw_token)
    else:
        verbose_proxy_logger.warning(...)
except jwt.PyJWTError as exc:
    # Auth failure — the token itself is invalid/expired/rejected.
    verbose_proxy_logger.error("MCPJWTSigner: token invalid: %s", exc)
    from fastapi import HTTPException
    raise HTTPException(status_code=401, detail={"error": f"MCPJWTSigner: {exc}"})
except (httpx.HTTPError, OSError, ValueError) as exc:
    # Infrastructure failure — JWKS endpoint unreachable, network error, etc.
    verbose_proxy_logger.error("MCPJWTSigner: JWKS/introspection unreachable: %s", exc)
    from fastapi import HTTPException
    raise HTTPException(status_code=503, detail={"error": f"MCPJWTSigner: upstream auth service unavailable: {exc}"})

…ndition

The _verify_langfuse_call helper only inspected the last mock call
(mock_post.call_args), but the Langfuse SDK may split trace-create and
generation-create events across separate HTTP flush cycles. This caused
an IndexError when the last call's batch contained only one event type.

Fix: iterate over mock_post.call_args_list to collect batch items from
ALL calls. Also add a safety assertion after filtering by trace_id and
mark all langfuse e2e tests with @pytest.mark.flaky(retries=3) as an
extra safety net for any residual timing issues.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
…changes

- Apply Black 26.x formatting to litellm_logging.py (parenthesized style)
- Update test_input_types_match_spec to follow $ref to InteractionsInput schema
  (Google updated their OpenAPI spec to use $ref instead of inline oneOf)
- Update test_content_schema_uses_discriminator to handle discriminator without
  explicit mapping (Google removed the mapping key from Content discriminator)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

from fastapi import HTTPException

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Falsy claim values incorrectly rejected in _validate_required_claims

not (jwt_claims or {}).get(c) uses a truthiness check. Any falsy but present claim value (False, 0, "", []) is treated as absent and incorrectly raises HTTP 403 — even though the claim is present in the token. For example, a JWT containing {"email_verified": false} or {"retry_count": 0} would fail validation.

Note: _passthrough_optional_claims (line 666) correctly uses claim in jwt_claims for key-presence. The same pattern should be used here:

Suggested change
missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

Comment on lines +505 to +507
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Wrong exception type for inactive/revoked token

jwt.exceptions.ExpiredSignatureError means the token's exp claim is in the past. Using it to signal that the introspection endpoint returned active=false conflates two entirely distinct situations — a token can be explicitly revoked or suspended long before it expires.

When this exception is caught in async_pre_call_hook and re-raised as HTTP 401, the error message will say "expired signature" even for revoked tokens, making production debugging significantly harder.

Suggested change
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
raise jwt.exceptions.InvalidTokenError(
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)

Comment on lines +562 to +564
value: Optional[str] = None

if source.startswith("token:"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Falsy claim values skipped in _resolve_end_user_identity

value = str(raw) if raw else None uses a truthiness check, so any falsy claim value (0, False, "") is treated as absent and the resolution chain falls through to the next source — potentially producing a hash fallback sub even when a valid (but falsy) claim exists in the token.

This is the same class of issue as in _validate_required_claims. Use a key-presence check instead:

Suggested change
value: Optional[str] = None
if source.startswith("token:"):
if source.startswith("token:"):
claim_name = source[len("token:"):]
if jwt_claims is not None and claim_name in jwt_claims:
value = str(jwt_claims[claim_name])

base_model = _litellm_params.get("base_model", None)
model_info = self.get_router_model_info(
deployment=deployment, received_model_name=model
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Bug fix: outer model variable shadowed by deployment-specific model name

The rename from model to _deployment_model is a real correctness fix. The original code overwrote the outer model variable (the request's model group name) with a per-deployment base_model or litellm_params["model"] string. On the next loop iteration, model would hold the previous deployment's model string instead of the original group name, potentially causing incorrect context-window error messages and skipping valid deployments.

This change is correct and important — just noting it explicitly since the inline comment on the same line is long and may be missed.

The file was correctly formatted for Black 23.12.1 (the version pinned
in pyproject.toml). The previous commit applied Black 26.x formatting
which was incompatible with the CI's Black version.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
do_standard_jwt_auth = True
if jwt_handler.litellm_jwtauth.virtual_key_claim_field is not None:
# Decode JWT to get claims without running full auth_builder
jwt_claims: Optional[dict]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 jwt_claims may be uninitialized when virtual_key_claim_field is None

jwt_claims is declared as a bare type annotation (no assignment) inside the if jwt_handler.litellm_jwtauth.virtual_key_claim_field is not None: block. If that condition is False, the block is skipped entirely and jwt_claims is never bound. Execution then falls into the if do_standard_jwt_auth: path, where it is assigned at line 734. However, if a future code path inside do_standard_jwt_auth == True exits early (e.g. is_proxy_admin return at line 744) before reaching line 734, jwt_claims would be passed uninitialized to UserAPIKeyAuth(jwt_claims=jwt_claims, ...) at line 763.

The safe fix is to initialize jwt_claims = None immediately before the outer if block, replacing the bare annotation:

Suggested change
jwt_claims: Optional[dict]
jwt_claims: Optional[dict] = None

Comment on lines +528 to +533
missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
if missing:
raise HTTPException(
status_code=403,
detail={
"error": (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Falsy claim values incorrectly treated as absent in _validate_required_claims

not (jwt_claims or {}).get(c) is a truthiness check, not a key-presence check. Any claim whose value is falsy — False, 0, "", [] — will be treated as if the claim is absent and the request will be rejected with HTTP 403, even though the claim is present in the token. For example, a JWT containing {"email_verified": false} or {"retry_count": 0} would incorrectly fail required_claims validation.

Note that _passthrough_optional_claims (line ~666) correctly uses claim in jwt_claims. The same key-presence pattern should be used here:

Suggested change
missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
if missing:
raise HTTPException(
status_code=403,
detail={
"error": (
missing = [c for c in self.required_claims if (jwt_claims is None or c not in jwt_claims)]

Comment on lines +562 to +563
value: Optional[str] = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Falsy claim values silently skipped in _resolve_end_user_identity

value = str(raw) if raw else None uses a truthiness check, so any falsy claim value (0, False, "") is treated as absent and the resolution chain falls through to the next source — potentially producing a hash-fallback sub even when a valid (but falsy) claim exists in the token. This is the same class of issue as in _validate_required_claims.

The fix should check for key presence before assigning:

Suggested change
value: Optional[str] = None
if jwt_claims is not None and claim_name in jwt_claims:
value = str(jwt_claims[claim_name])

Comment on lines +504 to +507
result: Dict[str, Any] = resp.json()
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Wrong exception type for inactive/revoked token

jwt.exceptions.ExpiredSignatureError semantically means the token's exp claim is in the past. Using it here to signal that the introspection endpoint returned active=false conflates two distinct situations — a token can be explicitly revoked or suspended long before its expiry. When this exception is caught in async_pre_call_hook and converted to HTTP 401, the error message will say "expired signature" even for revoked tokens, making production debugging significantly harder.

Use a more semantically accurate exception:

Suggested change
result: Dict[str, Any] = resp.json()
if not result.get("active", False):
raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
raise jwt.exceptions.InvalidTokenError(
"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
)

Comment on lines +806 to +820
"Proceeding without incoming token verification."
)
except Exception as exc:
verbose_proxy_logger.error(
"MCPJWTSigner: incoming token verification failed: %s", exc
)
from fastapi import HTTPException

raise HTTPException(
status_code=401,
detail={
"error": (
f"MCPJWTSigner: incoming token verification failed: {exc}"
)
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Broad except Exception masks infrastructure failures as HTTP 401

The catch-all except Exception converts every error — including network timeouts, DNS failures, HTTP 503 from the JWKS endpoint, and JSON parse errors — into an HTTP 401. From a client's perspective this is indistinguishable from "your token is invalid", making infrastructure outages invisible. If the IdP's JWKS endpoint is temporarily unreachable, every MCP tool call fails with 401 incoming token verification failed: Connection error, which looks to callers like an authentication issue.

Consider narrowing the catch to JWT-specific and auth-specific exceptions, and propagating unexpected infrastructure errors as 502/503:

import httpx

try:
    if is_jwt:
        jwt_claims = await self._verify_incoming_jwt(raw_token)
    elif self.token_introspection_endpoint:
        jwt_claims = await self._introspect_opaque_token(raw_token)
    ...
except jwt.PyJWTError as exc:
    # token is invalid or signature verification failed
    raise HTTPException(status_code=401, detail={"error": f"..."})
except httpx.HTTPStatusError as exc:
    # IdP returned a non-2xx → upstream error, not a client auth error
    raise HTTPException(status_code=502, detail={"error": f"..."})

Comment on lines +547 to +550
display = f"azure-{raw_display}" if p["id"] == "azure" else raw_display
lines += [
f" - model_name: {display}",
" litellm_params:",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unquoted model_name and model fields in generated YAML

model_name and model values are written as bare YAML scalars without quoting. The Bedrock model bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 produces model_name: anthropic.claude-3-5-sonnet-20241022-v2:0 — a colon-containing unquoted scalar. While YAML parsers handle v2:0 (no space after :) correctly today, any future model whose display name contains : (colon-space) or starts with special YAML characters ({, [, >, |) would silently produce a broken config.

_yaml_escape already exists in this file. Quote both fields defensively:

Suggested change
display = f"azure-{raw_display}" if p["id"] == "azure" else raw_display
lines += [
f" - model_name: {display}",
" litellm_params:",
lines += [
f' - model_name: "{_yaml_escape(display)}"',
" litellm_params:",
f' model: "{_yaml_escape(model)}"',
]

The Langfuse SDK may send the same event (e.g., trace-create) in
multiple flush cycles, causing duplicates when we aggregate from all
mock calls. After filtering by trace_id, deduplicate by keeping only
the first event of each type, then sort to ensure trace-create is at
index 0 and generation-create at index 1.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@yuneng-jiang yuneng-jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ishaan-jaff ishaan-jaff merged commit 8e61b32 into main Mar 18, 2026
31 of 74 checks passed
Comment on lines +694 to +698
response = {
**response,
"jwks_uri": f"{request_base_url}/.well-known/jwks.json",
"id_token_signing_alg_values_supported": ["RS256"],
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 OIDC discovery issuer not updated with signer.issuer

The augmented discovery document adds jwks_uri and id_token_signing_alg_values_supported, but inherits "issuer": request_base_url from the base response. When MCPJWTSigner is configured with a custom issuer (via the issuer: config param or the MCP_JWT_ISSUER env var), the discovery document's issuer field will differ from the iss claim in signed JWTs.

Per RFC 8414 / OIDC Core, relying parties (MCP servers) are expected to verify that a JWT's iss exactly matches the issuer field in the discovery document. A mismatch will cause all tokens to be rejected.

For example, if a user sets issuer: "https://my-litellm.example.com" in the guardrail config but the proxy is running at https://proxy.example.com, the discovery doc will say "issuer": "https://proxy.example.com" while every JWT will have "iss": "https://my-litellm.example.com".

The fix is to include signer.issuer in the augmented response:

response = {
    **response,
    "issuer": signer.issuer,
    "jwks_uri": f"{request_base_url}/.well-known/jwks.json",
    "id_token_signing_alg_values_supported": ["RS256"],
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants