[Staging] - Ishaan March 17th by ishaan-jaff · Pull Request #23903 · BerriAI/litellm

ishaan-jaff · 2026-03-17T22:05:46Z

[Staging] - Ishaan March 17th

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Add three grok-4.20 beta 2 model variants from xAI: - grok-4.20-multi-agent-beta-0309 (reasoning + multi-agent) - grok-4.20-beta-0309-reasoning (reasoning) - grok-4.20-beta-0309-non-reasoning Pricing (from https://docs.x.ai/docs/models): - Input: $2.00/1M tokens ($0.20/1M cached) - Output: $6.00/1M tokens - Context: 2M tokens All variants support vision, function calling, tool choice, and web search. Closes LIT-2171

vercel · 2026-03-17T22:05:51Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 18, 2026 10:14pm

codspeed-hq · 2026-03-17T22:07:39Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing litellm_ishaan_march_17 (8c23f9f) with main (cbb4c2c)}

greptile-apps · 2026-03-17T22:09:15Z

Greptile Summary

This staging PR introduces several distinct features and fixes across 61 files: the MCPJWTSigner zero-trust guardrail (signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT), an interactive setup wizard (litellm --setup), supporting JWKS/OIDC discovery endpoints, jwt_claims propagation through the auth stack, a correctness fix for a model-variable shadowing bug in the router, and a large volume of cosmetic Black reformatting.

Key changes:

MCPJWTSigner guardrail (mcp_jwt_signer.py, ~891 lines): Signs outbound MCP requests after optionally verifying the incoming Bearer token against an upstream IdP. Supports configurable claim operations, two-token model, end-user identity mapping, and required/optional claim validation. Multiple bugs were flagged in previous review rounds (falsy-claim truthiness checks, wrong exception type for revoked tokens, broad except Exception masking infra failures, FastAPI imports in guardrail code, JWKS cache concurrency).
OIDC discovery augmentation (discoverable_endpoints.py): Adds /.well-known/jwks.json endpoint and augments /.well-known/openid-configuration when MCPJWTSigner is active. A new issue was found: the augmented response does not update the issuer field to reflect signer.issuer; if a custom issuer is configured, MCP servers performing issuer validation will reject all tokens because the JWT iss won't match the discovery doc issuer.
Setup wizard (setup_wizard.py, scripts/install.sh): Arrow-key interactive provider selection, API key validation, and YAML config generation. Previously flagged issues (silent overwrite, hardcoded model lists, unquoted YAML scalars, invalid-key propagation) remain open.
Router bug fix (router.py): _deployment_model variable prevents the outer model group name from being clobbered per loop iteration — a real correctness fix.
Auth propagation (user_api_key_auth.py, _types.py): jwt_claims is now threaded from JWT auth into UserAPIKeyAuth for downstream guardrail use; jwt_claims: Optional[dict] annotation-without-assignment (previously flagged) is a latent uninitialized-variable risk.
Cosmetic reformatting: The bulk of the line-count delta (litellm_logging.py, responses/main.py, etc.) is Black reformatting with no logic changes.

Confidence Score: 2/5

Not safe to merge until the MCPJWTSigner bugs and OIDC issuer mismatch are resolved; the setup wizard issues are lower severity but should also be addressed.
The PR introduces a large new security-critical feature (MCPJWTSigner) that has multiple confirmed bugs spanning previous review rounds: falsy claims treated as absent in _validate_required_claims and _resolve_end_user_identity, semantically wrong exception type for revoked tokens, a broad except Exception hiding infrastructure failures as HTTP 401, FastAPI imports in guardrail code, and a JWKS cache without concurrency protection. A new issue was found in this round: the OIDC discovery document does not reflect signer.issuer, which will cause token validation failures for any deployment using a custom issuer. The router bug fix and auth jwt_claims threading are good changes. The cosmetic reformatting is safe. The overall score is constrained by the unresolved security guardrail issues.
litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py (multiple logic bugs), litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py (OIDC issuer mismatch), litellm/setup_wizard.py (silent config overwrite, unquoted YAML, invalid key propagation)

Important Files Changed

Filename	Overview
litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py	New MCPJWTSigner guardrail with several known issues: falsy claim values incorrectly rejected in `_validate_required_claims` (truthiness check vs. key-presence), wrong exception type (`ExpiredSignatureError`) for inactive/revoked tokens from introspection, falsy values skipped in `_resolve_end_user_identity`, broad `except Exception` masking infrastructure failures as HTTP 401, FastAPI imports inside the guardrail, and module-level JWKS cache lacking concurrency safety — all flagged in previous review threads.
litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py	OIDC discovery augmentation adds `jwks_uri` but doesn't update the `issuer` field to reflect `signer.issuer`; when a custom issuer is configured, the discovery doc's `issuer` will mismatch the JWT's `iss` claim, causing MCP servers to reject all tokens.
litellm/setup_wizard.py	New interactive setup wizard; issues previously flagged include hardcoded model/env-key lists, silent overwrite of existing config, invalid keys silently written to config, and unquoted `model_name`/`model` YAML fields. Core wizard flow and fallback terminal handling are well-structured.
litellm/proxy/_experimental/mcp_server/mcp_server_manager.py	`pre_call_tool_check` now returns a `hook_result` dict with `arguments` and `extra_headers`; JWT from hook is correctly forwarded to SSE/HTTP MCP servers but silently discarded for OpenAPI-backed servers (previously flagged). Bearer token extraction from raw headers is correct.
litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/init.py	Clean guardrail initializer with proper registry entries; `mode=None` produces an unclean error message (previously flagged). Signer is registered as a logging callback correctly.
litellm/router.py	Correct bug fix: introduces `_deployment_model` to prevent the outer `model` group name from being overwritten by a per-deployment base model on each loop iteration, fixing incorrect context-window error messages.
litellm/proxy/auth/user_api_key_auth.py	Adds `jwt_claims` propagation through JWT auth paths; `jwt_claims: Optional[dict]` annotation without initialization (previously flagged) could be unbound in edge cases. The standard JWT auth path correctly assigns `jwt_claims` from `result.get("jwt_claims", None)`.
scripts/install.sh	New POSIX-compatible install script with correct Python version detection, pip/uv/venv handling, and colored output. Uses `set -eu` correctly for sh compatibility. Resolves the previously flagged missing `scripts/install.sh` referenced by documentation.
litellm/proxy/proxy_cli.py	Adds `--setup` flag that invokes `run_setup_wizard()` and returns early; correctly resolves previously flagged missing CLI implementation.
litellm/litellm_core_utils/litellm_logging.py	288 lines changed, all purely cosmetic Black-style reformatting (parenthesis placement for long assignments). No logic changes.

Sequence Diagram

sequenceDiagram
    participant Client
    participant LiteLLMProxy
    participant MCPJWTSigner
    participant IdP
    participant MCPServer

    Client->>LiteLLMProxy: MCP tool call (Bearer token or API key)
    LiteLLMProxy->>MCPJWTSigner: async_pre_call_hook(call_type="call_mcp_tool")

    alt FR-5: Verify + re-sign configured
        MCPJWTSigner->>IdP: Fetch OIDC discovery / JWKS
        IdP-->>MCPJWTSigner: JWKS keys (cached 1h)
        MCPJWTSigner->>MCPJWTSigner: Verify incoming JWT (RS256)
        MCPJWTSigner->>MCPJWTSigner: Validate required_claims (FR-15)
    end

    MCPJWTSigner->>MCPJWTSigner: Resolve end-user identity (FR-12)
    MCPJWTSigner->>MCPJWTSigner: Build outbound claims + scope (FR-10, FR-13)
    MCPJWTSigner->>MCPJWTSigner: Sign JWT with LiteLLM RSA key (RS256)

    opt FR-14: Two-token model
        MCPJWTSigner->>MCPJWTSigner: Sign channel token (separate aud/TTL)
    end

    MCPJWTSigner-->>LiteLLMProxy: extra_headers["Authorization"] = Bearer <signed-JWT>
    LiteLLMProxy->>MCPServer: Tool call + Authorization: Bearer <signed-JWT>

    MCPServer->>LiteLLMProxy: GET /.well-known/jwks.json
    LiteLLMProxy-->>MCPServer: RSA public key (JWKS)
    MCPServer->>MCPServer: Verify JWT signature + claims
    MCPServer-->>LiteLLMProxy: Tool result
    LiteLLMProxy-->>Client: Tool result

_{Last reviewed commit: "Merge branch 'main' ..."}

greptile-apps · 2026-03-17T22:09:20Z

model_prices_and_context_window.json

+    "xai/grok-4.20-multi-agent-beta-0309": {
+        "cache_read_input_token_cost": 2e-07,
+        "input_cost_per_token": 2e-06,
+        "litellm_provider": "xai",
+        "max_input_tokens": 2000000,
+        "max_output_tokens": 2000000,
+        "max_tokens": 2000000,
+        "mode": "chat",
+        "output_cost_per_token": 6e-06,
+        "source": "https://docs.x.ai/docs/models",
+        "supports_function_calling": true,
+        "supports_reasoning": true,
+        "supports_tool_choice": true,
+        "supports_vision": true,
+        "supports_web_search": true
+    },
+    "xai/grok-4.20-beta-0309-reasoning": {
+        "cache_read_input_token_cost": 2e-07,
+        "input_cost_per_token": 2e-06,
+        "litellm_provider": "xai",
+        "max_input_tokens": 2000000,
+        "max_output_tokens": 2000000,
+        "max_tokens": 2000000,
+        "mode": "chat",
+        "output_cost_per_token": 6e-06,
+        "source": "https://docs.x.ai/docs/models",
+        "supports_function_calling": true,
+        "supports_reasoning": true,
+        "supports_tool_choice": true,
+        "supports_vision": true,
+        "supports_web_search": true
+    },
+    "xai/grok-4.20-beta-0309-non-reasoning": {
+        "cache_read_input_token_cost": 2e-07,
+        "input_cost_per_token": 2e-06,
+        "litellm_provider": "xai",
+        "max_input_tokens": 2000000,
+        "max_output_tokens": 2000000,
+        "max_tokens": 2000000,
+        "mode": "chat",
+        "output_cost_per_token": 6e-06,
+        "source": "https://docs.x.ai/docs/models",
+        "supports_function_calling": true,
+        "supports_tool_choice": true,
+        "supports_vision": true,
+        "supports_web_search": true
+    },


Inconsistent max_tokens value type and generic source URL

Two minor inconsistencies compared to the surrounding xAI entries:

Integer vs float: All nearby xAI entries use floats (2000000.0) for max_input_tokens, max_output_tokens, and max_tokens. The three new entries use bare integers (2000000). While JSON parsing in Python treats them identically at runtime, this is inconsistent with the file's established conventions.

Generic source URL: All other recently-added xAI models link to model-specific documentation pages (e.g., https://docs.x.ai/docs/models/grok-4-1-fast-non-reasoning). The new entries use the generic https://docs.x.ai/docs/models. A model-specific URL improves traceability for future pricing verification.

Suggested change

"xai/grok-4.20-multi-agent-beta-0309": {

"cache_read_input_token_cost": 2e-07,

"input_cost_per_token": 2e-06,

"litellm_provider": "xai",

"max_input_tokens": 2000000,

"max_output_tokens": 2000000,

"max_tokens": 2000000,

"mode": "chat",

"output_cost_per_token": 6e-06,

"source": "https://docs.x.ai/docs/models",

"supports_function_calling": true,

"supports_reasoning": true,

"supports_tool_choice": true,

"supports_vision": true,

"supports_web_search": true

},

"xai/grok-4.20-beta-0309-reasoning": {

"cache_read_input_token_cost": 2e-07,

"input_cost_per_token": 2e-06,

"litellm_provider": "xai",

"max_input_tokens": 2000000,

"max_output_tokens": 2000000,

"max_tokens": 2000000,

"mode": "chat",

"output_cost_per_token": 6e-06,

"source": "https://docs.x.ai/docs/models",

"supports_function_calling": true,

"supports_reasoning": true,

"supports_tool_choice": true,

"supports_vision": true,

"supports_web_search": true

},

"xai/grok-4.20-beta-0309-non-reasoning": {

"cache_read_input_token_cost": 2e-07,

"input_cost_per_token": 2e-06,

"litellm_provider": "xai",

"max_input_tokens": 2000000,

"max_output_tokens": 2000000,

"max_tokens": 2000000,

"mode": "chat",

"output_cost_per_token": 6e-06,

"source": "https://docs.x.ai/docs/models",

"supports_function_calling": true,

"supports_tool_choice": true,

"supports_vision": true,

"supports_web_search": true

},

"xai/grok-4.20-multi-agent-beta-0309": {

"cache_read_input_token_cost": 2e-07,

"input_cost_per_token": 2e-06,

"litellm_provider": "xai",

"max_input_tokens": 2000000.0,

"max_output_tokens": 2000000.0,

"max_tokens": 2000000.0,

"mode": "chat",

"output_cost_per_token": 6e-06,

"source": "https://docs.x.ai/docs/models",

"supports_function_calling": true,

"supports_reasoning": true,

"supports_tool_choice": true,

"supports_vision": true,

"supports_web_search": true

},

"xai/grok-4.20-beta-0309-reasoning": {

"cache_read_input_token_cost": 2e-07,

"input_cost_per_token": 2e-06,

"litellm_provider": "xai",

"max_input_tokens": 2000000.0,

"max_output_tokens": 2000000.0,

"max_tokens": 2000000.0,

"mode": "chat",

"output_cost_per_token": 6e-06,

"source": "https://docs.x.ai/docs/models",

"supports_function_calling": true,

"supports_reasoning": true,

"supports_tool_choice": true,

"supports_vision": true,

"supports_web_search": true

},

"xai/grok-4.20-beta-0309-non-reasoning": {

"cache_read_input_token_cost": 2e-07,

"input_cost_per_token": 2e-06,

"litellm_provider": "xai",

"max_input_tokens": 2000000.0,

"max_output_tokens": 2000000.0,

"max_tokens": 2000000.0,

"mode": "chat",

"output_cost_per_token": 6e-06,

"source": "https://docs.x.ai/docs/models",

"supports_function_calling": true,

"supports_tool_choice": true,

"supports_vision": true,

"supports_web_search": true

},

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-03-17T22:09:22Z

tests/test_litellm/llms/xai/test_xai_cost_calculator.py

+    def test_grok_4_20_beta_reasoning_cost_calculation(self):
+        """Test cost calculation for grok-4.20-beta-0309-reasoning model."""
+        usage = Usage(prompt_tokens=100, completion_tokens=200, total_tokens=300)
+
+        prompt_cost, completion_cost = cost_per_token(
+            model="grok-4.20-beta-0309-reasoning", usage=usage
+        )
+
+        # Input: 100 tokens * $2e-6 = $0.0002
+        # Output: 200 tokens * $6e-6 = $0.0012
+        expected_prompt_cost = 100 * 2e-6
+        expected_completion_cost = 200 * 6e-6
+
+        assert math.isclose(prompt_cost, expected_prompt_cost, rel_tol=1e-10)
+        assert math.isclose(completion_cost, expected_completion_cost, rel_tol=1e-10)


Missing reasoning token test for reasoning model

grok-4.20-beta-0309-reasoning is a reasoning model ("supports_reasoning": true in the JSON), yet the test only validates basic token pricing with no reasoning tokens in completion_tokens_details. The XAI cost calculator has a key billing rule: reasoning tokens are added to completion tokens for billing purposes. Without a test case that includes reasoning_tokens, the XAI-specific billing logic isn't exercised for this new model.

Consider adding a variant that mirrors test_reasoning_tokens_cost_calculation — pass a CompletionTokensDetailsWrapper with reasoning_tokens set, and assert that the expected completion cost is (completion_tokens + reasoning_tokens) * output_cost_per_token.

* docs: add Quick Install section for litellm --setup wizard * docs: clarify setup wizard is for local/beginner use

greptile-apps · 2026-03-17T22:19:17Z

docs/my-website/docs/proxy/docker_quick_start.md

+
+This detects your OS, installs `litellm[proxy]`, and drops you straight into the setup wizard.
+


Referenced scripts/install.sh does not exist

The install command instructs users to run:

curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh

However, scripts/install.sh does not exist anywhere in this repository. A user following this documentation will get a curl error (the -f flag causes a silent failure on HTTP 404), resulting in nothing being installed with no useful error message to diagnose the problem.

This entire "Quick Install" section should only be merged after the referenced scripts/install.sh is created and committed.

greptile-apps · 2026-03-17T22:19:18Z

docs/my-website/docs/proxy/docker_quick_start.md

+  Welcome to LiteLLM
+
+  Choose your LLM providers
+  ○ 1. OpenAI        GPT-4o, GPT-4o-mini, o1
+  ○ 2. Anthropic     Claude Opus, Sonnet, Haiku
+  ○ 3. Azure OpenAI  GPT-4o via Azure
+  ○ 4. Google Gemini Gemini 2.0 Flash, 1.5 Pro
+  ○ 5. AWS Bedrock   Claude, Llama via AWS
+  ○ 6. Ollama        Local models
+
+  ❯ Provider(s): 1,2
+
+  ❯ OpenAI API key: sk-...
+  ❯ Anthropic API key: sk-ant-...
+
+  ❯ Port [4000]:
+  ❯ Master key [auto-generate]:
+
+  ✔ Config saved → ./litellm_config.yaml
+
+  ❯ Start the proxy now? (Y/n):
+```
+
+The wizard walks you through:
+1. Pick your LLM providers (OpenAI, Anthropic, Azure, Bedrock, Gemini, Ollama)
+2. Enter API keys for each provider


litellm --setup wizard is not implemented

The docs present a detailed interactive wizard invoked via litellm --setup, but this flag/command does not exist in the litellm CLI (litellm/proxy/proxy_cli.py). Searching the entire codebase finds zero implementations of a --setup argument or an interactive provider wizard.

Users who install the package and then run litellm --setup will get an unrecognized argument error. This documentation should not be published until the wizard is actually implemented.

* feat(setup): add interactive setup wizard + install.sh Adds `litellm --setup` — a Claude Code-style TUI onboarding wizard that guides users through provider selection, API key entry, and proxy config generation, then optionally starts the proxy immediately. - litellm/setup_wizard.py: wizard with ASCII art, numbered provider menu (OpenAI, Anthropic, Azure, Gemini, Bedrock, Ollama), API key prompts, port/master-key config, and litellm_config.yaml generation - litellm/proxy/proxy_cli.py: adds --setup flag that invokes the wizard - scripts/install.sh: curl-installable script (detect OS/Python, pip install litellm[proxy], launch wizard) Usage: curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh litellm --setup * fix(install.sh): remove orange color, add LITELLM_BRANCH env var for branch installs * fix(install.sh): install from git branch so --setup is available for QA * fix(install.sh): remove stale LITELLM_BRANCH reference that caused unbound variable error * fix(install.sh): force-reinstall from git to bypass cached PyPI version * fix(install.sh): show pip progress bar during install * fix(install.sh): always launch wizard via $PYTHON_BIN -m litellm, not PATH binary * fix(install.sh): use litellm.proxy.proxy_cli module (no __main__.py exists) * fix(install.sh): suppress RuntimeWarning from module invocation * fix(install.sh): use Python bin-dir litellm binary to avoid CWD sys.path shadowing * fix(install.sh): use sysconfig.get_path('scripts') to find pip-installed litellm binary * fix(install.sh): redirect stdin from /dev/tty on exec so wizard gets terminal, not exhausted pipe * fix(install.sh): warn about git clone duration, drop --no-cache-dir so re-runs are faster * feat(setup_wizard): arrow-key selector, updated model names * fix(setup_wizard): use sysconfig binary to start proxy, not python -m litellm * feat(setup_wizard): credential validation after key entry + clear next-steps after proxy start * style(install.sh): show git clone warning in blue * refactor(setup_wizard): class with static methods, use check_valid_key from litellm.utils * address greptile review: fix yaml escaping, port validation, display name collisions, tests - setup_wizard.py: add _yaml_escape() for safe YAML embedding of API keys - setup_wizard.py: add _styled_input() with readline ANSI ignore markers - setup_wizard.py: change DIVIDER to _divider() fn to avoid import-time color capture - setup_wizard.py: validate port range 1-65535, initialize before loop - setup_wizard.py: qualify azure display names (azure-gpt-4o) to avoid collision with openai - setup_wizard.py: work on env_copy in _build_config to avoid mutating caller's dict - setup_wizard.py: skip model_list entries for providers with no credentials - setup_wizard.py: prompt for azure deployment name - setup_wizard.py: wrap os.execlp in try/except with friendly fallback - setup_wizard.py: wrap config write in try/except OSError - setup_wizard.py: fix _validate_and_report to use two print lines (no \r overwrite) - setup_wizard.py: add .gitignore tip next to key storage notice - setup_wizard.py: fix run_setup_wizard() return type annotation to None - scripts/install.sh: drop pipefail (not supported by dash on Ubuntu when invoked as sh) - scripts/install.sh: use litellm[proxy] from PyPI (not hardcoded dev branch) - scripts/install.sh: guard /dev/tty read with -r check for Docker/CI compat - scripts/install.sh: remove --force-reinstall to avoid downgrading dependencies - tests/test_litellm/test_setup_wizard.py: 13 unit tests for _build_config and _yaml_escape * style: black format setup_wizard.py * fix: address remaining greptile issues - Windows compat, YAML quoting, credential flow - guard termios/tty imports with try/except ImportError for Windows compat - quote master_key as YAML double-quoted scalar (same as env vars) - remove unused port param from _build_config signature - _validate_and_report now returns the final key so re-entered creds are stored - add test for master_key YAML quoting * fix: add --port to suggested command, guard /dev/tty exec in install.sh * fix: quote api_base in YAML, skip azure if no deployment, only redraw on state change * fix: address greptile review comments - _yaml_escape: add control character escaping (\n, \r, \t) - test: fix tautological assertion in test_build_config_azure_no_deployment_skipped - test: add tests for control character escaping in _yaml_escape

greptile-apps · 2026-03-17T22:25:23Z

litellm/setup_wizard.py

+            version = importlib.metadata.version("litellm")
+        except Exception:
+            version = "unknown"
+        print()
+        print(orange(LITELLM_ASCII.rstrip("\n")))
+        print(f"  {orange('Welcome')} to {bold('LiteLLM')} {grey('v' + version)}")
+        print()
+        print(_divider())
+        print()
+
+    # ── provider selector ───────────────────────────────────────────────────
+
+    @staticmethod
+    def _select_providers() -> List[Dict]:
+        """Arrow-key multi-select. Falls back to number input if /dev/tty unavailable."""
+        if not _HAS_RAW_TERMINAL:
+            return SetupWizard._select_fallback()
+        try:
+            return SetupWizard._select_interactive()
+        except OSError:
+            return SetupWizard._select_fallback()
+
+    @staticmethod
+    def _read_key() -> str:
+        """Read one keypress from /dev/tty in raw mode."""
+        assert (
+            termios is not None and tty is not None
+        )  # only called when _HAS_RAW_TERMINAL
+        with open("/dev/tty", "rb") as tty_fh:
+            fd = tty_fh.fileno()
+            old = termios.tcgetattr(fd)
+            try:
+                tty.setraw(fd)
+                ch = tty_fh.read(1)
+                if ch == b"\x1b":
+                    ch2 = tty_fh.read(1)
+                    if ch2 == b"[":
+                        ch3 = tty_fh.read(1)
+                        return "\x1b[" + ch3.decode("utf-8", errors="replace")
+                    return "\x1b" + ch2.decode("utf-8", errors="replace")
+                return ch.decode("utf-8", errors="replace")
+            finally:
+                termios.tcsetattr(fd, termios.TCSADRAIN, old)
+
+    @staticmethod
+    def _render_selector(cursor: int, selected: Set[int], first_render: bool) -> int:
+        """Draw or redraw the provider list. Returns the number of lines printed."""
+        lines = [
+            f"\n  {bold('Add your first model')}\n",
+            grey("  ↑↓ to navigate · Space to select · Enter to confirm") + "\n",
+            "\n",
+        ]
+        for i, p in enumerate(PROVIDERS):
+            arrow = blue("❯") if i == cursor else " "
+            bullet = green("◉") if i in selected else grey("○")
+            name_str = bold(p["name"]) if i == cursor else p["name"]
+            lines.append(f"  {arrow} {bullet} {name_str}  {grey(p['description'])}\n")
+        lines.append("\n")
+
+        content = "".join(lines)
+        if not first_render and _supports_color():
+            sys.stdout.write(_MOVE_UP.format(content.count("\n")))


Hardcoded env-key names and model lists contradict CLAUDE.md guidance

The PROVIDERS list hardcodes both provider env-key names (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY) and model lists (e.g. ["gpt-4o", "gpt-4o-mini"]). The CLAUDE.md section added in this same PR explicitly states:

Do not hardcode provider env-key names or model lists that already exist in the codebase.

These env-key names already exist in the respective provider implementations (e.g. litellm.utils, individual provider configs) and the model lists exist in model_prices_and_context_window.json. This creates a maintenance burden: when new popular models are released, two places must be updated (the JSON and the wizard's PROVIDERS list).

The test_model field pattern is a good start — but the models list could be driven by querying get_model_info for each provider's top models rather than duplicating them inline.

Rule Used: CLAUDE.md (source)

…ound (#23908)

greptile-apps · 2026-03-17T22:53:03Z

litellm/setup_wizard.py

+        config_path = Path(os.getcwd()) / "litellm_config.yaml"
+        try:
+            config_path.write_text(
+                SetupWizard._build_config(providers, env_vars, master_key)
+            )
+        except OSError as exc:
+            print(f"\n  {bold(_CROSS + ' Could not write config:')} {exc}")
+            print("  Try running from a directory you have write access to.\n")
+            return


Silent overwrite of existing config

config_path.write_text(...) overwrites litellm_config.yaml without checking whether the file already exists. A user who runs litellm --setup a second time after having manually customised their config will silently lose all those edits. There is no backup, no diff, and no confirmation prompt.

Consider adding an existence check before writing:

Suggested change

config_path = Path(os.getcwd()) / "litellm_config.yaml"

try:

config_path.write_text(

SetupWizard._build_config(providers, env_vars, master_key)

)

except OSError as exc:

print(f"\n {bold(_CROSS + ' Could not write config:')} {exc}")

print(" Try running from a directory you have write access to.\n")

return

config_path = Path(os.getcwd()) / "litellm_config.yaml"

if config_path.exists():

overwrite = _styled_input(

f" {blue('❯')} {bold(str(config_path))} already exists. Overwrite? {grey('(y/N)')}: "

)

if overwrite.lower() != "y":

print(f"\n {grey('Setup cancelled — existing config was not changed.')}\n")

return

try:

config_path.write_text(

SetupWizard._build_config(providers, env_vars, master_key)

)

greptile-apps · 2026-03-17T22:53:04Z

litellm/setup_wizard.py

+            print(f"  {_CROSS} {bold(provider['name'])} {grey('— invalid API key')}")
+            if (
+                _styled_input(f"  {blue('❯')} Re-enter key? {grey('(y/N)')}: ").lower()
+                != "y"
+            ):
+                return api_key


Invalid key silently written to config

When check_valid_key reports that the key is invalid and the user answers "N" to the re-entry prompt, _validate_and_report returns the original (invalid) key unchanged. That key then flows back through _collect_keys into env_vars and is ultimately written verbatim into the environment_variables block of litellm_config.yaml.

When the user later starts the proxy and all requests to that provider fail with authentication errors, there is nothing in the config or any log to indicate this was flagged during setup. At a minimum, the caller (_collect_keys) should be told that validation failed so it can skip that provider's models in the generated config, or the return value should carry an is_valid flag.

The current callsite:

env_vars[p["env_key"]] = SetupWizard._validate_and_report(p, key)

silently uses whatever is returned, including a key already known to be wrong.

greptile-apps · 2026-03-17T22:53:05Z

litellm/setup_wizard.py

+                lines += [
+                    f"  - model_name: {display}",
+                    "    litellm_params:",


Unquoted model_name and model fields in generated YAML

model_name and model values are written as bare YAML scalars without quoting:

lines += [ f" - model_name: {display}", " litellm_params:", f" model: {model}", ]

Current hardcoded names happen to be safe, but the Bedrock model (bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0) produces model_name: anthropic.claude-3-5-sonnet-20241022-v2:0 — a colon-containing unquoted scalar. YAML parsers handle v2:0 (no space after :) correctly today, but any future model whose display name contains : (colon-space) or starts with {, [, >, |, &, *, # would silently break the generated config. Quoting both fields defensively avoids this class of failure:

lines += [ f' - model_name: "{display}"', " litellm_params:", f' model: "{model}"', ]

…P auth (#23897) * Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers * Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present. * Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings. * Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers * Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present. * Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings. * feat(guardrails): add MCPJWTSigner built-in guardrail for zero trust MCP auth Signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT so MCP servers can trust a single signing authority instead of every upstream IdP. Enable in config.yaml: guardrails: - guardrail_name: mcp-jwt-signer litellm_params: guardrail: mcp_jwt_signer mode: pre_mcp_call default_on: true JWT carries sub (user_id), act.sub (team_id, RFC 8693), tool-level scope, iss, aud, iat/exp/nbf. RSA-2048 keypair auto-generated at startup unless MCP_JWT_SIGNING_KEY env var is set. Adds /.well-known/jwks.json endpoint and jwks_uri to /.well-known/openid-configuration so MCP servers can verify LiteLLM-issued tokens via OIDC discovery. * Update MCPServerManager to raise HTTPException with status code 400 for extra headers in OpenAPI-backed servers. Adjust tests to verify the correct status code and exception message. * fix: address P1 issues in MCPJWTSigner - OpenAPI servers: warn + skip header injection instead of 500 - JWKS Cache-Control: 5min for auto-generated keys, 1h for persistent - sub claim: fallback to apikey:{token_hash} for anonymous callers - ttl_seconds: validate > 0 at init time * docs: add MCP zero trust auth guide with architecture diagram * docs: add FastMCP JWT verification guide to zero trust doc * fix: address remaining Greptile review issues (round 2) - mcp_server_manager: warn when hook Authorization overwrites existing header - __init__: remove _mcp_jwt_signer_instance from __all__ (private internal) - discoverable_endpoints: copy dict instead of mutating in-place on OIDC augmentation - test docstring: reflect warn-and-continue behavior for OpenAPI servers - test: update scope assertions for least-privilege (no mcp:tools/list on tool-call JWTs) * fix: address Greptile round 3 feedback - initialize_guardrail: validate mode='pre_mcp_call' at init time — misconfigured mode silently bypasses JWT injection, which is a zero-trust bypass - _build_claims: remove duplicate inline 'import re' (module-level import already present) - _types.py: add TODO comment explaining jwt_claims is forward-compat plumbing for a follow-up PR that will forward upstream IdP claims into outbound MCP JWTs * feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes Addresses all missing pieces from the scoping doc review: FR-5 (Verify + re-sign): MCPJWTSigner now accepts access_token_discovery_uri and token_introspection_endpoint. When set, the incoming Bearer token is extracted from raw_headers (threaded through pre_call_tool_check), verified against the IdP's JWKS (JWT) or introspected (opaque), and only re-signed if valid. Falls back to user_api_key_dict.jwt_claims for LiteLLM JWT-auth mode. FR-12 (Configurable end-user identity mapping): end_user_claim_sources ordered list drives sub resolution — sources: token:<claim>, litellm:user_id, litellm:email, litellm:end_user_id, litellm:team_id. FR-13 (Claim operations): add_claims (insert-if-absent), set_claims (always override), remove_claims (delete) applied in that order. FR-14 (Two-token model): channel_token_audience + channel_token_ttl issue a second JWT injected as x-mcp-channel-token: Bearer <token>. FR-15 (Incoming claim validation): required_claims raises HTTP 403 when any listed claim is absent; optional_claims passes listed claims from verified token into the outbound JWT. FR-9 (Debug headers): debug_headers: true emits x-litellm-mcp-debug with kid, sub, iss, exp, scope. FR-10 (Configurable scopes): allowed_scopes replaces auto-generation. Also fixed: tool-call JWTs no longer grant mcp:tools/list (overpermission). P1 fixes: - proxy/utils.py: _convert_mcp_hook_response_to_kwargs merges rather than replaces extra_headers, preserving headers from prior guardrails. - mcp_server_manager.py: warns when hook injects Authorization alongside a server-configured authentication_token (previously silent). - mcp_server_manager.py: pre_call_tool_check now accepts raw_headers and extracts incoming_bearer_token so FR-5 verification has the raw token. - proxy/utils.py: remove stray inline import inspect inside loop (pre-existing lint error, now cleaned up). Tests: 43 passing (28 new tests covering all FR flags + P1 fixes). * feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes (core) Remaining files from the FR implementation: mcp_jwt_signer.py — full rewrite with all new params: FR-5: access_token_discovery_uri, token_introspection_endpoint, verify_issuer, verify_audience + _verify_incoming_jwt(), _introspect_opaque_token() FR-12: end_user_claim_sources ordered resolution chain FR-13: add_claims, set_claims, remove_claims FR-14: channel_token_audience, channel_token_ttl → x-mcp-channel-token FR-15: required_claims (raises 403), optional_claims (passthrough) FR-9: debug_headers → x-litellm-mcp-debug FR-10: allowed_scopes; tool-call JWTs no longer over-grant tools/list mcp_server_manager.py: - pre_call_tool_check gains raw_headers param to extract incoming_bearer_token - Silent Authorization override warning fixed: now fires when server has authentication_token AND hook injects Authorization tests/test_mcp_jwt_signer.py: 28 new tests covering all FR flags + P1 fixes (43 total, all passing) * fix(mcp_jwt_signer): address pre-landing review issues - Remove stale TODO comment on UserAPIKeyAuth.jwt_claims — the field is already populated and consumed by MCPJWTSigner in the same PR - Fix _get_oidc_discovery to only cache the OIDC discovery doc when jwks_uri is present; a malformed/empty doc now retries on the next request instead of being permanently cached until proxy restart - Add FR-5 test coverage for _fetch_jwks (cache hit/miss), _get_oidc_discovery (cache/no-cache on bad doc), _verify_incoming_jwt (valid token, expired token), _introspect_opaque_token (active, inactive, no endpoint), and the end-to-end 401 hook path — 53 tests total, all passing * docs(mcp_zero_trust): rewrite as use-case guide covering all new JWT signer features Add scenario-driven sections for each new config area: - Verify+re-sign with Okta/Azure AD (access_token_discovery_uri, end_user_claim_sources, token_introspection_endpoint) - Enforcing caller attributes with required_claims / optional_claims - Adding metadata via add_claims / set_claims / remove_claims - Two-token model for AWS Bedrock AgentCore Gateway (channel_token_audience / channel_token_ttl) - Controlling scopes with allowed_scopes - Debugging JWT rejections with debug_headers Update JWT claims table to reflect configurable sub (end_user_claim_sources) * fix(mcp_jwt_signer): wire all config.yaml params through initialize_guardrail The factory was only passing issuer/audience/ttl_seconds to MCPJWTSigner. All FR-5/9/10/12/13/14/15 params (access_token_discovery_uri, end_user_claim_sources, add/set/remove_claims, channel_token_audience, required/optional_claims, debug_headers, allowed_scopes, etc.) were silently dropped, making every advertised advanced feature non-functional when loaded from config.yaml. Add regression test that asserts every param is wired through correctly. * docs(mcp_zero_trust): add hero image * docs(mcp_zero_trust): apply Linear-style edits - Lead with the problem (unsigned direct calls bypass access controls) - Shorter statement section headers instead of question-form headers - Move diagram/OIDC discovery block after the reader is bought in - Add 'read further only if you need to' callout after basic setup - Two-token section now opens from the user problem not product jargon - Add concrete 403 error response example in required_claims section - Debug section opens from the symptom (MCP server returning 401) - Lowercase claims reference header for consistency * fix(mcp_jwt_signer): fix algorithm confusion attack + add OIDC discovery 24h TTL - Remove alg from unverified JWT header; use signing_jwk.algorithm_name from JWKS key instead. Reading alg from attacker-controlled headers enables alg:none / HS256 confusion attacks. - Add _oidc_discovery_fetched_at timestamp and _OIDC_DISCOVERY_TTL = 86400 (24h). Without a TTL the cached discovery doc never refreshes, so IdP key rotation is invisible. --------- Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>

greptile-apps · 2026-03-18T00:34:37Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+        missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
+        if missing:
+            raise HTTPException(
+                status_code=403,
+                detail={
+                    "error": (
+                        f"MCPJWTSigner: incoming token is missing required claims: "
+                        f"{missing}. Configure the IdP to include these claims."
+                    )
+                },
+            )


Falsy-value claims incorrectly reported as missing

The check not (jwt_claims or {}).get(c) treats falsy claim values (False, 0, "", []) as absent — meaning a JWT containing {"email_verified": false} or {"retry_count": 0} would fail required_claims validation and block the request with HTTP 403, even though the claim is present.

The correct check tests for key absence, not value truthiness:

Suggested change

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]

if missing:

raise HTTPException(

status_code=403,

detail={

"error": (

f"MCPJWTSigner: incoming token is missing required claims: "

f"{missing}. Configure the IdP to include these claims."

)

},

)

missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

greptile-apps · 2026-03-18T00:34:38Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+        if not result.get("active", False):
+            raise jwt.exceptions.ExpiredSignatureError(  # type: ignore[attr-defined]
+                "MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
+            )
+        return result


Wrong exception type for inactive token

jwt.exceptions.ExpiredSignatureError semantically means the token's exp claim is in the past. Using it to signal that an introspection endpoint returned active=false is misleading — the two conditions are distinct (e.g., a token could be revoked long before it expires).

When this is caught in async_pre_call_hook and re-raised as an HTTP 401, the error message will claim "expired signature" even when the real issue is token revocation or suspension, making production debugging much harder.

Use a more semantically accurate exception:

Suggested change

if not result.get("active", False):

raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

)

return result

raise jwt.exceptions.InvalidTokenError(

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

)

greptile-apps · 2026-03-18T00:34:40Z

litellm/proxy/_experimental/mcp_server/mcp_server_manager.py

        # For OpenAPI servers, call the tool handler directly instead of via MCP client
        if mcp_server.spec_path:
            verbose_logger.debug(
-                f"Calling OpenAPI tool {name} directly via HTTP handler"
+                "Calling OpenAPI tool %s directly via HTTP handler", name
            )
+            if hook_result.get("extra_headers"):
+                verbose_logger.warning(
+                    "pre_mcp_call hook returned extra_headers for OpenAPI-backed "
+                    "MCP server '%s' — header injection is not supported for "
+                    "OpenAPI servers; headers will be ignored. Use SSE/HTTP "
+                    "transport to enable hook header injection.",
+                    server_name,


JWT silently dropped for OpenAPI-backed servers

When MCPJWTSigner is active and a tool call goes to an OpenAPI-backed MCP server, the signed JWT in hook_result["extra_headers"] is silently discarded after the warning log. The caller receives no indication that JWT authentication was not applied to the upstream call.

This creates a silent security gap: an operator may believe JWT enforcement is active, but calls to OpenAPI-backed servers bypass it entirely. The warning in the server log is easy to miss.

Consider either:

Raising an HTTPException (or GuardrailRaisedException) to fail fast if MCPJWTSigner injects an Authorization header that cannot be forwarded, or

Documenting this limitation prominently in the config YAML doc comment so operators are warned at setup time.

At minimum, the warning message should be escalated to error level so it stands out in logs.

greptile-apps · 2026-03-18T00:34:41Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+# Module-level singleton for the JWKS discovery endpoint to access.
+_mcp_jwt_signer_instance: Optional["MCPJWTSigner"] = None
+
+# Simple in-memory JWKS cache: keyed by JWKS URI → (keys_list, fetched_at).
+_jwks_cache: Dict[str, tuple] = {}
+_JWKS_CACHE_TTL = 3600  # 1 hour


Module-level JWKS cache is not concurrency-safe

_jwks_cache is a plain module-level dict with a non-atomic read-check-write pattern in _fetch_jwks. In a high-concurrency async environment multiple coroutines can simultaneously observe a stale/missing cache entry and all proceed to fire HTTP requests to the same JWKS URI. This results in redundant IdP requests rather than incorrect behavior — but it is worth noting, especially for deployments under heavy load.

Consider using asyncio.Lock() per URI or an asyncio.Semaphore to serialise JWKS refreshes:

_jwks_cache: Dict[str, tuple] = {} _jwks_cache_locks: Dict[str, "asyncio.Lock"] = {}

Then acquire the lock per jwks_uri before checking-and-refreshing the cache.

…ty CVEs, router bug, batch resolution Fix 1: Run Black formatter on 35 files Fix 2: Fix MyPy type errors: - setup_wizard.py: add type annotation for 'selected' set variable - user_api_key_auth.py: remove redundant type annotation on jwt_claims reassignment Fix 3: Fix spend accuracy test burst 2 polling to wait for expected total spend instead of just 'any increase' from burst 2 Fix 4: Bump Next.js 16.1.6 -> 16.1.7 to fix CVE-2026-27978, CVE-2026-27979, CVE-2026-27980, CVE-2026-29057 Fix 5: Fix router _pre_call_checks model variable being overwritten inside loop, causing wrong model lookups on subsequent deployments. Use local _deployment_model variable instead. Fix 6: Add missing resolve_output_file_ids_to_unified call in batch retrieve non-terminal-to-terminal path (matching the terminal path behavior) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

CLAassistant · 2026-03-18T01:24:30Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ishaan-jaff
❌ cursoragent
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

greptile-apps · 2026-03-18T01:32:00Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+
+        from fastapi import HTTPException
+
+        missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]


Falsy-value claims silently treated as absent

not (jwt_claims or {}).get(c) treats any falsy value (False, 0, "", []) as absent. A JWT containing {"email_verified": false} or {"retry_count": 0} would incorrectly fail required_claims validation and block the request with HTTP 403, even though the claim is present in the token.

The correct check should test for key absence, not value truthiness:

Suggested change

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]

missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

greptile-apps · 2026-03-18T01:32:01Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+        if not result.get("active", False):
+            raise jwt.exceptions.ExpiredSignatureError(  # type: ignore[attr-defined]
+                "MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
+            )
+        return result


Wrong exception type for inactive/revoked token

jwt.exceptions.ExpiredSignatureError semantically means the token's exp claim is in the past. Using it to signal that an introspection endpoint returned active=false conflates two distinct conditions — a token can be revoked long before it expires.

When caught upstream in async_pre_call_hook and re-raised as HTTP 401, the error message will say "expired signature" even for revoked or suspended tokens, making production debugging much harder.

Use a more semantically accurate exception:

Suggested change

if not result.get("active", False):

raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

)

return result

if not result.get("active", False):

raise jwt.exceptions.InvalidTokenError(

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

)

greptile-apps · 2026-03-18T01:32:02Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+            raise HTTPException(
+                status_code=403,
+                detail={
+                    "error": (
+                        f"MCPJWTSigner: incoming token is missing required claims: "
+                        f"{missing}. Configure the IdP to include these claims."
+                    )
+                },
+            )
+
+    # ------------------------------------------------------------------
+    # FR-12: End-user identity mapping
+    # ------------------------------------------------------------------


FastAPI import inside the litellm/ SDK directory

from fastapi import HTTPException is imported in _validate_required_claims (line ~533) and async_pre_call_hook (line ~595). Per the project rule, FastAPI is a proxy-only dependency and must not be imported in SDK files. While this file lives under litellm/proxy/guardrails/, the import pattern used (lazy inside a method) is still fragile and would fail if the guardrail is ever invoked in a non-proxy SDK context.

Consider raising a litellm.exceptions.AuthenticationError or a plain ValueError/PermissionError that the proxy layer can then translate into an HTTPException. This also decouples the guardrail from FastAPI's request lifecycle.

Rule Used: What: Do not allow fastapi imports on files outsid... (source)

greptile-apps · 2026-03-18T01:32:05Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+# Module-level singleton for the JWKS discovery endpoint to access.
+_mcp_jwt_signer_instance: Optional["MCPJWTSigner"] = None
+
+# Simple in-memory JWKS cache: keyed by JWKS URI → (keys_list, fetched_at).
+_jwks_cache: Dict[str, tuple] = {}
+_JWKS_CACHE_TTL = 3600  # 1 hour
+
+
+def get_mcp_jwt_signer() -> Optional["MCPJWTSigner"]:
+    """Return the active MCPJWTSigner singleton, or None if not initialized."""
+    return _mcp_jwt_signer_instance
+
+
+def _load_private_key_from_env(env_var: str) -> RSAPrivateKey:
+    """Load an RSA private key from an env var (PEM string or file:// path)."""
+    key_material = os.environ.get(env_var, "")
+    if not key_material:
+        raise ValueError(
+            f"MCPJWTSigner: environment variable '{env_var}' is set but empty."
+        )
+    if key_material.startswith("file://"):
+        path = key_material[len("file://") :]
+        with open(path, "rb") as f:
+            key_bytes = f.read()


Module-level JWKS cache lacks concurrency safety

_jwks_cache is a plain module-level dict with a non-atomic read-check-write pattern in _fetch_jwks. In a high-concurrency async environment, multiple coroutines can simultaneously observe a stale/missing cache entry and all fire HTTP requests to the same JWKS URI, causing redundant IdP requests. Under heavy load this can inadvertently trigger rate limits on the IdP.

Consider protecting the refresh with a per-URI asyncio.Lock:

_jwks_cache: Dict[str, tuple] = {} _jwks_cache_locks: Dict[str, asyncio.Lock] = {} async def _fetch_jwks(jwks_uri: str) -> List[Dict[str, Any]]: now = time.time() cached = _jwks_cache.get(jwks_uri) if cached and now - cached[1] < _JWKS_CACHE_TTL: return cached[0] if jwks_uri not in _jwks_cache_locks: _jwks_cache_locks[jwks_uri] = asyncio.Lock() async with _jwks_cache_locks[jwks_uri]: # Re-check after acquiring lock cached = _jwks_cache.get(jwks_uri) if cached and now - cached[1] < _JWKS_CACHE_TTL: return cached[0] # ... fetch and store

greptile-apps · 2026-03-18T01:32:07Z

litellm/proxy/_experimental/mcp_server/mcp_server_manager.py

        # For OpenAPI servers, call the tool handler directly instead of via MCP client
        if mcp_server.spec_path:
            verbose_logger.debug(
-                f"Calling OpenAPI tool {name} directly via HTTP handler"
+                "Calling OpenAPI tool %s directly via HTTP handler", name
            )
+            if hook_result.get("extra_headers"):
+                verbose_logger.warning(
+                    "pre_mcp_call hook returned extra_headers for OpenAPI-backed "
+                    "MCP server '%s' — header injection is not supported for "
+                    "OpenAPI servers; headers will be ignored. Use SSE/HTTP "


JWT silently dropped for OpenAPI-backed MCP servers

When MCPJWTSigner is active and a tool call targets an OpenAPI-backed MCP server, the signed JWT in hook_result["extra_headers"] is silently discarded after the warning log. The caller receives no indication that JWT authentication was not forwarded to the upstream.

This creates a silent security gap: operators may believe JWT enforcement is active for all MCP servers, but calls to OpenAPI-backed servers bypass it entirely. The warning message is easy to miss in logs.

Consider either raising an HTTPException to fail fast if the operator expects JWT enforcement, or escalating the log level to error so the gap is visible in monitoring:

verbose_logger.error( "pre_mcp_call hook returned extra_headers for OpenAPI-backed " "MCP server '%s' — header injection is NOT supported; the " "JWT will NOT be forwarded. Switch to SSE/HTTP transport or " "disable MCPJWTSigner for this server.", server_name, )

greptile-apps · 2026-03-18T01:32:09Z

litellm/setup_wizard.py

+        "name": "Anthropic",
+        "description": "Claude Opus 4.6, Sonnet 4.6, Haiku 4.5",
+        "env_key": "ANTHROPIC_API_KEY",
+        "key_hint": "sk-ant-...",
+        "test_model": "claude-haiku-4-5-20251001",
+        "models": ["claude-opus-4-6", "claude-sonnet-4-6", "claude-haiku-4-5-20251001"],
+    },
+    {
+        "id": "gemini",
+        "name": "Google Gemini",
+        "description": "Gemini 2.0 Flash, Gemini 2.5 Pro",
+        "env_key": "GEMINI_API_KEY",
+        "key_hint": "AIza...",
+        "test_model": "gemini/gemini-2.0-flash",
+        "models": ["gemini/gemini-2.0-flash", "gemini/gemini-2.5-pro"],
+    },
+    {
+        "id": "azure",
+        "name": "Azure OpenAI",
+        "description": "GPT-4o via Azure",
+        "env_key": "AZURE_API_KEY",
+        "key_hint": "your-azure-key",
+        "test_model": None,  # needs deployment name — skip validation
+        "models": [],
+        "needs_api_base": True,
+        "api_base_hint": "https://<resource>.openai.azure.com/",
+        "api_version": "2024-07-01-preview",
+    },
+    {
+        "id": "bedrock",
+        "name": "AWS Bedrock",
+        "description": "Claude 3.5, Llama 3 via AWS",
+        "env_key": "AWS_ACCESS_KEY_ID",
+        "key_hint": "AKIA...",
+        "test_model": None,  # multi-key auth — skip validation
+        "models": ["bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"],
+        "extra_keys": ["AWS_SECRET_ACCESS_KEY", "AWS_REGION_NAME"],
+        "extra_hints": ["your-secret-key", "us-east-1"],
+    },
+    {
+        "id": "ollama",
+        "name": "Ollama",
+        "description": "Local models (llama3.2, mistral, etc.)",
+        "env_key": None,
+        "key_hint": None,
+        "test_model": None,  # local — no remote validation
+        "models": ["ollama/llama3.2", "ollama/mistral"],
+        "api_base": "http://localhost:11434",
+    },
+]
+
+
+# ---------------------------------------------------------------------------
+# ANSI colour helpers
+# ---------------------------------------------------------------------------
+
+_ANSI_RE = re.compile(r"\033\[[^m]*m")
+
+_ORANGE = "\033[38;2;215;119;87m"
+_DIM = "\033[2m"
+_BOLD = "\033[1m"
+_GREEN = "\033[38;2;78;186;101m"
+_BLUE = "\033[38;2;177;185;249m"
+_GREY = "\033[38;2;153;153;153m"
+_RESET = "\033[0m"
+_CHECK = "✔"
+_CROSS = "✘"
+
+_CURSOR_HIDE = "\033[?25l"
+_CURSOR_SHOW = "\033[?25h"
+_MOVE_UP = "\033[{}A"
+
+
+def _supports_color() -> bool:
+    return sys.stdout.isatty() and os.environ.get("NO_COLOR") is None


Hardcoded provider env-key names and model lists

The PROVIDERS list hardcodes both provider env-key names (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY) and model lists (e.g. ["gpt-4o", "gpt-4o-mini"]). The CLAUDE.md section added in this same PR explicitly states:

Do not hardcode provider env-key names or model lists that already exist in the codebase.

These env-key names already exist in the respective provider implementations and the model lists exist in model_prices_and_context_window.json. When new popular models are released, two places must be updated.

The test_model field pattern is a good start — but the models list could be driven by querying get_model_info for each provider's top models rather than duplicating them inline.

Rule Used: CLAUDE.md (source)

greptile-apps · 2026-03-18T01:32:11Z

litellm/setup_wizard.py

+                )
+                return api_key
+
+            print(f"  {_CROSS} {bold(provider['name'])} {grey('— invalid API key')}")


Invalid key silently written to generated config

When check_valid_key reports the key is invalid and the user declines to re-enter, _validate_and_report returns the original (invalid) key unchanged. That key then flows back through _collect_keys into env_vars and is written verbatim into the environment_variables block of litellm_config.yaml.

When the proxy later starts and all requests to that provider fail with authentication errors, nothing in the config or logs indicates this was flagged during setup. At a minimum, _collect_keys should be told that validation failed so it can skip that provider's models in the generated config, or _validate_and_report should return an is_valid flag.

# Current (silent failure): env_vars[p["env_key"]] = SetupWizard._validate_and_report(p, key)

greptile-apps · 2026-03-18T01:32:12Z

litellm/setup_wizard.py

+        config_path = Path(os.getcwd()) / "litellm_config.yaml"
+        try:
+            config_path.write_text(
+                SetupWizard._build_config(providers, env_vars, master_key)
+            )


Silent overwrite of existing config file

config_path.write_text(...) overwrites litellm_config.yaml without checking whether the file already exists. A user who runs litellm --setup a second time after customising their config will silently lose all those edits with no backup, diff, or confirmation prompt.

Consider adding an existence check before writing:

if config_path.exists(): answer = _styled_input( f" {orange('Warning:')} {config_path.name} already exists. Overwrite? [y/N] " ) if answer.lower() != "y": print(f"\n {grey('Aborted. Existing config preserved.')}\n") return

greptile-apps · 2026-03-18T01:32:13Z

litellm/setup_wizard.py

+                lines += [
+                    f"  - model_name: {display}",
+                    "    litellm_params:",
+                    f"      model: {model}",
+                ]
+                if p["env_key"] and p["env_key"] in env_copy:
+                    lines.append(f"      api_key: os.environ/{p['env_key']}")
+                if p.get("api_base"):


Unquoted model_name and model fields in generated YAML

model_name and model values are written as bare YAML scalars without quoting. The Bedrock model bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 produces model_name: anthropic.claude-3-5-sonnet-20241022-v2:0 — a colon-containing unquoted scalar. While YAML parsers handle v2:0 (no space after :) correctly today, any future model whose display name contains : (colon-space) or starts with special YAML characters would silently break the generated config.

Quote both fields defensively:

lines += [ f' - model_name: "{display}"', " litellm_params:", f' model: "{model}"', ]

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

…ibility Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

greptile-apps · 2026-03-18T02:00:40Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+        if not self.required_claims:
+            return
+
+        from fastapi import HTTPException
+
+        missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
+        if missing:
+            raise HTTPException(
+                status_code=403,
+                detail={
+                    "error": (
+                        f"MCPJWTSigner: incoming token is missing required claims: "
+                        f"{missing}. Configure the IdP to include these claims."
+                    )
+                },
+            )


Inconsistent key-presence check between required_claims and optional_claims

_validate_required_claims uses not (jwt_claims or {}).get(c) (a truthiness check), while _passthrough_optional_claims (line ~666) correctly uses claim in jwt_claims (a key-presence check). This inconsistency means two different semantics apply to the same incoming JWT claims depending on which feature processes them:

optional_claims: a claim with value False is passed through correctly

required_claims: the same claim with value False fails validation with HTTP 403

The fix should use key-presence consistently:

Suggested change

if not self.required_claims:

return

from fastapi import HTTPException

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]

if missing:

raise HTTPException(

status_code=403,

detail={

"error": (

f"MCPJWTSigner: incoming token is missing required claims: "

f"{missing}. Configure the IdP to include these claims."

)

},

)

missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

greptile-apps · 2026-03-18T02:00:40Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+        result: Dict[str, Any] = resp.json()
+        if not result.get("active", False):
+            raise jwt.exceptions.ExpiredSignatureError(  # type: ignore[attr-defined]
+                "MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
+            )


Semantically incorrect exception for inactive/revoked token

jwt.exceptions.ExpiredSignatureError means the token's exp claim is in the past. Using it to signal that an introspection endpoint returned active=false conflates two distinct situations — a token may be revoked, suspended, or otherwise deactivated long before its exp.

When this is caught upstream in async_pre_call_hook and re-raised as HTTP 401, the error message will claim "expired signature" even for tokens that have been explicitly revoked, making production debugging unnecessarily difficult.

Consider raising a more semantically accurate exception:

raise jwt.exceptions.InvalidTokenError( "MCPJWTSigner: incoming token is inactive (introspection returned active=false)" )

greptile-apps · 2026-03-18T02:00:42Z

litellm/proxy/_experimental/mcp_server/mcp_server_manager.py

        # For OpenAPI servers, call the tool handler directly instead of via MCP client
        if mcp_server.spec_path:
            verbose_logger.debug(
-                f"Calling OpenAPI tool {name} directly via HTTP handler"
+                "Calling OpenAPI tool %s directly via HTTP handler", name
            )
+            if hook_result.get("extra_headers"):
+                verbose_logger.warning(
+                    "pre_mcp_call hook returned extra_headers for OpenAPI-backed "
+                    "MCP server '%s' — header injection is not supported for "
+                    "OpenAPI servers; headers will be ignored. Use SSE/HTTP "


Hook JWT silently discarded for OpenAPI-backed MCP servers

When MCPJWTSigner is active and a tool call targets an OpenAPI-backed MCP server, the signed JWT in hook_result["extra_headers"] is silently discarded after the warning log. The caller receives no indication that JWT authentication was not forwarded upstream.

This creates a silent security gap: an operator who configures MCPJWTSigner believing it enforces zero-trust for all MCP servers will find that OpenAPI-backed servers bypass it entirely. The WARNING log is easy to miss in production.

Consider escalating to ERROR level or raising an HTTPException to fail closed when JWT enforcement is expected:

verbose_logger.error( "pre_mcp_call hook returned extra_headers for OpenAPI-backed " "MCP server '%s' — JWT will NOT be forwarded. Use SSE/HTTP " "transport or disable MCPJWTSigner for this server.", server_name, )

greptile-apps · 2026-03-18T02:00:43Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+            value: Optional[str] = None
+
+            if source.startswith("token:"):
+                claim_name = source[len("token:") :]
+                raw = (jwt_claims or {}).get(claim_name)
+                value = str(raw) if raw else None


Falsy claim values silently skipped in end-user identity resolution

_resolve_end_user_identity uses value = str(raw) if raw else None. This skips any falsy claim value — 0, False, "", [] — treating it as absent and falling through to the next source in the chain (or the final hash fallback), producing an incorrect sub claim in the signed JWT.

This is a distinct issue from the required_claims truthiness bug. Note how _passthrough_optional_claims (line ~666) correctly uses claim in jwt_claims for key-presence. The same pattern should be used here:

Suggested change

value: Optional[str] = None

if source.startswith("token:"):

claim_name = source[len("token:") :]

raw = (jwt_claims or {}).get(claim_name)

value = str(raw) if raw else None

raw = (jwt_claims or {}).get(claim_name)

value = str(raw) if (raw is not None) else None

…el lookup) Replace deprecated gpt-3.5-turbo-1106 with gpt-4.1-mini + mock_response in test_router_region_pre_call_check, following the same pattern used in commit 717d37c for test_router_context_window_check_pre_call_check_out_group. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

greptile-apps · 2026-03-18T02:05:50Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+
+        from fastapi import HTTPException
+
+        missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]


Falsy claim values incorrectly treated as missing in _validate_required_claims

not (jwt_claims or {}).get(c) uses a truthiness check, meaning any falsy but present claim value (False, 0, "", []) is treated as absent. A JWT with {"email_verified": false} or {"retry_count": 0} would incorrectly raise HTTP 403, even though the claim is present.

Contrast this with _passthrough_optional_claims on line 666, which correctly uses the claim in jwt_claims key-presence pattern.

Suggested change

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]

missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

greptile-apps · 2026-03-18T02:05:51Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+        if not result.get("active", False):
+            raise jwt.exceptions.ExpiredSignatureError(  # type: ignore[attr-defined]
+                "MCPJWTSigner: incoming token is inactive (introspection returned active=false)"
+            )


Wrong exception type for revoked/inactive token

jwt.exceptions.ExpiredSignatureError semantically means the token's exp claim is in the past. Using it here to signal that the introspection endpoint returned active=false conflates two distinct situations — a token can be revoked or suspended long before it expires.

When this exception is caught in async_pre_call_hook and re-raised as HTTP 401, the error message will say "expired signature" even for explicitly revoked tokens, making production debugging harder. Use a semantically accurate exception instead:

Suggested change

if not result.get("active", False):

raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

)

raise jwt.exceptions.InvalidTokenError(

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

)

greptile-apps · 2026-03-18T02:05:52Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+            value: Optional[str] = None
+
+            if source.startswith("token:"):


Falsy claim values skipped in _resolve_end_user_identity

value = str(raw) if raw else None uses truthiness, so any falsy claim value (0, False, "") is treated as absent and the resolution chain falls through to the next source — potentially producing a hash fallback sub even when a valid (but falsy) claim exists.

This is the same class of bug as in _validate_required_claims. The fix is to check for key presence before assigning:

if source.startswith("token:"): claim_name = source[len("token:"):] if jwt_claims is not None and claim_name in jwt_claims: value = str(jwt_claims[claim_name])

greptile-apps · 2026-03-18T02:05:53Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/__init__.py

+    mode = litellm_params.mode
+    if mode != "pre_mcp_call":
+        raise ValueError(
+            f"MCPJWTSigner guardrail '{guardrail_name}' has mode='{mode}' but must use "
+            "mode='pre_mcp_call'. JWT injection only fires for MCP tool calls."
+        )


mode validation may break with valid non-string mode values

The initialize_guardrail function checks if mode != "pre_mcp_call" and raises. However, looking at the LitellmParams type, mode could be a Literal enum or None. If a user accidentally omits mode from the config, mode will be None and the error message will say mode='None' instead of something helpful like "mode is required". Consider explicitly handling the None case to provide a clearer error message for misconfigured guardrails.

greptile-apps · 2026-03-18T02:05:54Z

litellm/proxy/_experimental/mcp_server/mcp_server_manager.py

        # For OpenAPI servers, call the tool handler directly instead of via MCP client
        if mcp_server.spec_path:
            verbose_logger.debug(
-                f"Calling OpenAPI tool {name} directly via HTTP handler"
+                "Calling OpenAPI tool %s directly via HTTP handler", name
            )
+            if hook_result.get("extra_headers"):
+                verbose_logger.warning(
+                    "pre_mcp_call hook returned extra_headers for OpenAPI-backed "
+                    "MCP server '%s' — header injection is not supported for "


JWT silently discarded for OpenAPI-backed MCP servers — security gap

When MCPJWTSigner is active and a tool call targets an OpenAPI-backed MCP server (mcp_server.spec_path is set), the signed JWT in hook_result["extra_headers"] is silently discarded after logging a WARNING. The caller receives no indication that JWT authentication was not forwarded.

An operator who configures MCPJWTSigner to enforce zero-trust for all MCP servers will find that OpenAPI-backed servers bypass it entirely, with only a WARNING log that is easy to miss in production. At minimum, escalate the log level to ERROR:

verbose_logger.error( "pre_mcp_call hook returned extra_headers for OpenAPI-backed " "MCP server '%s' — JWT will NOT be forwarded. Use SSE/HTTP " "transport or disable MCPJWTSigner for this server.", server_name, )

greptile-apps · 2026-03-18T02:05:55Z

litellm/proxy/auth/user_api_key_auth.py

                do_standard_jwt_auth = True
                if jwt_handler.litellm_jwtauth.virtual_key_claim_field is not None:
                    # Decode JWT to get claims without running full auth_builder
+                    jwt_claims: Optional[dict]
                    if jwt_handler.litellm_jwtauth.oidc_userinfo_enabled:
                        jwt_claims = await jwt_handler.get_oidc_userinfo(token=api_key)


jwt_claims variable potentially uninitialized before first use

The type annotation jwt_claims: Optional[dict] is added at line 688 before the conditional block that assigns it, but the variable is not given a default value. If the if jwt_handler.litellm_jwtauth.virtual_key_claim_field is not None: branch is entered but neither the oidc_userinfo_enabled nor the else path runs (hypothetical future changes), jwt_claims would be referenced unbound.

More critically, on line 731, jwt_claims = result.get("jwt_claims", None) is placed in the standard JWT auth path but not in all code paths. If the virtual-key fast path (do_standard_jwt_auth = False) is taken without entering the claim extraction block, jwt_claims may not be defined when reached at line 760.

Consider initializing jwt_claims = None immediately at the point of declaration to make the intent explicit and safe.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

greptile-apps · 2026-03-18T02:39:25Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+            except Exception as exc:
+                verbose_proxy_logger.error(
+                    "MCPJWTSigner: incoming token verification failed: %s", exc
+                )
+                from fastapi import HTTPException
+
+                raise HTTPException(
+                    status_code=401,
+                    detail={
+                        "error": (
+                            f"MCPJWTSigner: incoming token verification failed: {exc}"
+                        )


Broad except Exception masks infrastructure failures as HTTP 401

The catch-all except Exception converts every error — including network timeouts, DNS failures, HTTP 503 from the JWKS endpoint, and JSON parse errors — into an HTTP 401 response. From a client's perspective this looks identical to "your token is invalid", making infrastructure failures completely invisible and extremely hard to diagnose in production.

For example: if the IdP's JWKS endpoint returns a 503 or is temporarily unreachable, every MCP tool call is rejected with 401 incoming token verification failed: Connection error — which looks like a broken token to the caller.

The fix is to narrow the catch to JWT-specific errors and re-raise unexpected exceptions as 502/503:

import httpx try: if is_jwt: jwt_claims = await self._verify_incoming_jwt(raw_token) elif self.token_introspection_endpoint: jwt_claims = await self._introspect_opaque_token(raw_token) else: verbose_proxy_logger.warning(...) except jwt.PyJWTError as exc: # Auth failure — the token itself is invalid/expired/rejected. verbose_proxy_logger.error("MCPJWTSigner: token invalid: %s", exc) from fastapi import HTTPException raise HTTPException(status_code=401, detail={"error": f"MCPJWTSigner: {exc}"}) except (httpx.HTTPError, OSError, ValueError) as exc: # Infrastructure failure — JWKS endpoint unreachable, network error, etc. verbose_proxy_logger.error("MCPJWTSigner: JWKS/introspection unreachable: %s", exc) from fastapi import HTTPException raise HTTPException(status_code=503, detail={"error": f"MCPJWTSigner: upstream auth service unavailable: {exc}"})

…ndition The _verify_langfuse_call helper only inspected the last mock call (mock_post.call_args), but the Langfuse SDK may split trace-create and generation-create events across separate HTTP flush cycles. This caused an IndexError when the last call's batch contained only one event type. Fix: iterate over mock_post.call_args_list to collect batch items from ALL calls. Also add a safety assertion after filtering by trace_id and mark all langfuse e2e tests with @pytest.mark.flaky(retries=3) as an extra safety net for any residual timing issues. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

…changes - Apply Black 26.x formatting to litellm_logging.py (parenthesized style) - Update test_input_types_match_spec to follow $ref to InteractionsInput schema (Google updated their OpenAPI spec to use $ref instead of inline oneOf) - Update test_content_schema_uses_discriminator to handle discriminator without explicit mapping (Google removed the mapping key from Content discriminator) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

greptile-apps · 2026-03-18T20:16:58Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+
+        from fastapi import HTTPException
+
+        missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]


Falsy claim values incorrectly rejected in _validate_required_claims

not (jwt_claims or {}).get(c) uses a truthiness check. Any falsy but present claim value (False, 0, "", []) is treated as absent and incorrectly raises HTTP 403 — even though the claim is present in the token. For example, a JWT containing {"email_verified": false} or {"retry_count": 0} would fail validation.

Note: _passthrough_optional_claims (line 666) correctly uses claim in jwt_claims for key-presence. The same pattern should be used here:

Suggested change

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]

missing = [c for c in self.required_claims if c not in (jwt_claims or {})]

greptile-apps · 2026-03-18T20:16:59Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+        if not result.get("active", False):
+            raise jwt.exceptions.ExpiredSignatureError(  # type: ignore[attr-defined]
+                "MCPJWTSigner: incoming token is inactive (introspection returned active=false)"


Wrong exception type for inactive/revoked token

jwt.exceptions.ExpiredSignatureError means the token's exp claim is in the past. Using it to signal that the introspection endpoint returned active=false conflates two entirely distinct situations — a token can be explicitly revoked or suspended long before it expires.

When this exception is caught in async_pre_call_hook and re-raised as HTTP 401, the error message will say "expired signature" even for revoked tokens, making production debugging significantly harder.

Suggested change

if not result.get("active", False):

raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

raise jwt.exceptions.InvalidTokenError(

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

)

greptile-apps · 2026-03-18T20:17:00Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+            value: Optional[str] = None
+
+            if source.startswith("token:"):


Falsy claim values skipped in _resolve_end_user_identity

value = str(raw) if raw else None uses a truthiness check, so any falsy claim value (0, False, "") is treated as absent and the resolution chain falls through to the next source — potentially producing a hash fallback sub even when a valid (but falsy) claim exists in the token.

This is the same class of issue as in _validate_required_claims. Use a key-presence check instead:

Suggested change

value: Optional[str] = None

if source.startswith("token:"):

if source.startswith("token:"):

claim_name = source[len("token:"):]

if jwt_claims is not None and claim_name in jwt_claims:

value = str(jwt_claims[claim_name])

greptile-apps · 2026-03-18T20:17:01Z

litellm/router.py

                    base_model = _litellm_params.get("base_model", None)
                model_info = self.get_router_model_info(
                    deployment=deployment, received_model_name=model
                )


Bug fix: outer model variable shadowed by deployment-specific model name

The rename from model to _deployment_model is a real correctness fix. The original code overwrote the outer model variable (the request's model group name) with a per-deployment base_model or litellm_params["model"] string. On the next loop iteration, model would hold the previous deployment's model string instead of the original group name, potentially causing incorrect context-window error messages and skipping valid deployments.

This change is correct and important — just noting it explicitly since the inline comment on the same line is long and may be missed.

The file was correctly formatted for Black 23.12.1 (the version pinned in pyproject.toml). The previous commit applied Black 26.x formatting which was incompatible with the CI's Black version. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

greptile-apps · 2026-03-18T20:34:05Z

litellm/proxy/auth/user_api_key_auth.py

                do_standard_jwt_auth = True
                if jwt_handler.litellm_jwtauth.virtual_key_claim_field is not None:
                    # Decode JWT to get claims without running full auth_builder
+                    jwt_claims: Optional[dict]


jwt_claims may be uninitialized when virtual_key_claim_field is None

jwt_claims is declared as a bare type annotation (no assignment) inside the if jwt_handler.litellm_jwtauth.virtual_key_claim_field is not None: block. If that condition is False, the block is skipped entirely and jwt_claims is never bound. Execution then falls into the if do_standard_jwt_auth: path, where it is assigned at line 734. However, if a future code path inside do_standard_jwt_auth == True exits early (e.g. is_proxy_admin return at line 744) before reaching line 734, jwt_claims would be passed uninitialized to UserAPIKeyAuth(jwt_claims=jwt_claims, ...) at line 763.

The safe fix is to initialize jwt_claims = None immediately before the outer if block, replacing the bare annotation:

Suggested change

jwt_claims: Optional[dict]

jwt_claims: Optional[dict] = None

greptile-apps · 2026-03-18T20:34:06Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+        missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]
+        if missing:
+            raise HTTPException(
+                status_code=403,
+                detail={
+                    "error": (


Falsy claim values incorrectly treated as absent in _validate_required_claims

not (jwt_claims or {}).get(c) is a truthiness check, not a key-presence check. Any claim whose value is falsy — False, 0, "", [] — will be treated as if the claim is absent and the request will be rejected with HTTP 403, even though the claim is present in the token. For example, a JWT containing {"email_verified": false} or {"retry_count": 0} would incorrectly fail required_claims validation.

Note that _passthrough_optional_claims (line ~666) correctly uses claim in jwt_claims. The same key-presence pattern should be used here:

Suggested change

missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]

if missing:

raise HTTPException(

status_code=403,

detail={

"error": (

missing = [c for c in self.required_claims if (jwt_claims is None or c not in jwt_claims)]

greptile-apps · 2026-03-18T20:34:07Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+            value: Optional[str] = None
+


Falsy claim values silently skipped in _resolve_end_user_identity

value = str(raw) if raw else None uses a truthiness check, so any falsy claim value (0, False, "") is treated as absent and the resolution chain falls through to the next source — potentially producing a hash-fallback sub even when a valid (but falsy) claim exists in the token. This is the same class of issue as in _validate_required_claims.

The fix should check for key presence before assigning:

Suggested change

value: Optional[str] = None

if jwt_claims is not None and claim_name in jwt_claims:

value = str(jwt_claims[claim_name])

greptile-apps · 2026-03-18T20:34:08Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+        result: Dict[str, Any] = resp.json()
+        if not result.get("active", False):
+            raise jwt.exceptions.ExpiredSignatureError(  # type: ignore[attr-defined]
+                "MCPJWTSigner: incoming token is inactive (introspection returned active=false)"


Wrong exception type for inactive/revoked token

jwt.exceptions.ExpiredSignatureError semantically means the token's exp claim is in the past. Using it here to signal that the introspection endpoint returned active=false conflates two distinct situations — a token can be explicitly revoked or suspended long before its expiry. When this exception is caught in async_pre_call_hook and converted to HTTP 401, the error message will say "expired signature" even for revoked tokens, making production debugging significantly harder.

Use a more semantically accurate exception:

Suggested change

result: Dict[str, Any] = resp.json()

if not result.get("active", False):

raise jwt.exceptions.ExpiredSignatureError( # type: ignore[attr-defined]

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

raise jwt.exceptions.InvalidTokenError(

"MCPJWTSigner: incoming token is inactive (introspection returned active=false)"

)

greptile-apps · 2026-03-18T20:34:09Z

litellm/proxy/guardrails/guardrail_hooks/mcp_jwt_signer/mcp_jwt_signer.py

+                        "Proceeding without incoming token verification."
+                    )
+            except Exception as exc:
+                verbose_proxy_logger.error(
+                    "MCPJWTSigner: incoming token verification failed: %s", exc
+                )
+                from fastapi import HTTPException
+
+                raise HTTPException(
+                    status_code=401,
+                    detail={
+                        "error": (
+                            f"MCPJWTSigner: incoming token verification failed: {exc}"
+                        )
+                    },


Broad except Exception masks infrastructure failures as HTTP 401

The catch-all except Exception converts every error — including network timeouts, DNS failures, HTTP 503 from the JWKS endpoint, and JSON parse errors — into an HTTP 401. From a client's perspective this is indistinguishable from "your token is invalid", making infrastructure outages invisible. If the IdP's JWKS endpoint is temporarily unreachable, every MCP tool call fails with 401 incoming token verification failed: Connection error, which looks to callers like an authentication issue.

Consider narrowing the catch to JWT-specific and auth-specific exceptions, and propagating unexpected infrastructure errors as 502/503:

import httpx try: if is_jwt: jwt_claims = await self._verify_incoming_jwt(raw_token) elif self.token_introspection_endpoint: jwt_claims = await self._introspect_opaque_token(raw_token) ... except jwt.PyJWTError as exc: # token is invalid or signature verification failed raise HTTPException(status_code=401, detail={"error": f"..."}) except httpx.HTTPStatusError as exc: # IdP returned a non-2xx → upstream error, not a client auth error raise HTTPException(status_code=502, detail={"error": f"..."})

greptile-apps · 2026-03-18T20:34:10Z

litellm/setup_wizard.py

+                display = f"azure-{raw_display}" if p["id"] == "azure" else raw_display
+                lines += [
+                    f"  - model_name: {display}",
+                    "    litellm_params:",


Unquoted model_name and model fields in generated YAML

model_name and model values are written as bare YAML scalars without quoting. The Bedrock model bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0 produces model_name: anthropic.claude-3-5-sonnet-20241022-v2:0 — a colon-containing unquoted scalar. While YAML parsers handle v2:0 (no space after :) correctly today, any future model whose display name contains : (colon-space) or starts with special YAML characters ({, [, >, |) would silently produce a broken config.

_yaml_escape already exists in this file. Quote both fields defensively:

Suggested change

display = f"azure-{raw_display}" if p["id"] == "azure" else raw_display

lines += [

f" - model_name: {display}",

" litellm_params:",

lines += [

f' - model_name: "{_yaml_escape(display)}"',

" litellm_params:",

f' model: "{_yaml_escape(model)}"',

]

The Langfuse SDK may send the same event (e.g., trace-create) in multiple flush cycles, causing duplicates when we aggregate from all mock calls. After filtering by trace_id, deduplicate by keeping only the first event of each type, then sort to ensure trace-create is at index 0 and generation-create at index 1. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

yuneng-jiang

lgtm

greptile-apps · 2026-03-18T22:15:46Z

litellm/proxy/_experimental/mcp_server/discoverable_endpoints.py

+                response = {
+                    **response,
+                    "jwks_uri": f"{request_base_url}/.well-known/jwks.json",
+                    "id_token_signing_alg_values_supported": ["RS256"],
+                }


OIDC discovery issuer not updated with signer.issuer

The augmented discovery document adds jwks_uri and id_token_signing_alg_values_supported, but inherits "issuer": request_base_url from the base response. When MCPJWTSigner is configured with a custom issuer (via the issuer: config param or the MCP_JWT_ISSUER env var), the discovery document's issuer field will differ from the iss claim in signed JWTs.

Per RFC 8414 / OIDC Core, relying parties (MCP servers) are expected to verify that a JWT's iss exactly matches the issuer field in the discovery document. A mismatch will cause all tokens to be rejected.

For example, if a user sets issuer: "https://my-litellm.example.com" in the guardrail config but the proxy is running at https://proxy.example.com, the discovery doc will say "issuer": "https://proxy.example.com" while every JWT will have "iss": "https://my-litellm.example.com".

The fix is to include signer.issuer in the augmented response:

response = { **response, "issuer": signer.issuer, "jwks_uri": f"{request_base_url}/.well-known/jwks.json", "id_token_signing_alg_values_supported": ["RS256"], }

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

docs: add Quick Install section for litellm --setup wizard (#23905)

6b2e56f

* docs: add Quick Install section for litellm --setup wizard * docs: clarify setup wizard is for local/beginner use

vercel bot deployed to Preview March 17, 2026 22:18 View deployment

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

vercel bot deployed to Preview March 17, 2026 22:21 View deployment

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

feat(ui): remove Chat UI page link and banner from sidebar and playgr…

2f7dcba

…ound (#23908)

vercel bot deployed to Preview March 17, 2026 22:46 View deployment

greptile-apps bot reviewed Mar 17, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 18, 2026 00:31 Failure

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

vercel bot had a problem deploying to Preview March 18, 2026 01:25 Failure

chore: regenerate poetry.lock to sync with pyproject.toml

05f3ad4

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

vercel bot had a problem deploying to Preview March 18, 2026 01:28 Failure

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

cursoragent and others added 2 commits March 18, 2026 01:40

Merge remote-tracking branch 'origin/main' into litellm_ishaan_march_17

2c02b68

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

fix: format merged files from main and regenerate poetry.lock

f6d53dc

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

vercel bot deployed to Preview March 18, 2026 01:43 View deployment

fix(mypy): annotate jwt_claims as Optional[dict] to fix type incompat…

cffc92b

…ibility Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

vercel bot deployed to Preview March 18, 2026 01:46 View deployment

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

vercel bot deployed to Preview March 18, 2026 02:02 View deployment

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

ci: retry flaky logging_testing (async event loop race condition)

399a120

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

vercel bot deployed to Preview March 18, 2026 02:31 View deployment

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

vercel bot deployed to Preview March 18, 2026 20:08 View deployment

vercel bot deployed to Preview March 18, 2026 20:12 View deployment

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

vercel bot deployed to Preview March 18, 2026 20:23 View deployment

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

vercel bot deployed to Preview March 18, 2026 20:46 View deployment

yuneng-jiang self-requested a review March 18, 2026 22:04

yuneng-jiang approved these changes Mar 18, 2026

View reviewed changes

Merge branch 'main' into litellm_ishaan_march_17

8c23f9f

ishaan-jaff merged commit 8e61b32 into main Mar 18, 2026
31 of 74 checks passed

vercel bot deployed to Preview March 18, 2026 22:14 View deployment

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes


		This detects your OS, installs `litellm[proxy]`, and drops you straight into the setup wizard.


		from fastapi import HTTPException

		missing = [c for c in self.required_claims if not (jwt_claims or {}).get(c)]

	value: Optional[str] = None
	if jwt_claims is not None and claim_name in jwt_claims:
	value = str(jwt_claims[claim_name])

Uh oh!

Conversation

ishaan-jaff commented Mar 17, 2026

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

CLAassistant commented Mar 18, 2026

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

vercel bot commented Mar 17, 2026 •

edited

Loading

codspeed-hq bot commented Mar 17, 2026 •

edited

Loading

greptile-apps bot commented Mar 17, 2026 •

edited

Loading