fix: set oauth2_flow when building MCPServer in _execute_with_mcp_client#23468
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a bug in Key changes:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/proxy/_experimental/mcp_server/rest_endpoints.py | Core logic fix: _execute_with_mcp_client now correctly propagates oauth2_flow to MCPServer, with explicit field preference and a safety guard for missing token_url. Minor concern: explicit client_credentials override is silently downgraded with no log when token_url is absent. |
| litellm/proxy/_types.py | Adds oauth2_flow: Optional[Literal["client_credentials", "authorization_code"]] = None to NewMCPServerRequest, correctly matching the type used in MCPServer. Addresses the previously missing field that prevented explicit M2M opt-in. |
| tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py | Removes @pytest.mark.skip from two previously broken tests. Both use monkeypatch to avoid real network calls and correctly assert the fixed behaviour: M2M credentials forwarded to MCPServer and incoming Authorization header dropped for M2M servers. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[_execute_with_mcp_client called] --> B[Extract client_id, client_secret, scopes]
B --> C{request.oauth2_flow set?}
C -- Yes --> D[Use request.oauth2_flow]
C -- No --> E{client_id AND client_secret AND token_url?}
E -- Yes --> F[_oauth2_flow = 'client_credentials']
E -- No --> G[_oauth2_flow = None]
D --> H{Guard: oauth2_flow == 'client_credentials' AND token_url missing?}
F --> H
G --> I[Build MCPServer with _oauth2_flow]
H -- Yes --> J[Reset _oauth2_flow = None - silent downgrade]
H -- No --> I
J --> I
I --> K{server_model.has_client_credentials?}
K -- True --> L[Drop incoming oauth2_headers - M2M flow]
K -- False --> M[Forward incoming oauth2_headers]
L --> N[Create MCP client and run operation]
M --> N
Comments Outside Diff (1)
-
litellm/proxy/_experimental/mcp_server/rest_endpoints.py, line 915-916 (link)Silent override of explicit
oauth2_flowwithout a warningWhen a caller explicitly sets
oauth2_flow="client_credentials"(via the newly-addedNewMCPServerRequest.oauth2_flowfield) but omitstoken_url, the guard silently resets_oauth2_flowtoNone. The incomingAuthorizationheader is then forwarded as-is, which is correct behaviour — but the caller gets no feedback that their explicitly requested M2M flow was ignored.Because the whole function body is wrapped in
except BaseException, even aValueErrorwould be swallowed and surface only as the generic "Failed to connect to MCP server" message. Averbose_logger.warningcall here would at least produce a diagnosable log entry:if _oauth2_flow == "client_credentials" and not request.token_url: verbose_logger.warning( "_execute_with_mcp_client: oauth2_flow='client_credentials' requested but " "token_url is not set — falling back to forwarding the incoming auth header." ) _oauth2_flow = None
Last reviewed commit: d58b0a9
Ensure final finish_reason chunks retain non-OpenAI attributes from original provider chunks, including the holding_chunk flush path where delta is non-empty. Add regression tests for both final-chunk branches. Made-with: Cursor
Made-with: Cursor
8adc5f6 to
cd7b258
Compare
21 jobs were using xlarge (8 vCPU, 16GB) despite running trivial or low-parallelism workloads. Downgrades 6 trivial container-check/UI jobs to medium (2 vCPU) and 15 lightweight test jobs to large (4 vCPU). Only the 5 high-parallelism mapped test jobs (-n 8/16) remain on xlarge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change no_output_timeout from 120m/60m to 15m across all test jobs - Fix multi-line YAML formatting for no_output_timeout entries - Reduce Playwright per-test timeout from 4min to 3min - Add 15s actionTimeout and 30s navigationTimeout to Playwright config Hanging tests now fail in 15min instead of 2hrs, cutting wasted CI time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch from machine VM to docker executor (cimg/node:20.19) - Add npm cache with restore_cache/save_cache - Use npm ci instead of npm install (stop deleting node_modules/package-lock.json) - Enable parallel test execution with --pool forks --maxForks=3 - Bump resource_class to medium+ for 3 vCPUs - Remove unnecessary requires: ui_build dependency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ui_unit_tests was gated behind ui_build but not required for publishing. This ensures UI unit tests must pass before PyPI publish. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch ui_build from machine VM to docker executor (cimg/node:20.19) - Add npm cache (restore_cache/save_cache) keyed on package-lock.json - Use npm ci instead of rm -rf node_modules && npm install - Gate ui_unit_tests behind ui_build (don't run tests if build fails) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…atabase test_bad_database_url only validates DB error handling — it doesn't need the UI. Dockerfile.non_root does a full npm install + next build which is unnecessary overhead for this test. Dockerfile.database skips the UI build (build_admin_ui.sh is a no-op for OSS) making the Docker build much faster. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add restore_cache/save_cache for .next/cache to eliminate the "No build cache found" warning and speed up subsequent Next.js builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
build_and_test, helm_chart_testing, and test_bad_database_url each rebuilt Dockerfile.database from scratch (~5-10 min each) despite build_docker_database_image already building and persisting it to workspace. Now all three load the pre-built image via workspace, eliminating 3 redundant Docker builds. Also removes orphaned test_nonroot_image job definition (not referenced in workflow). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…olation - Router testing: add CircleCI parallelism=4 with timing-based test splitting - Guardrails testing: add pytest-xdist -n 4, suppress DEBUG logs with LITELLM_LOG=WARNING - Rewrite conftest.py in both test dirs for xdist compatibility (save/restore pattern) - Fix module-level Router instances in test_router_fallback_handlers, test_router_custom_routing, test_acooldowns_router Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…is state - test_async_fallbacks_streaming: replace deprecated gpt-3.5-turbo fallback with gpt-4o-mini, fix use of module-level kwargs variable - test_ausage_based_routing_fallbacks: remove Redis dependency to prevent shared state across parallel CI containers (test already uses mock_response) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…al API The test was failing because it depended on real API calls to deprecated models. Now uses mock_response to validate streaming through the router without external dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…allback leak - test_hanging_request_azure: mock httpx.AsyncClient.send to simulate slow response instead of racing real network latency against a 10ms timeout. The old non-existent deployment (gpt-4o-new-test) returned 404 faster than the timeout, causing NotFoundError instead of APITimeoutError. - test_completion_together_ai_llama: update model from deprecated Meta-Llama-3.1-8B-Instruct-Turbo to Llama-3.2-3B-Instruct-Turbo (Together AI removed the old model from serverless). - conftest.py: clear litellm.callbacks list before each test to prevent proxy hooks (SkillsInjectionHook, VirtualKeyModelMaxBudgetLimiter) from leaking across tests via Router initialization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test relied on a global side effect (customLogger initialization in litellm_logging.py) from prior tests' success callbacks to dispatch failure callbacks. When tests run in parallel by file, no prior test initializes customLogger, so the Router's deployment_callback_on_failure was never invoked and cooldowns were never set. Rewrite to directly call deployment_callback_on_failure with a proper RateLimitError containing retry-after headers, testing the cooldown logic without depending on the logging callback chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, stream_chunk_builder only took annotations from the first chunk that contained them, losing any annotations from later chunks. This is a problem because providers like Gemini/Vertex AI send grounding metadata (converted to annotations) in the final streaming chunk, while other providers may spread annotations across multiple chunks. Changes: - Collect and merge annotations from ALL annotation-bearing chunks instead of only using the first one
) * added the header mapping feature * added tests * final cleanup * final cleanup * added missing test and logic * fixed header sending bug * Update litellm/proxy/auth/auth_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * added back init file in responses + fixed test_auth_utils.py int local_testing --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
test_post_call_rule_streaming in test_rules.py sets litellm.post_call_rules but never cleans up. Since pytest_collection_modifyitems sorts tests by name across modules, the leaked rule causes failures in test_streaming.py, test_register_model.py, and test_sagemaker.py. Add pre_call_rules and post_call_rules to the isolate_litellm_state fixture's save/restore and clear lists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…atch Fix: Vertex ai Batch Output File Download Fails with 500
docs(blog): add WebRTC blog post link
…header_order Refactor: Filtering beta header after transformation
…al-streaming-attributes fix(streaming): preserve custom attributes on final stream chunk
… issues - Remove duplicate DecodedCharacterId TypedDict from litellm/types/videos/main.py - Remove dead LITELLM_MANAGED_VIDEO_CHARACTER_COMPLETE_STR constant from litellm/types/utils.py - Add FastAPI Form validation for name field in video_create_character endpoint Made-with: Cursor
…n video handlers Add response.raise_for_status() before transform_*_response() calls in all eight video character/edit/extension handler methods (sync and async): - video_create_character_handler / async_video_create_character_handler - video_get_character_handler / async_video_get_character_handler - video_edit_handler / async_video_edit_handler - video_extension_handler / async_video_extension_handler Without these checks, httpx does not raise on 4xx/5xx responses, so provider errors (e.g., 401 Unauthorized) pass directly to Pydantic model constructors, causing ValidationError instead of meaningful HTTP errors. The raise_for_status() ensures the exception handler receives proper HTTPStatusError for translation into actionable messages. Made-with: Cursor
…r in router-first routing Add avideo_create_character and avideo_get_character to the list of video endpoints that use router-first routing when a model is provided (either from decoded IDs or target_model_names). Previously only avideo_edit and avideo_extension were in the router-first block. This ensures both character endpoints benefit from multi-deployment load balancing and model resolution, making them consistent with the other video operations. This allows: - avideo_create_character: Router picks among multiple deployments when target_model_names is set - avideo_get_character: Router assists with multi-model environments for consistency Made-with: Cursor
- Clear examples for SDK and proxy usage - Feature highlights: router support, encoding, error handling - Best practices for character uploads and prompting - Available from LiteLLM v1.83.0+ - Troubleshooting guide for common issues Made-with: Cursor
- Add curl examples for avideo_edit and avideo_extension APIs - Explain how LiteLLM encodes/decodes managed character IDs - Show metadata included in character IDs (provider, model_id) - Detail transparent router-first routing benefits Made-with: Cursor
…tion test Add avideo_create_character, avideo_get_character, avideo_edit, and avideo_extension to the skip condition since Azure video calls don't use initialize_azure_sdk_client. Tests now properly skip with expected behavior instead of failing: - test_ensure_initialize_azure_sdk_client_always_used[avideo_create_character] ✓ - test_ensure_initialize_azure_sdk_client_always_used[avideo_get_character] ✓ - test_ensure_initialize_azure_sdk_client_always_used[avideo_edit] ✓ - test_ensure_initialize_azure_sdk_client_always_used[avideo_extension] ✓ Made-with: Cursor
…sion methods Convert all 8 new video methods from @AbstractMethod to concrete implementations that raise NotImplementedError. This prevents breaking external third-party BaseVideoConfig subclasses at import time. Methods affected: - transform_video_create_character_request/response - transform_video_get_character_request/response - transform_video_edit_request/response - transform_video_extension_request/response External integrators can now upgrade without instantiation errors; NotImplementedError is only raised when operations are actually called on unsupported providers. This restores backward compatibility with the project's policy. Made-with: Cursor
…r-endpoint-fixes [Feat] Add create character endpoints and other new videos Endpoints
…14_2026 Litellm oss staging 03 14 2026
|
@greptileai please review this pr |
|
@greptileai review this draft |
e4c8f95
into
BerriAI:litellm_internal_dev_03_16_2026
Summary
MCPServer.has_client_credentialsrequiresoauth2_flow == "client_credentials"to be set explicitly_execute_with_mcp_clientwas buildingMCPServerwithout settingoauth2_flow, sohas_client_credentialsalways returnedFalseAuthorizationheader was forwarded instead of being droppedoauth2_flow="client_credentials"whenclient_idandclient_secretare presentTest plan
test_m2m_credentials_forwarded_to_server_model—server.has_client_credentialsnow returnsTruetest_m2m_drops_incoming_oauth2_headers— incomingAuthorizationheader is now dropped for M2M serversmake test-unitpasses