fix: set oauth2_flow when building MCPServer in _execute_with_mcp_client by joereyna · Pull Request #23468 · BerriAI/litellm

joereyna · 2026-03-12T19:34:48Z

Summary

MCPServer.has_client_credentials requires oauth2_flow == "client_credentials" to be set explicitly
_execute_with_mcp_client was building MCPServer without setting oauth2_flow, so has_client_credentials always returned False
This caused two bugs: M2M tokens were never auto-fetched, and the incoming Authorization header was forwarded instead of being dropped
Fix: set oauth2_flow="client_credentials" when client_id and client_secret are present

Test plan

test_m2m_credentials_forwarded_to_server_model — server.has_client_credentials now returns True
test_m2m_drops_incoming_oauth2_headers — incoming Authorization header is now dropped for M2M servers
make test-unit passes

vercel · 2026-03-12T19:34:54Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 16, 2026 10:38pm

greptile-apps · 2026-03-12T19:37:33Z

Greptile Summary

This PR fixes a bug in _execute_with_mcp_client where MCPServer was built without an oauth2_flow value, causing has_client_credentials to always return False. This blocked M2M token auto-fetching and incorrectly forwarded the incoming Authorization header to downstream MCP servers.

Key changes:

Adds oauth2_flow: Optional[Literal["client_credentials", "authorization_code"]] to NewMCPServerRequest in _types.py, correctly matching the MCPServer field type and allowing callers to opt in explicitly.
Updates _execute_with_mcp_client to prefer the caller-supplied request.oauth2_flow, falling back to auto-detection (client_credentials when client_id, client_secret, and token_url are all present).
Adds a safety guard: if oauth2_flow resolves to "client_credentials" but token_url is absent, it resets to None to avoid dropping the incoming auth header with nothing to replace it.
Un-skips two regression tests (test_m2m_credentials_forwarded_to_server_model, test_m2m_drops_incoming_oauth2_headers) that now correctly exercise the fixed code paths.

Confidence Score: 4/5

Safe to merge — the fix correctly restores M2M OAuth behaviour without breaking other flows, and is covered by the previously-skipped tests.
The logic is sound: explicit oauth2_flow takes priority, the heuristic still requires token_url, and the guard prevents auth header loss. The one minor gap is that the guard silently ignores an explicitly set client_credentials flow without any log entry, making it hard to debug if a caller omits token_url. No breaking changes to existing code paths.
No files require special attention beyond the minor logging gap in rest_endpoints.py at the token_url guard.

Important Files Changed

Filename	Overview
litellm/proxy/_experimental/mcp_server/rest_endpoints.py	Core logic fix: `_execute_with_mcp_client` now correctly propagates `oauth2_flow` to `MCPServer`, with explicit field preference and a safety guard for missing `token_url`. Minor concern: explicit `client_credentials` override is silently downgraded with no log when `token_url` is absent.
litellm/proxy/_types.py	Adds `oauth2_flow: Optional[Literal["client_credentials", "authorization_code"]] = None` to `NewMCPServerRequest`, correctly matching the type used in `MCPServer`. Addresses the previously missing field that prevented explicit M2M opt-in.
tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py	Removes `@pytest.mark.skip` from two previously broken tests. Both use `monkeypatch` to avoid real network calls and correctly assert the fixed behaviour: M2M credentials forwarded to `MCPServer` and incoming `Authorization` header dropped for M2M servers.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_execute_with_mcp_client called] --> B[Extract client_id, client_secret, scopes]
    B --> C{request.oauth2_flow set?}
    C -- Yes --> D[Use request.oauth2_flow]
    C -- No --> E{client_id AND client_secret AND token_url?}
    E -- Yes --> F[_oauth2_flow = 'client_credentials']
    E -- No --> G[_oauth2_flow = None]
    D --> H{Guard: oauth2_flow == 'client_credentials' AND token_url missing?}
    F --> H
    G --> I[Build MCPServer with _oauth2_flow]
    H -- Yes --> J[Reset _oauth2_flow = None - silent downgrade]
    H -- No --> I
    J --> I
    I --> K{server_model.has_client_credentials?}
    K -- True --> L[Drop incoming oauth2_headers - M2M flow]
    K -- False --> M[Forward incoming oauth2_headers]
    L --> N[Create MCP client and run operation]
    M --> N

Comments Outside Diff (1)

litellm/proxy/_experimental/mcp_server/rest_endpoints.py, line 915-916 (link)

Silent override of explicit oauth2_flow without a warning

When a caller explicitly sets oauth2_flow="client_credentials" (via the newly-added NewMCPServerRequest.oauth2_flow field) but omits token_url, the guard silently resets _oauth2_flow to None. The incoming Authorization header is then forwarded as-is, which is correct behaviour — but the caller gets no feedback that their explicitly requested M2M flow was ignored.

Because the whole function body is wrapped in except BaseException, even a ValueError would be swallowed and surface only as the generic "Failed to connect to MCP server" message. A verbose_logger.warning call here would at least produce a diagnosable log entry:
```
if _oauth2_flow == "client_credentials" and not request.token_url:
    verbose_logger.warning(
        "_execute_with_mcp_client: oauth2_flow='client_credentials' requested but "
        "token_url is not set — falling back to forwarding the incoming auth header."
    )
    _oauth2_flow = None
```

_{Last reviewed commit: d58b0a9}

litellm/proxy/_experimental/mcp_server/rest_endpoints.py

litellm/proxy/_types.py

Ensure final finish_reason chunks retain non-OpenAI attributes from original provider chunks, including the holding_chunk flush path where delta is non-empty. Add regression tests for both final-chunk branches. Made-with: Cursor

Made-with: Cursor

…h token_url

21 jobs were using xlarge (8 vCPU, 16GB) despite running trivial or low-parallelism workloads. Downgrades 6 trivial container-check/UI jobs to medium (2 vCPU) and 15 lightweight test jobs to large (4 vCPU). Only the 5 high-parallelism mapped test jobs (-n 8/16) remain on xlarge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Change no_output_timeout from 120m/60m to 15m across all test jobs - Fix multi-line YAML formatting for no_output_timeout entries - Reduce Playwright per-test timeout from 4min to 3min - Add 15s actionTimeout and 30s navigationTimeout to Playwright config Hanging tests now fail in 15min instead of 2hrs, cutting wasted CI time. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Switch from machine VM to docker executor (cimg/node:20.19) - Add npm cache with restore_cache/save_cache - Use npm ci instead of npm install (stop deleting node_modules/package-lock.json) - Enable parallel test execution with --pool forks --maxForks=3 - Bump resource_class to medium+ for 3 vCPUs - Remove unnecessary requires: ui_build dependency Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ui_unit_tests was gated behind ui_build but not required for publishing. This ensures UI unit tests must pass before PyPI publish. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Switch ui_build from machine VM to docker executor (cimg/node:20.19) - Add npm cache (restore_cache/save_cache) keyed on package-lock.json - Use npm ci instead of rm -rf node_modules && npm install - Gate ui_unit_tests behind ui_build (don't run tests if build fails) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…atabase test_bad_database_url only validates DB error handling — it doesn't need the UI. Dockerfile.non_root does a full npm install + next build which is unnecessary overhead for this test. Dockerfile.database skips the UI build (build_admin_ui.sh is a no-op for OSS) making the Docker build much faster. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add restore_cache/save_cache for .next/cache to eliminate the "No build cache found" warning and speed up subsequent Next.js builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

build_and_test, helm_chart_testing, and test_bad_database_url each rebuilt Dockerfile.database from scratch (~5-10 min each) despite build_docker_database_image already building and persisting it to workspace. Now all three load the pre-built image via workspace, eliminating 3 redundant Docker builds. Also removes orphaned test_nonroot_image job definition (not referenced in workflow). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…olation - Router testing: add CircleCI parallelism=4 with timing-based test splitting - Guardrails testing: add pytest-xdist -n 4, suppress DEBUG logs with LITELLM_LOG=WARNING - Rewrite conftest.py in both test dirs for xdist compatibility (save/restore pattern) - Fix module-level Router instances in test_router_fallback_handlers, test_router_custom_routing, test_acooldowns_router Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…is state - test_async_fallbacks_streaming: replace deprecated gpt-3.5-turbo fallback with gpt-4o-mini, fix use of module-level kwargs variable - test_ausage_based_routing_fallbacks: remove Redis dependency to prevent shared state across parallel CI containers (test already uses mock_response) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…al API The test was failing because it depended on real API calls to deprecated models. Now uses mock_response to validate streaming through the router without external dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…allback leak - test_hanging_request_azure: mock httpx.AsyncClient.send to simulate slow response instead of racing real network latency against a 10ms timeout. The old non-existent deployment (gpt-4o-new-test) returned 404 faster than the timeout, causing NotFoundError instead of APITimeoutError. - test_completion_together_ai_llama: update model from deprecated Meta-Llama-3.1-8B-Instruct-Turbo to Llama-3.2-3B-Instruct-Turbo (Together AI removed the old model from serverless). - conftest.py: clear litellm.callbacks list before each test to prevent proxy hooks (SkillsInjectionHook, VirtualKeyModelMaxBudgetLimiter) from leaking across tests via Router initialization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The test relied on a global side effect (customLogger initialization in litellm_logging.py) from prior tests' success callbacks to dispatch failure callbacks. When tests run in parallel by file, no prior test initializes customLogger, so the Router's deployment_callback_on_failure was never invoked and cooldowns were never set. Rewrite to directly call deployment_callback_on_failure with a proper RateLimitError containing retry-after headers, testing the cooldown logic without depending on the logging callback chain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously, stream_chunk_builder only took annotations from the first chunk that contained them, losing any annotations from later chunks. This is a problem because providers like Gemini/Vertex AI send grounding metadata (converted to annotations) in the final streaming chunk, while other providers may spread annotations across multiple chunks. Changes: - Collect and merge annotations from ALL annotation-bearing chunks instead of only using the first one

) * added the header mapping feature * added tests * final cleanup * final cleanup * added missing test and logic * fixed header sending bug * Update litellm/proxy/auth/auth_utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * added back init file in responses + fixed test_auth_utils.py int local_testing --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

test_post_call_rule_streaming in test_rules.py sets litellm.post_call_rules but never cleans up. Since pytest_collection_modifyitems sorts tests by name across modules, the leaked rule causes failures in test_streaming.py, test_register_model.py, and test_sagemaker.py. Add pre_call_rules and post_call_rules to the isolate_litellm_state fixture's save/restore and clear lists. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…atch Fix: Vertex ai Batch Output File Download Fails with 500

docs(blog): add WebRTC blog post link

…header_order Refactor: Filtering beta header after transformation

…al-streaming-attributes fix(streaming): preserve custom attributes on final stream chunk

… issues - Remove duplicate DecodedCharacterId TypedDict from litellm/types/videos/main.py - Remove dead LITELLM_MANAGED_VIDEO_CHARACTER_COMPLETE_STR constant from litellm/types/utils.py - Add FastAPI Form validation for name field in video_create_character endpoint Made-with: Cursor

…n video handlers Add response.raise_for_status() before transform_*_response() calls in all eight video character/edit/extension handler methods (sync and async): - video_create_character_handler / async_video_create_character_handler - video_get_character_handler / async_video_get_character_handler - video_edit_handler / async_video_edit_handler - video_extension_handler / async_video_extension_handler Without these checks, httpx does not raise on 4xx/5xx responses, so provider errors (e.g., 401 Unauthorized) pass directly to Pydantic model constructors, causing ValidationError instead of meaningful HTTP errors. The raise_for_status() ensures the exception handler receives proper HTTPStatusError for translation into actionable messages. Made-with: Cursor

…r in router-first routing Add avideo_create_character and avideo_get_character to the list of video endpoints that use router-first routing when a model is provided (either from decoded IDs or target_model_names). Previously only avideo_edit and avideo_extension were in the router-first block. This ensures both character endpoints benefit from multi-deployment load balancing and model resolution, making them consistent with the other video operations. This allows: - avideo_create_character: Router picks among multiple deployments when target_model_names is set - avideo_get_character: Router assists with multi-model environments for consistency Made-with: Cursor

- Clear examples for SDK and proxy usage - Feature highlights: router support, encoding, error handling - Best practices for character uploads and prompting - Available from LiteLLM v1.83.0+ - Troubleshooting guide for common issues Made-with: Cursor

- Add curl examples for avideo_edit and avideo_extension APIs - Explain how LiteLLM encodes/decodes managed character IDs - Show metadata included in character IDs (provider, model_id) - Detail transparent router-first routing benefits Made-with: Cursor

…tion test Add avideo_create_character, avideo_get_character, avideo_edit, and avideo_extension to the skip condition since Azure video calls don't use initialize_azure_sdk_client. Tests now properly skip with expected behavior instead of failing: - test_ensure_initialize_azure_sdk_client_always_used[avideo_create_character] ✓ - test_ensure_initialize_azure_sdk_client_always_used[avideo_get_character] ✓ - test_ensure_initialize_azure_sdk_client_always_used[avideo_edit] ✓ - test_ensure_initialize_azure_sdk_client_always_used[avideo_extension] ✓ Made-with: Cursor

@AbstractMethod

…sion methods Convert all 8 new video methods from @AbstractMethod to concrete implementations that raise NotImplementedError. This prevents breaking external third-party BaseVideoConfig subclasses at import time. Methods affected: - transform_video_create_character_request/response - transform_video_get_character_request/response - transform_video_edit_request/response - transform_video_extension_request/response External integrators can now upgrade without instantiation errors; NotImplementedError is only raised when operations are actually called on unsupported providers. This restores backward compatibility with the project's policy. Made-with: Cursor

…r-endpoint-fixes [Feat] Add create character endpoints and other new videos Endpoints

…14_2026 Litellm oss staging 03 14 2026

codspeed-hq · 2026-03-16T15:43:07Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing joereyna:fix/mcp-rest-m2m-oauth2-flow (d58b0a9) with main (278c9ba)}

joereyna · 2026-03-16T16:04:53Z

@greptileai please review this pr

joereyna · 2026-03-16T20:10:02Z

@greptileai review this draft

vercel bot deployed to Preview March 12, 2026 19:36 View deployment

greptile-apps bot reviewed Mar 12, 2026

View reviewed changes

litellm/proxy/_experimental/mcp_server/rest_endpoints.py Outdated Show resolved Hide resolved

litellm/proxy/_experimental/mcp_server/rest_endpoints.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview March 13, 2026 01:23 View deployment

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

litellm/proxy/_types.py Outdated Show resolved Hide resolved

Sameerlite added 2 commits March 13, 2026 13:06

docs(blog): add WebRTC blog post link

8f769ef

Made-with: Cursor

vercel bot deployed to Preview March 14, 2026 02:48 View deployment

joereyna added 3 commits March 13, 2026 20:11

fix: set oauth2_flow when building MCPServer in _execute_with_mcp_client

feee689

fix: add oauth2_flow to NewMCPServerRequest and guard auto-detect wit…

377b79a

…h token_url

fix: narrow oauth2_flow type to Literal in NewMCPServerRequest

cd7b258

joereyna force-pushed the fix/mcp-rest-m2m-oauth2-flow branch from 8adc5f6 to cd7b258 Compare March 14, 2026 03:14

vercel bot deployed to Preview March 14, 2026 03:15 View deployment

yuneng-jiang and others added 16 commits March 14, 2026 17:15

Add ui_unit_tests as requirement for publish_to_pypi

45ff66a

ui_unit_tests was gated behind ui_build but not required for publishing. This ensures UI unit tests must pass before PyPI publish. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Cache Next.js build cache for ui_build job

cc80771

Add restore_cache/save_cache for .next/cache to eliminate the "No build cache found" warning and speed up subsequent Next.js builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Sameerlite and others added 19 commits March 16, 2026 17:56

Add new videos transformation

14a691f

Add new videos docs

430f3ac

Merge branch 'main' into litellm_create-character-endpoint-fixes

9beec82

Merge pull request BerriAI#23718 from BerriAI/litellm_fix_vertex_ai_b…

ab377f3

…atch Fix: Vertex ai Batch Output File Download Fails with 500

Merge pull request BerriAI#23547 from Sameerlite/litellm_blog-webrtc

10d5475

docs(blog): add WebRTC blog post link

Merge pull request BerriAI#23715 from BerriAI/litellm_anthropic_beta_…

0bbdd2a

…header_order Refactor: Filtering beta header after transformation

Merge pull request BerriAI#23530 from Sameerlite/litellm_preserve-fin…

b796ee9

…al-streaming-attributes fix(streaming): preserve custom attributes on final stream chunk

Fix docs

32842a5

Fix docs

1255382

Merge pull request BerriAI#23737 from BerriAI/litellm_create-characte…

71dfd01

…r-endpoint-fixes [Feat] Add create character endpoints and other new videos Endpoints

Merge pull request BerriAI#23686 from BerriAI/litellm_oss_staging_03_…

3dccdde

…14_2026 Litellm oss staging 03 14 2026

Merge branch 'main' into fix/mcp-rest-m2m-oauth2-flow

aa744fb

vercel bot deployed to Preview March 16, 2026 15:43 View deployment

fix: remove skip decorators from m2m tests now that oauth2_flow is set

0c17393

vercel bot deployed to Preview March 16, 2026 20:19 View deployment

joereyna self-assigned this Mar 16, 2026

fix: clear oauth2_flow when client_credentials set without token_url

d58b0a9

vercel bot deployed to Preview March 16, 2026 22:38 View deployment

yuneng-jiang changed the base branch from main to litellm_internal_dev_03_16_2026 March 16, 2026 22:55

yuneng-jiang merged commit e4c8f95 into BerriAI:litellm_internal_dev_03_16_2026 Mar 16, 2026
37 of 39 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: set oauth2_flow when building MCPServer in _execute_with_mcp_client#23468

fix: set oauth2_flow when building MCPServer in _execute_with_mcp_client#23468
yuneng-jiang merged 108 commits intoBerriAI:litellm_internal_dev_03_16_2026from
joereyna:fix/mcp-rest-m2m-oauth2-flow

joereyna commented Mar 12, 2026

Uh oh!

vercel bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 12, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 16, 2026 •

edited

Loading

Uh oh!

joereyna commented Mar 16, 2026 •

edited

Loading

Uh oh!

joereyna commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

joereyna commented Mar 12, 2026

Summary

Test plan

Uh oh!

vercel bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

joereyna commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joereyna commented Mar 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vercel bot commented Mar 12, 2026 •

edited

Loading

greptile-apps bot commented Mar 12, 2026 •

edited

Loading

codspeed-hq bot commented Mar 16, 2026 •

edited

Loading

joereyna commented Mar 16, 2026 •

edited

Loading