Skip to content

fix: set oauth2_flow when building MCPServer in _execute_with_mcp_client#23468

Merged
yuneng-jiang merged 108 commits intoBerriAI:litellm_internal_dev_03_16_2026from
joereyna:fix/mcp-rest-m2m-oauth2-flow
Mar 16, 2026
Merged

fix: set oauth2_flow when building MCPServer in _execute_with_mcp_client#23468
yuneng-jiang merged 108 commits intoBerriAI:litellm_internal_dev_03_16_2026from
joereyna:fix/mcp-rest-m2m-oauth2-flow

Conversation

@joereyna
Copy link
Collaborator

Summary

  • MCPServer.has_client_credentials requires oauth2_flow == "client_credentials" to be set explicitly
  • _execute_with_mcp_client was building MCPServer without setting oauth2_flow, so has_client_credentials always returned False
  • This caused two bugs: M2M tokens were never auto-fetched, and the incoming Authorization header was forwarded instead of being dropped
  • Fix: set oauth2_flow="client_credentials" when client_id and client_secret are present

Test plan

  • test_m2m_credentials_forwarded_to_server_modelserver.has_client_credentials now returns True
  • test_m2m_drops_incoming_oauth2_headers — incoming Authorization header is now dropped for M2M servers
  • make test-unit passes

@vercel
Copy link

vercel bot commented Mar 12, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 16, 2026 10:38pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR fixes a bug in _execute_with_mcp_client where MCPServer was built without an oauth2_flow value, causing has_client_credentials to always return False. This blocked M2M token auto-fetching and incorrectly forwarded the incoming Authorization header to downstream MCP servers.

Key changes:

  • Adds oauth2_flow: Optional[Literal["client_credentials", "authorization_code"]] to NewMCPServerRequest in _types.py, correctly matching the MCPServer field type and allowing callers to opt in explicitly.
  • Updates _execute_with_mcp_client to prefer the caller-supplied request.oauth2_flow, falling back to auto-detection (client_credentials when client_id, client_secret, and token_url are all present).
  • Adds a safety guard: if oauth2_flow resolves to "client_credentials" but token_url is absent, it resets to None to avoid dropping the incoming auth header with nothing to replace it.
  • Un-skips two regression tests (test_m2m_credentials_forwarded_to_server_model, test_m2m_drops_incoming_oauth2_headers) that now correctly exercise the fixed code paths.

Confidence Score: 4/5

  • Safe to merge — the fix correctly restores M2M OAuth behaviour without breaking other flows, and is covered by the previously-skipped tests.
  • The logic is sound: explicit oauth2_flow takes priority, the heuristic still requires token_url, and the guard prevents auth header loss. The one minor gap is that the guard silently ignores an explicitly set client_credentials flow without any log entry, making it hard to debug if a caller omits token_url. No breaking changes to existing code paths.
  • No files require special attention beyond the minor logging gap in rest_endpoints.py at the token_url guard.

Important Files Changed

Filename Overview
litellm/proxy/_experimental/mcp_server/rest_endpoints.py Core logic fix: _execute_with_mcp_client now correctly propagates oauth2_flow to MCPServer, with explicit field preference and a safety guard for missing token_url. Minor concern: explicit client_credentials override is silently downgraded with no log when token_url is absent.
litellm/proxy/_types.py Adds oauth2_flow: Optional[Literal["client_credentials", "authorization_code"]] = None to NewMCPServerRequest, correctly matching the type used in MCPServer. Addresses the previously missing field that prevented explicit M2M opt-in.
tests/test_litellm/proxy/_experimental/mcp_server/test_rest_endpoints.py Removes @pytest.mark.skip from two previously broken tests. Both use monkeypatch to avoid real network calls and correctly assert the fixed behaviour: M2M credentials forwarded to MCPServer and incoming Authorization header dropped for M2M servers.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_execute_with_mcp_client called] --> B[Extract client_id, client_secret, scopes]
    B --> C{request.oauth2_flow set?}
    C -- Yes --> D[Use request.oauth2_flow]
    C -- No --> E{client_id AND client_secret AND token_url?}
    E -- Yes --> F[_oauth2_flow = 'client_credentials']
    E -- No --> G[_oauth2_flow = None]
    D --> H{Guard: oauth2_flow == 'client_credentials' AND token_url missing?}
    F --> H
    G --> I[Build MCPServer with _oauth2_flow]
    H -- Yes --> J[Reset _oauth2_flow = None - silent downgrade]
    H -- No --> I
    J --> I
    I --> K{server_model.has_client_credentials?}
    K -- True --> L[Drop incoming oauth2_headers - M2M flow]
    K -- False --> M[Forward incoming oauth2_headers]
    L --> N[Create MCP client and run operation]
    M --> N
Loading

Comments Outside Diff (1)

  1. litellm/proxy/_experimental/mcp_server/rest_endpoints.py, line 915-916 (link)

    Silent override of explicit oauth2_flow without a warning

    When a caller explicitly sets oauth2_flow="client_credentials" (via the newly-added NewMCPServerRequest.oauth2_flow field) but omits token_url, the guard silently resets _oauth2_flow to None. The incoming Authorization header is then forwarded as-is, which is correct behaviour — but the caller gets no feedback that their explicitly requested M2M flow was ignored.

    Because the whole function body is wrapped in except BaseException, even a ValueError would be swallowed and surface only as the generic "Failed to connect to MCP server" message. A verbose_logger.warning call here would at least produce a diagnosable log entry:

    if _oauth2_flow == "client_credentials" and not request.token_url:
        verbose_logger.warning(
            "_execute_with_mcp_client: oauth2_flow='client_credentials' requested but "
            "token_url is not set — falling back to forwarding the incoming auth header."
        )
        _oauth2_flow = None

Last reviewed commit: d58b0a9

Ensure final finish_reason chunks retain non-OpenAI attributes from original provider chunks, including the holding_chunk flush path where delta is non-empty. Add regression tests for both final-chunk branches.

Made-with: Cursor
yuneng-jiang and others added 16 commits March 14, 2026 17:15
21 jobs were using xlarge (8 vCPU, 16GB) despite running trivial or
low-parallelism workloads. Downgrades 6 trivial container-check/UI jobs
to medium (2 vCPU) and 15 lightweight test jobs to large (4 vCPU).
Only the 5 high-parallelism mapped test jobs (-n 8/16) remain on xlarge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change no_output_timeout from 120m/60m to 15m across all test jobs
- Fix multi-line YAML formatting for no_output_timeout entries
- Reduce Playwright per-test timeout from 4min to 3min
- Add 15s actionTimeout and 30s navigationTimeout to Playwright config

Hanging tests now fail in 15min instead of 2hrs, cutting wasted CI time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch from machine VM to docker executor (cimg/node:20.19)
- Add npm cache with restore_cache/save_cache
- Use npm ci instead of npm install (stop deleting node_modules/package-lock.json)
- Enable parallel test execution with --pool forks --maxForks=3
- Bump resource_class to medium+ for 3 vCPUs
- Remove unnecessary requires: ui_build dependency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ui_unit_tests was gated behind ui_build but not required for publishing.
This ensures UI unit tests must pass before PyPI publish.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch ui_build from machine VM to docker executor (cimg/node:20.19)
- Add npm cache (restore_cache/save_cache) keyed on package-lock.json
- Use npm ci instead of rm -rf node_modules && npm install
- Gate ui_unit_tests behind ui_build (don't run tests if build fails)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…atabase

test_bad_database_url only validates DB error handling — it doesn't need
the UI. Dockerfile.non_root does a full npm install + next build which is
unnecessary overhead for this test. Dockerfile.database skips the UI build
(build_admin_ui.sh is a no-op for OSS) making the Docker build much faster.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add restore_cache/save_cache for .next/cache to eliminate the
"No build cache found" warning and speed up subsequent Next.js builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
build_and_test, helm_chart_testing, and test_bad_database_url each rebuilt
Dockerfile.database from scratch (~5-10 min each) despite build_docker_database_image
already building and persisting it to workspace. Now all three load the
pre-built image via workspace, eliminating 3 redundant Docker builds.

Also removes orphaned test_nonroot_image job definition (not referenced in workflow).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…olation

- Router testing: add CircleCI parallelism=4 with timing-based test splitting
- Guardrails testing: add pytest-xdist -n 4, suppress DEBUG logs with LITELLM_LOG=WARNING
- Rewrite conftest.py in both test dirs for xdist compatibility (save/restore pattern)
- Fix module-level Router instances in test_router_fallback_handlers, test_router_custom_routing, test_acooldowns_router

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…is state

- test_async_fallbacks_streaming: replace deprecated gpt-3.5-turbo fallback
  with gpt-4o-mini, fix use of module-level kwargs variable
- test_ausage_based_routing_fallbacks: remove Redis dependency to prevent
  shared state across parallel CI containers (test already uses mock_response)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…al API

The test was failing because it depended on real API calls to deprecated
models. Now uses mock_response to validate streaming through the router
without external dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…allback leak

- test_hanging_request_azure: mock httpx.AsyncClient.send to simulate slow
  response instead of racing real network latency against a 10ms timeout.
  The old non-existent deployment (gpt-4o-new-test) returned 404 faster
  than the timeout, causing NotFoundError instead of APITimeoutError.
- test_completion_together_ai_llama: update model from deprecated
  Meta-Llama-3.1-8B-Instruct-Turbo to Llama-3.2-3B-Instruct-Turbo
  (Together AI removed the old model from serverless).
- conftest.py: clear litellm.callbacks list before each test to prevent
  proxy hooks (SkillsInjectionHook, VirtualKeyModelMaxBudgetLimiter)
  from leaking across tests via Router initialization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test relied on a global side effect (customLogger initialization in
litellm_logging.py) from prior tests' success callbacks to dispatch
failure callbacks. When tests run in parallel by file, no prior test
initializes customLogger, so the Router's deployment_callback_on_failure
was never invoked and cooldowns were never set.

Rewrite to directly call deployment_callback_on_failure with a proper
RateLimitError containing retry-after headers, testing the cooldown
logic without depending on the logging callback chain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously, stream_chunk_builder only took annotations from the first
chunk that contained them, losing any annotations from later chunks.

This is a problem because providers like Gemini/Vertex AI send grounding
metadata (converted to annotations) in the final streaming chunk, while
other providers may spread annotations across multiple chunks.

Changes:
- Collect and merge annotations from ALL annotation-bearing chunks
  instead of only using the first one
)

* added the header mapping feature

* added tests

* final cleanup

* final cleanup

* added missing test and logic

* fixed header sending bug

* Update litellm/proxy/auth/auth_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* added back init file in responses + fixed test_auth_utils.py  int local_testing

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
test_post_call_rule_streaming in test_rules.py sets
litellm.post_call_rules but never cleans up. Since
pytest_collection_modifyitems sorts tests by name across modules,
the leaked rule causes failures in test_streaming.py,
test_register_model.py, and test_sagemaker.py.

Add pre_call_rules and post_call_rules to the isolate_litellm_state
fixture's save/restore and clear lists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sameerlite and others added 19 commits March 16, 2026 17:56
…atch

Fix: Vertex ai Batch Output File Download Fails with 500
…header_order

Refactor: Filtering beta header after transformation
…al-streaming-attributes

fix(streaming): preserve custom attributes on final stream chunk
… issues

- Remove duplicate DecodedCharacterId TypedDict from litellm/types/videos/main.py
- Remove dead LITELLM_MANAGED_VIDEO_CHARACTER_COMPLETE_STR constant from litellm/types/utils.py
- Add FastAPI Form validation for name field in video_create_character endpoint

Made-with: Cursor
…n video handlers

Add response.raise_for_status() before transform_*_response() calls in all eight
video character/edit/extension handler methods (sync and async):

- video_create_character_handler / async_video_create_character_handler
- video_get_character_handler / async_video_get_character_handler
- video_edit_handler / async_video_edit_handler
- video_extension_handler / async_video_extension_handler

Without these checks, httpx does not raise on 4xx/5xx responses, so provider
errors (e.g., 401 Unauthorized) pass directly to Pydantic model constructors,
causing ValidationError instead of meaningful HTTP errors. The raise_for_status()
ensures the exception handler receives proper HTTPStatusError for translation into
actionable messages.

Made-with: Cursor
…r in router-first routing

Add avideo_create_character and avideo_get_character to the list of video endpoints
that use router-first routing when a model is provided (either from decoded IDs or
target_model_names).

Previously only avideo_edit and avideo_extension were in the router-first block.
This ensures both character endpoints benefit from multi-deployment load balancing
and model resolution, making them consistent with the other video operations.

This allows:
- avideo_create_character: Router picks among multiple deployments when target_model_names is set
- avideo_get_character: Router assists with multi-model environments for consistency

Made-with: Cursor
- Clear examples for SDK and proxy usage
- Feature highlights: router support, encoding, error handling
- Best practices for character uploads and prompting
- Available from LiteLLM v1.83.0+
- Troubleshooting guide for common issues

Made-with: Cursor
- Add curl examples for avideo_edit and avideo_extension APIs
- Explain how LiteLLM encodes/decodes managed character IDs
- Show metadata included in character IDs (provider, model_id)
- Detail transparent router-first routing benefits

Made-with: Cursor
…tion test

Add avideo_create_character, avideo_get_character, avideo_edit, and avideo_extension
to the skip condition since Azure video calls don't use initialize_azure_sdk_client.

Tests now properly skip with expected behavior instead of failing:
- test_ensure_initialize_azure_sdk_client_always_used[avideo_create_character] ✓
- test_ensure_initialize_azure_sdk_client_always_used[avideo_get_character] ✓
- test_ensure_initialize_azure_sdk_client_always_used[avideo_edit] ✓
- test_ensure_initialize_azure_sdk_client_always_used[avideo_extension] ✓

Made-with: Cursor
…sion methods

Convert all 8 new video methods from @AbstractMethod to concrete implementations
that raise NotImplementedError. This prevents breaking external third-party
BaseVideoConfig subclasses at import time.

Methods affected:
- transform_video_create_character_request/response
- transform_video_get_character_request/response
- transform_video_edit_request/response
- transform_video_extension_request/response

External integrators can now upgrade without instantiation errors; NotImplementedError
is only raised when operations are actually called on unsupported providers.

This restores backward compatibility with the project's policy.

Made-with: Cursor
…r-endpoint-fixes

[Feat] Add create character endpoints and other new videos Endpoints
@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 16, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing joereyna:fix/mcp-rest-m2m-oauth2-flow (d58b0a9) with main (278c9ba)

Open in CodSpeed

@joereyna
Copy link
Collaborator Author

joereyna commented Mar 16, 2026

@greptileai please review this pr

@joereyna
Copy link
Collaborator Author

@greptileai review this draft

@yuneng-jiang yuneng-jiang changed the base branch from main to litellm_internal_dev_03_16_2026 March 16, 2026 22:55
@yuneng-jiang yuneng-jiang merged commit e4c8f95 into BerriAI:litellm_internal_dev_03_16_2026 Mar 16, 2026
37 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants