[Feat] enable progress notifications for MCP tool calls#19809
Merged
ishaan-jaff merged 3 commits intoBerriAI:mainfrom Jan 26, 2026
Merged
[Feat] enable progress notifications for MCP tool calls#19809ishaan-jaff merged 3 commits intoBerriAI:mainfrom
ishaan-jaff merged 3 commits intoBerriAI:mainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Member
fix merge conflicts pls @houdataali |
Contributor
Author
Done |
f7cd506 to
6c211bc
Compare
krrishdholakia
added a commit
that referenced
this pull request
Feb 5, 2026
* UI: new build * redirect to login on expired jwt * [Feat] UI + Backend - Allow adding policies on Keys/Teams + Viewing on Info panels (#19688) * ui for policy mgmt * test_add_guardrails_from_policy_engine_accepts_dynamic_policies_and_pops_from_data * docs: add litellm-enterprise requirement for managed files (#19689) * Update Gemini 2.0 Flash deprecation dates to March 31, 2026 (#19592) Google announced that Gemini 2.0 Flash and Flash Lite models will be discontinued on March 31, 2026. Updated deprecation_date field for all affected model variants across different providers (vertex_ai, gemini, deepinfra, openrouter, vercel_ai_gateway). Models updated: - gemini-2.0-flash (added deprecation date) - gemini-2.0-flash-001 (updated from 2026-02-05) - gemini-2.0-flash-lite (added deprecation date) - gemini-2.0-flash-lite-001 (updated from 2026-02-25) All variants now correctly reflect the March 31, 2026 shutdown date. * fixing build * Fixing failing tests * deactivating non root tests * fixing arize tests * cache tests serial * fixing circleci config * fixing circleci config * Update OSS Adopters section with new table format * Fixing ruff check * bump: version 1.81.2 → 1.81.3 * chore: update Next.js build artifacts (2026-01-24 17:18 UTC, node v22.16.0) * CI/CD fixes - split local testing * fix: _apply_search_filter_to_models mypy linting * test_partner_models_httpx_streaming * test_web_search * Fix: log duplication when json_logs is enabled (#19705) * fix: FLAKY tests * fix unstable tests * docs fix * docs fix * docs fix * docs fix * docs fix * test_get_default_unvicorn_init_args * fix flaky tests * test_hanging_request_azure * test_team_update_sc_2 * BUMP extras * test fixes * test fixes * test_retrieve_container_basic * Model and Team filtering * TestBedrockInvokeToolSearch * fix(presidio): resolve runtime error by handling asyncio loops in bac… (#19714) * fix(presidio): resolve runtime error by handling asyncio loops in background threads * add test case for thread safety * UI Keys Teams Router Settings docs * chore: update Next.js build artifacts (2026-01-25 00:27 UTC, node v22.16.0) * test_stream_transformation_error_sync * fix patch reliability mock tests * fix MCP tests * auto truncation of virtual keys table values * fix: args issue & refactor into helper function to reduce bloat for both(#19441) * Fix bulk user add * fix(proxy): support slashes in google generateContent model names (#19737) * fix(proxy): support slashes in google route params * fix(proxy): extract google model ids with slashes * test(proxy): cover google model ids with slashes * Fix/non standard mcp url pattern (#19738) * fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272) OAuth discovery endpoints now support both URL patterns: - Standard MCP pattern: /mcp/{server_name} (new) - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible) The standard pattern is required by MCP-compliant clients like mcp-inspector and VSCode Copilot, which expect resource URLs following the /mcp/{server_name} convention per RFC 9728. Changes: - Add _build_oauth_protected_resource_response() helper - Add oauth_protected_resource_mcp_standard() endpoint - Add oauth_authorization_server_mcp_standard() endpoint - Keep legacy endpoints for backward compatibility - Add tests for both URL patterns Fixes #17272 * fix(mcp): Add standard MCP URL pattern support for OAuth discovery (#17272) OAuth discovery endpoints now support both URL patterns: - Standard MCP pattern: /mcp/{server_name} (new) - Legacy LiteLLM pattern: /{server_name}/mcp (backward compatible) The standard pattern is required by MCP-compliant clients like mcp-inspector and VSCode Copilot, which expect resource URLs following the /mcp/{server_name} convention per RFC 9728. Changes: - Add _build_oauth_protected_resource_response() helper - Add oauth_protected_resource_mcp_standard() endpoint - Add oauth_authorization_server_mcp_standard() endpoint - Keep legacy endpoints for backward compatibility - Add tests for both URL patterns Fixes #17272 * Test was relocated * refactor(mcp): Extract helper methods from run_with_session to fix PLR0915 Split the large run_with_session method (55 statements) into smaller helper methods to satisfy ruff's PLR0915 rule (max 50 statements): - _create_transport_context(): Creates transport based on type - _execute_session_operation(): Handles session lifecycle Also changed cleanup exception handling from Exception to BaseException to properly catch asyncio.CancelledError (which is a BaseException subclass in Python 3.8+). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(mcp): Fix flaky test by mocking health_check_server The test_mcp_server_manager_config_integration_with_database test was making real network calls to fake URLs which caused timeouts and CancelledError exceptions. Fixed by mocking health_check_server to return a proper LiteLLM_MCPServerTable object instead of making network calls. * test(mcp): Fix skip condition to properly detect claude model names The skip condition for missing API keys was checking for "anthropic" in the model name, but the test uses "claude-haiku-4-5" which doesn't match. Updated to check for both "anthropic" and "claude" model patterns. Also added skip condition for OpenAI models when OPENAI_API_KEY is not set. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * test(mcp): Fix skip condition to properly detect claude model names The skip condition for missing API keys was checking for "anthropic" in the model name, but the test uses "claude-haiku-4-5" which doesn't match. Updated to check for both "anthropic" and "claude" model patterns. Also added skip condition for OpenAI models when OPENAI_API_KEY is not set. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> * add callbacks and labels to prometheus (#19708) * feat: add clientip and user agent in metrics (#19717) * feat: add clientip and user agent in metrics * fix: lint errors * Add model id and other req labels --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * fix: optimize logo fetching and resolve mcp import blockers (#19719) * feat: tpm-rpm limit in prometheus metrics (#19725) Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> * add timeout to onyx guardrail (#19731) * add timeout to onyx guardrail * add tests * Fix /batches to return encoded ids (from managed objects table) * fix(proxy): use return value from CustomLogger.async_post_call_success_hook (#19670) * fix(proxy): use return value from CustomLogger.async_post_call_success_hook Previously the return value was ignored for CustomLogger callbacks, preventing users from modifying responses. Now the return value is captured and used to replace the response (if not None), consistent with CustomGuardrail and streaming iterator hook behavior. Fixes issue with custom_callbacks not being able to inject data into LLM responses. * fix(proxy): also fix async_post_call_streaming_hook to use return value Previously the streaming hook only used return values that started with "data: " (SSE format). Now any non-None return value is used, consistent with async_post_call_success_hook and streaming iterator hook behavior. Added tests for streaming hook transformation. --------- Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com> * feat(hosted_vllm): support thinking parameter for /v1/messages endpoint Adds support for Anthropic-style 'thinking' parameter in hosted_vllm, converting it to OpenAI-style 'reasoning_effort' since vLLM is OpenAI-compatible. This enables users to use Claude Code CLI with hosted vLLM models like GLM-4.6/4.7 through the /v1/messages endpoint. Mapping (same as Anthropic adapter): - budget_tokens >= 10000 -> "high" - budget_tokens >= 5000 -> "medium" - budget_tokens >= 2000 -> "low" - budget_tokens < 2000 -> "minimal" Fixes #19761 * Fix batch creation to return the input file's expires_at attribute * bump: version 1.81.3 → 1.81.4 (#19793) * fix: server rooth path (#19790) * refactor: extract transport context creation into separate method (#19794) * Fix user max budget reset to unlimited - Added a Pydantic validator to convert empty string inputs for max_budget to None, preventing float parsing errors from the frontend. - Modified the internal user update logic to explicitly allow max_budget to be None, ensuring the value isn't filtered out and can be reset to unlimited in the database. - Added unit tests for validation and logic. Closes #19781 * Make test_get_users_key_count deterministic by creating dedicated test user (#19795) - Create a test user with auto_create_key=False to ensure known starting state - Filter get_users by user_ids to target only the test user - Verify initial key count is 0 before creating a key - Clean up test user after test completes - This ensures consistent behavior across CI and local environments * Add test for Router.get_valid_args, fix router code coverage encoding (#19797) - Add test_get_valid_args in test_router_helper_utils.py to cover get_valid_args - Use encoding='utf-8' in router_code_coverage.py for cross-platform file reads * fix sso email case sensitivity * Fix test_mcp_server_manager_config_integration_with_database cancellation error (#19801) Mock _create_mcp_client to avoid network calls in health checks. This prevents asyncio.CancelledError when the test teardown closes the event loop while health checks are still pending. The test focuses on conversion logic (access_groups, description) not health check functionality, so mocking the network call is appropriate. * fix: make HTTPHandler mockable in OIDC secret manager tests (#19803) * fix: make HTTPHandler mockable in OIDC secret manager tests - Add _get_oidc_http_handler() factory function to make HTTPHandler easily mockable in tests - Update test_oidc_github_success to patch factory function instead of HTTPHandler directly - Update Google OIDC tests for consistency - Fixes test_oidc_github_success failure where mock was bypassed This change allows tests to properly mock HTTPHandler instances used for OIDC token requests, fixing the test failure where the mock was not being used. * fix: patch base_llm_http_handler method directly in container tests - Use patch.object to patch container_create_handler method directly on the base_llm_http_handler instance instead of patching the module - Fixes test_provider_support[openai] failure where mock wasn't applied - Also fixes test_error_handling_integration with same approach The issue was that patching 'litellm.containers.main.base_llm_http_handler' didn't work because the module imports it with 'from litellm.main import', creating a local reference. Using patch.object patches the method on the actual object instance, which works regardless of import style. * fix: resolve flaky test_openai_env_base by clearing cache - Add cache clearing at start of test_openai_env_base to prevent cache pollution - Ensures no cached clients from previous tests interfere with respx mocks - Fixes intermittent failures where aiohttp transport was used instead of httpx - Test-only change with low risk, no production code modifications Resolves flaky test marked with @pytest.mark.flaky(retries=3, delay=1) Both parametrized versions (OPENAI_API_BASE and OPENAI_BASE_URL) now pass consistently * test: add explicit mock verification in test_provider_support - Capture mock handler with 'as mock_handler' for explicit validation - Add assert_called_once() to verify mock was actually used - Ensures test verifies no real API calls are made - Follows same pattern as test_openai_env_base validation * Add light/dark mode slider for dev * fix key duration input * Messages api bedrock converse caching and pdf support (#19785) * cache control for user messages and system messages * add cache createion tokens in reponse * cache controls in tool calls and assistant turns * refactor with _should_preserve_cache_control * add cache control unit tests * use simpler cache creation token count logic * use helper function * remove unused function * fix unit tests * fixing team member add * [Feat] enable progress notifications for MCP tool calls (#19809) * enable progress notifications for MCP tool calls * adjust mcp test * [Feat] CLI Auth - Add configurable CLI JWT expiration via environment variable (#19780) * fix: add CLI_JWT_EXPIRATION_HOURS * docs: CLI_JWT_EXPIRATION_HOURS * fix: get_cli_jwt_auth_token * test_get_cli_jwt_auth_token_custom_expiration * fixing flaky tests around oidc and email * Add dont ask me again option in nudges * CI/CD: Increase retries and stabilize litellm_mapped_tests_core (#19826) * Fix PLR0915: Extract system message handling to reduce statement count * fix mypy * fix: add host_progress_callback parameter to mock_call_tool in test The test_call_tool_without_broken_pipe_error was failing because the mock function did not accept the host_progress_callback keyword argument that the actual implementation passes to client.call_tool(). Updated the mock to accept this parameter to match the real implementation signature. * fixing flaky tests around oidc and email * Add documentation comment to test file * add retry * add dependency * increase retry --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> * Fix broken mocks in 6 flaky tests to prevent real API calls (#19829) * Fix broken mocks in 6 flaky tests to prevent real API calls Added network-level HTTP blocking using respx to prevent tests from making real API calls when Python-level mocks fail. This makes tests more reliable and retryable in CI. Changes: - Azure OIDC test: Added Azure Identity SDK mock to prevent real Azure calls - Vector store test: Added @respx.mock decorator to block HTTP requests - Resend email tests (3): Added @respx.mock decorator for all 3 test functions - SendGrid email test: Added @respx.mock decorator All test assertions and verification logic remain unchanged - only added safety nets to catch leaked API calls. * Fix failing OIDC secret manager tests Fixed two test failures in test_secret_managers_main.py: 1. test_oidc_azure_ad_token_success: Corrected the patch path for get_bearer_token_provider from 'litellm.secret_managers.get_azure_ad_token_provider.get_bearer_token_provider' to 'azure.identity.get_bearer_token_provider' since the function is imported from azure.identity. 2. test_oidc_google_success: Added @patch('httpx.Client') decorator to prevent any real HTTP connections during test execution, resolving httpx.ConnectError issues. Both tests now pass successfully. * Adding tests: * fixing breaking change: just user_id provided should upsert still * Fix: A2A Python SDK URL * [Feat] Add UI for /rag/ingest API - upload docs, pdfs etc to create vector stores (#19822) * feat: _save_vector_store_to_db_from_rag_ingest * UI features for RAG ingest * fix: Endpoints * ragIngestCall * _save_vector_store_to_db_from_rag_ingest * fix: rag_ingest Code QA CHECK * UI fixes unit tests * docs(readme): add OpenAI Agents SDK to OSS Adopters (#19820) * docs(readme): add OpenAI Agents SDK to OSS Adopters * docs(readme): add OpenAI Agents SDK logo * Fixing tests * Litellm release notes 01 26 2026 (#19836) * docs: document new models/endpoints * docs: cleanup * feat: update model table * fixing tests * Litellm release notes 01 26 2026 (#19838) * docs: document new models/endpoints * docs: cleanup * feat: update model table * fix: cleanup * feat: Add model_id label to Prometheus metrics (#18048) (#19678) Co-authored-by: Cursor Agent <cursoragent@cursor.com> * fix(models): set gpt-5.2-codex mode to responses for Azure and OpenRouter (#19770) Fixes #19754 The gpt-5.2-codex model only supports the responses API, not chat completions. Updated azure/gpt-5.2-codex and openrouter/openai/gpt-5.2-codex entries to use mode: "responses" and supported_endpoints: ["/v1/responses"]. * fix(responses): update local_vars with detected provider (#19782) (#19798) When using the responses API with provider-specific params (aws_*, vertex_*) without explicitly passing custom_llm_provider, the code crashed with: AttributeError: 'NoneType' object has no attribute 'startswith' Root cause: local_vars was captured via locals() before get_llm_provider() detected the provider from the model string (e.g., "bedrock/..."), so custom_llm_provider remained None when processing provider-specific params. Fix: Update local_vars["custom_llm_provider"] after get_llm_provider() call so the detected provider is available for param processing. Affected provider-specific params: - aws_* (aws_region_name, aws_access_key_id, etc.) for Bedrock/SageMaker - vertex_* (vertex_project, vertex_location, etc.) for Vertex AI * fix(azure): use generic cost calculator for audio token pricing (#19771) Azure audio models were charging audio output tokens at the text token rate instead of the correct audio token rate. This resulted in costs being ~6.65x lower than expected. The fix replaces Azure's custom cost calculation logic with the generic cost calculator that properly handles text, audio, cached, reasoning, and image tokens. Fixes #19764 * fix(xai): correct cached token cost calculation for xAI models (#19772) * fix(azure): use generic cost calculator for audio token pricing Azure audio models were charging audio output tokens at the text token rate instead of the correct audio token rate. This resulted in costs being ~6.65x lower than expected. The fix replaces Azure's custom cost calculation logic with the generic cost calculator that properly handles text, audio, cached, reasoning, and image tokens. Fixes #19764 * fix(xai): correct cached token cost calculation for xAI models - Fix double-counting issue where xAI reports text_tokens = prompt_tokens (including cached), causing tokens to be charged twice - Add cache_read_input_token_cost to xAI grok-3 and grok-3-mini model variants - Detection: when text_tokens + cached_tokens > prompt_tokens, recalculate text_tokens = prompt_tokens - cached_tokens xAI pricing (25% of input for cached): - grok-3 variants: $0.75/M cached (input $3/M) - grok-3-mini variants: $0.075/M cached (input $0.30/M) * Fix:Support both JSON array format and comma-separated values from user headers * Translate advanced-tool-use to Bedrock-specific headers for Claude Opus 4.5 * fix: token calculations and refactor (#19696) * fix(prometheus): safely handle None metadata in logging to prevent At… (#19691) * fix(prometheus): safely handle None metadata in logging to prevent AttributeError * fix: lint issues * fix: resolve 'does not exist' migration errors as applied in setup_database (#19281) * Fix: timeout exception raised eror * Add sarvam doc * Add gemini-robotics-er-1.5-preview model in model map * Add gemini-robotics-er-1.5-preview model documentation * Fix: Stream the download in chunks * Add grok reasoning content * Revert poetry lock * Fix mypy and code quality issues * feat: add feature to make silent calls (#19544) * feat: add feature to make silent calls * add test or silent feat * add docs for silent feat * fix lint issues and UI logs * add docs of ab testing and deep copy * fix(enterprise): correct error message for DISABLE_ADMIN_ENDPOINTS (#19861) The error message for DISABLE_ADMIN_ENDPOINTS incorrectly said "DISABLING LLM API ENDPOINTS is an Enterprise feature" instead of "DISABLING ADMIN ENDPOINTS is an Enterprise feature". This was a copy-paste bug from the is_llm_api_route_disabled() function. Added regression tests to verify both error messages are correct. * fix(proxy): handle agent parameter in /interactions endpoint (#19866) * initialize tiktoken environment at import time to support offline usage * fix(bedrock): support tool search header translation for Sonnet 4.5 (#19871) Extend advanced-tool-use header translation to include Claude Sonnet 4.5 in addition to Opus 4.5 on Bedrock Invoke API. When Claude Code sends the advanced-tool-use-2025-11-20 header, it now gets correctly translated to Bedrock-specific headers for both: - Claude Opus 4.5 - Claude Sonnet 4.5 Headers translated: - tool-search-tool-2025-10-19 - tool-examples-2025-10-29 Fixes defer_loading validation error on Bedrock with Sonnet 4.5. Ref: https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool * bulk update keys endpoint * mypy linting * [Feat] RAG API - Add support for using s3 Vectors as Vector Store Provider for /rag/ingest (#19888) * init S3VectorsRAGIngestion as a supported ingestion provider for RAG API * test: TestRAGS3Vectors * init S3VectorsVectorStoreOptions * init s3 vectors * code clean up + QA * fix: get_credentials * S3VectorsRAGIngestion * TestRAGS3Vectors * docs: AWS S3 Vectors * add asyncio QA checks * fix: S3_VECTORS_DEFAULT_DIMENSION * Add native_background_mode to override polling_via_cache for specific models This follow-up to PR #16862 allows users to specify models that should use the native provider's background mode instead of polling via cache. Config example: litellm_settings: responses: background_mode: polling_via_cache: ["openai"] native_background_mode: ["o4-mini-deep-research"] ttl: 3600 When a model is in native_background_mode list, should_use_polling_for_request returns False, allowing the request to fall through to native provider handling. Committed-By-Agent: cursor * [Feat] RAG API - Add s3_vectors as provider on /vector_store/search API + UI for creating + PDF support for /rag/ingest (#19895) * init S3VectorsRAGIngestion as a supported ingestion provider for RAG API * test: TestRAGS3Vectors * init S3VectorsVectorStoreOptions * init s3 vectors * code clean up + QA * fix: get_credentials * S3VectorsRAGIngestion * TestRAGS3Vectors * docs: AWS S3 Vectors * add asyncio QA checks * fix: S3_VECTORS_DEFAULT_DIMENSION * init ui for bedrock s3 vectors * fix add /search support for s3_vectors * init atransform_search_vector_store_request * feat: S3VectorsVectorStoreConfig * TestS3VectorsVectorStoreConfig * atransform_search_vector_store_request * fix: S3VectorsVectorStoreConfig * add validation for bucket name etd * fix UI validation for s3 vector store * init extract_text_from_pdf * add pypdf * fix code QA checks * fix navbar * init s3_vector.png * fix QA code * Add tests for native_background_mode feature Added 8 new unit tests for the native_background_mode feature: - test_polling_disabled_when_model_in_native_background_mode - test_polling_disabled_for_native_background_mode_with_provider_list - test_polling_enabled_when_model_not_in_native_background_mode - test_polling_enabled_when_native_background_mode_is_none - test_polling_enabled_when_native_background_mode_is_empty_list - test_native_background_mode_exact_match_required - test_native_background_mode_with_provider_prefix_in_request - test_native_background_mode_with_router_lookup Committed-By-Agent: cursor * add sortBy and sortOrder params for /v2/model/info * ruff check * Fixing UI tests * test(proxy): add regression tests for vertex passthrough model names with slashes (#19855) Added test cases for custom model names containing slashes in Vertex AI passthrough URLs (e.g., gcp/google/gemini-2.5-flash). Test cases: - gcp/google/gemini-2.5-flash - gcp/google/gemini-3-flash-preview - custom/model * fix: guardrails issues streaming-response regex (#19901) * fix: add fix for migration issue and and stable linux debain (#19843) * fix: filter unsupported beta headers for Bedrock Invoke API (#19877) - Add whitelist-based filtering for anthropic_beta headers - Only allow Bedrock-supported beta flags (computer-use, tool-search, etc.) - Filter out unsupported flags like mcp-servers, structured-outputs - Remove output_format parameter from Bedrock Invoke requests - Force tool-based structured outputs when response_format is used Fixes #16726 * fix: allow tool_choice for Azure GPT-5 chat models (#19813) * fix: don't treat gpt-5-chat as GPT-5 reasoning * fix: mark azure gpt-5-chat as supporting tool_choice * test: cover gpt-5-chat params on azure/openai * fix: tool with antropic #19800 (#19805) * All Models Page server side sorting * Add Init Containers in the community helm chart (#19816) * docs: fix guardrail logging docs (#19833) * Fixing build and tests * inspect BadRequestError after all other policy types (#19878) As indicated by https://docs.litellm.ai/docs/exception_mapping, BadRequestError is used as the base type for multiple exceptions. As such, it should be tested last in handling retry policies. This updates the integration test that validates retry policies work as expected. Fixes #19876 * fix(main): use local tiktoken cache in lazy loading (#19774) The lazy loading implementation for encoding in __getattr__ was calling tiktoken.get_encoding() directly without first setting TIKTOKEN_CACHE_DIR. This caused tiktoken to attempt downloading the encoding file from the internet instead of using the local copy bundled with litellm. This fix uses _get_default_encoding() from _lazy_imports which properly sets TIKTOKEN_CACHE_DIR before loading tiktoken, ensuring the local cache is used. * fix(gemini): subtract implicit cached tokens from text_tokens for correct cost calculation (#19775) When Gemini uses implicit caching, it returns cachedContentTokenCount but NOT cacheTokensDetails. Previously, text_tokens was not adjusted in this case, causing costs to be calculated as if all tokens were non-cached. This fix subtracts cachedContentTokenCount from text_tokens when no cacheTokensDetails is present (implicit caching), ensuring correct cost calculation with the reduced cache_read pricing. * [Feat] UI: Allow Admins to control what pages are visible on LeftNav (#19907) * feat: enabled_ui_pages_internal_users * init ui for internal user controsl * fix ui settings * fix build * fix leftnav * fix leftnav * test fixes * fix leftnav * isPageAccessibleToInternalUsers * docs fix * docs ui viz * Add xai websearch params support * Allow dynamic setting of store_prompts_in_spend_logs * Fix: output_tokens_details.reasoning_tokens None * fix: Pydantic will fail to parse it because cached_tokens is required but not provided * Spend logs setting modal * adding tests * fix(anthropic): remove explicit cache_control null in tool_result content Fixes issue where tool_result content blocks include explicit 'cache_control': null which breaks some Anthropic API channels. Changes: - Only include cache_control field when explicitly set and not None - Prevents serialization of null values in tool_result text content - Maintains backward compatibility with existing cache_control usage Related issue: Anthropic tool_result conversion adds explicit null values that cause compatibility issues with certain API implementations. Co-Authored-By: Claude (claude-4.5-sonnet) <noreply@anthropic.com> * Fixing tests * Add Prompt caching and reasoning support for MiniMax, GLM, Xiaomi * Fix test_calculate_usage_completion_tokens_details_always_populated and logging object test * Fix gemini-robotics-er-1.5-preview name * Fix gemini-robotics-er-1.5-preview name * Fix team cli auth flow (#19666) * Cleanup code for user cli auth, and make sure not to prompt user for team multiple times while polling * Adding tests * Cleanup normalize teams some more * fix(vertex_ai): support model names with slashes in passthrough URLs (#19944) The regex in get_vertex_model_id_from_url() was using [^/:]+ which stopped at the first slash, truncating model names like 'gcp/google/gemini-2.5-flash' to just 'gcp'. This caused access_groups checks to fail for custom model names. Changed the pattern to [^:]+ to allow slashes in model names, only stopping at the colon before the action (e.g., :generateContent). * Fix thread leak in OpenTelemetry dynamic header path (#19946) * UI: New build * breakdown by team and keys * Adding test * Fixing build * fix pypdf: >=6.6.2 * [Fix] A2a Gateway - Allow supporting old A2a card formats (#19949) * fix: LiteLLMA2ACardResolver * fix: LiteLLMA2ACardResolver * feat: .well-known/agent.json * test_card_resolver_fallback_from_new_to_old_path * Add error_message search in spend logs endpoint * Adding Error message search to ui spend logs * fix * fix(presidio): reuse HTTP connections to prevent OOMs (#19964) * [Fix] VertexAI Pass through - fix regression that caused vertex ai passthroughs to stop working for router models (#19967) * fix(vertex_ai): replace custom model names with actual Vertex AI model names in passthrough URLs (#19948) When the passthrough URL already contains project and location, the code was skipping the deployment lookup and forwarding the URL as-is to Vertex AI. For custom model names like gcp/google/gemini-2.5-flash, Vertex AI returned 404 because it only knows the actual model name (gemini-2.5-flash). The fix makes the deployment lookup always run, so the custom model name gets replaced with the actual Vertex AI model name before forwarding. * add _resolve_vertex_model_from_router * fix: get_llm_provider * Potential fix for code scanning alert no. 4020: Clear-text logging of sensitive information Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> * Reusable Table Sort Component * Fixing sorting API calls * [Release Day] - Fixed CI/CD issues & changed processes (#19902) * [Feat] - Search API add /list endpoint to list what search tools exist in router (#19969) * feat: List all available search tools configured in the router. * add debugging search API * add debugging search API * fixing sorting for v2/model/info * [Feat] LiteLLM Vector Stores - Add permission management for users, teams (#19972) * fix: create_vector_store_in_db * add team/user to LiteLLM_ManagedVectorStore * add _check_vector_store_access * add new fields * test_check_vector_store_access * add vector_store/list endpoints * fix code QA checks * feat: Add new OpenRouter models: `xiaomi/mimo-v2-flash`, `z-ai/glm-4.7`, `z-ai/glm-4.7-flash`, and `minimax/minimax-m2.1`. to model prices and context window (#19938) Co-authored-by: Rushil Chugh <Rushil> * fix gemini gemini-robotics-er-1.5-preview entry * removing _experimental out routes from gitignore * chore: update Next.js build artifacts (2026-01-29 04:12 UTC, node v22.16.0) * Add custom_llm_provider as gemini translation * Add test to check if model map is corretly formatted * Intentional bad model map * Add Validate model_prices_and_context_window.json job * Remove validate job from lint * Intentional bad model map * Intentional bad model map * Correct model map path * Fix: litellm_fix_robotic_model_map_entry * fix(mypy): fix type: ignore placement for OTEL LogRecord import The type: ignore[attr-defined] comment was on the import alias line inside parentheses, but mypy reports the error on the `from` line. Collapse to single-line imports so the suppression is on the correct line. Also add no-redef to the fallback branch. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Cesar Garcia <128240629+Chesars@users.noreply.github.com> Co-authored-by: Yogeshwaran Ravichandran <96047771+yogeshwaran10@users.noreply.github.com> Co-authored-by: Alexsander Hamir <alexsanderhamirgomesbaptista@gmail.com> Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Tamir Kiviti <95572081+tamirkiviti13@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Gabriele Michelli <michelligabriele0@gmail.com> Co-authored-by: Chesars <cesarponce19544@gmail.com> Co-authored-by: yogeshwaran10 <ywaran646@gmail.com> Co-authored-by: colinlin-stripe <colinlin@stripe.com> Co-authored-by: houdataali <84786211+houdataali@users.noreply.github.com> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Milan <milan@berri.ai> Co-authored-by: Xianzong Xie <xianzongxie@stripe.com> Co-authored-by: Teo Stocco <zifeo@users.noreply.github.com> Co-authored-by: Pragya Sardana <pragyasardana@gmail.com> Co-authored-by: Ryan Wilson <84201908+ryewilson@users.noreply.github.com> Co-authored-by: Brian Caswell <bcaswell@microsoft.com> Co-authored-by: lizhen <lizhen10763@autohome.com.cn> Co-authored-by: boarder7395 <37314943+boarder7395@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Co-authored-by: rushilchugh01 <58689126+rushilchugh01@users.noreply.github.com>
6 tasks
pdecat
reviewed
Feb 16, 2026
| json_response=True, # Use JSON responses instead of SSE by default | ||
| stateless=True, | ||
| json_response=False, # enables SSE streaming | ||
| stateless=False, # enables session state |
There was a problem hiding this comment.
Hi @houdataali, this change is causing issues, was it mandatory?
PTAL #20242 (comment)
🙏
michelligabriele
added a commit
to michelligabriele/litellm
that referenced
this pull request
Feb 16, 2026
PR BerriAI#19809 changed stateless=True to stateless=False to enable progress notifications for MCP tool calls. This caused the mcp library to enforce mcp-session-id headers on all non-initialize requests, breaking MCP Inspector, curl, and any client without automatic session management. Revert to stateless=True to restore compatibility with all MCP clients. The progress notification code already handles missing sessions gracefully (defensive checks + try/except), so no other changes are needed. Fixes BerriAI#20242
7 tasks
krrishdholakia
pushed a commit
that referenced
this pull request
Feb 16, 2026
PR #19809 changed stateless=True to stateless=False to enable progress notifications for MCP tool calls. This caused the mcp library to enforce mcp-session-id headers on all non-initialize requests, breaking MCP Inspector, curl, and any client without automatic session management. Revert to stateless=True to restore compatibility with all MCP clients. The progress notification code already handles missing sessions gracefully (defensive checks + try/except), so no other changes are needed. Fixes #20242
krrishdholakia
added a commit
that referenced
this pull request
Feb 16, 2026
* fix: SSO PKCE support fails in multi-pod Kubernetes deployments * fix: virutal key grace period from env/UI * fix: refactor, race condition handle, fstring sql injection * fix: add async call to avoid server pauses * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: add await in tests * add modify test to perform async run * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix grace period with better error handling on frontend and as per best practices * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: as per request changes * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Fix errors when callbacks are invoked for file delete operations: * Fix errors when callbacks are invoked for file operations * Fix: pass deployment credentials to afile_retrieve in managed_files post-call hook * Fix: bypass managed files access check in batch polling by calling afile_content directly * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: afile_retrieve returns unified ID for batch output files * fix: batch retrieve returns unified input_file_id * fix(chatgpt): drop unsupported responses params for Codex Co-authored-by: Cursor <cursoragent@cursor.com> * test(chatgpt): ensure Codex request filters unsupported params Co-authored-by: Cursor <cursoragent@cursor.com> * Fix deleted managed files returning 403 instead of 404 * Add comments * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: thread deployment model_info through batch cost calculation batch_cost_calculator only checked the global cost map, ignoring deployment-level custom pricing (input_cost_per_token_batches etc.). Add optional model_info param through the batch cost chain and pass it from CheckBatchCost. * fix(deps): add pytest-postgresql for db schema migration tests The test_db_schema_migration.py test requires pytest-postgresql but it was missing from dependencies, causing import errors: ModuleNotFoundError: No module named 'pytest_postgresql' Added pytest-postgresql ^6.0.0 to dev dependencies to fix test collection errors in proxy_unit_tests. This is a pre-existing issue, not related to PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): replace caplog with custom handler for parallel execution The cost calculation log level tests were failing when run with pytest-xdist parallel execution because caplog doesn't work reliably across worker processes. This causes "ValueError: I/O operation on closed file" errors. Solution: Replace caplog fixture with a custom LogRecordHandler that directly attaches to the logger. This approach works correctly in parallel execution because each worker process has its own handler instance. Fixes test failures in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): correct async mock for video generation logging test The test was failing with AuthenticationError because the mock wasn't intercepting the actual HTTP handler calls. This caused real API calls with no API key, resulting in 401 errors. Root cause: The test was patching the wrong target using string path 'litellm.videos.main.base_llm_http_handler' instead of using patch.object on the actual handler instance. Additionally, it was mocking the sync method instead of async_video_generation_handler. Solution: Use patch.object with side_effect pattern on the correct async handler method, following the same pattern used in test_video_generation_async(). Fixes test failure in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): add cleanup fixture and no_parallel mark for MCP tests Two MCP server tests were failing when run with pytest-xdist parallel execution (--dist=loadscope): - test_mcp_routing_with_conflicting_alias_and_group_name - test_oauth2_headers_passed_to_mcp_client Both tests showed assertion failures where mocks weren't being called (0 times instead of expected 1 time). Root cause: These tests rely on global_mcp_server_manager singleton state and complex async mocking that doesn't work reliably with parallel execution. Each worker process can have different state and patches may not apply correctly. Solution: 1. Added autouse fixture to clean up global_mcp_server_manager registry before and after each test for better isolation 2. Added @pytest.mark.no_parallel to these specific tests to ensure they run sequentially, avoiding parallel execution issues This approach maintains test reliability while allowing other tests in the file to still benefit from parallelization. Fixes test failures exposed by PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Regenerate poetry.lock with Poetry 2.3.2 Updated lock file to use Poetry 2.3.2 (matching main branch standard). This addresses Greptile feedback about Poetry version mismatch. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove unused pytest import and add trailing newline - Removed unused pytest import (caplog fixture was removed) - Added missing trailing newline at end of file Addresses Greptile feedback (minor style issues). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove redundant import inside test method The module litellm.videos.main is already imported at the top of the file (line 21), so the import inside the test method is redundant. Addresses Greptile feedback (minor style issue). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix converse anthropic usage object according to v1/messages specs * Add routing based on if reasoning is supported or not * add fireworks_ai/accounts/fireworks/models/kimi-k2p5 in model map * Removed stray .md file * fix(bedrock): clamp thinking.budget_tokens to minimum 1024 Bedrock rejects thinking.budget_tokens values below 1024 with a 400 error. This adds automatic clamping in the LiteLLM transformation layer so callers (e.g. router with reasoning_effort="low") don't need to know about the provider-specific minimum. Fixes #21297 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve Langfuse test isolation to prevent flaky failures (#21093) The test was creating fresh mocks but not fully isolating from setUp state, causing intermittent CI failures with 'Expected generation to be called once. Called 0 times.' Instead of creating fresh mocks, properly reset the existing setUp mocks to ensure clean state while maintaining proper mock chain configuration. * feat(s3): add support for virtual-hosted-style URLs (#21094) Add s3_use_virtual_hosted_style parameter to support AWS S3 virtual-hosted-style URL format (bucket.endpoint/key) alongside the existing path-style format (endpoint/bucket/key). This enables compatibility with S3-compatible services like MinIO and aligns with AWS S3 official terminology. * Addressed greptile comments to extract common helpers and return 404 * Allow effort="max" for Claude Opus 4.6 (#21112) * fix(aiohttp): prevent closing shared ClientSession in AiohttpTransport (#21117) When a shared ClientSession is passed to LiteLLMAiohttpTransport, calling aclose() on the transport would close the shared session, breaking other clients still using it. Add owns_session parameter (default True for backwards compatibility) to AiohttpTransport and LiteLLMAiohttpTransport. When a shared session is provided in http_handler.py, owns_session=False is set to prevent the transport from closing a session it does not own. This aligns AiohttpTransport with the ownership pattern already used in AiohttpHandler (aiohttp_handler.py). * perf(spend): avoid duplicate daily agent transaction computation (#21187) * fix: proxy/batches_endpoints/endpoints.py:309:11: PLR0915 Too many statements (54 > 50) * fix mypy * Add doc for OpenAI Agents SDK with LiteLLM * Add doc for OpenAI Agents SDK with LiteLLM * Update docs/my-website/sidebars.js Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix mypy * Update tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add blog fffor Managing Anthropic Beta Headers * Add blog fffor Managing Anthropic Beta Headers * correct the time * Fix: Exclude tool params for models without function calling support (#21125) (#21244) * Fix tool params reported as supported for models without function calling (#21125) JSON-configured providers (e.g. PublicAI) inherited all OpenAI params including tools, tool_choice, function_call, and functions — even for models that don't support function calling. This caused an inconsistency where get_supported_openai_params included "tools" but supports_function_calling returned False. The fix checks supports_function_calling in the dynamic config's get_supported_openai_params and removes tool-related params when the model doesn't support it. Follows the same pattern used by OVHCloud and Fireworks AI providers. * Style: move verbose_logger to module-level import, remove redundant try/except Address review feedback from Greptile bot: - Move verbose_logger import to top-level (matches project convention) - Remove redundant try/except around supports_function_calling() since it already handles exceptions internally via _supports_factory() * fix(index.md): cleanup str * fix(proxy): handle missing DATABASE_URL in append_query_params (#21239) * fix: handle missing database url in append_query_params * Update litellm/proxy/proxy_cli.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(mcp): revert StreamableHTTPSessionManager to stateless mode (#21323) PR #19809 changed stateless=True to stateless=False to enable progress notifications for MCP tool calls. This caused the mcp library to enforce mcp-session-id headers on all non-initialize requests, breaking MCP Inspector, curl, and any client without automatic session management. Revert to stateless=True to restore compatibility with all MCP clients. The progress notification code already handles missing sessions gracefully (defensive checks + try/except), so no other changes are needed. Fixes #20242 * UI - Content Filters, help edit/view categories and 1-click add categories + go to next page (#21223) * feat(ui/): allow viewing content filter categories on guardrail info * fix(add_guardrail_form.tsx): add validation check to prevent adding empty content filter guardrails * feat(ui/): improve ux around adding new content filter categories easy to skip adding a category, so make it a 1-click thing * Fix OCI Grok output pricing (#21329) * fix(proxy): fix master key rotation Prisma validation errors _rotate_master_key() used jsonify_object() which converts Python dicts to JSON strings. Prisma's Python client rejects strings for Json-typed fields — it requires prisma.Json() wrappers or native dicts. This affected three code paths: - Model table (create_many): litellm_params and model_info converted to strings, plus created_at/updated_at were None (non-nullable DateTime) - Config table (update): param_value converted to string - Credentials table (update): credential_values/credential_info converted to strings Fix: replace jsonify_object() with model_dump(exclude_none=True) + prisma.Json() wrappers for all Json fields. Wrap model delete+insert in a Prisma transaction for atomicity. Add try/except around MCP server rotation to prevent non-critical failures from blocking the entire rotation. --------- Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: mjkam <mjkam@naver.com> Co-authored-by: Fly <48186978+tuzkiyoung@users.noreply.github.com> Co-authored-by: Kristoffer Arlind <13228507+KristofferArlind@users.noreply.github.com> Co-authored-by: Constantine <Runixer@gmail.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Atharva Jaiswal <92455570+AtharvaJaiswal005@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
michelligabriele
added a commit
to michelligabriele/litellm
that referenced
this pull request
Feb 24, 2026
…iAI#21323) PR BerriAI#19809 changed stateless=True to stateless=False to enable progress notifications for MCP tool calls. This caused the mcp library to enforce mcp-session-id headers on all non-initialize requests, breaking MCP Inspector, curl, and any client without automatic session management. Revert to stateless=True to restore compatibility with all MCP clients. The progress notification code already handles missing sessions gracefully (defensive checks + try/except), so no other changes are needed. Fixes BerriAI#20242
michelligabriele
added a commit
to michelligabriele/litellm
that referenced
this pull request
Feb 24, 2026
Adds TestProxyMcpStatelessBehavior to test_proxy_mcp_e2e.py with a test that verifies two independent MCP clients can connect, initialize, and call tools without sharing session state. This catches the regression from PR BerriAI#19809 where stateless=False broke clients that don't manage mcp-session-id headers. Regression test for BerriAI#20242
7 tasks
krrishdholakia
pushed a commit
that referenced
this pull request
Feb 24, 2026
…) (#22030) PR #19809 changed stateless=True to stateless=False to enable progress notifications for MCP tool calls. This caused the mcp library to enforce mcp-session-id headers on all non-initialize requests, breaking MCP Inspector, curl, and any client without automatic session management. Revert to stateless=True to restore compatibility with all MCP clients. The progress notification code already handles missing sessions gracefully (defensive checks + try/except), so no other changes are needed. Fixes #20242
ishaan-jaff
pushed a commit
that referenced
this pull request
Feb 26, 2026
Adds TestProxyMcpStatelessBehavior to test_proxy_mcp_e2e.py with a test that verifies two independent MCP clients can connect, initialize, and call tools without sharing session state. This catches the regression from PR #19809 where stateless=False broke clients that don't manage mcp-session-id headers. Regression test for #20242
Sameerlite
pushed a commit
that referenced
this pull request
Mar 3, 2026
Adds TestProxyMcpStatelessBehavior to test_proxy_mcp_e2e.py with a test that verifies two independent MCP clients can connect, initialize, and call tools without sharing session state. This catches the regression from PR #19809 where stateless=False broke clients that don't manage mcp-session-id headers. Regression test for #20242
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Relevant issues
The MCP progress utility was not working when MCP servers were used via LiteLLM, progress notifications sent by external MCP servers were not reaching litellm and Host clients.
Root Causes
progressTokensent to external servers)json_response=True+stateless=True) instead of streamingScreenshots
Before Fix
After Fix
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
client.py - Added progress_callback to receive progress from external MCP servers
server.py - Capture Host's progressToken and forward via host_progress_callback
mcp_server_manager.py - Pass host_progress_callback through the call chain