fix(test): update reasoning_effort test to expect dict format#21271
Merged
fix(test): update reasoning_effort test to expect dict format#21271
Conversation
Update test expectations to match the current code behavior where
reasoning_effort is transformed from a string to a dict with
'effort' and 'summary' fields.
The transformation happens in:
litellm/llms/anthropic/experimental_pass_through/adapters/handler.py:72-74
When reasoning_effort is a string like "minimal", it's converted to:
{"effort": "minimal", "summary": "detailed"}
The test was expecting just the string "minimal", causing it to fail.
Test now passes ✅
Related: test was failing on PR #21217, but NOT caused by PR #21217
(which only modifies test_anthropic_structured_output.py). This is a
pre-existing broken test that also fails on main branch.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Greptile SummaryFixes a broken test assertion in
Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_anthropic_experimental_pass_through_messages_handler.py | Updates test assertion for reasoning_effort from expecting a string to expecting the dict format ({"effort": "minimal", "summary": "detailed"}) that the full pipeline actually produces. Correct fix for a pre-existing broken test. |
Flowchart
flowchart TD
A["anthropic_messages_handler\n(model: openai/gpt-5.2, thinking: {type: enabled, budget_tokens: 1024})"] --> B["_prepare_completion_kwargs"]
B --> C["translate_anthropic_to_openai\n(sets reasoning_effort = 'minimal')"]
C --> D["_route_openai_thinking_to_responses_api_if_needed\n(transforms to dict for OpenAI models)"]
D --> E["reasoning_effort = {effort: 'minimal', summary: 'detailed'}"]
E --> F["litellm.completion(**completion_kwargs)"]
Last reviewed commit: 0812323
jquinter
added a commit
that referenced
this pull request
Feb 15, 2026
…ution Implements three key improvements to reduce test flakiness from parallel execution: 1. **Split Vertex AI tests into separate group** (workers: 1) - Vertex AI tests often have environment variable pollution issues - Running serially prevents cross-test interference with GOOGLE_APPLICATION_CREDENTIALS - Isolates authentication-related test failures 2. **Reduce workers for other LLM tests** (4 -> 2) - Decreases chance of race conditions and state conflicts - Still parallel but with less contention 3. **Add --dist=loadscope to pytest-xdist** - Keeps tests from the same file together on one worker - Reduces interference between unrelated test modules - Data shows 70% pass rate WITH loadscope vs 40% WITHOUT - Better test isolation while maintaining parallelism Note: loadscope exposes one tokenizer cache issue in core-utils which will be fixed in a separate PR. The tradeoff is worth it (7/10 pass vs 4/10 without). These changes address the root causes of intermittent test failures in: PRs #21268, #21271, #21272, #21273, #21275, #21276: - Environment variable pollution (GOOGLE_APPLICATION_CREDENTIALS, VERTEXAI_PROJECT) - Global state conflicts (litellm.known_tokenizer_config) - Async mock timing issues with parallel execution Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
jquinter
added a commit
that referenced
this pull request
Feb 18, 2026
…ution Implements three key improvements to reduce test flakiness from parallel execution: 1. **Split Vertex AI tests into separate group** (workers: 1) - Vertex AI tests often have environment variable pollution issues - Running serially prevents cross-test interference with GOOGLE_APPLICATION_CREDENTIALS - Isolates authentication-related test failures 2. **Reduce workers for other LLM tests** (4 -> 2) - Decreases chance of race conditions and state conflicts - Still parallel but with less contention 3. **Add --dist=loadscope to pytest-xdist** - Keeps tests from the same file together on one worker - Reduces interference between unrelated test modules - Data shows 70% pass rate WITH loadscope vs 40% WITHOUT - Better test isolation while maintaining parallelism Note: loadscope exposes one tokenizer cache issue in core-utils which will be fixed in a separate PR. The tradeoff is worth it (7/10 pass vs 4/10 without). These changes address the root causes of intermittent test failures in: PRs #21268, #21271, #21272, #21273, #21275, #21276: - Environment variable pollution (GOOGLE_APPLICATION_CREDENTIALS, VERTEXAI_PROJECT) - Global state conflicts (litellm.known_tokenizer_config) - Async mock timing issues with parallel execution Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes broken test that was expecting
reasoning_effortto be a string, but the code now returns a dict format.Problem
Test
test_openai_model_with_thinking_converts_to_reasoning_effortwas failing with:Test fails on main branch ❌ - This is a pre-existing broken test, not a regression.
Root Cause
Code in
litellm/llms/anthropic/experimental_pass_through/adapters/handler.py(lines 72-74) transformsreasoning_effortfrom a string to a dict:The test was never updated to expect this dict format.
Solution
Updated test expectations from:
To:
Testing
pytest tests/.../test_anthropic_experimental_pass_through_messages_handler.py::test_openai_model_with_thinking_converts_to_reasoning_effort -v ======================== 1 passed in 0.14s ========================Related
This test failure was reported on PR #21217, but NOT caused by PR #21217 (which only modifies
test_anthropic_structured_output.py). This is a pre-existing bug that also fails on the main branch.🤖 Generated with Claude Code