fix(test): update reasoning_effort test to expect dict format by jquinter · Pull Request #21271 · BerriAI/litellm

jquinter · 2026-02-15T22:08:40Z

Summary

Fixes broken test that was expecting reasoning_effort to be a string, but the code now returns a dict format.

Problem

Test test_openai_model_with_thinking_converts_to_reasoning_effort was failing with:

AssertionError: reasoning_effort should be 'minimal' for budget_tokens=1024, 
got {'effort': 'minimal', 'summary': 'detailed'}

Test fails on main branch ❌ - This is a pre-existing broken test, not a regression.

Root Cause

Code in litellm/llms/anthropic/experimental_pass_through/adapters/handler.py (lines 72-74) transforms reasoning_effort from a string to a dict:

completion_kwargs["reasoning_effort"] = {
    "effort": reasoning_effort,  # "minimal"
    "summary": "detailed",
}

The test was never updated to expect this dict format.

Solution

Updated test expectations from:

assert call_kwargs["reasoning_effort"] == "minimal"

To:

expected_reasoning_effort = {"effort": "minimal", "summary": "detailed"}
assert call_kwargs["reasoning_effort"] == expected_reasoning_effort

Testing

pytest tests/.../test_anthropic_experimental_pass_through_messages_handler.py::test_openai_model_with_thinking_converts_to_reasoning_effort -v
======================== 1 passed in 0.14s ========================

Update test expectations to match the current code behavior where reasoning_effort is transformed from a string to a dict with 'effort' and 'summary' fields. The transformation happens in: litellm/llms/anthropic/experimental_pass_through/adapters/handler.py:72-74 When reasoning_effort is a string like "minimal", it's converted to: {"effort": "minimal", "summary": "detailed"} The test was expecting just the string "minimal", causing it to fail. Test now passes ✅ Related: test was failing on PR #21217, but NOT caused by PR #21217 (which only modifies test_anthropic_structured_output.py). This is a pre-existing broken test that also fails on main branch. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

vercel · 2026-02-15T22:08:44Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 15, 2026 10:09pm

greptile-apps · 2026-02-15T22:10:35Z

Greptile Summary

Fixes a broken test assertion in test_openai_model_with_thinking_converts_to_reasoning_effort where the expected value for reasoning_effort was a plain string ("minimal"), but the actual code pipeline transforms it into a dict ({"effort": "minimal", "summary": "detailed"}).

The test exercises the full pipeline: translate_anthropic_to_openai first sets reasoning_effort as a string, then _route_openai_thinking_to_responses_api_if_needed transforms it into a dict with effort and summary fields for OpenAI models routed through the Responses API.
The test assertion is updated to match the actual dict output. The companion unit test (test_non_claude_model_converts_thinking_to_reasoning_effort) correctly continues to expect a string since it tests translate_thinking_for_model in isolation.
The PR includes evidence of the fix: passing test output is provided in the description.

Confidence Score: 5/5

This PR is safe to merge — it only fixes a test assertion to match existing production behavior.
The change is a minimal, correct test fix. It aligns the test expectation with the actual code behavior in the handler pipeline. No production code is modified, and the fix has been verified with a passing test run.
No files require special attention.

Important Files Changed

Filename	Overview
tests/test_litellm/llms/anthropic/experimental_pass_through/messages/test_anthropic_experimental_pass_through_messages_handler.py	Updates test assertion for `reasoning_effort` from expecting a string to expecting the dict format (`{"effort": "minimal", "summary": "detailed"}`) that the full pipeline actually produces. Correct fix for a pre-existing broken test.

Flowchart

flowchart TD
    A["anthropic_messages_handler\n(model: openai/gpt-5.2, thinking: {type: enabled, budget_tokens: 1024})"] --> B["_prepare_completion_kwargs"]
    B --> C["translate_anthropic_to_openai\n(sets reasoning_effort = 'minimal')"]
    C --> D["_route_openai_thinking_to_responses_api_if_needed\n(transforms to dict for OpenAI models)"]
    D --> E["reasoning_effort = {effort: 'minimal', summary: 'detailed'}"]
    E --> F["litellm.completion(**completion_kwargs)"]

_{Last reviewed commit: 0812323}

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

…ution Implements three key improvements to reduce test flakiness from parallel execution: 1. **Split Vertex AI tests into separate group** (workers: 1) - Vertex AI tests often have environment variable pollution issues - Running serially prevents cross-test interference with GOOGLE_APPLICATION_CREDENTIALS - Isolates authentication-related test failures 2. **Reduce workers for other LLM tests** (4 -> 2) - Decreases chance of race conditions and state conflicts - Still parallel but with less contention 3. **Add --dist=loadscope to pytest-xdist** - Keeps tests from the same file together on one worker - Reduces interference between unrelated test modules - Data shows 70% pass rate WITH loadscope vs 40% WITHOUT - Better test isolation while maintaining parallelism Note: loadscope exposes one tokenizer cache issue in core-utils which will be fixed in a separate PR. The tradeoff is worth it (7/10 pass vs 4/10 without). These changes address the root causes of intermittent test failures in: PRs #21268, #21271, #21272, #21273, #21275, #21276: - Environment variable pollution (GOOGLE_APPLICATION_CREDENTIALS, VERTEXAI_PROJECT) - Global state conflicts (litellm.known_tokenizer_config) - Async mock timing issues with parallel execution Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

vercel bot deployed to Preview February 15, 2026 22:09 View deployment

greptile-apps bot reviewed Feb 15, 2026

View reviewed changes

jquinter merged commit f20dd25 into main Feb 15, 2026
17 of 23 checks passed

jquinter mentioned this pull request Feb 15, 2026

improve(ci): enhance test stability with better isolation and distribution #21277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(test): update reasoning_effort test to expect dict format#21271

fix(test): update reasoning_effort test to expect dict format#21271
jquinter merged 1 commit intomainfrom
fix/anthropic-pass-through-reasoning-effort-test

jquinter commented Feb 15, 2026

Uh oh!

vercel bot commented Feb 15, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 15, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jquinter commented Feb 15, 2026

Summary

Problem

Root Cause

Solution

Testing

Related

Uh oh!

vercel bot commented Feb 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 15, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Feb 15, 2026 •

edited

Loading