docs: comprehensive test flakiness solutions guide by jquinter · Pull Request #21391 · BerriAI/litellm

jquinter · 2026-02-17T17:45:12Z

Problem

CI tests are flaky - passing locally but failing in CI:

http_handler tests: isinstance() failures (FIXED in fix(tests): resolve test isolation issue in http_handler tests #21388)
Vertex AI tests: credential/mock timing issues
MCP server tests: async mock call count failures

Solution

Add comprehensive documentation with actionable solutions:

📖 test-flakiness-guide.md (Developer Guide)

Complete reference for test authors covering:

Better Test Isolation
- Using @pytest.mark.no_parallel for async mocks
- Fixing module import issues with local imports
- When to use sequential vs parallel execution
Robust Async Mock Setup
- 3 patterns for reliable async mocking
- Patch early, execute late strategy
- Using AsyncMock and side_effect properly
Retry Logic
- Using pytest-rerunfailures
- Custom retry decorators
- CI-specific retry strategies
Test Fixtures & Cleanup
- Preventing state pollution between tests
- Module-level cleanup patterns
Quick Reference
- Problem → Solution lookup table
- Real examples from codebase
- Debugging commands

🔧 ci-test-improvements.md (Implementation Plan)

Phased rollout plan with concrete next steps:

Phase 1: Quick Wins (1 day)

Install pytest-rerunfailures plugin
Mark known flaky tests
Add test utilities module

Phase 2: CI Improvements (2-3 days)

GitHub Actions retry logic
Separate stable vs flaky test runs
Makefile targets for common scenarios

Phase 3: Enforcement (1 week)

Pre-commit hooks for test quality
Update CLAUDE.md guidelines
Review all async tests

Includes:

pytest-rerunfailures setup
GitHub Actions yaml examples
Makefile targets (test-fast, test-flaky, test-repeat)
Pre-commit hook for enforcing markers
Test utilities module with decorators
Success metrics and monitoring

Impact

✅ Developers have clear guidance on writing reliable tests
✅ Reduces CI false failures
✅ Prevents new flaky tests from being introduced
✅ Provides tools for diagnosing flakiness locally

Next Steps

After this documentation is merged, implement Phase 1:

poetry add --group dev pytest-rerunfailures
Create tests/test_utils.py with retry decorators
Mark existing flaky tests with @pytest.mark.flaky
Add Makefile targets for easy testing

Issue discovered in PR fix(bedrock): filter internal json_tool_call when mixed with real tools #21107 (Bedrock fix)
Test isolation fix in PR fix(tests): resolve test isolation issue in http_handler tests #21388
Lint fix in PR fix(lint): suppress PLR0915 too many statements in route_request #21390
All three showed same pattern: tests pass locally, fail in CI

Examples

Before (flaky):

from module import AsyncHTTPHandler  # Stale after reload

@pytest.mark.asyncio
async def test_client():
    client = get_client()
    assert isinstance(client, AsyncHTTPHandler)  # ❌ Flaky!

After (reliable):

@pytest.mark.asyncio
@pytest.mark.no_parallel  # Sequential execution
async def test_client():
    from module import AsyncHTTPHandler  # Fresh import
    client = get_client()
    assert isinstance(client, AsyncHTTPHandler)  # ✅ Reliable!

Fix isinstance() checks failing due to module reload in conftest.py. The conftest.py fixture reloads the litellm module between test modules, which causes class references imported at module-level to become stale. When AsyncHTTPHandler is imported at the top of the file and then litellm is reloaded by the fixture, the isinstance() check fails because the returned instance is of the NEW AsyncHTTPHandler class while the test is checking against the OLD class reference. Solution: Import AsyncHTTPHandler locally within each test function that uses isinstance() checks. This ensures we get the fresh class reference after the module reload. Fixed tests: - test_session_reuse_integration - test_get_async_httpx_client_with_shared_session - test_get_async_httpx_client_without_shared_session This resolves intermittent CI failures where parallel test execution triggers the module reload behavior. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add two detailed guides for addressing CI test flakiness: 1. test-flakiness-guide.md - Developer guide with: - How to use @pytest.mark.no_parallel for async mocks - Patterns for robust async mock setup - Retry logic strategies - Module reload issues and fixes - Quick reference and checklist 2. ci-test-improvements.md - Implementation plan with: - Priority phased rollout (Quick wins → CI → Enforcement) - pytest-rerunfailures plugin setup - GitHub Actions improvements for retries - Makefile targets for testing - Pre-commit hooks for test quality - Test utilities module with decorators - Success metrics and monitoring These guides provide actionable solutions for the CI test failures observed in PRs #21107, #21388, and #21390. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

vercel · 2026-02-17T17:45:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Feb 17, 2026 5:46pm

greptile-apps · 2026-02-17T17:48:35Z

Greptile Summary

This PR adds two documentation files (test-flakiness-guide.md and ci-test-improvements.md) and fixes isinstance() flakiness in test_http_handler.py by using local imports aliased as AsyncHTTPHandlerReload to avoid stale class references after importlib.reload(litellm) in conftest.

Test fix: Three test functions (test_get_async_httpx_client_with_shared_session, test_get_async_httpx_client_without_shared_session, test_session_reuse_integration) now import AsyncHTTPHandler locally inside the test body, resolving isinstance() failures caused by module reload creating new class objects
Documentation: Comprehensive guides covering test isolation patterns, async mock strategies, retry logic, and a phased CI improvement plan — though the recommended local import pattern directly contradicts CLAUDE.md's style guideline against inline imports
Pre-commit hook proposal: The check_test_markers.py script in the implementation plan has a false-positive issue where file-level mock detection causes all async tests in a file to be flagged, not just those that actually use mocks

Confidence Score: 4/5

This PR is safe to merge — the code changes are a targeted test fix, and the documentation is additive with no production code impact.
The actual code change is a minimal, well-targeted fix for isinstance() flakiness in test files. The two documentation files are additive and don't affect runtime behavior. The only concern is the style guide contradiction (local imports vs CLAUDE.md guidelines) and a logic bug in the proposed pre-commit hook script, but the pre-commit hook is documentation-only and not yet implemented.
docs/ci-test-improvements.md contains a proposed pre-commit hook with false-positive detection logic that should be corrected before implementation.

Important Files Changed

Filename	Overview
docs/test-flakiness-guide.md	New documentation guide for addressing test flakiness. Contains accurate references to pyproject.toml settings and practical advice. Contradicts CLAUDE.md style guide on inline imports.
docs/ci-test-improvements.md	Implementation plan for CI test improvements with phased rollout. Contains code examples for pytest-rerunfailures, GitHub Actions, Makefile targets, and pre-commit hooks. All proposed, not yet implemented.
tests/test_litellm/llms/custom_httpx/test_http_handler.py	Fixes isinstance() flakiness by using local imports aliased as AsyncHTTPHandlerReload. Module-level AsyncHTTPHandler import remains for non-isinstance usage. Whitespace cleanup included.

Flowchart

flowchart TD
    A["conftest.py: importlib.reload(litellm)"] --> B["Module-level AsyncHTTPHandler reference becomes stale"]
    B --> C{"isinstance(obj, AsyncHTTPHandler)?"}
    C -->|"Stale reference"| D["❌ False — different class object"]
    C -->|"Fresh local import"| E["✅ True — same class object"]
    
    F["test_http_handler.py line 18:\nfrom ... import AsyncHTTPHandler"] --> B
    
    G["PR Fix: Local import in test body\nfrom ... import AsyncHTTPHandler as AsyncHTTPHandlerReload"] --> E

_{Last reviewed commit: feb01b9}

greptile-apps

_{3 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-17T17:48:38Z

docs/test-flakiness-guide.md

+
+For `isinstance()` checks that fail after module reload:
+
+```python
+# ❌ BAD: Module-level import becomes stale after reload
+from module import MyClass
+


Contradicts CLAUDE.md style guide

This guide recommends using local imports inside test functions, but CLAUDE.md explicitly states: "Avoid imports within methods — place all imports at the top of the file (module-level). Inline imports inside functions/methods make dependencies harder to trace and hurt readability. The only exception is avoiding circular imports where absolutely necessary."

Since this is a legitimate exception (module reload invalidates module-level references), consider updating CLAUDE.md to add test files as another accepted exception, or at least reference this conflict in this guide so developers aren't confused by the contradictory advice.

Context Used: Context from dashboard - CLAUDE.md (source)

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

greptile-apps · 2026-02-17T17:48:39Z

docs/ci-test-improvements.md

+        name: Check async tests have proper markers
+        entry: python scripts/check_test_markers.py
+        language: python
+        files: ^tests/.*test.*\.py$
+```
+
+`scripts/check_test_markers.py`:
+


Pre-commit hook has false positives

The check_test_markers.py script uses a very broad heuristic: has_mock = 'patch' in content or 'Mock' in content or 'mock_' in node.name.lower(). This checks the entire file content for any occurrence of "patch" or "Mock", meaning if any test in the file uses mocks, every async test in that file will be flagged — even those that don't use mocks at all. The check should be scoped to each individual test function's decorators and body, not the entire file content.

giulio-leone · 2026-03-02T14:37:45Z

Automated patch bundle from next-100 unresolved backlog expansion.\nGenerated due limited direct branch-write access; please apply/cherry-pick minimal edits below.\n\n## PR #21391 — fix/test-flakiness-improvements (2 unresolved)

Unresolved thread summary

T1 docs/test-flakiness-guide.md:43 — Contradicts CLAUDE.md style guide
T2 docs/ci-test-improvements.md:128 — Pre-commit hook has false positives

Minimal patch proposals

T1 docs/test-flakiness-guide.md:43
- Edit steps:
  1. Adjust test inputs/assertions to validate the reviewer-reported behavior and prevent regressions.
  2. Keep the test deterministic and verify it fails before / passes after the patch intent.
T2 docs/ci-test-improvements.md:128
- Edit steps:
  1. Update documentation wording/table structure so docs match current API behavior.
  2. Ensure markdown structure remains valid (anchors, columns, and code examples).

jquinter and others added 2 commits February 17, 2026 14:27

jquinter marked this pull request as draft February 17, 2026 17:46

vercel bot deployed to Preview February 17, 2026 17:46 View deployment

greptile-apps bot reviewed Feb 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: comprehensive test flakiness solutions guide#21391

docs: comprehensive test flakiness solutions guide#21391
jquinter wants to merge 2 commits intomainfrom
fix/test-flakiness-improvements

jquinter commented Feb 17, 2026

Uh oh!

vercel bot commented Feb 17, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Feb 17, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 17, 2026

Uh oh!

greptile-apps bot Feb 17, 2026

Uh oh!

giulio-leone commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jquinter commented Feb 17, 2026

Problem

Solution

📖 test-flakiness-guide.md (Developer Guide)

🔧 ci-test-improvements.md (Implementation Plan)

Impact

Next Steps

Related

Examples

Uh oh!

vercel bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Feb 17, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

giulio-leone commented Mar 2, 2026

Unresolved thread summary

Minimal patch proposals

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Feb 17, 2026 •

edited

Loading