Skip to content

docs: comprehensive test flakiness solutions guide#21391

Draft
jquinter wants to merge 2 commits intomainfrom
fix/test-flakiness-improvements
Draft

docs: comprehensive test flakiness solutions guide#21391
jquinter wants to merge 2 commits intomainfrom
fix/test-flakiness-improvements

Conversation

@jquinter
Copy link
Contributor

Problem

CI tests are flaky - passing locally but failing in CI:

Solution

Add comprehensive documentation with actionable solutions:

📖 test-flakiness-guide.md (Developer Guide)

Complete reference for test authors covering:

  1. Better Test Isolation

    • Using @pytest.mark.no_parallel for async mocks
    • Fixing module import issues with local imports
    • When to use sequential vs parallel execution
  2. Robust Async Mock Setup

    • 3 patterns for reliable async mocking
    • Patch early, execute late strategy
    • Using AsyncMock and side_effect properly
  3. Retry Logic

    • Using pytest-rerunfailures
    • Custom retry decorators
    • CI-specific retry strategies
  4. Test Fixtures & Cleanup

    • Preventing state pollution between tests
    • Module-level cleanup patterns
  5. Quick Reference

    • Problem → Solution lookup table
    • Real examples from codebase
    • Debugging commands

🔧 ci-test-improvements.md (Implementation Plan)

Phased rollout plan with concrete next steps:

Phase 1: Quick Wins (1 day)

  • Install pytest-rerunfailures plugin
  • Mark known flaky tests
  • Add test utilities module

Phase 2: CI Improvements (2-3 days)

  • GitHub Actions retry logic
  • Separate stable vs flaky test runs
  • Makefile targets for common scenarios

Phase 3: Enforcement (1 week)

  • Pre-commit hooks for test quality
  • Update CLAUDE.md guidelines
  • Review all async tests

Includes:

  • pytest-rerunfailures setup
  • GitHub Actions yaml examples
  • Makefile targets (test-fast, test-flaky, test-repeat)
  • Pre-commit hook for enforcing markers
  • Test utilities module with decorators
  • Success metrics and monitoring

Impact

  • ✅ Developers have clear guidance on writing reliable tests
  • ✅ Reduces CI false failures
  • ✅ Prevents new flaky tests from being introduced
  • ✅ Provides tools for diagnosing flakiness locally

Next Steps

After this documentation is merged, implement Phase 1:

  1. poetry add --group dev pytest-rerunfailures
  2. Create tests/test_utils.py with retry decorators
  3. Mark existing flaky tests with @pytest.mark.flaky
  4. Add Makefile targets for easy testing

Related

Examples

Before (flaky):

from module import AsyncHTTPHandler  # Stale after reload

@pytest.mark.asyncio
async def test_client():
    client = get_client()
    assert isinstance(client, AsyncHTTPHandler)  # ❌ Flaky!

After (reliable):

@pytest.mark.asyncio
@pytest.mark.no_parallel  # Sequential execution
async def test_client():
    from module import AsyncHTTPHandler  # Fresh import
    client = get_client()
    assert isinstance(client, AsyncHTTPHandler)  # ✅ Reliable!

jquinter and others added 2 commits February 17, 2026 14:27
Fix isinstance() checks failing due to module reload in conftest.py.

The conftest.py fixture reloads the litellm module between test modules,
which causes class references imported at module-level to become stale.
When AsyncHTTPHandler is imported at the top of the file and then litellm
is reloaded by the fixture, the isinstance() check fails because the
returned instance is of the NEW AsyncHTTPHandler class while the test
is checking against the OLD class reference.

Solution: Import AsyncHTTPHandler locally within each test function that
uses isinstance() checks. This ensures we get the fresh class reference
after the module reload.

Fixed tests:
- test_session_reuse_integration
- test_get_async_httpx_client_with_shared_session
- test_get_async_httpx_client_without_shared_session

This resolves intermittent CI failures where parallel test execution
triggers the module reload behavior.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add two detailed guides for addressing CI test flakiness:

1. test-flakiness-guide.md - Developer guide with:
   - How to use @pytest.mark.no_parallel for async mocks
   - Patterns for robust async mock setup
   - Retry logic strategies
   - Module reload issues and fixes
   - Quick reference and checklist

2. ci-test-improvements.md - Implementation plan with:
   - Priority phased rollout (Quick wins → CI → Enforcement)
   - pytest-rerunfailures plugin setup
   - GitHub Actions improvements for retries
   - Makefile targets for testing
   - Pre-commit hooks for test quality
   - Test utilities module with decorators
   - Success metrics and monitoring

These guides provide actionable solutions for the CI test failures
observed in PRs #21107, #21388, and #21390.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Feb 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 17, 2026 5:46pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 17, 2026

Greptile Summary

This PR adds two documentation files (test-flakiness-guide.md and ci-test-improvements.md) and fixes isinstance() flakiness in test_http_handler.py by using local imports aliased as AsyncHTTPHandlerReload to avoid stale class references after importlib.reload(litellm) in conftest.

  • Test fix: Three test functions (test_get_async_httpx_client_with_shared_session, test_get_async_httpx_client_without_shared_session, test_session_reuse_integration) now import AsyncHTTPHandler locally inside the test body, resolving isinstance() failures caused by module reload creating new class objects
  • Documentation: Comprehensive guides covering test isolation patterns, async mock strategies, retry logic, and a phased CI improvement plan — though the recommended local import pattern directly contradicts CLAUDE.md's style guideline against inline imports
  • Pre-commit hook proposal: The check_test_markers.py script in the implementation plan has a false-positive issue where file-level mock detection causes all async tests in a file to be flagged, not just those that actually use mocks

Confidence Score: 4/5

  • This PR is safe to merge — the code changes are a targeted test fix, and the documentation is additive with no production code impact.
  • The actual code change is a minimal, well-targeted fix for isinstance() flakiness in test files. The two documentation files are additive and don't affect runtime behavior. The only concern is the style guide contradiction (local imports vs CLAUDE.md guidelines) and a logic bug in the proposed pre-commit hook script, but the pre-commit hook is documentation-only and not yet implemented.
  • docs/ci-test-improvements.md contains a proposed pre-commit hook with false-positive detection logic that should be corrected before implementation.

Important Files Changed

Filename Overview
docs/test-flakiness-guide.md New documentation guide for addressing test flakiness. Contains accurate references to pyproject.toml settings and practical advice. Contradicts CLAUDE.md style guide on inline imports.
docs/ci-test-improvements.md Implementation plan for CI test improvements with phased rollout. Contains code examples for pytest-rerunfailures, GitHub Actions, Makefile targets, and pre-commit hooks. All proposed, not yet implemented.
tests/test_litellm/llms/custom_httpx/test_http_handler.py Fixes isinstance() flakiness by using local imports aliased as AsyncHTTPHandlerReload. Module-level AsyncHTTPHandler import remains for non-isinstance usage. Whitespace cleanup included.

Flowchart

flowchart TD
    A["conftest.py: importlib.reload(litellm)"] --> B["Module-level AsyncHTTPHandler reference becomes stale"]
    B --> C{"isinstance(obj, AsyncHTTPHandler)?"}
    C -->|"Stale reference"| D["❌ False — different class object"]
    C -->|"Fresh local import"| E["✅ True — same class object"]
    
    F["test_http_handler.py line 18:\nfrom ... import AsyncHTTPHandler"] --> B
    
    G["PR Fix: Local import in test body\nfrom ... import AsyncHTTPHandler as AsyncHTTPHandlerReload"] --> E
Loading

Last reviewed commit: feb01b9

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +37 to +43

For `isinstance()` checks that fail after module reload:

```python
# ❌ BAD: Module-level import becomes stale after reload
from module import MyClass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contradicts CLAUDE.md style guide

This guide recommends using local imports inside test functions, but CLAUDE.md explicitly states: "Avoid imports within methods — place all imports at the top of the file (module-level). Inline imports inside functions/methods make dependencies harder to trace and hurt readability. The only exception is avoiding circular imports where absolutely necessary."

Since this is a legitimate exception (module reload invalidates module-level references), consider updating CLAUDE.md to add test files as another accepted exception, or at least reference this conflict in this guide so developers aren't confused by the contradictory advice.

Context Used: Context from dashboard - CLAUDE.md (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +121 to +128
name: Check async tests have proper markers
entry: python scripts/check_test_markers.py
language: python
files: ^tests/.*test.*\.py$
```

`scripts/check_test_markers.py`:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-commit hook has false positives

The check_test_markers.py script uses a very broad heuristic: has_mock = 'patch' in content or 'Mock' in content or 'mock_' in node.name.lower(). This checks the entire file content for any occurrence of "patch" or "Mock", meaning if any test in the file uses mocks, every async test in that file will be flagged — even those that don't use mocks at all. The check should be scoped to each individual test function's decorators and body, not the entire file content.

@giulio-leone
Copy link

Automated patch bundle from next-100 unresolved backlog expansion.\nGenerated due limited direct branch-write access; please apply/cherry-pick minimal edits below.\n\n## PR #21391fix/test-flakiness-improvements (2 unresolved)

Unresolved thread summary

  • T1 docs/test-flakiness-guide.md:43 — Contradicts CLAUDE.md style guide
  • T2 docs/ci-test-improvements.md:128 — Pre-commit hook has false positives

Minimal patch proposals

  • T1 docs/test-flakiness-guide.md:43
    • Edit steps:
      1. Adjust test inputs/assertions to validate the reviewer-reported behavior and prevent regressions.
      2. Keep the test deterministic and verify it fails before / passes after the patch intent.
  • T2 docs/ci-test-improvements.md:128
    • Edit steps:
      1. Update documentation wording/table structure so docs match current API behavior.
      2. Ensure markdown structure remains valid (anchors, columns, and code examples).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants