Fix: Resolve flakiness in three integration tests by AlexsanderHamir · Pull Request #17594 · BerriAI/litellm

AlexsanderHamir · 2025-12-06T15:56:57Z

Title

Fix: Resolve flakiness in three integration tests

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🐛 Bug Fix

Changes

test_no_duplicate_spend_logs (test_litellm/responses/test_no_duplicate_spend_logs.py) Problem: Used await asyncio.sleep(1) to wait for async logging completion, which created race conditions. The async logging worker queues tasks in the background, and sleep() doesn't guarantee completion.

Fix: Replaced sleep() with GLOBAL_LOGGING_WORKER.flush() which properly waits for the logging queue to empty, ensuring all async logging tasks complete before assertions run.
test_log_langfuse_v2_handles_null_usage_values (test_litellm/integrations/test_langfuse.py) Problem: Used datetime.datetime.now() twice for start_time and end_time, which could cause timing inconsistencies between test runs, especially in CI environments with variable execution speeds.

Fix: Use fixed timestamps instead of datetime.now() to ensure consistent timing across all test runs, eliminating timing-related flakiness.
test_watsonx_gpt_oss_prompt_transformation (test_litellm/llms/watsonx/test_watsonx.py) Problem: Directly accessed mock_post.call_args without checking if it exists, which could be None if the mock wasn't called or if an exception occurred before the POST request. The test catches exceptions and continues, making this a potential failure point.

Fix: Added proper assertions and use call_args_list[0] for safer access: - Assert that call_args_list has at least one call - Assert that call_args is not None - Assert that 'data' key exists in kwargs This ensures the test fails with clear error messages rather than intermittent AttributeError exceptions.

All fixes maintain the original test intent while making them deterministic and reliable in CI environments.

Fixed three flaky tests that were intermittently failing in CI: 1. test_no_duplicate_spend_logs (test_litellm/responses/test_no_duplicate_spend_logs.py) Problem: Used await asyncio.sleep(1) to wait for async logging completion, which created race conditions. The async logging worker queues tasks in the background, and sleep() doesn't guarantee completion. Fix: Replaced sleep() with GLOBAL_LOGGING_WORKER.flush() which properly waits for the logging queue to empty, ensuring all async logging tasks complete before assertions run. 2. test_log_langfuse_v2_handles_null_usage_values (test_litellm/integrations/test_langfuse.py) Problem: Used datetime.datetime.now() twice for start_time and end_time, which could cause timing inconsistencies between test runs, especially in CI environments with variable execution speeds. Fix: Use fixed timestamps instead of datetime.now() to ensure consistent timing across all test runs, eliminating timing-related flakiness. 3. test_watsonx_gpt_oss_prompt_transformation (test_litellm/llms/watsonx/test_watsonx.py) Problem: Directly accessed mock_post.call_args without checking if it exists, which could be None if the mock wasn't called or if an exception occurred before the POST request. The test catches exceptions and continues, making this a potential failure point. Fix: Added proper assertions and use call_args_list[0] for safer access: - Assert that call_args_list has at least one call - Assert that call_args is not None - Assert that 'data' key exists in kwargs This ensures the test fails with clear error messages rather than intermittent AttributeError exceptions. All fixes maintain the original test intent while making them deterministic and reliable in CI environments.

vercel · 2025-12-06T15:57:03Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
litellm	Building	Preview	Comment	Dec 6, 2025 3:57pm

Fixes test_log_langfuse_v2_handles_null_usage_values flaky test failure by properly cleaning up sys.modules['langfuse'] in tearDown. Changes: - Store original langfuse module in setUp before mocking - Restore original or remove mock in tearDown to prevent state pollution - Remove invalid print_verbose parameter from log_event_on_langfuse Root Cause: The tearDown method was not cleaning up sys.modules['langfuse'] after each test, causing mock state to leak between tests. This caused intermittent failures in CI, especially when tests run in parallel or in different orders. Impact: This test has a long history of flakiness with multiple attempted fixes (#20475, #17599, #17594, #17591, #17588). The missing sys.modules cleanup was the underlying issue causing continued failures despite those patches. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fixes test_log_langfuse_v2_handles_null_usage_values flaky test failure by properly cleaning up sys.modules['langfuse'] in tearDown. Changes: - Store original langfuse module in setUp before mocking - Restore original or remove mock in tearDown to prevent state pollution - Remove invalid print_verbose parameter from log_event_on_langfuse Root Cause: The tearDown method was not cleaning up sys.modules['langfuse'] after each test, causing mock state to leak between tests. This caused intermittent failures in CI, especially when tests run in parallel or in different orders. Impact: This test has a long history of flakiness with multiple attempted fixes (BerriAI#20475, BerriAI#17599, BerriAI#17594, BerriAI#17591, BerriAI#17588). The missing sys.modules cleanup was the underlying issue causing continued failures despite those patches. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

AlexsanderHamir merged commit 1e89aa3 into main Dec 6, 2025
6 of 43 checks passed

vercel bot deployed to Preview December 6, 2025 15:57 View deployment

jquinter mentioned this pull request Feb 14, 2026

Fix: Langfuse test isolation to prevent flaky failures #21214

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Resolve flakiness in three integration tests#17594

Fix: Resolve flakiness in three integration tests#17594
AlexsanderHamir merged 1 commit intomainfrom
litellm_ci_fix007

AlexsanderHamir commented Dec 6, 2025

Uh oh!

vercel bot commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AlexsanderHamir commented Dec 6, 2025

Title

Relevant issues

Pre-Submission checklist

Type

Changes

Uh oh!

vercel bot commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant