ignore flaky test#8278
Merged
yihau merged 1 commit intoanza-xyz:masterfrom Oct 1, 2025
Merged
Conversation
lijunwangs
approved these changes
Oct 1, 2025
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8278 +/- ##
=========================================
- Coverage 83.2% 83.1% -0.1%
=========================================
Files 836 836
Lines 366600 366592 -8
=========================================
- Hits 305201 304982 -219
- Misses 61399 61610 +211 🚀 New features to boost your workflow:
|
AvhiMaz
added a commit
to AvhiMaz/agave
that referenced
this pull request
Nov 12, 2025
Remove #[ignore] attribute and fix timing to eliminate race condition in test_scheduler_drop_idle, which was disabled in PR anza-xyz#8278. The test verifies the scheduler pool's cleaner thread correctly removes idle schedulers while preserving recently-pooled ones. Root cause: The original test used a 100ms idle threshold with 1000ms sleep, but timing was still unreliable due to system variations. The race occurred when old_scheduler and new_scheduler had unclear age differences. Fix: - Use explicit 300ms idle threshold for this test (instead of 100ms) - Sleep 350ms before returning new_scheduler (provides 50ms+ safety margin) - This guarantees old_scheduler is idle while new_scheduler is not Result: - old_scheduler: 350ms old (> 300ms threshold) → definitely idle → removed - new_scheduler: ~0ms old (< 300ms threshold) → definitely not idle → kept - Test is now deterministic and matches expected checkpoint sequence Fixes anza-xyz#8279 Signed-off-by: AvhiMaz <avhimazumder5@outlook.com>
github-merge-queue Bot
pushed a commit
that referenced
this pull request
Nov 19, 2025
…9042) * fix: re-enable test_scheduler_drop_idle by resolving race condition Remove #[ignore] attribute and fix timing to eliminate race condition in test_scheduler_drop_idle, which was disabled in PR #8278. The test verifies the scheduler pool's cleaner thread correctly removes idle schedulers while preserving recently-pooled ones. Root cause: The original test used a 100ms idle threshold with 1000ms sleep, but timing was still unreliable due to system variations. The race occurred when old_scheduler and new_scheduler had unclear age differences. Fix: - Use explicit 300ms idle threshold for this test (instead of 100ms) - Sleep 350ms before returning new_scheduler (provides 50ms+ safety margin) - This guarantees old_scheduler is idle while new_scheduler is not Result: - old_scheduler: 350ms old (> 300ms threshold) → definitely idle → removed - new_scheduler: ~0ms old (< 300ms threshold) → definitely not idle → kept - Test is now deterministic and matches expected checkpoint sequence Fixes #8279 Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> * Address all reviewer feedback on test_scheduler_drop_idle - Move SHORTENED_MAX_POOLING_DURATION constant to test_scheduler_drop_stale where it's used - Convert test_max_pooling_duration from let to const TEST_MAX_POOLING_DURATION following conventions - Create TEST_WAIT_FOR_IDLE constant (500ms) with explicit safety margin tied to TEST_MAX_POOLING_DURATION - Increase safety margin from 50ms to 200ms for CI reliability - Add detailed comments explaining 300ms pooling duration tradeoff (speed vs race condition window) - Update assertion comment with explicit timing guarantees: old_scheduler ~500ms old (exceeds 300ms threshold), new_scheduler ~50ms old (below threshold) Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> * refactor: tie test duration constants together and improve comment accuracy Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> * fix: cargo fmt Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> --------- Signed-off-by: AvhiMaz <avhimazumder5@outlook.com>
rustopian
pushed a commit
to rustopian/agave
that referenced
this pull request
Nov 20, 2025
…nza-xyz#9042) * fix: re-enable test_scheduler_drop_idle by resolving race condition Remove #[ignore] attribute and fix timing to eliminate race condition in test_scheduler_drop_idle, which was disabled in PR anza-xyz#8278. The test verifies the scheduler pool's cleaner thread correctly removes idle schedulers while preserving recently-pooled ones. Root cause: The original test used a 100ms idle threshold with 1000ms sleep, but timing was still unreliable due to system variations. The race occurred when old_scheduler and new_scheduler had unclear age differences. Fix: - Use explicit 300ms idle threshold for this test (instead of 100ms) - Sleep 350ms before returning new_scheduler (provides 50ms+ safety margin) - This guarantees old_scheduler is idle while new_scheduler is not Result: - old_scheduler: 350ms old (> 300ms threshold) → definitely idle → removed - new_scheduler: ~0ms old (< 300ms threshold) → definitely not idle → kept - Test is now deterministic and matches expected checkpoint sequence Fixes anza-xyz#8279 Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> * Address all reviewer feedback on test_scheduler_drop_idle - Move SHORTENED_MAX_POOLING_DURATION constant to test_scheduler_drop_stale where it's used - Convert test_max_pooling_duration from let to const TEST_MAX_POOLING_DURATION following conventions - Create TEST_WAIT_FOR_IDLE constant (500ms) with explicit safety margin tied to TEST_MAX_POOLING_DURATION - Increase safety margin from 50ms to 200ms for CI reliability - Add detailed comments explaining 300ms pooling duration tradeoff (speed vs race condition window) - Update assertion comment with explicit timing guarantees: old_scheduler ~500ms old (exceeds 300ms threshold), new_scheduler ~50ms old (below threshold) Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> * refactor: tie test duration constants together and improve comment accuracy Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> * fix: cargo fmt Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> --------- Signed-off-by: AvhiMaz <avhimazumder5@outlook.com>
AvhiMaz
added a commit
to AvhiMaz/agave
that referenced
this pull request
Nov 28, 2025
…nza-xyz#9042) * fix: re-enable test_scheduler_drop_idle by resolving race condition Remove #[ignore] attribute and fix timing to eliminate race condition in test_scheduler_drop_idle, which was disabled in PR anza-xyz#8278. The test verifies the scheduler pool's cleaner thread correctly removes idle schedulers while preserving recently-pooled ones. Root cause: The original test used a 100ms idle threshold with 1000ms sleep, but timing was still unreliable due to system variations. The race occurred when old_scheduler and new_scheduler had unclear age differences. Fix: - Use explicit 300ms idle threshold for this test (instead of 100ms) - Sleep 350ms before returning new_scheduler (provides 50ms+ safety margin) - This guarantees old_scheduler is idle while new_scheduler is not Result: - old_scheduler: 350ms old (> 300ms threshold) → definitely idle → removed - new_scheduler: ~0ms old (< 300ms threshold) → definitely not idle → kept - Test is now deterministic and matches expected checkpoint sequence Fixes anza-xyz#8279 Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> * Address all reviewer feedback on test_scheduler_drop_idle - Move SHORTENED_MAX_POOLING_DURATION constant to test_scheduler_drop_stale where it's used - Convert test_max_pooling_duration from let to const TEST_MAX_POOLING_DURATION following conventions - Create TEST_WAIT_FOR_IDLE constant (500ms) with explicit safety margin tied to TEST_MAX_POOLING_DURATION - Increase safety margin from 50ms to 200ms for CI reliability - Add detailed comments explaining 300ms pooling duration tradeoff (speed vs race condition window) - Update assertion comment with explicit timing guarantees: old_scheduler ~500ms old (exceeds 300ms threshold), new_scheduler ~50ms old (below threshold) Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> * refactor: tie test duration constants together and improve comment accuracy Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> * fix: cargo fmt Signed-off-by: AvhiMaz <avhimazumder5@outlook.com> --------- Signed-off-by: AvhiMaz <avhimazumder5@outlook.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
test_scheduler_drop_idleseems to be flaky and hit this assert:See https://buildkite.com/anza/agave/builds/31204#01999eda-ad53-4c13-8e73-77882137f119 for an example.
This is causing CI to fail and/or retry and take a long time and slowing down merging into master
Summary of Changes
ignore this flaky test