Fix SearchableSnapshotsPersistentCacheIntegTests.testCacheSurviveRestart#66578
Fix SearchableSnapshotsPersistentCacheIntegTests.testCacheSurviveRestart#66578tlrx wants to merge 1 commit intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-distributed (Team:Distributed) |
original-brownbear
left a comment
There was a problem hiding this comment.
This PR looks like it will likely fix/improve the issue but I wonder if we shouldn't fix this in production code instead. Maybe my understanding of this is wrong, but when trying to reproduce the issue it turned out that:
private void runIfShardMarkedAsEvictedInCache(ShardEviction shardEviction, Runnable runnable) {
try (Releasable ignored = shardsEvictionLock.acquire(shardEviction)) {
boolean success = false;
try {
if (evictedShards.remove(shardEviction)) {
runnable.run();
}
success = true;
} finally {
assert success : "shard eviction should be successful: " + shardEviction;
if (success == false) {
final boolean added = evictedShards.add(shardEviction);
assert added : shardEviction;
}
}
}
}
trips the assertion for success, leading to indefinitely waiting on a listener (it trips because the CacheService is already shut down). If we ensure that the evictions go through before we kill the CacheService we fix the test, but the assertion that's causing all of this still doesn't hold.
Shouldn't we either drop that assertion (or maybe better but more involved yet fix the code to not trip the assertion by fixing the shutdown of CacheService) ?
|
@original-brownbear Agreed. Henning already suggested to improve runIfShardMarkedAsEvictedInCache and I have ideas for that. I'll open a PR. |
|
I've opened #67160 which should fix the production code. |
Previous attempt in #66354 wasn't enough to fix this test. The eviction of cache files is executed using a loop in the generic thread pool so it's possible that
succeed but not all cache files are evicted yet, and the last assertion of the test failed.
Closes #66278