-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Is your feature request related to a problem? Please describe.
Test retries were added as a tactical fix to deal with non-deterministic test failures. However, this has allowed us to introduce new tests that are flaky. The test retries should be a tactical fix for existing tests that are flaky, but we should never rely on retries for newly added tests, or at least certainly not by default.
Describe the solution you'd like
Identify flaky tests from the last month or so using the flaky test finder script or something similar, then explicitly retry only those tests in the test retry configuration. Specifically, this could look like:
// test retry configuration
subprojects {
apply plugin: "org.gradle.test-retry"
tasks.withType(Test).configureEach {
retry {
if (BuildParams.isCi()) {
maxRetries = 3
maxFailures = 10
}
failOnPassedAfterRetry = false
classRetry {
includeClasses.add("org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests")
includeClasses.add("org.opensearch.action.admin.indices.create.ShrinkIndexIT")
includeClasses.add("org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT")
includeClasses.add("org.opensearch.cluster.ClusterHealthIT")
includeClasses.add("org.opensearch.cluster.allocation.AwarenessAllocationIT")
includeClasses.add("org.opensearch.cluster.allocation.ClusterRerouteIT")
includeClasses.add("org.opensearch.cluster.metadata.IndexGraveyardTests")
includeClasses.add("org.opensearch.cluster.routing.MovePrimaryFirstTests")
includeClasses.add("org.opensearch.cluster.service.MasterServiceTests")
includeClasses.add("org.opensearch.gateway.RecoveryFromGatewayIT")
includeClasses.add("org.opensearch.http.SearchRestCancellationIT")
includeClasses.add("org.opensearch.index.IndexServiceTests")
includeClasses.add("org.opensearch.index.IndexSettingsTests")
includeClasses.add("org.opensearch.index.SegmentReplicationPressureIT")
includeClasses.add("org.opensearch.index.reindex.BulkByScrollResponseTests")
includeClasses.add("org.opensearch.index.shard.RemoteStoreRefreshListenerTests")
includeClasses.add("org.opensearch.index.translog.RemoteFSTranslogTests")
includeClasses.add("org.opensearch.indices.replication.RemoteStoreReplicationSourceTests")
includeClasses.add("org.opensearch.indices.replication.SegmentReplicationIT")
includeClasses.add("org.opensearch.indices.replication.SegmentReplicationRelocationIT")
includeClasses.add("org.opensearch.indices.replication.SegmentReplicationTargetServiceTests")
includeClasses.add("org.opensearch.monitor.fs.FsHealthServiceTests")
includeClasses.add("org.opensearch.remotestore.CreateRemoteIndexIT")
includeClasses.add("org.opensearch.remotestore.CreateRemoteIndexTranslogDisabledIT")
includeClasses.add("org.opensearch.remotestore.RemoteStoreIT")
includeClasses.add("org.opensearch.remotestore.RemoteStoreStatsIT")
includeClasses.add("org.opensearch.remotestore.SegmentReplicationRemoteStoreIT")
includeClasses.add("org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT")
includeClasses.add("org.opensearch.remotestore.multipart.RemoteStoreMultipartIT")
includeClasses.add("org.opensearch.repositories.azure.AzureBlobContainerRetriesTests")
includeClasses.add("org.opensearch.search.ConcurrentSegmentSearchTimeoutIT")
includeClasses.add("org.opensearch.search.SearchTimeoutIT")
includeClasses.add("org.opensearch.search.SearchWeightedRoutingIT")
includeClasses.add("org.opensearch.search.pit.DeletePitMultiNodeIT")
includeClasses.add("org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT")
includeClasses.add("org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT")
includeClasses.add("org.opensearch.snapshots.SnapshotStatusApisIT
}
}
}
}
(The above list of tests was generated by taking all unique class names reported by the flaky test finder script using these parameters: ruby flaky-test-finder.rb -s 20000 -e 20597)