Skip to content

Retry only known-flaky tests #8798

@andrross

Description

@andrross

Is your feature request related to a problem? Please describe.
Test retries were added as a tactical fix to deal with non-deterministic test failures. However, this has allowed us to introduce new tests that are flaky. The test retries should be a tactical fix for existing tests that are flaky, but we should never rely on retries for newly added tests, or at least certainly not by default.

Describe the solution you'd like
Identify flaky tests from the last month or so using the flaky test finder script or something similar, then explicitly retry only those tests in the test retry configuration. Specifically, this could look like:

// test retry configuration
subprojects {
  apply plugin: "org.gradle.test-retry"
  tasks.withType(Test).configureEach {
    retry {
      if (BuildParams.isCi()) {
        maxRetries = 3
        maxFailures = 10
      }
      failOnPassedAfterRetry = false
      classRetry {
        includeClasses.add("org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests")
        includeClasses.add("org.opensearch.action.admin.indices.create.ShrinkIndexIT")
        includeClasses.add("org.opensearch.backwards.MixedClusterClientYamlTestSuiteIT")
        includeClasses.add("org.opensearch.cluster.ClusterHealthIT")
        includeClasses.add("org.opensearch.cluster.allocation.AwarenessAllocationIT")
        includeClasses.add("org.opensearch.cluster.allocation.ClusterRerouteIT")
        includeClasses.add("org.opensearch.cluster.metadata.IndexGraveyardTests")
        includeClasses.add("org.opensearch.cluster.routing.MovePrimaryFirstTests")
        includeClasses.add("org.opensearch.cluster.service.MasterServiceTests")
        includeClasses.add("org.opensearch.gateway.RecoveryFromGatewayIT")
        includeClasses.add("org.opensearch.http.SearchRestCancellationIT")
        includeClasses.add("org.opensearch.index.IndexServiceTests")
        includeClasses.add("org.opensearch.index.IndexSettingsTests")
        includeClasses.add("org.opensearch.index.SegmentReplicationPressureIT")
        includeClasses.add("org.opensearch.index.reindex.BulkByScrollResponseTests")
        includeClasses.add("org.opensearch.index.shard.RemoteStoreRefreshListenerTests")
        includeClasses.add("org.opensearch.index.translog.RemoteFSTranslogTests")
        includeClasses.add("org.opensearch.indices.replication.RemoteStoreReplicationSourceTests")
        includeClasses.add("org.opensearch.indices.replication.SegmentReplicationIT")
        includeClasses.add("org.opensearch.indices.replication.SegmentReplicationRelocationIT")
        includeClasses.add("org.opensearch.indices.replication.SegmentReplicationTargetServiceTests")
        includeClasses.add("org.opensearch.monitor.fs.FsHealthServiceTests")
        includeClasses.add("org.opensearch.remotestore.CreateRemoteIndexIT")
        includeClasses.add("org.opensearch.remotestore.CreateRemoteIndexTranslogDisabledIT")
        includeClasses.add("org.opensearch.remotestore.RemoteStoreIT")
        includeClasses.add("org.opensearch.remotestore.RemoteStoreStatsIT")
        includeClasses.add("org.opensearch.remotestore.SegmentReplicationRemoteStoreIT")
        includeClasses.add("org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT")
        includeClasses.add("org.opensearch.remotestore.multipart.RemoteStoreMultipartIT")
        includeClasses.add("org.opensearch.repositories.azure.AzureBlobContainerRetriesTests")
        includeClasses.add("org.opensearch.search.ConcurrentSegmentSearchTimeoutIT")
        includeClasses.add("org.opensearch.search.SearchTimeoutIT")
        includeClasses.add("org.opensearch.search.SearchWeightedRoutingIT")
        includeClasses.add("org.opensearch.search.pit.DeletePitMultiNodeIT")
        includeClasses.add("org.opensearch.smoketest.SmokeTestMultiNodeClientYamlTestSuiteIT")
        includeClasses.add("org.opensearch.snapshots.DedicatedClusterSnapshotRestoreIT")
        includeClasses.add("org.opensearch.snapshots.SnapshotStatusApisIT
      }
    }
  }
}

(The above list of tests was generated by taking all unique class names reported by the flaky test finder script using these parameters: ruby flaky-test-finder.rb -s 20000 -e 20597)

Metadata

Metadata

Assignees

Labels

:testAdding or fixing a testBuild Libraries & InterfacesenhancementEnhancement or improvement to existing feature or requestflaky-testRandom test failure that succeeds on second run

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions