Fix failing tests on `feature/desired-balance-allocator` branch

Following tests are failing and need to be fixed before `DesiredBalanceShardsAllocator` could be merged to master.

- [x] DesiredBalanceServiceTests.* (https://github.com/elastic/elasticsearch/pull/86435)
- [x] DesiredBalanceReconcilerTests.* (https://github.com/elastic/elasticsearch/pull/86432)
- [x] ClusterAllocationExplainIT. testAllocationFilteringOnIndexCreation
- [x] ClusterHealthIT. testHealthOnIndexCreation
- [x] CorruptedFileIT. testCorruptionOnNetworkLayer (Replica shard is not started and remains in error state)
- [x] CorruptedFileIT. testReplicaCorruption
- [x] FilteringAllocationIT. testDecommissionNodeNoReplicas
- [x] IndexFoldersDeletionListenerIT. testListenersInvokedWhenIndexHasLeftOverShard (small probability to stuck after `logger.debug("--> creating a new index [{}]", indexName);`)
- [x] IndexRecoveryIT. testCancelNewShardRecoveryAndUsesExistingShardCopy
- [x] IndexRecoveryIT. testDoNotInfinitelyWaitForMapping (timed out waiting for green state: `ALLOCATION_FAILED, failed shard on node [y0Jt0-QrSu29efTbYU8AdQ]: failed to create index, failure org.elasticsearch.index.mapper.MapperParsingException: simulate mapping parsing error`)
- [x] IndexRecoveryIT. testCancelRecoveryWithAutoExpandReplicas (stuck after creating index [0-all] index in a cluster with a single master and no data nodes)
- [x] RareClusterStateIT. testDeleteCreateInOneBulk (consistently timing out on creating index with 0s timeout)
- [x] RecoveryFromGatewayIT. testSingleNodeNoFlush
- [x] ReplicaShardAllocatorIT. testDoNotCancelRecoveryForBrokenNode (timed out waiting for green state: `ALLOCATION_FAILED, failed recovery, failure org.elasticsearch.indices.recovery.RecoveryFailedException`)
- [x] ReplicaShardAllocatorIT. testPreferCopyCanPerformNoopRecovery
- [x] ReplicaShardAllocatorIT. testPreferCopyWithHighestMatchingOperations
- [x] ReplicaShardAllocatorIT. testPeerRecoveryForClosedIndices (<5% probability)
- [x] ReplicaShardAllocatorSyncIdIT. testPreferCopyCanPerformNoopRecovery
- [x] SimpleIndexStateIT. testFastCloseAfterCreateContinuesCreateAfterOpen (~50% failure rate with `Expected: <RED> but: was <YELLOW>` when creating index that could not be allocated)
- [x] TransportSearchFailuresIT. testFailedSearchWithWrongQuery (~1% probability to timeout on `logger.info("Done Cluster Health, status {}", clusterHealth.getStatus());`, looks like it is more likely ~5% with `-Dtests.seed=F9E8E5F50A9C9B21`) 
- [x] UpdateShardAllocationSettingsIT. testUpdateSameHostSetting
- [x] ClusterRerouteIT. testDelayWithALargeAmountOfShards (might rarely timeout. Shards balance is not converging 250 shards over 3 data nodes with ~5% probability). Related to: #88384
- [x] GetGlobalCheckpointsActionIT. testWaitOnIndexCreated (repeatedly failing)
- [x] GetGlobalCheckpointsActionIT#testWaitOnPrimaryShardThrottled (`cluster.routing.allocation.node_initial_primaries_recoveries=0` prevents balance from converging)
- [x] org.elasticsearch.datastreams.DataStreamMigrationIT. testBasicMigration (times out when executing migration, listener is not called in the else branch)
- [x] NodeShutdownShardsIT. testNodeReplacementOnlyAllowsShardsFromReplacedNode
- [x] test {yaml=indices.split/30_copy_settings/Copy settings during split index}
- [x] test {yaml=indices.shrink/30_copy_settings/Copy settings during shrink index}
- [x] TransformAuditorIT.testAliasCreatedforBWCIndexes

### org.elasticsearch.action.admin.indices.shrink.TransportResizeAction

- [x] ShrinkIndexIT. testCreateShrinkIndexToN (`[NO(initial allocation of the shrunken index is only allowed on nodes [_id:"hg09_hMfS3uDUfv93xggmA"] that hold a copy of every shard in the index)]`)
- [x] ShrinkIndexIT. testShrinkThenSplitWithFailedNode (`NO(initial allocation of the shrunken index is only allowed on nodes [_id:"eh7a8csCQzOwCePaFlw9xA"] that hold a copy of every shard in the index)`)
- [x] SplitIndexIT. testCreateSplitIndexToN (`NO(source primary is allocated on another node)`)
- [x] SplitIndexIT. testSplitFromOneToN (`NO(source primary is allocated on another node)`)
- [x] SplitIndexIT. testSplitIndexPrimaryTerm (`NO(source primary is allocated on another node)`)
- [x] PartitionedRoutingIT. testShrinking

### HasFrozenCacheAllocationDecider

- [x] various searchable snapshot test failures due to throttling when `xpack.searchable.snapshot.shared_cache.size` is not yet reported
- [x] FrozenExistenceDeciderIT. testZeroToOne fails for the same reason

### MoveAllocationCommand usage

- [x] ClusterRerouteIT. testClusterRerouteWithBlocks (uses `MoveAllocationCommand`)
- [x] IndexPrimaryRelocationIT. testPrimaryRelocationWhileIndexing (uses `MoveAllocationCommand`)
- [x] IndexRecoveryIT. testRerouteRecovery
- [x] IndicesStoreIntegrationIT. testIndexCleanup (~10% to stuck when running individually)
- [x] RelocationIT. testRelocationWhileIndexingRandom (MoveAllocationCommand)
- [x] RelocationIT. testRelocationWhileRefreshing (MoveAllocationCommand)

### not retrying shard allocation after an error

### setWaitForNoRelocatingShards(true) should wait for desired balance to converge

- [x] AwarenessAllocationIT. testAwarenessZonesIncrementalNodes (health `setWaitForNoRelocatingShards(true)` is not waiting for a pending desired balance computation ~10% chance)

### Snapshot related tests

- [x] AbortedRestoreIT. testAbortedRestoreAlsoAbortFileRestores
- [x] BlobStoreIncrementalityIT. testIncrementalBehaviorOnPrimaryFailover (20% chance failure with `timed out waiting for green state`)
- [x] FsBlobStoreRepositoryIntegTests. testSnapshotAndRestore
- [x] IndicesOptionsIntegrationIT. testWildcardBehaviourSnapshotRestore
- [x] MetadataLoadingDuringSnapshotRestoreIT. testWhenMetadataAreLoaded
- [x] ConcurrentSnapshotsIT. testConcurrentRestoreDeleteAndClone
- [x] CorruptedBlobStoreRepositoryIT. *
- [x] DataStreamsSnapshotsIT. *
- [x] DedicatedClusterSnapshotRestoreIT. *
- [x] DiskThresholdDeciderIT. testRestoreSnapshotAllocationDoesNotExceedWatermark
- [x] RestoreSnapshotIT. *
- [x] SharedClusterSnapshotRestoreIT.testUnrestorableIndexDuringRestore (this test is stuck when running individually)
- [x] SnapshotCustomPluginStateIT. testIncludeGlobalState
- [x] SnapshotStressTestsIT. testRandomActivities
- [x] SystemDataStreamSnapshotIT. *
- [x] SystemIndicesSnapshotIT. *

4308 integration tests passed.

### ESAllocationTestCase related unit test failures when using desired balance allocator

- [x] [TrackFailedAllocationNodesTests » testTrackFailedNodes](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.TrackFailedAllocationNodesTests/testTrackFailedNodes) (retries allocation on different nodes)
- [x] [ClusterRebalanceRoutingTests » testClusterPrimariesActive1](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.ClusterRebalanceRoutingTests/testClusterPrimariesActive1)
- [x] [ClusterRebalanceRoutingTests » testAlways](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.ClusterRebalanceRoutingTests/testAlways)
- [x] [FailedShardsRoutingTests » testFailAllReplicasInitializingOnPrimaryFail](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.FailedShardsRoutingTests/testFailAllReplicasInitializingOnPrimaryFail) (retries allocation on different node)
- [x] [FailedShardsRoutingTests » testRebalanceFailure](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.FailedShardsRoutingTests/testRebalanceFailure)
- [x] [FailedShardsRoutingTests » testFailAllReplicasInitializingOnPrimaryFailWhil...](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.FailedShardsRoutingTests/testFailAllReplicasInitializingOnPrimaryFailWhileHavingAReplicaToElect) (retries allocation on different node)
- [x] [PrimaryElectionRoutingTests » testRemovingInitializingReplicasIfPrimariesFails](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.PrimaryElectionRoutingTests/testRemovingInitializingReplicasIfPrimariesFails)
- [x] [InSyncAllocationIdTests » testInSyncAllocationIdsUpdated](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.InSyncAllocationIdTests/testInSyncAllocationIdsUpdated)
- [x] [IndexBalanceTests » testBalanceIncrementallyStartNodes](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.IndexBalanceTests/testBalanceIncrementallyStartNodes) (unnecessary rebalancing)
- [x] [EnableAllocationTests » testEnableClusterBalance](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.decider.EnableAllocationTests/testEnableClusterBalance) (using different nodes to allocate)
- [x] [RoutingIteratorTests » testNodeSelectorRouting](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.structure.RoutingIteratorTests/testNodeSelectorRouting) (different order of iterating)
- [x] [RoutingNodesIntegrityTests » testBalanceIncrementallyStartNodes](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.RoutingNodesIntegrityTests/testBalanceIncrementallyStartNodes) (unnecessary rebalancing)
- [x] [RoutingNodesIntegrityTests » testBalanceAllNodesStartedAddIndex](https://gradle-enterprise.elastic.co/s/xddpcxmjy42a2/tests/:server:test/org.elasticsearch.cluster.routing.allocation.RoutingNodesIntegrityTests/testBalanceAllNodesStartedAddIndex) (using different nodes to allocate)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix failing tests on `feature/desired-balance-allocator` branch #86429

org.elasticsearch.action.admin.indices.shrink.TransportResizeAction

HasFrozenCacheAllocationDecider

MoveAllocationCommand usage

not retrying shard allocation after an error

setWaitForNoRelocatingShards(true) should wait for desired balance to converge

Snapshot related tests

ESAllocationTestCase related unit test failures when using desired balance allocator

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix failing tests on feature/desired-balance-allocator branch #86429

Description

org.elasticsearch.action.admin.indices.shrink.TransportResizeAction

HasFrozenCacheAllocationDecider

MoveAllocationCommand usage

not retrying shard allocation after an error

setWaitForNoRelocatingShards(true) should wait for desired balance to converge

Snapshot related tests

ESAllocationTestCase related unit test failures when using desired balance allocator

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Fix failing tests on `feature/desired-balance-allocator` branch #86429