-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)Team:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Description
Following tests are failing and need to be fixed before DesiredBalanceShardsAllocator could be merged to master.
- DesiredBalanceServiceTests.* (Fix DesiredBalanceServiceTests #86435)
- DesiredBalanceReconcilerTests.* (Fix DesiredBalanceReconcilerTests#testFailsNewPrimariesIfNoDataNodes #86432)
- ClusterAllocationExplainIT. testAllocationFilteringOnIndexCreation
- ClusterHealthIT. testHealthOnIndexCreation
- CorruptedFileIT. testCorruptionOnNetworkLayer (Replica shard is not started and remains in error state)
- CorruptedFileIT. testReplicaCorruption
- FilteringAllocationIT. testDecommissionNodeNoReplicas
- IndexFoldersDeletionListenerIT. testListenersInvokedWhenIndexHasLeftOverShard (small probability to stuck after
logger.debug("--> creating a new index [{}]", indexName);) - IndexRecoveryIT. testCancelNewShardRecoveryAndUsesExistingShardCopy
- IndexRecoveryIT. testDoNotInfinitelyWaitForMapping (timed out waiting for green state:
ALLOCATION_FAILED, failed shard on node [y0Jt0-QrSu29efTbYU8AdQ]: failed to create index, failure org.elasticsearch.index.mapper.MapperParsingException: simulate mapping parsing error) - IndexRecoveryIT. testCancelRecoveryWithAutoExpandReplicas (stuck after creating index [0-all] index in a cluster with a single master and no data nodes)
- RareClusterStateIT. testDeleteCreateInOneBulk (consistently timing out on creating index with 0s timeout)
- RecoveryFromGatewayIT. testSingleNodeNoFlush
- ReplicaShardAllocatorIT. testDoNotCancelRecoveryForBrokenNode (timed out waiting for green state:
ALLOCATION_FAILED, failed recovery, failure org.elasticsearch.indices.recovery.RecoveryFailedException) - ReplicaShardAllocatorIT. testPreferCopyCanPerformNoopRecovery
- ReplicaShardAllocatorIT. testPreferCopyWithHighestMatchingOperations
- ReplicaShardAllocatorIT. testPeerRecoveryForClosedIndices (<5% probability)
- ReplicaShardAllocatorSyncIdIT. testPreferCopyCanPerformNoopRecovery
- SimpleIndexStateIT. testFastCloseAfterCreateContinuesCreateAfterOpen (~50% failure rate with
Expected: <RED> but: was <YELLOW>when creating index that could not be allocated) - TransportSearchFailuresIT. testFailedSearchWithWrongQuery (~1% probability to timeout on
logger.info("Done Cluster Health, status {}", clusterHealth.getStatus());, looks like it is more likely ~5% with-Dtests.seed=F9E8E5F50A9C9B21) - UpdateShardAllocationSettingsIT. testUpdateSameHostSetting
- ClusterRerouteIT. testDelayWithALargeAmountOfShards (might rarely timeout. Shards balance is not converging 250 shards over 3 data nodes with ~5% probability). Related to: BalancedShardsAllocator rebalancing might move shards but not improve the balance #88384
- GetGlobalCheckpointsActionIT. testWaitOnIndexCreated (repeatedly failing)
- GetGlobalCheckpointsActionIT#testWaitOnPrimaryShardThrottled (
cluster.routing.allocation.node_initial_primaries_recoveries=0prevents balance from converging) - org.elasticsearch.datastreams.DataStreamMigrationIT. testBasicMigration (times out when executing migration, listener is not called in the else branch)
- NodeShutdownShardsIT. testNodeReplacementOnlyAllowsShardsFromReplacedNode
- test {yaml=indices.split/30_copy_settings/Copy settings during split index}
- test {yaml=indices.shrink/30_copy_settings/Copy settings during shrink index}
- TransformAuditorIT.testAliasCreatedforBWCIndexes
org.elasticsearch.action.admin.indices.shrink.TransportResizeAction
- ShrinkIndexIT. testCreateShrinkIndexToN (
[NO(initial allocation of the shrunken index is only allowed on nodes [_id:"hg09_hMfS3uDUfv93xggmA"] that hold a copy of every shard in the index)]) - ShrinkIndexIT. testShrinkThenSplitWithFailedNode (
NO(initial allocation of the shrunken index is only allowed on nodes [_id:"eh7a8csCQzOwCePaFlw9xA"] that hold a copy of every shard in the index)) - SplitIndexIT. testCreateSplitIndexToN (
NO(source primary is allocated on another node)) - SplitIndexIT. testSplitFromOneToN (
NO(source primary is allocated on another node)) - SplitIndexIT. testSplitIndexPrimaryTerm (
NO(source primary is allocated on another node)) - PartitionedRoutingIT. testShrinking
HasFrozenCacheAllocationDecider
- various searchable snapshot test failures due to throttling when
xpack.searchable.snapshot.shared_cache.sizeis not yet reported - FrozenExistenceDeciderIT. testZeroToOne fails for the same reason
MoveAllocationCommand usage
- ClusterRerouteIT. testClusterRerouteWithBlocks (uses
MoveAllocationCommand) - IndexPrimaryRelocationIT. testPrimaryRelocationWhileIndexing (uses
MoveAllocationCommand) - IndexRecoveryIT. testRerouteRecovery
- IndicesStoreIntegrationIT. testIndexCleanup (~10% to stuck when running individually)
- RelocationIT. testRelocationWhileIndexingRandom (MoveAllocationCommand)
- RelocationIT. testRelocationWhileRefreshing (MoveAllocationCommand)
not retrying shard allocation after an error
setWaitForNoRelocatingShards(true) should wait for desired balance to converge
- AwarenessAllocationIT. testAwarenessZonesIncrementalNodes (health
setWaitForNoRelocatingShards(true)is not waiting for a pending desired balance computation ~10% chance)
Snapshot related tests
- AbortedRestoreIT. testAbortedRestoreAlsoAbortFileRestores
- BlobStoreIncrementalityIT. testIncrementalBehaviorOnPrimaryFailover (20% chance failure with
timed out waiting for green state) - FsBlobStoreRepositoryIntegTests. testSnapshotAndRestore
- IndicesOptionsIntegrationIT. testWildcardBehaviourSnapshotRestore
- MetadataLoadingDuringSnapshotRestoreIT. testWhenMetadataAreLoaded
- ConcurrentSnapshotsIT. testConcurrentRestoreDeleteAndClone
- CorruptedBlobStoreRepositoryIT. *
- DataStreamsSnapshotsIT. *
- DedicatedClusterSnapshotRestoreIT. *
- DiskThresholdDeciderIT. testRestoreSnapshotAllocationDoesNotExceedWatermark
- RestoreSnapshotIT. *
- SharedClusterSnapshotRestoreIT.testUnrestorableIndexDuringRestore (this test is stuck when running individually)
- SnapshotCustomPluginStateIT. testIncludeGlobalState
- SnapshotStressTestsIT. testRandomActivities
- SystemDataStreamSnapshotIT. *
- SystemIndicesSnapshotIT. *
4308 integration tests passed.
ESAllocationTestCase related unit test failures when using desired balance allocator
- TrackFailedAllocationNodesTests » testTrackFailedNodes (retries allocation on different nodes)
- ClusterRebalanceRoutingTests » testClusterPrimariesActive1
- ClusterRebalanceRoutingTests » testAlways
- FailedShardsRoutingTests » testFailAllReplicasInitializingOnPrimaryFail (retries allocation on different node)
- FailedShardsRoutingTests » testRebalanceFailure
- FailedShardsRoutingTests » testFailAllReplicasInitializingOnPrimaryFailWhil... (retries allocation on different node)
- PrimaryElectionRoutingTests » testRemovingInitializingReplicasIfPrimariesFails
- InSyncAllocationIdTests » testInSyncAllocationIdsUpdated
- IndexBalanceTests » testBalanceIncrementallyStartNodes (unnecessary rebalancing)
- EnableAllocationTests » testEnableClusterBalance (using different nodes to allocate)
- RoutingIteratorTests » testNodeSelectorRouting (different order of iterating)
- RoutingNodesIntegrityTests » testBalanceIncrementallyStartNodes (unnecessary rebalancing)
- RoutingNodesIntegrityTests » testBalanceAllNodesStartedAddIndex (using different nodes to allocate)
Metadata
Metadata
Assignees
Labels
:Distributed Coordination/AllocationAll issues relating to the decision making around placing a shard (both master logic & on the nodes)All issues relating to the decision making around placing a shard (both master logic & on the nodes)Team:Distributed (Obsolete)Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.