Skip to content

SharedClusterSnapshotRestoreIT#testSnapshotCanceledOnRemovedShard can fail with a timeout #37888

@jtibshirani

Description

@jtibshirani

I haven't been able to reproduce the failure locally. This may be related to #37005, but the failure mode appears different.


Link to the build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+release-tests/379/

Command to reproduce:

./gradlew :server:integTest \
  -Dtests.seed=6A71FF1FC78E423D \
  -Dtests.class=org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT \
  -Dtests.method="testSnapshotCanceledOnRemovedShard" \
  -Dtests.security.manager=true \
  -Dbuild.snapshot=false \
  -Dtests.jvm.argline="-Dbuild.snapshot=false" \
  -Dtests.locale=sk \
  -Dtests.timezone=AGT \
  -Dcompiler.java=11 \
  -Druntime.java=8

Relevant excerpt from the logs:

  1> [2019-01-25T13:47:04,288][WARN ][o.e.i.c.IndicesClusterStateService] [node_s0] [[test-idx][0]] marking and sending shard failed due to [master marked shard as active, but shard has not been created, mark shard as failed]
  1> [2019-01-25T13:47:04,289][WARN ][o.e.c.r.a.AllocationService] [node_s0] failing shard [failed shard, shard [test-idx][0], node[hlnLSh4HQB2zjelilc-Cpw], [P], s[STARTED], a[id=zpjFxvKvTKSdteVl-WwdjQ], message [master marked shard as active, but shard has not been created, mark shard as failed], failure [Unknown], markAsStale [true]]
  1> [2019-01-25T13:47:04,290][INFO ][o.e.c.r.a.AllocationService] [node_s0] Cluster health status changed from [YELLOW] to [RED] (reason: [shards failed [[test-idx][0]] ...]).
  1> [2019-01-25T13:47:04,377][INFO ][o.e.c.r.a.AllocationService] [node_s0] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[test-idx][0]] ...]).
  1> [2019-01-25T13:47:04,383][INFO ][o.e.s.SharedClusterSnapshotRestoreIT] [testSnapshotCanceledOnRemovedShard] [SharedClusterSnapshotRestoreIT#testSnapshotCanceledOnRemovedShard]: cleaning up after test
  1> [2019-01-25T13:47:04,398][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [test-idx/FzaAc9xuTISiciuyn9EyAA] deleting index
  1> [2019-01-25T13:47:04,421][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_s0] removing template [random_index_template]
  1> [2019-01-25T13:47:04,442][INFO ][o.e.r.RepositoriesService] [node_s0] delete repository [test-repo]
  1> [2019-01-25T13:47:04,459][INFO ][o.e.s.SharedClusterSnapshotRestoreIT] [testSnapshotCanceledOnRemovedShard] [SharedClusterSnapshotRestoreIT#testSnapshotCanceledOnRemovedShard]: cleaned up after test
  1> [2019-01-25T13:47:04,459][INFO ][o.e.s.SharedClusterSnapshotRestoreIT] [testSnapshotCanceledOnRemovedShard] after test
FAILURE 14.0s J6 | SharedClusterSnapshotRestoreIT.testSnapshotCanceledOnRemovedShard <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: Timeout!!!
   > 	at __randomizedtesting.SeedInfo.seed([6A71FF1FC78E423D:D288D05BCBF00CD1]:0)
   > 	at org.elasticsearch.snapshots.AbstractSnapshotIntegTestCase.waitForCompletion(AbstractSnapshotIntegTestCase.java:134)
   > 	at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.testSnapshotCanceledOnRemovedShard(SharedClusterSnapshotRestoreIT.java:3231)

Full build logs: snapshot-failure.txt

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions