Skip to content

[CI] BWC failures for :qa:full-cluster-restart testOperationBasedRecovery #51274

@williamrandolph

Description

@williamrandolph

Example build failure

Jenkins build: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+default-distro+bwc/BWC_VERSION=7.2.1,nodes=centos-7&&immutable/502/
Build scan: https://gradle-enterprise.elastic.co/s/o257sqrvlr37k

Reproduction line

When I tried to reproduce this locally, I got an unrelated error. Perhaps I'm not set up correctly for these integration tests.

REPRODUCE WITH: ./gradlew ':qa:full-cluster-restart:v7.2.1#upgradedClusterTest' --tests "org.elasticsearch.upgrades.FullClusterRestartIT.testOperationBasedRecovery" \
  -Dtests.seed=E537B7AF25FA7DE1 \
  -Dtests.security.manager=true \
  -Dtests.locale=es-AR \
  -Dtests.timezone=America/Buenos_Aires \
  -Dtests.distribution=default \
  -Dcompiler.java=13

REPRODUCE WITH: ./gradlew ':x-pack:qa:full-cluster-restart:v7.2.1#upgradedClusterTest' --tests "org.elasticsearch.xpack.restart.CoreFullClusterRestartIT.testOperationBasedRecovery" \
  -Dtests.seed=E537B7AF25FA7DE1 \
  -Dtests.security.manager=true \
  -Dtests.locale=zh-TW \
  -Dtests.timezone=MIT \
  -Dtests.distribution=default \
  -Dcompiler.java=13

Example relevant log:

java.lang.AssertionError: 
Expected: an empty collection
     but: <[{name=_0.cfe, length_in_bytes=405, reused=false, recovered_in_bytes=405}, {name=_0.si, length_in_bytes=383, reused=false, recovered_in_bytes=383}, {name=_0_2_Lucene80_0.dvm, length_in_bytes=160, reused=false, recovered_in_bytes=160}, {name=_2.si, length_in_bytes=383, reused=false, recovered_in_bytes=383}, {name=_0.cfs, length_in_bytes=4542, reused=false, recovered_in_bytes=4542}, {name=_2.cfe, length_in_bytes=405, reused=false, recovered_in_bytes=405}, {name=_0_2.fnm, length_in_bytes=906, reused=false, recovered_in_bytes=906}, {name=_2.cfs, length_in_bytes=2637, reused=false, recovered_in_bytes=2637}, {name=_0_2_Lucene80_0.dvd, length_in_bytes=97, reused=false, recovered_in_bytes=97}, {name=segments_5, length_in_bytes=443, reused=false, recovered_in_bytes=443}]>
	at __randomizedtesting.SeedInfo.seed([E537B7AF25FA7DE1:F8DBBB8A237C8796]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at org.elasticsearch.test.rest.ESRestTestCase.assertNoFileBasedRecovery(ESRestTestCase.java:1137)
	at org.elasticsearch.upgrades.FullClusterRestartIT.testOperationBasedRecovery(FullClusterRestartIT.java:1295)
	[…]

Frequency

This failure began to appear yesterday and has cropped up on several cycles of scheduled BWC tests. We have 42 failures so far, according to build tests. The first failures came just after #51189 was merged. That PR is entitled "Use Lucene index in peer recovery and resync" and it touched the FullClusterRestartIT class that's failing, so it seems like it might be a good place to start looking.

Metadata

Metadata

Assignees

Labels

:Distributed Indexing/RecoveryAnything around constructing a new shard, either from a local or a remote source.>test-failureTriaged test failures from CI

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions