Skip to content

Conversation

@original-brownbear
Copy link
Contributor

@original-brownbear original-brownbear commented Feb 2, 2019


  • Marked as >non-issue because this never made it into a release

* The response type here is not empty and was always wrong but this only became visible now that 0a604e3 was introduced
   * As a result of 0a604e3 we started actually handling the response
of this request and logging/handling exceptions before that we simply dropped the classcast exception here quietly using the empty response handler
* Closes #38226
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @original-brownbear.

@original-brownbear
Copy link
Contributor Author

@dnhatn thanks for the quick review, bad news though:
Seems that error was just a symptom (well it's a bug too) and the test failed again in https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-1/7125/consoleText

Looking into that now. I'd suggest if I can't fix it quickly I re-mute the tests and at least merge this fix to cut back on the noise? It seems like this could hit other tests that haven't yet failed too randomly.

@original-brownbear
Copy link
Contributor Author

Fixed in 7b26929

The issue was that the changed busy assert block died on a normal exception when concurrently loading the repository data (I guess we could look into making the behaviour of the FS repository here nicer eventually):

ERROR   0.48s J3 | SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart <<< FAILURES!
   > Throwable #1: UncategorizedExecutionException[Failed execution]; nested: NoSuchFileException[/var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/server/build/testrun/integTest/J3/temp/org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT_DF78BC25D4400645-001/tempDir-002/repos/sYzVZglzXR/indices/qCbBefHbTamRgzueAMrRvg/meta-saB1MnWJSuaATpK4KqAAVQ.dat];
   > 	at __randomizedtesting.SeedInfo.seed([DF78BC25D4400645:A7A575D3E8E33D4B]:0)
   > 	at org.elasticsearch.common.util.concurrent.FutureUtils.rethrowExecutionException(FutureUtils.java:97)
   > 	at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:62)
   > 	at org.elasticsearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:34)
   > 	at org.elasticsearch.action.ActionRequestBuilder.get(ActionRequestBuilder.java:52)
   > 	at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.lambda$testAbortedSnapshotDuringInitDoesNotStart$24(SharedClusterSnapshotRestoreIT.java:3686)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:846)
   > 	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:832)
   > 	at org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT.testAbortedSnapshotDuringInitDoesNotStart(SharedClusterSnapshotRestoreIT.java:3685)
   > 	at java.lang.Thread.run(Thread.java:748)
   > Caused by: java.nio.file.NoSuchFileException: /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request-1/server/build/testrun/integTest/J3/temp/org.elasticsearch.snapshots.SharedClusterSnapshotRestoreIT_DF78BC25D4400645-001/tempDir-002/repos/sYzVZglzXR/indices/qCbBefHbTamRgzueAMrRvg/meta-saB1MnWJSuaATpK4KqAAVQ.dat
   > 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
   > 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
   > 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
   > 	at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
   > 	at java.nio.file.Files.newByteChannel(Files.java:361)
   > 	at java.nio.file.Files.newByteChannel(Files.java:407)
   > 	at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
   > 	at org.apache.lucene.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:192)
   > 	at org.apache.lucene.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:192)
   > 	at org.apache.lucene.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:192)
   > 	at org.apache.lucene.mockfile.HandleTrackingFS.newInputStream(HandleTrackingFS.java:92)
   > 	at org.apache.lucene.mockfile.FilterFileSystemProvider.newInputStream(FilterFileSystemProvider.java:192)
   > 	at org.apache.lucene.mockfile.HandleTrackingFS.newInputStream(HandleTrackingFS.java:92)
   > 	at java.nio.file.Files.newInputStream(Files.java:152)
   > 	at org.elasticsearch.common.blobstore.fs.FsBlobContainer.readBlob(FsBlobContainer.java:120)
   > 	at org.elasticsearch.snapshots.mockstore.BlobContainerWrapper.readBlob(BlobContainerWrapper.java:48)
   > 	at org.elasticsearch.snapshots.mockstore.MockRepository$MockBlobStore$MockBlobContainer.readBlob(MockRepository.java:318)
   > 	at org.elasticsearch.repositories.blobstore.ChecksumBlobStoreFormat.readBlob(ChecksumBlobStoreFormat.java:101)
   > 	at org.elasticsearch.repositories.blobstore.BlobStoreFormat.read(BlobStoreFormat.java:90)
   > 	at org.elasticsearch.repositories.blobstore.BlobStoreRepository.getSnapshotIndexMetaData(BlobStoreRepository.java:576)
   > 	at org.elasticsearch.snapshots.SnapshotsService.snapshotShards(SnapshotsService.java:635)
   > 	at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.buildResponse(TransportSnapshotsStatusAction.java:236)
   > 	at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:99)
   > 	at org.elasticsearch.action.admin.cluster.snapshots.status.TransportSnapshotsStatusAction.masterOperation(TransportSnapshotsStatusAction.java:60)
   > 	at org.elasticsearch.action.support.master.TransportMasterNodeAction.masterOperation(TransportMasterNodeAction.java:127)
   > 	at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$2.doRun(TransportMasterNodeAction.java:208)
   > 	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:751)
   > 	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
   > 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   > 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

@original-brownbear
Copy link
Contributor Author

@dnhatn can you take another look please? :)

@original-brownbear
Copy link
Contributor Author

Jenkins run elasticsearch-ci/2

@original-brownbear original-brownbear added >test Issues or PRs that are addressing/adding tests and removed >non-issue labels Feb 2, 2019
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for an extra iteration @original-brownbear.

client.admin().cluster().prepareSnapshotStatus("repository").setSnapshots("snap").get();
assertThat(status.getSnapshots().iterator().next().getState(), equalTo(State.ABORTED));
} catch (Exception e) {
throw new AssertionError(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment to say that here we rethrow in AssertionError to force assertBusy to retry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added it :)

Copy link
Contributor

@DaveCTurner DaveCTurner Feb 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear this assertBusy failed on a local ./gradlew check - here is the log from the test:
testAbortedSnapshotDuringInitDoesNotStart-failure.log.gz.

Copy link
Contributor Author

@original-brownbear original-brownbear Feb 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveCTurner thanks! I think I have an idea where this is coming from still ... Will fix later/tomorrow :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test Issues or PRs that are addressing/adding tests v6.7.0 v7.0.0-beta1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants