Fix ActionListener.map exception handling #50886

henningandersen · 2020-01-11T16:21:36Z

map would call listener.onFailure for exceptions from
listener.onResponse, but this means we could double trigger some
listeners which is generally unexpected. Instead, we should assume that
a listener's onResponse (and onFailure) implementation is responsible
for its own exception handling.

This affects many APIs across the code base. This is the first of a series
of PRs changing exception handling to adhere to the principle that an
ActionListener implementation is responsible for its own exception handling.

The demo program test here illustrates the current inconsistencies in how exceptions in listeners are handled.

map would call listener.onFailure for exceptions from listener.onResponse, but this means we could double trigger some listeners which is generally unexpected. Instead, we should assume that a listener's onResponse (and onFailure) implementation is responsible for its own exception handling.

elasticmachine · 2020-01-11T16:21:38Z

Pinging @elastic/es-core-infra (:Core/Infra/Core)

elasticmachine · 2020-01-11T16:21:40Z

Pinging @elastic/es-distributed (:Distributed/Distributed)

pugnascotia

Changes look good.

How do you feel about adding some JavaDoc to the new test cases, to explain what the expectations are?

original-brownbear · 2020-01-13T10:18:35Z

server/src/main/java/org/elasticsearch/action/ActionListener.java

+                delegate.onFailure(e);
+                return;
+            }
+            delegate.onResponse(mapped);


I see the point in this change, but note that I added the .map shortcut back when I added it to dry up a bunch of ActionListener.wrap(..., listener::onFailure) spots.
I think we'd basically have to audit every spot that we use .map in now and make sure that the listener/delegate will actually handle it's own onResponse failures (from a quick read over the spots we use map in this may already hold true).

Maybe we should assert this and do something like:

try { delegate.onResponse(mapped); } catch (Exception e) { assert false: e; throw e; }

I added the assert. Local CI seems happy about it.

henningandersen · 2020-01-13T15:16:20Z

Thanks @pugnascotia , I added javadoc here: 198651b

The DFS action relied on map notifying onFailure (sort of, at least this way it is bwc). But there seems to be no reason it cannot simply use the ChannelActionListener, so change it into using that.

…andling

henningandersen · 2020-01-17T13:42:34Z

Failure already reported here.

…andling

henningandersen · 2020-01-17T15:58:47Z

Removed WIP on this, since I think this can go in alone.

ywelsch

I have not gone through all callers of this method to check whether this is safe, but left a few small comments here.

ywelsch · 2020-01-17T16:53:18Z

server/src/main/java/org/elasticsearch/action/ActionListener.java

+            try {
+                delegate.onResponse(mapped);
+            } catch (RuntimeException e) {
+                assert false : new AssertionError("map: listener.onResponse failed", e);


why assert here but not when calling delegate.onFailure(e);?

Thanks, added that in ca31964 and tests seems unaffected.

ywelsch · 2020-01-17T16:54:48Z

server/src/main/java/org/elasticsearch/action/search/SearchTransportService.java

-            });
+            (request, channel, task) ->
+                searchService.executeDfsPhase(request, (SearchShardTask) task,
+                    new ChannelActionListener<>(channel, DFS_ACTION_NAME, request))


ChannelActionListener also has this weird double-sending logic.

Yes, but the code used to propagate the exception out. This seemed to be a left-over from when the ChannelActionListener was introduced. The old map exception handling would ensure onFailure were called on exception. This means this change is effectively a no-op now (with the caveat that onFailure will be called on exceptions like for all other ChannelActionListener usages).

We should notice that DirectTransportChannel will not bubble out exceptions from invoking the TransportResponseHandler. So it seems likely that the primary exceptions bubbled out are related to communicating the response over a wire, in which case invoking onFailure might be desirable (for instance an NPE on serialization).

So I would like to keep using ChannelActionListener here and then deal with ChannelActionListener in a follow-up. WDYT?

server/src/test/java/org/elasticsearch/action/admin/indices/create/CreateIndexIT.java

…andling

original-brownbear

LGTM thanks Henning :)

henningandersen · 2020-01-22T17:52:42Z

@elasticmachine test this please

henningandersen · 2020-01-23T12:59:22Z

Build failure looks unrelated, reported it here: #51347 .
@elasticmachine run elasticsearch-ci/2

henningandersen · 2020-01-24T08:00:06Z

One more test round:
@elasticmachine test this please

henningandersen · 2020-01-29T15:00:50Z

I have not gone through all callers of this method to check whether this is safe

I went through all calls in production (not test) code. Most are "clearly" OK, falling into one of these categories:

Clearly not able to throw except in severe bug cases like NPE.
Inner listener will not throw since it is either ActionListener.wrap og ChannelActionListener - though there is an addendum to that statement below.
Guarded by assertions enabled check (i.e. not active in production).
A choice is made to use map or not, clearly the intention was not to get the exception behavior (e.g. TransportBulkAction.BulkRequestModifier.wrapActionListenerIfNeeded).
Ultimately caught and onFailure called (transport service, security, ActionRunnable, StepListener/ListenableFuture).
Used with NotifyOnceListener inside only.

Following I did not follow to the end:

PersistentTasksService: this affects all methods here. I checked a subset of the usages only:

startPersistentTask:
- CCR/resume follow ends in CCR ResponseHandler which does counting that will disregard an additional onFailure call anyway.
- Rollup and transform uses wrap to create the listener
sendCompletionRequest: only has trace logging listeners.
updatePersistentTaskState: has a variety of listeners sent to it. Many use wrap, but some are complex and hard to chase to the end (so I did not).

The small addendum mentioned previously is that I did not consider the effect of having for instance map(map(wrap(onResponse, onFailure))) where onResponse and onFailure are functions that both throw. In that case, each additional map wrapping would previously invoke onFailure once more, such that onFailure in this case would be called 3 times.

ActionListener.map would call listener.onFailure for exceptions from listener.onResponse, but this means we could double trigger some listeners which is generally unexpected. Instead, we should assume that a listener's onResponse (and onFailure) implementation is responsible for its own exception handling.

This reverts commit 92ee0f7.

Datafeeds being closed while starting could result in and NPE. This was handled as any other failure, masking out the NPE. However, this conflicts with the changes in elastic#50886. Related to elastic#50886 and elastic#51302

Datafeeds being closed while starting could result in and NPE. This was handled as any other failure, masking out the NPE. However, this conflicts with the changes in #50886. Related to #50886 and #51302

ActionListener.map would call listener.onFailure for exceptions from listener.onResponse, but this means we could double trigger some listeners which is generally unexpected. Instead, we should assume that a listener's onResponse (and onFailure) implementation is responsible for its own exception handling.

henningandersen · 2020-01-30T11:53:00Z

Notice that this PR was merged, then reverted. After #51646 was merged, the commit was cherry-picked (identical, no changes) to reintroduce it.

ActionListener.map would call listener.onFailure for exceptions from listener.onResponse, but this means we could double trigger some listeners which is generally unexpected. Instead, we should assume that a listener's onResponse (and onFailure) implementation is responsible for its own exception handling.

ActionListener.completeWith would catch exceptions from listener.onResponse and deliver them to lister.onFailure, essentially double notifying the listener. Instead we now assert that listeners do not throw when using ActionListener.completeWith. Relates elastic#50886

ActionListener.completeWith would catch exceptions from listener.onResponse and deliver them to lister.onFailure, essentially double notifying the listener. Instead we now assert that listeners do not throw when using ActionListener.completeWith. Relates #50886

henningandersen added >bug WIP :Core/Infra/Core Core issues without another label >breaking-java :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. v8.0.0 v7.6.0 labels Jan 11, 2020

Fix checkstyle

3087981

pugnascotia approved these changes Jan 13, 2020

View reviewed changes

original-brownbear reviewed Jan 13, 2020

View reviewed changes

Add javadoc

198651b

Fix SearchTransportService DFS action

62eaedc

The DFS action relied on map notifying onFailure (sort of, at least this way it is bwc). But there seems to be no reason it cannot simply use the ChannelActionListener, so change it into using that.

henningandersen mentioned this pull request Jan 15, 2020

Block too many concurrent mapping updates #51038

Merged

$@polyfractal$ polyfractal added v7.7.0 and removed v7.6.0 labels Jan 15, 2020

henningandersen added 4 commits January 16, 2020 16:32

Try assert

15e4199

Adapt to asserting on failure.

2b24d57

Remove test that cannot live with assertion error.

0dae4f4

Merge remote-tracking branch 'origin/master' into fix_map_exception_h…

b62a2b7

…andling

Merge remote-tracking branch 'origin/master' into fix_map_exception_h…

65ba493

…andling

henningandersen removed the WIP label Jan 17, 2020

henningandersen requested review from original-brownbear and ywelsch January 17, 2020 16:03

ywelsch reviewed Jan 17, 2020

View reviewed changes

henningandersen added 2 commits January 22, 2020 12:45

Merge remote-tracking branch 'origin/master' into fix_map_exception_h…

084790e

…andling

Armins improved javadoc

534c2bc

original-brownbear approved these changes Jan 22, 2020

View reviewed changes

henningandersen merged commit 92ee0f7 into elastic:master Jan 29, 2020

henningandersen added the backport pending label Jan 29, 2020

henningandersen added a commit that referenced this pull request Jan 29, 2020

Revert "Fix ActionListener.map exception handling (#50886)"

96a0a2f

This reverts commit 92ee0f7.

henningandersen mentioned this pull request Jan 29, 2020

[ML] Fix possible race condition starting datafeed #51646

Merged

henningandersen mentioned this pull request Jan 30, 2020

[TEST] Fix ActionListener.map exception handling (#50886) #51659

Closed

henningandersen removed the backport pending label Jan 30, 2020

henningandersen mentioned this pull request Jan 31, 2020

Fix completeWith exception handling #51734

Merged

This was referenced Apr 1, 2020

7.7.0 meta ticket elastic/elasticsearch-net#4525

Closed

7.7.0 meta ticket (Part 3) elastic/elasticsearch-net#4534

Closed

original-brownbear mentioned this pull request Mar 15, 2021

Cleanup more ActionListener Delegation Spots #69662

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Fix ActionListener.map exception handling #50886

Fix ActionListener.map exception handling #50886

Uh oh!

Conversation

henningandersen commented Jan 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticmachine commented Jan 11, 2020

Uh oh!

elasticmachine commented Jan 11, 2020

Uh oh!

pugnascotia left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear Jan 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen Jan 16, 2020

Choose a reason for hiding this comment

Uh oh!

henningandersen commented Jan 13, 2020

Uh oh!

henningandersen commented Jan 17, 2020

Uh oh!

henningandersen commented Jan 17, 2020

Uh oh!

ywelsch left a comment

Choose a reason for hiding this comment

Uh oh!

ywelsch Jan 17, 2020

Choose a reason for hiding this comment

Uh oh!

henningandersen Jan 24, 2020

Choose a reason for hiding this comment

Uh oh!

ywelsch Jan 17, 2020

Choose a reason for hiding this comment

Uh oh!

henningandersen Jan 24, 2020

Choose a reason for hiding this comment

Uh oh!

ywelsch Jan 24, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen commented Jan 22, 2020

Uh oh!

henningandersen commented Jan 23, 2020

Uh oh!

henningandersen commented Jan 24, 2020

Uh oh!

henningandersen commented Jan 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henningandersen commented Jan 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

henningandersen commented Jan 11, 2020 •

edited

Loading

original-brownbear Jan 13, 2020 •

edited

Loading

henningandersen commented Jan 29, 2020 •

edited

Loading