Skip to content

Conversation

@DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Aug 3, 2020

Today when a node fails to properly deserialize a transport message with
a parent task we log the following relatively uninformative message:

java.lang.IllegalStateException: Message not fully read (response) for requestId [9999], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$6@abcdefgh], error [false]; resetting

In particular, the wrapping of the listener in the TransportService
obscures all clues as to the source of the problem, e.g. the action name
or the identity of the underlying listener. This commit exposes the
inner listener to the logs.

Also if the listener is wrapped with ContextPreservingActionListener
then its identity is similarly hidden. This commit also exposes the
wrapped listener in this case.

Relates #38939

Today when a node fails to properly deserialize a transport message with
a parent task we log the following relatively uninformative message:

    java.lang.IllegalStateException: Message not fully read (response) for requestId [9999], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$6@abcdefgh], error [false]; resetting

In particular, the wrapping of the listener in the `TransportService`
obscures all clues as to the source of the problem, e.g. the action name
or the identity of the underlying listener. This commit exposes the
inner listener to the logs.

Also if the listener is wrapped with `ContextPreservingActionListener`
then its identity is similarly hidden. This commit also exposes the
wrapped listener in this case.

Relates elastic#38939
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Network)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 3, 2020
Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM that should make some BwC test failures much easier to track down, thanks :)

@DaveCTurner
Copy link
Contributor Author

I was also wondering about always wrapping the listener in the transport service regardless of whether it has a parent task or not, simply to add the action name to this log message. Today we only wrap it if it's a child task, so we're relying on top-level requests having a useful toString() too. WDYT?

@original-brownbear
Copy link
Contributor

Today we only wrap it if it's a child task, so we're relying on top-level requests having a useful toString() too.

I would expect to get a useful toString out of things via the class itself and the fact that org.elasticsearch.action.ActionListenerResponseHandler has a proper toString most of the times now. Maybe that's good enough? (not the biggest fan of adding yet another layer of listener wrapping just to improve logging to be honest).

@DaveCTurner DaveCTurner merged commit 85a30c0 into elastic:master Aug 3, 2020
@DaveCTurner DaveCTurner deleted the 2020-08-03-improve-deserialization-failure-logging branch August 3, 2020 10:39
DaveCTurner added a commit that referenced this pull request Aug 3, 2020
Today when a node fails to properly deserialize a transport message with
a parent task we log the following relatively uninformative message:

    java.lang.IllegalStateException: Message not fully read (response) for requestId [9999], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.transport.TransportService$6@abcdefgh], error [false]; resetting

In particular, the wrapping of the listener in the `TransportService`
obscures all clues as to the source of the problem, e.g. the action name
or the identity of the underlying listener. This commit exposes the
inner listener to the logs.

Also if the listener is wrapped with `ContextPreservingActionListener`
then its identity is similarly hidden. This commit also exposes the
wrapped listener in this case.

Relates #38939
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Network Http and internode communication implementations >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.10.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants