XDS Connection closure smart logging#14616
XDS Connection closure smart logging#14616mandarjog wants to merge 3 commits intoenvoyproxy:mainfrom
Conversation
bd2d53d to
49c88ac
Compare
XDS connection closures are logged as warnings only for repeated failures for certain type of status codes. Signed-off-by: Mandar U Jog <mjog@google.com>
49c88ac to
98e6f69
Compare
htuch
left a comment
There was a problem hiding this comment.
Thanks; @mandarjog can you add tests for all these new behaviors? There's a lot of different branch conditions, so we should have some coverage of the logic here.
/wait
| stream_ = async_client_->start(service_method_, *this, Http::AsyncClient::StreamOptions()); | ||
| if (stream_ == nullptr) { | ||
| ENVOY_LOG(warn, "Unable to establish new stream"); | ||
| ENVOY_LOG(debug, "Unable to establish new grpc config stream"); |
There was a problem hiding this comment.
Shouldn't this be a warning? There's a real failure ocurring.
There was a problem hiding this comment.
The real failure with error message is recorded in the call back. I don;t think there is a case where something is not recorded in the callback, but is recorded here. This is also the source of noise since it does not tell the reason.
source/common/config/grpc_stream.h
Outdated
| void onReceiveMessage(ResponseProtoPtr<ResponseProto>&& message) override { | ||
| // Reset here so that it starts with fresh backoff interval on next disconnect. | ||
| backoff_strategy_->reset(); | ||
| unsetFailure(); |
There was a problem hiding this comment.
Why is this needed? Isn't it sufficient to unset on successful connect?
There was a problem hiding this comment.
It is the comment on the next line that lead me to believe I need a reset here, to.
But I think it makes sense to remove it. ack.
|
A release note may be nice, as this will impact everyone's logs |
howardjohn
left a comment
There was a problem hiding this comment.
I am not sure masking errors to debug level is the right approach.
Moving status code 0 to debug level makes sense to me, but I think we want to see errors?
Signed-off-by: Mandar U Jog <mjog@google.com>
|
I wasn't sure from the code, do we log the first error? For example, in the above ^, do we log the failure 3 or failure 0? In my opinion, logging the first failure is ideal, because if we have a log on first failure and successful connections, then we know that everything between failure 0 and "connected" is more failures, even if they are not explicitly logged. if we log only latter failures we may miss when the issues first start. |
|
@htuch can you point me to an example of faking time ? |
|
If xds reconnected in the allotted time, then we do not log anything. ( there is no connection successful message yet) at debug level, we log every event. If the connection does not recover in allotted time, we will print the original error and state the amount of time this condition has been going on. |
|
@mandarjog you want a fake time source for the dispatcher, see the |
Signed-off-by: Mandar U Jog <mjog@google.com>
|
precheck deps succeeds locally |
|
I am thinking of adding and INFO message on successful connection, and it will also state how long was it disconnected. |
|
Sounds reasonable, this is a pretty important diagnostic that applies to many use cases. We can always drop log level if folks find it too spammy. |
|
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
|
This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
XDS connection closures are logged as warnings only for
repeated failures for certain type of status codes.
Fixes #14591
Signed-off-by: Mandar U Jog mjog@google.com
Will follow up with tests if this approach is ok.