-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status error presentation with details #12564
Conversation
Remote Execution Status messages embedded in ExecuteResponses are extremely capable vehicles for conveying the nature of an error, and informing a user of further steps to take to remediate it. This change expands the presentation of these response Statuses, and brings all of the error details to light, by default instead of requiring --verbose_failures to investigate any details of a remote execution problem. The interpretation of precondition failures to highlight retriable responses has been expanded to ignore benign details that might be included in a response. SpawnResult error message composition has been simplified substantially, without any special behavior for 'Remote' errors, and a removal of a duplicate message printout incurred in the wake of succcessive @janakr and @olaola changes. Failure messages are now implied to be present in all spawn result failure reporting exactly once, and the failureMessage of a SpawnResult is implied to be the parameter to getDetailMessage. An example error presentation is as follows (including the modifications to SpawnResult's output formatting): ``` ERROR: /home/werkt/dev/test/BUILD:22:10: Linking test failed: (Exit 34): Remote Execution Failure: Failed Precondition: Action 4223ab2cc114385110714243a0b4a88cc743f2169b5be7d4d438a6bbba4f529f/142 is invalid Resource Info: type.googleapis.com/google.longrunning.Operation: name='shard/operations/9335fef2-184b-4d26-9a6f-2f27cebe7527', owner='tool_invocation_id:4b4bf7b1-fadd-44fd-99be-a234e7c26fc4,correlated_invocation_id:dc88325a-9317-48c0-9013-b3bb8b7a458f' Precondition Failure: (MISSING) bazel-out/k8-fastbuild/bin/test: 7872: An output could not be uploaded because it exceeded the maximum size of an entry Target //:test failed to build ```
When a retryInfo is supplied, it should circumvent any other conditions which would prevent retriability. Its delay will inform the subsequent backoff delay supplied, assuming it is not beyond the retry count.
71e3903
to
551bc7a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
import io.grpc.Status.Code; | ||
import io.grpc.protobuf.StatusProto; | ||
|
||
class ExecuteRetrier extends RemoteRetrier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please add javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope a single line is sufficient here, didn't see much else on the other classes I looked at.
for (Any detail : status.getDetailsList()) { | ||
if (detail.is(RetryInfo.class)) { | ||
try { | ||
retryInfo = detail.unpack(RetryInfo.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want the retryInfo to have a deterministic last-specified behavior. Not that I want to see multiple from the service, but if it does, the last one in the list should be effective.
for (Any detail : status.getDetailsList()) { | ||
if (detail.is(RetryInfo.class)) { | ||
// server says we can retry, regardless of other details | ||
fullyRetriable = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early return here - nothing after it is effective, and coming up with whether the precondition failure is effective is meaningless. Thanks!
Remote Execution Status messages embedded in ExecuteResponses are extremely capable vehicles for conveying the nature of an error, and informing a user of further steps to take to remediate it. This change expands the presentation of these response Statuses, and brings all of the error details to light, by default instead of requiring --verbose_failures to investigate any details of a remote execution problem. The interpretation of precondition failures to highlight retriable responses has been expanded to ignore benign details that might be included in a response. SpawnResult error message composition has been simplified substantially, without any special behavior for 'Remote' errors, and a removal of a duplicate message printout incurred in the wake of succcessive @janakr and @olaola changes. Failure messages are now implied to be present in all spawn result failure reporting exactly once, and the failureMessage of a SpawnResult is implied to be the parameter to getDetailMessage. An example error presentation is as follows (including the modifications to SpawnResult's output formatting): ``` ERROR: /home/werkt/dev/test/BUILD:22:10: Linking test failed: (Exit 34): Remote Execution Failure: Failed Precondition: Action 4223ab2cc114385110714243a0b4a88cc743f2169b5be7d4d438a6bbba4f529f/142 is invalid Resource Info: type.googleapis.com/google.longrunning.Operation: name='shard/operations/9335fef2-184b-4d26-9a6f-2f27cebe7527', owner='tool_invocation_id:4b4bf7b1-fadd-44fd-99be-a234e7c26fc4,correlated_invocation_id:dc88325a-9317-48c0-9013-b3bb8b7a458f' Precondition Failure: (MISSING) bazel-out/k8-fastbuild/bin/test: 7872: An output could not be uploaded because it exceeded the maximum size of an entry Target //:test failed to build ``` Closes #12564. PiperOrigin-RevId: 3449738
Remote Execution Status messages embedded in ExecuteResponses are extremely capable vehicles for conveying the nature of an error, and informing a user of further steps to take to remediate it. This change expands the presentation of these response Statuses, and brings all of the error details to light, by default instead of requiring --verbose_failures to investigate any details of a remote execution problem. The interpretation of precondition failures to highlight retriable responses has been expanded to ignore benign details that might be included in a response. SpawnResult error message composition has been simplified substantially, without any special behavior for 'Remote' errors, and a removal of a duplicate message printout incurred in the wake of succcessive @janakr and @olaola changes. Failure messages are now implied to be present in all spawn result failure reporting exactly once, and the failureMessage of a SpawnResult is implied to be the parameter to getDetailMessage. An example error presentation is as follows (including the modifications to SpawnResult's output formatting): ``` ERROR: /home/werkt/dev/test/BUILD:22:10: Linking test failed: (Exit 34): Remote Execution Failure: Failed Precondition: Action 4223ab2cc114385110714243a0b4a88cc743f2169b5be7d4d438a6bbba4f529f/142 is invalid Resource Info: type.googleapis.com/google.longrunning.Operation: name='shard/operations/9335fef2-184b-4d26-9a6f-2f27cebe7527', owner='tool_invocation_id:4b4bf7b1-fadd-44fd-99be-a234e7c26fc4,correlated_invocation_id:dc88325a-9317-48c0-9013-b3bb8b7a458f' Precondition Failure: (MISSING) bazel-out/k8-fastbuild/bin/test: 7872: An output could not be uploaded because it exceeded the maximum size of an entry Target //:test failed to build ``` Closes #12564. PiperOrigin-RevId: 3449738
Remote Execution Status messages embedded in ExecuteResponses are
extremely capable vehicles for conveying the nature of an error, and
informing a user of further steps to take to remediate it. This change
expands the presentation of these response Statuses, and brings all of
the error details to light, by default instead of requiring
--verbose_failures to investigate any details of a remote execution
problem.
The interpretation of precondition failures to highlight retriable
responses has been expanded to ignore benign details that might be
included in a response.
SpawnResult error message composition has been simplified substantially,
without any special behavior for 'Remote' errors, and a removal of a
duplicate message printout incurred in the wake of succcessive @janakr
and @olaola changes. Failure messages are now implied to be present in
all spawn result failure reporting exactly once, and the failureMessage
of a SpawnResult is implied to be the parameter to getDetailMessage.
An example error presentation is as follows (including the modifications
to SpawnResult's output formatting):