Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status error presentation with details #12564

Closed
wants to merge 3 commits into from

Conversation

werkt
Copy link
Contributor

@werkt werkt commented Nov 26, 2020

Remote Execution Status messages embedded in ExecuteResponses are
extremely capable vehicles for conveying the nature of an error, and
informing a user of further steps to take to remediate it. This change
expands the presentation of these response Statuses, and brings all of
the error details to light, by default instead of requiring
--verbose_failures to investigate any details of a remote execution
problem.

The interpretation of precondition failures to highlight retriable
responses has been expanded to ignore benign details that might be
included in a response.

SpawnResult error message composition has been simplified substantially,
without any special behavior for 'Remote' errors, and a removal of a
duplicate message printout incurred in the wake of succcessive @janakr
and @olaola changes. Failure messages are now implied to be present in
all spawn result failure reporting exactly once, and the failureMessage
of a SpawnResult is implied to be the parameter to getDetailMessage.

An example error presentation is as follows (including the modifications
to SpawnResult's output formatting):

ERROR: /home/werkt/dev/test/BUILD:22:10: Linking test failed: (Exit 34): Remote Execution Failure:
Failed Precondition: Action 4223ab2cc114385110714243a0b4a88cc743f2169b5be7d4d438a6bbba4f529f/142 is invalid
  Resource Info: type.googleapis.com/google.longrunning.Operation: name='shard/operations/9335fef2-184b-4d26-9a6f-2f27cebe7527', owner='tool_invocation_id:4b4bf7b1-fadd-44fd-99be-a234e7c26fc4,correlated_invocation_id:dc88325a-9317-48c0-9013-b3bb8b7a458f'
  Precondition Failure:
    (MISSING) bazel-out/k8-fastbuild/bin/test: 7872: An output could not be uploaded because it exceeded the maximum size of an entry
Target //:test failed to build

Remote Execution Status messages embedded in ExecuteResponses are
extremely capable vehicles for conveying the nature of an error, and
informing a user of further steps to take to remediate it. This change
expands the presentation of these response Statuses, and brings all of
the error details to light, by default instead of requiring
--verbose_failures to investigate any details of a remote execution
problem.

The interpretation of precondition failures to highlight retriable
responses has been expanded to ignore benign details that might be
included in a response.

SpawnResult error message composition has been simplified substantially,
without any special behavior for 'Remote' errors, and a removal of a
duplicate message printout incurred in the wake of succcessive @janakr
and @olaola changes. Failure messages are now implied to be present in
all spawn result failure reporting exactly once, and the failureMessage
of a SpawnResult is implied to be the parameter to getDetailMessage.

An example error presentation is as follows (including the modifications
to SpawnResult's output formatting):

```
ERROR: /home/werkt/dev/test/BUILD:22:10: Linking test failed: (Exit 34): Remote Execution Failure:
Failed Precondition: Action 4223ab2cc114385110714243a0b4a88cc743f2169b5be7d4d438a6bbba4f529f/142 is invalid
  Resource Info: type.googleapis.com/google.longrunning.Operation: name='shard/operations/9335fef2-184b-4d26-9a6f-2f27cebe7527', owner='tool_invocation_id:4b4bf7b1-fadd-44fd-99be-a234e7c26fc4,correlated_invocation_id:dc88325a-9317-48c0-9013-b3bb8b7a458f'
  Precondition Failure:
    (MISSING) bazel-out/k8-fastbuild/bin/test: 7872: An output could not be uploaded because it exceeded the maximum size of an entry
Target //:test failed to build
```
When a retryInfo is supplied, it should circumvent any other conditions
which would prevent retriability. Its delay will inform the subsequent
backoff delay supplied, assuming it is not beyond the retry count.
@werkt werkt force-pushed the remote-error-details branch from 71e3903 to 551bc7a Compare November 29, 2020 17:24
@aiuto aiuto added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Nov 30, 2020
Copy link
Member

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

import io.grpc.Status.Code;
import io.grpc.protobuf.StatusProto;

class ExecuteRetrier extends RemoteRetrier {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add javadoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope a single line is sufficient here, didn't see much else on the other classes I looked at.

for (Any detail : status.getDetailsList()) {
if (detail.is(RetryInfo.class)) {
try {
retryInfo = detail.unpack(RetryInfo.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want the retryInfo to have a deterministic last-specified behavior. Not that I want to see multiple from the service, but if it does, the last one in the list should be effective.

for (Any detail : status.getDetailsList()) {
if (detail.is(RetryInfo.class)) {
// server says we can retry, regardless of other details
fullyRetriable = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

break here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return here - nothing after it is effective, and coming up with whether the precondition failure is effective is meaningless. Thanks!

@bazel-io bazel-io closed this in 5122617 Dec 1, 2020
@werkt werkt deleted the remote-error-details branch December 1, 2020 14:21
philwo pushed a commit that referenced this pull request Mar 15, 2021
Remote Execution Status messages embedded in ExecuteResponses are
extremely capable vehicles for conveying the nature of an error, and
informing a user of further steps to take to remediate it. This change
expands the presentation of these response Statuses, and brings all of
the error details to light, by default instead of requiring
--verbose_failures to investigate any details of a remote execution
problem.

The interpretation of precondition failures to highlight retriable
responses has been expanded to ignore benign details that might be
included in a response.

SpawnResult error message composition has been simplified substantially,
without any special behavior for 'Remote' errors, and a removal of a
duplicate message printout incurred in the wake of succcessive @janakr
and @olaola changes. Failure messages are now implied to be present in
all spawn result failure reporting exactly once, and the failureMessage
of a SpawnResult is implied to be the parameter to getDetailMessage.

An example error presentation is as follows (including the modifications
to SpawnResult's output formatting):

```
ERROR: /home/werkt/dev/test/BUILD:22:10: Linking test failed: (Exit 34): Remote Execution Failure:
Failed Precondition: Action 4223ab2cc114385110714243a0b4a88cc743f2169b5be7d4d438a6bbba4f529f/142 is invalid
  Resource Info: type.googleapis.com/google.longrunning.Operation: name='shard/operations/9335fef2-184b-4d26-9a6f-2f27cebe7527', owner='tool_invocation_id:4b4bf7b1-fadd-44fd-99be-a234e7c26fc4,correlated_invocation_id:dc88325a-9317-48c0-9013-b3bb8b7a458f'
  Precondition Failure:
    (MISSING) bazel-out/k8-fastbuild/bin/test: 7872: An output could not be uploaded because it exceeded the maximum size of an entry
Target //:test failed to build
```

Closes #12564.

PiperOrigin-RevId: 3449738
philwo pushed a commit that referenced this pull request Mar 15, 2021
Remote Execution Status messages embedded in ExecuteResponses are
extremely capable vehicles for conveying the nature of an error, and
informing a user of further steps to take to remediate it. This change
expands the presentation of these response Statuses, and brings all of
the error details to light, by default instead of requiring
--verbose_failures to investigate any details of a remote execution
problem.

The interpretation of precondition failures to highlight retriable
responses has been expanded to ignore benign details that might be
included in a response.

SpawnResult error message composition has been simplified substantially,
without any special behavior for 'Remote' errors, and a removal of a
duplicate message printout incurred in the wake of succcessive @janakr
and @olaola changes. Failure messages are now implied to be present in
all spawn result failure reporting exactly once, and the failureMessage
of a SpawnResult is implied to be the parameter to getDetailMessage.

An example error presentation is as follows (including the modifications
to SpawnResult's output formatting):

```
ERROR: /home/werkt/dev/test/BUILD:22:10: Linking test failed: (Exit 34): Remote Execution Failure:
Failed Precondition: Action 4223ab2cc114385110714243a0b4a88cc743f2169b5be7d4d438a6bbba4f529f/142 is invalid
  Resource Info: type.googleapis.com/google.longrunning.Operation: name='shard/operations/9335fef2-184b-4d26-9a6f-2f27cebe7527', owner='tool_invocation_id:4b4bf7b1-fadd-44fd-99be-a234e7c26fc4,correlated_invocation_id:dc88325a-9317-48c0-9013-b3bb8b7a458f'
  Precondition Failure:
    (MISSING) bazel-out/k8-fastbuild/bin/test: 7872: An output could not be uploaded because it exceeded the maximum size of an entry
Target //:test failed to build
```

Closes #12564.

PiperOrigin-RevId: 3449738
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants