REv2: RPCs failing with DEADLINE_EXCEEDED no longer return a useful error message #12898
Labels
P2
We'll consider working on this in future. (Assignee optional)
team-Remote-Exec
Issues and PRs for the Execution (Remote) team
type: bug
Description of the problem / feature request:
When running a build with Bazel 4.0.0 against Buildbarn, one might see the following cryptic error message, without any further details:
The java.log contains the following:
Investigating the gRPC Prometheus metrics on the Buildbarn side, I can see that this is caused by one or more RPCs failing with DEADLINE_EXCEEDED. I can confirm that reducing
--remote_timeout
to an extremely low value (e.g., 3 seconds) makes these errors more probable. Increasing this flag to an extremely high value (e.g., 3600 seconds) makes the errors go away.Feature requests: what underlying problem are you trying to solve with this feature?
Make Bazel print a useful error when builds fail like this.
Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Run a build against a REv2 cluster, while passing in
--remote_timeout=${low}
.What operating system are you running Bazel on?
Linux
What's the output of
bazel info release
?Have you found anything relevant by searching the web?
No
Any other information, logs, or outputs that you want to share?
No
The text was updated successfully, but these errors were encountered: