Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Fix ray::Status <--> gRPC status interplay. #14278

Open
clarkzinzow opened this issue Feb 23, 2021 · 2 comments
Open

[Core] Fix ray::Status <--> gRPC status interplay. #14278

clarkzinzow opened this issue Feb 23, 2021 · 2 comments
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P2 Important issue, but not time-critical RFC RFC issues
Milestone

Comments

@clarkzinzow
Copy link
Contributor

clarkzinzow commented Feb 23, 2021

There isn’t a clean ray::Status <--> gRPC status conversion for all ray::Statuses, yet we're pretending that there is by recasting every server-side ray::Status as an IOError client-side. This is very confusing to Ray devs when they, say, return a Status::ObjectNotFound status from the server that is recast as a generic Status::IOError on the client. We should fix this conversion so application code can properly exchange and interpret application-level errors, while maintaining support for transport-level gRPC errors.

Options

Two options are immediately apparent:

  1. Include application-level errors such as ray::Status in our proto payloads. Transport-level errors would still be handled via the gRPC status, but application-level errors that don’t map to gRPC status codes would be defined in the reply proto, alongside the normal payload. How aggressively we should try to map a subset of ray::Statuses to gRPC statuses requires some thought, e.g. should an "object not found" error be an application-level Status::ObjectNotFound error, or should that be mapped to the NOT_FOUND gRPC status code at the transport level? The best practice consensus is to avoid defining "specific resource has X state" codes when a generic "resource has X state" code exists, i.e. that we should use the NOT_FOUND gRPC status code where possible.
  2. Embrace the richer gRPC error model, which would keep transport-level and application-level errors consolidated in the gRPC status. The application-level ray::Status-esque errors would go into the error details. We would still do a best-effort mapping of ray::Status codes to gRPC status codes, falling back to gRPC status code UNKNOWN when a good mapping doesn't exist. Support for the richer error model exists for our core language (C++), our current frontend (worker) languages (Python, Java, C++), and potential future core languages (Go, Rust), but no support yet for grpc-web or Node.js. I think that this support will suffice for our needs, especially given that the richer error model should be opt-in for each RPC.

I believe that option (2), the richer error model, is the best approach.

@rkooo567 rkooo567 added this to the Core Bugs milestone Feb 23, 2021
@rkooo567 rkooo567 added the enhancement Request for new feature and/or capability label Feb 23, 2021
@rkooo567
Copy link
Contributor

Added to core bug not to lose track of it. Feel free to remove it if you think it is not appropriate @ericl

@ericl ericl added P2 Important issue, but not time-critical RFC RFC issues labels Feb 23, 2021
@ericl
Copy link
Contributor

ericl commented Feb 23, 2021

cc @raulchen for thoughts

@rkooo567 rkooo567 added the core Issues that should be addressed in Ray Core label Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core enhancement Request for new feature and/or capability P2 Important issue, but not time-critical RFC RFC issues
Projects
None yet
Development

No branches or pull requests

3 participants