-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Specification
Now that @matrixai/errors is ready, we can use it to incorporate error chaining into PK. This will involve a series of steps:
Chaining
In all places where we are catching an error and then throwing a new error in its place, we need to be including the original error as the cause of the new error, e.g.
} catch (e) {
throw new SomePolykeyError(e.message, { cause: e });
}Note that the cause should always be an exception/error. If not specified it defaults to unknown. When a error is thrown, the top-level error should contain the full instances of the errors in its cause chain, as one big nested object. This data structure can be serialised and deserialised recursively, where every error in the chain has a single cause, and that cause is contained as a property within it.
It is important to realise that the replacer will remove any entry if the returned value is undefined. Unless it is an array, it will replace that value with null. This means that because our cause may be undefined, it may not exist during deserialisation. Any usage of fromJSON must be aware that the cause property may not exist.
Client Server Error Architecture
toError: ServiceError -> ErrorPolykeyRemote fromError: Error -> ServerStatusResponse
┌─────────────────────────────┐
┌───────────────────────────────┐ │ NodeId 1 - Server │
│ Client │ Request │ │
│ │ │ ┌─────────────────────────┐ │
│ Receives toError ServiceError ├──────────────────────────────────┼─► Service Handlers │ │
│ Throws ErrorPolykeyRemote │ │ │ │ │
│ cause: ErrorX | ErrorUnknown ◄──────────────────────────────────┼─┤ Sends fromError ErrorX │ │
│ data: { nodeID: 1, ... } │ │ └─────────────────────────┘ │
└───────────────────────────────┘ Response with error │ │
JSON string in metadata └─────────────────────────────┘
error: { type: ErrorX, ... }
It's important to realise:
- Only the client constructs
ErrorPolykeyRemote - The
ErrorPolykeyRemoteshould containnodeId: NodeId,host,portand other connection information in its owndatawhich can aid debugging which server responded with this error - The
ErrorPolykeyRemoteshould also have information about which call triggered this, and perhaps even request information, it can have a lot of information which can add later.- Additional client/connection metadata should be set in the
dataproperty ofErrorPolykeyRemote - Give
dataa more specific type likedata: ClientMetadata & POJOso that it forces thatClientMetadatamust be available - We can use
type ClientMetadata = { nodeId: NodeId; host: Host; port: Port }. And additional information can be provided later. Like what the call was. - The
ClientMetadatatype should be insrc/types.tsto avoid import loops, since it has to use theNodeIdtype fromnodesandHostandPortfrom the thenetworkdomains. - This keeps the type along with
ErrorPolykeyRemotetogether at the top levelsrc.
- Additional client/connection metadata should be set in the
- The server only serialises the
ErrorXthatErrorXmay be any kind of error. The server must filter the information it sends to the client to ensure it is not leaking sensitive information. This can include thestacksince thestackis rarely required by the client. - Even when the server sends the error back to the client, it must also log out this error, but only if the error is considered a "server error". That is if the is a "client error" such as 4xx errors in HTTP, then it should not be logged out. This includes:
- Authentication
- Validation
- Precondition
- Concurrency Conflict
- Anything else coming from HTTP 4xx
- Example: missing object/resource
- If the client encounters unknown data during deserialisation, it only returns
ErrorUnknownat the root of the JSON data. Any other unknown data should be returned as-is. - This means all remote errors are always
ErrorPolykeyRemote, but thecausemay beErrorUnknownorErrorX. WhereErrorXmay contain anything after that (as long as the runtime schema checking duringfromJSONcalls work) - The GRPC metadata key for the serialised error JSON should just be
error, in the future if we switch to using JSON RPC, this error JSON will probably be encoded as part of the JSON RPC protocol - see https://www.jsonrpc.org/specification#error_object and https://eth.wiki/json-rpc/json-rpc-error-codes-improvement-proposal - GRPC itself only supports "HTTP2" kind of errors, but these codes aren't really being used, all errors from the remote node is considered one kind of HTTP error, that being of
UNKNOWN. We use that to represent an "application" error. This basically means we don't use most of GRPC/HTTP2's standard error codes, as our "application layer errors" are on top. This is because GRPC/HTTP2's standard error codes were designed more for the HTTP layer. Although some of the HTTP error codes are still used internally by GRPC. - This is the same philosophy taken by JSON RPC over HTTP: http://www.simple-is-better.org/json-rpc/transport_http.html, where success or error at the application layer is still 200 response code at the HTTP layer. HTTP errors are specific to the HTTP layer.
200 OK for responses (both for Response and Error objects) 204 No Response / 202 Accepted for empty responses, i.e. as response to a Notification 307 Temporary Redirect / 308 Permanent Redirect for HTTP-redirections (note that the request may not be automatically retransmitted) 405 Method Not Allowed if HTTP GET is not supported: for all HTTP GET requests if HTTP GET is supported: if the client tries to call a non-safe/non-indempotent method via HTTP GET 415 Unsupported Media Type if the Content-Type is not application/json others for HTTP-errors or HTTP-features outside of the scope of JSON-RPC
JSON serialisation/deserialisation
Our gRPC toError and fromError utilities will need to be modified to be able to serialise and deserialise the error chain when sending errors between clients/agents or agents/agents.
- These can utilise JSON's replacer/reviver properties of stringify/parse - this is non-recursive and processes objects in fragments, with everything being orchestrated by stringify/parse
- The replacer needs to serialise errors into a standardised structure
{ type: 'ClassName', data: { ... } }and also filter out sensitive data (the stack) when errors are being sent to an agent (as opposed to a client) - it works top to bottom, creating a structure first and filtering out unwanted fields afterwards - The reviver needs to covert this structure back into its original data type - it works bottom to top, converting fields to their correct types and then finally constructing the finished data type
- If the received data can't be deserialised into a known error type it should be converted to an
ErrorPolykeyUnknown, with the unknown data in thedatafield
- If the received data can't be deserialised into a known error type it should be converted to an
- We also no longer need to separate errors in the gRPC metadata into
name/message/dataand this is now just one data structure - this can just be oneerrorfield - Need to filter out sensitive information when sent from the service handlers, this includes
stackand other special data. This would a case by case basis for specific exception classes. That is, thereplacerwill need to execute a filter by checking against specific exception classes and filtering them out. Filter rules should be acquired fromsrc/clientandsrc/agent.
Logging of Errors at the Service Handlers
- Loggers should be added to the service handlers in order to log out errors, since these do not shut down the agent process
- Not all errors should be reported though - only validation/authentication/other precondition errors
- This logger is separate from the logger used for grpc internals
- How to determine if an error is for the client or for the server? This is a case by case basis, unlike HTTP, we do not have 4xx and 5xx codes emitted at the beginning, all exceptions assume programmatic usage. Therefore there is a smaller amount of client errors then there are server errors. The filter should just check that if part of "client errors" set, then don't emit. Create set of "client errors", this set can be defined in both
src/clientandsrc/server(code duplication is fine here), since each service can define their own set of what is considered a "client error".
Reporting the Error to the user at the Root Exception Handler
We have 2 root exception handlers, one at the client and one at the agent. This addresses the client side, however unless things are different, the same applies to the agent side.
On the client side, this is done in src/bin/polykey.ts.
When an exception is received, we must interpret 3 things:
- The
causechain - The
exitCode - The metadata in
data
According to issue #323, the --format json doesn't currently affect the STDERR. This needs to be done now because the errors being reported are quite complex, and during testing, we expectation utilities should be parsing JSON to make it easier to test.
Therefore binUtils.outputFormatter will need to ensure that error type is a human formatted JSON, while json type can now be the JSON output for exceptions. This is doable now due to the toJSON utility function inherited from AbstractError.
Furthermore in order to acquire the options passed into the command, use rootCommand.opts(), this method was added into to PolykeyCommand to enable us to acquire the options.
const opts = rootCommand.opts();
if (opts.format === 'json') {
process.stderr.write(
binUtils.outputFormatter({
type: 'json',
data: e.toJSON(),
});
);
} else {
// use `error` format
process.stderr.write(
binUtils.outputFormatter({
type: 'error',
// ...
});
);
}The desired format for human format should be something like:
ErrorName: description - message
K\tV
K\tV
cause ErrorName: description - message
K\tV
K\tV
Note the usage of \t for separation while spaces are used for indentation. We can change this format later.
The exitCode is currently a bit ambiguous, it was originally intended to mean that if this exception was the last exception caught by the process, this code should be used for the process exit code.
It seems more ideal to allow the process to decide what the exit code should be based on the family of exceptions. But since we have built our exit code this way, we need to continue to use it like this.
However now that we have a cause chain, we have to decide how to deal with the exit code when we are getting ErrorPolykeyRemote. Because of this scenario:
C -> A1 -> A2
We must differentiate exceptions that originate from the client, or the first agent or the second agent. If an exception comes from A2, we may see ErrorPolykeyRemote wrapping another ErrorPolykeyRemote.
So right now a policy can be made:
- Find the first
causeproperty that is notErrorPolykeyRemote, and use thatexitCode - Otherwise get the
exitCodeof the first exception.
This only works because ErrorPolykeyRemote's cause type is limited to ErrorPolykey, it cannot be undefined or anything else.
Additional context
- MatrixAI/js-db@5172d01
- https://github.com/mathew-kurian/TraceError.js - can use this stack style for reporting
- Integrate Structured Logging js-logger#3 - Needs design integration with js-logger so these 2 libraries would work together
- General Data Validation - Boundary IO locations should be doing data validation and marshalling (and the decommissioning of GenericIdTypes for all IDs) #321 (comment) - discussion on error handling between local node and remote node for agent & client service, realising the need for additional metadata indicating where the exception comes from, as well as handling non-ErrorPolykey errors
- CLI and Client & Agent Service test splitting #311 (comment) - discussion of how error chaining will make debugging of exceptions that originate from outside the domain being tested easier
- Release error chaining and formalise custom errors js-errors#1 (comment) - Initial ideas on error serialisation and deserialisation in this thread
Tasks
- 1. Convert all errors from
CustomErrorto newAbstractError<T>from@matrixai/errors - 2. Give all errors a static description and
errCode - 3. In all places where one error is caught and a different one is thrown, include the original error as the
cause - 4. Modify the validation utils to use new error chaining
- 5. Modify gRPC
fromErrorandtoErrorfunctions to properly handle chained errors - 6. Include a timestamp when any error is thrown
- 7. Add error logging to service handlers
- 8. Update all packages relying on
js-errorsto1.1.0to use thefromJSONandtoJSON - 9. PK should adapt its replacer and reviver functionality and bring in "rules" from both
src/clientandsrc/agent. Can use DI in the client service vs the agent service to change the serialisation/deserialisation. - 10. Logging should also use separate rules, one for client service, one for agent service.
- 11. Handle the error chain at the root exception handler for
src/bin/polykey.tsandsrc/bin/polykey-agent.ts - 12. Ensure that
AbstractError.fromJSONcan handle non-existentcauseproperty