Skip to content

Integrate Error Chaining in a js-errors or @matrixai/errors package #304

@CMCDragonkai

Description

@CMCDragonkai

Specification

Now that @matrixai/errors is ready, we can use it to incorporate error chaining into PK. This will involve a series of steps:

Chaining

In all places where we are catching an error and then throwing a new error in its place, we need to be including the original error as the cause of the new error, e.g.

} catch (e) {
  throw new SomePolykeyError(e.message, { cause: e });
}

Note that the cause should always be an exception/error. If not specified it defaults to unknown. When a error is thrown, the top-level error should contain the full instances of the errors in its cause chain, as one big nested object. This data structure can be serialised and deserialised recursively, where every error in the chain has a single cause, and that cause is contained as a property within it.

It is important to realise that the replacer will remove any entry if the returned value is undefined. Unless it is an array, it will replace that value with null. This means that because our cause may be undefined, it may not exist during deserialisation. Any usage of fromJSON must be aware that the cause property may not exist.

Client Server Error Architecture

toError: ServiceError -> ErrorPolykeyRemote                         fromError: Error -> ServerStatusResponse

                                                                         ┌─────────────────────────────┐
      ┌───────────────────────────────┐                                  │ NodeId 1 - Server           │
      │ Client                        │             Request              │                             │
      │                               │                                  │ ┌─────────────────────────┐ │
      │ Receives toError ServiceError ├──────────────────────────────────┼─► Service Handlers        │ │
      │ Throws ErrorPolykeyRemote     │                                  │ │                         │ │
      │ cause: ErrorX | ErrorUnknown  ◄──────────────────────────────────┼─┤ Sends fromError ErrorX  │ │
      │ data: { nodeID: 1, ... }      │                                  │ └─────────────────────────┘ │
      └───────────────────────────────┘      Response with error         │                             │
                                             JSON string in metadata     └─────────────────────────────┘
                                          error: { type: ErrorX, ... }

It's important to realise:

  • Only the client constructs ErrorPolykeyRemote
  • The ErrorPolykeyRemote should contain nodeId: NodeId, host, port and other connection information in its own data which can aid debugging which server responded with this error
  • The ErrorPolykeyRemote should also have information about which call triggered this, and perhaps even request information, it can have a lot of information which can add later.
    • Additional client/connection metadata should be set in the data property of ErrorPolykeyRemote
    • Give data a more specific type like data: ClientMetadata & POJO so that it forces that ClientMetadata must be available
    • We can use type ClientMetadata = { nodeId: NodeId; host: Host; port: Port }. And additional information can be provided later. Like what the call was.
    • The ClientMetadata type should be in src/types.ts to avoid import loops, since it has to use the NodeId type from nodes and Host and Port from the the network domains.
    • This keeps the type along with ErrorPolykeyRemote together at the top level src.
  • The server only serialises the ErrorX that ErrorX may be any kind of error. The server must filter the information it sends to the client to ensure it is not leaking sensitive information. This can include the stack since the stack is rarely required by the client.
  • Even when the server sends the error back to the client, it must also log out this error, but only if the error is considered a "server error". That is if the is a "client error" such as 4xx errors in HTTP, then it should not be logged out. This includes:
    • Authentication
    • Validation
    • Precondition
    • Concurrency Conflict
    • Anything else coming from HTTP 4xx
    • Example: missing object/resource
  • If the client encounters unknown data during deserialisation, it only returns ErrorUnknown at the root of the JSON data. Any other unknown data should be returned as-is.
  • This means all remote errors are always ErrorPolykeyRemote, but the cause may be ErrorUnknown or ErrorX. Where ErrorX may contain anything after that (as long as the runtime schema checking during fromJSON calls work)
  • The GRPC metadata key for the serialised error JSON should just be error, in the future if we switch to using JSON RPC, this error JSON will probably be encoded as part of the JSON RPC protocol - see https://www.jsonrpc.org/specification#error_object and https://eth.wiki/json-rpc/json-rpc-error-codes-improvement-proposal
  • GRPC itself only supports "HTTP2" kind of errors, but these codes aren't really being used, all errors from the remote node is considered one kind of HTTP error, that being of UNKNOWN. We use that to represent an "application" error. This basically means we don't use most of GRPC/HTTP2's standard error codes, as our "application layer errors" are on top. This is because GRPC/HTTP2's standard error codes were designed more for the HTTP layer. Although some of the HTTP error codes are still used internally by GRPC.
  • This is the same philosophy taken by JSON RPC over HTTP: http://www.simple-is-better.org/json-rpc/transport_http.html, where success or error at the application layer is still 200 response code at the HTTP layer. HTTP errors are specific to the HTTP layer.
    200 OK
    for responses (both for Response and Error objects)
    204 No Response / 202 Accepted
    for empty responses, i.e. as response to a Notification
    307 Temporary Redirect / 308 Permanent Redirect
    for HTTP-redirections (note that the request may not be automatically retransmitted)
    405 Method Not Allowed
    if HTTP GET is not supported: for all HTTP GET requests
    if HTTP GET is supported: if the client tries to call a non-safe/non-indempotent method via HTTP GET
    415 Unsupported Media Type
    if the Content-Type is not application/json
    others
    for HTTP-errors or HTTP-features outside of the scope of JSON-RPC
    

JSON serialisation/deserialisation

Our gRPC toError and fromError utilities will need to be modified to be able to serialise and deserialise the error chain when sending errors between clients/agents or agents/agents.

  • These can utilise JSON's replacer/reviver properties of stringify/parse - this is non-recursive and processes objects in fragments, with everything being orchestrated by stringify/parse
  • The replacer needs to serialise errors into a standardised structure { type: 'ClassName', data: { ... } } and also filter out sensitive data (the stack) when errors are being sent to an agent (as opposed to a client) - it works top to bottom, creating a structure first and filtering out unwanted fields afterwards
  • The reviver needs to covert this structure back into its original data type - it works bottom to top, converting fields to their correct types and then finally constructing the finished data type
    • If the received data can't be deserialised into a known error type it should be converted to an ErrorPolykeyUnknown, with the unknown data in the data field
  • We also no longer need to separate errors in the gRPC metadata into name/message/data and this is now just one data structure - this can just be one error field
  • Need to filter out sensitive information when sent from the service handlers, this includes stack and other special data. This would a case by case basis for specific exception classes. That is, the replacer will need to execute a filter by checking against specific exception classes and filtering them out. Filter rules should be acquired from src/client and src/agent.

Logging of Errors at the Service Handlers

  • Loggers should be added to the service handlers in order to log out errors, since these do not shut down the agent process
    • Not all errors should be reported though - only validation/authentication/other precondition errors
    • This logger is separate from the logger used for grpc internals
    • How to determine if an error is for the client or for the server? This is a case by case basis, unlike HTTP, we do not have 4xx and 5xx codes emitted at the beginning, all exceptions assume programmatic usage. Therefore there is a smaller amount of client errors then there are server errors. The filter should just check that if part of "client errors" set, then don't emit. Create set of "client errors", this set can be defined in both src/client and src/server (code duplication is fine here), since each service can define their own set of what is considered a "client error".

Reporting the Error to the user at the Root Exception Handler

We have 2 root exception handlers, one at the client and one at the agent. This addresses the client side, however unless things are different, the same applies to the agent side.

On the client side, this is done in src/bin/polykey.ts.

When an exception is received, we must interpret 3 things:

  • The cause chain
  • The exitCode
  • The metadata in data

According to issue #323, the --format json doesn't currently affect the STDERR. This needs to be done now because the errors being reported are quite complex, and during testing, we expectation utilities should be parsing JSON to make it easier to test.

Therefore binUtils.outputFormatter will need to ensure that error type is a human formatted JSON, while json type can now be the JSON output for exceptions. This is doable now due to the toJSON utility function inherited from AbstractError.

Furthermore in order to acquire the options passed into the command, use rootCommand.opts(), this method was added into to PolykeyCommand to enable us to acquire the options.

const opts = rootCommand.opts();
if (opts.format === 'json') {
  process.stderr.write(
    binUtils.outputFormatter({
      type: 'json',
      data: e.toJSON(),
    });
  );
} else {
  // use `error` format
  process.stderr.write(
    binUtils.outputFormatter({
      type: 'error',
      // ...
    });
  );
}

The desired format for human format should be something like:

ErrorName: description - message
  K\tV
  K\tV
  cause ErrorName: description - message
    K\tV
    K\tV

Note the usage of \t for separation while spaces are used for indentation. We can change this format later.

The exitCode is currently a bit ambiguous, it was originally intended to mean that if this exception was the last exception caught by the process, this code should be used for the process exit code.

It seems more ideal to allow the process to decide what the exit code should be based on the family of exceptions. But since we have built our exit code this way, we need to continue to use it like this.

However now that we have a cause chain, we have to decide how to deal with the exit code when we are getting ErrorPolykeyRemote. Because of this scenario:

C -> A1 -> A2

We must differentiate exceptions that originate from the client, or the first agent or the second agent. If an exception comes from A2, we may see ErrorPolykeyRemote wrapping another ErrorPolykeyRemote.

So right now a policy can be made:

  1. Find the first cause property that is not ErrorPolykeyRemote, and use that exitCode
  2. Otherwise get the exitCode of the first exception.

This only works because ErrorPolykeyRemote's cause type is limited to ErrorPolykey, it cannot be undefined or anything else.

Additional context

Tasks

  • 1. Convert all errors from CustomError to new AbstractError<T> from @matrixai/errors
  • 2. Give all errors a static description and errCode
  • 3. In all places where one error is caught and a different one is thrown, include the original error as the cause
  • 4. Modify the validation utils to use new error chaining
  • 5. Modify gRPC fromError and toError functions to properly handle chained errors
  • 6. Include a timestamp when any error is thrown
  • 7. Add error logging to service handlers
  • 8. Update all packages relying on js-errors to 1.1.0 to use the fromJSON and toJSON
  • 9. PK should adapt its replacer and reviver functionality and bring in "rules" from both src/client and src/agent. Can use DI in the client service vs the agent service to change the serialisation/deserialisation.
  • 10. Logging should also use separate rules, one for client service, one for agent service.
  • 11. Handle the error chain at the root exception handler for src/bin/polykey.ts and src/bin/polykey-agent.ts
  • 12. Ensure that AbstractError.fromJSON can handle non-existent cause property

Sub-issues

Metadata

Metadata

Labels

developmentStandard developmentepicBig issue with multiple subissues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions