Integrate Error Chaining in a js-errors or @matrixai/errors package

### Specification

Now that `@matrixai/errors` is ready, we can use it to incorporate error chaining into PK. This will involve a series of steps:

#### Chaining

In all places where we are catching an error and then throwing a new error in its place, we need to be including the original error as the `cause` of the new error, e.g.
```ts
} catch (e) {
  throw new SomePolykeyError(e.message, { cause: e });
}
```

Note that the `cause` should always be an exception/error. If not specified it defaults to `unknown`. When a error is thrown, the top-level error should contain the full instances of the errors in its cause chain, as one big nested object. This data structure can be serialised and deserialised recursively, where every error in the chain has a single cause, and that cause is contained as a property within it.

It is important to realise that the `replacer` will remove any entry if the returned value is `undefined`. Unless it is an array, it will replace that value with `null`. This means that because our `cause` may be `undefined`, it may not exist during deserialisation. Any usage of `fromJSON` must be aware that the `cause` property may not exist.

#### Client Server Error Architecture

```
toError: ServiceError -> ErrorPolykeyRemote                         fromError: Error -> ServerStatusResponse

                                                                         ┌─────────────────────────────┐
      ┌───────────────────────────────┐                                  │ NodeId 1 - Server           │
      │ Client                        │             Request              │                             │
      │                               │                                  │ ┌─────────────────────────┐ │
      │ Receives toError ServiceError ├──────────────────────────────────┼─► Service Handlers        │ │
      │ Throws ErrorPolykeyRemote     │                                  │ │                         │ │
      │ cause: ErrorX | ErrorUnknown  ◄──────────────────────────────────┼─┤ Sends fromError ErrorX  │ │
      │ data: { nodeID: 1, ... }      │                                  │ └─────────────────────────┘ │
      └───────────────────────────────┘      Response with error         │                             │
                                             JSON string in metadata     └─────────────────────────────┘
                                          error: { type: ErrorX, ... }
```

It's important to realise:

* Only the client constructs `ErrorPolykeyRemote`
* The `ErrorPolykeyRemote` should contain `nodeId: NodeId`, `host`, `port` and other connection information in its own `data` which can aid debugging which server responded with this error
* The `ErrorPolykeyRemote` should also have information about which call triggered this, and perhaps even request information, it can have a lot of information which can add later.
  - Additional client/connection metadata should be set in the `data` property of `ErrorPolykeyRemote`
  - Give `data` a more specific type like `data: ClientMetadata & POJO` so that it forces that `ClientMetadata` must be available
  - We can use `type ClientMetadata = { nodeId: NodeId; host: Host; port: Port }`. And additional information can be provided later. Like what the call was.
  - The `ClientMetadata` type should be in `src/types.ts` to avoid import loops, since it has to use the `NodeId` type from `nodes` and `Host` and `Port` from the the `network` domains.
  - This keeps the type along with `ErrorPolykeyRemote` together at the top level `src`.
* The server only serialises the `ErrorX` that `ErrorX` may be any kind of error. The server must filter the information it sends to the client to ensure it is not leaking sensitive information. This can include the `stack` since the `stack` is rarely required by the client.
* Even when the server sends the error back to the client, it must also log out this error, but only if the error is considered a "server error". That is if the is a "client error" such as 4xx errors in HTTP, then it should not be logged out. This includes:
  - Authentication
  - Validation
  - Precondition
  - Concurrency Conflict
  - Anything else coming from HTTP 4xx
  - Example: missing object/resource
* If the client encounters unknown data during deserialisation, it only returns `ErrorUnknown` at the root of the JSON data. Any other unknown data should be returned as-is.
* This means all remote errors are **always** `ErrorPolykeyRemote`, but the `cause` may be `ErrorUnknown` or `ErrorX`. Where `ErrorX` may contain anything after that (as long as the runtime schema checking during `fromJSON` calls work)
* The GRPC metadata key for the serialised error JSON should just be `error`, in the future if we switch to using JSON RPC, this error JSON will probably be encoded as part of the JSON RPC protocol - see https://www.jsonrpc.org/specification#error_object and https://eth.wiki/json-rpc/json-rpc-error-codes-improvement-proposal
* GRPC itself only supports "HTTP2" kind of errors, but these codes aren't really being used, all errors from the remote node is considered one kind of HTTP error, that being of `UNKNOWN`. We use that to represent an "application" error. This basically means we don't use most of GRPC/HTTP2's standard error codes, as our "application layer errors" are on top. This is because GRPC/HTTP2's standard error codes were designed more for the HTTP layer. Although some of the HTTP error codes are still used internally by GRPC.
* This is the same philosophy taken by JSON RPC over HTTP: http://www.simple-is-better.org/json-rpc/transport_http.html, where success or error at the application layer is still 200 response code at the HTTP layer. HTTP errors are specific to the HTTP layer.
  ```
  200 OK
  for responses (both for Response and Error objects)
  204 No Response / 202 Accepted
  for empty responses, i.e. as response to a Notification
  307 Temporary Redirect / 308 Permanent Redirect
  for HTTP-redirections (note that the request may not be automatically retransmitted)
  405 Method Not Allowed
  if HTTP GET is not supported: for all HTTP GET requests
  if HTTP GET is supported: if the client tries to call a non-safe/non-indempotent method via HTTP GET
  415 Unsupported Media Type
  if the Content-Type is not application/json
  others
  for HTTP-errors or HTTP-features outside of the scope of JSON-RPC
  ```

#### JSON serialisation/deserialisation

Our gRPC `toError` and `fromError` utilities will need to be modified to be able to serialise and deserialise the error chain when sending errors between clients/agents or agents/agents.

- These can utilise JSON's replacer/reviver properties of stringify/parse - this is non-recursive and processes objects in fragments, with everything being orchestrated by stringify/parse
- The replacer needs to serialise errors into a standardised structure `{ type: 'ClassName', data: { ... } }` and also filter out sensitive data (the stack) when errors are being sent to an agent (as opposed to a client) - it works top to bottom, creating a structure first and filtering out unwanted fields afterwards
- The reviver needs to covert this structure back into its original data type - it works bottom to top, converting fields to their correct types and then finally constructing the finished data type
  - If the received data can't be deserialised into a known error type it should be converted to an `ErrorPolykeyUnknown`, with the unknown data in the `data` field
- We also no longer need to separate errors in the gRPC metadata into `name`/`message`/`data` and this is now just one data structure - this can just be one `error` field
- Need to filter out sensitive information when sent from the service handlers, this includes `stack` and other special data. This would a case by case basis for specific exception classes. That is, the `replacer` will need to execute a filter by checking against specific exception classes and filtering them out. Filter rules should be acquired from `src/client` and `src/agent`.

#### Logging of Errors at the Service Handlers

- Loggers should be added to the service handlers in order to log out errors, since these do not shut down the agent process
  - Not all errors should be reported though - only validation/authentication/other precondition errors
  - This logger is separate from the logger used for grpc internals
  - How to determine if an error is for the client or for the server? This is a case by case basis, unlike HTTP, we do not have 4xx and 5xx codes emitted at the beginning, all exceptions assume programmatic usage. Therefore there is a smaller amount of client errors then there are server errors. The filter should just check that if part of "client errors" set, then don't emit. Create set of "client errors", this set can be defined in both `src/client` and `src/server` (code duplication is fine here), since each service can define their own set of what is considered a "client error".

#### Reporting the Error to the user at the Root Exception Handler

We have 2 root exception handlers, one at the client and one at the agent. This addresses the client side, however unless things are different, the same applies to the agent side.

On the client side, this is done in `src/bin/polykey.ts`.

When an exception is received, we must interpret 3 things:

* The `cause` chain
* The `exitCode`
* The metadata in `data`

According to issue #323, the `--format json` doesn't currently affect the STDERR. This needs to be done now because the errors being reported are quite complex, and during testing, we expectation utilities should be parsing JSON to make it easier to test.

Therefore `binUtils.outputFormatter` will need to ensure that `error` type is a human formatted JSON, while `json` type can now be the JSON output for exceptions. This is doable now due to the `toJSON` utility function inherited from `AbstractError`.

Furthermore in order to acquire the options passed into the command, use `rootCommand.opts()`, this method was added into to `PolykeyCommand` to enable us to acquire the options.

```ts
const opts = rootCommand.opts();
if (opts.format === 'json') {
  process.stderr.write(
    binUtils.outputFormatter({
      type: 'json',
      data: e.toJSON(),
    });
  );
} else {
  // use `error` format
  process.stderr.write(
    binUtils.outputFormatter({
      type: 'error',
      // ...
    });
  );
}
```

The desired format for human format should be something like:

```
ErrorName: description - message
  K\tV
  K\tV
  cause ErrorName: description - message
    K\tV
    K\tV
```

Note the usage of `\t` for separation while spaces are used for indentation. We can change this format later.

The `exitCode` is currently a bit ambiguous, it was originally intended to mean that if this exception was the last exception caught by the process, this code should be used for the process exit code.

It seems more ideal to allow the process to decide what the exit code should be based on the family of exceptions. But since we have built our exit code this way, we need to continue to use it like this.

However now that we have a `cause` chain, we have to decide how to deal with the exit code when we are getting `ErrorPolykeyRemote`. Because of this scenario:

```
C -> A1 -> A2
```

We must differentiate exceptions that originate from the client, or the first agent or the second agent. If an exception comes from A2, we may see `ErrorPolykeyRemote` wrapping another `ErrorPolykeyRemote`.

So right now a policy can be made:

1. Find the first `cause` property that is not `ErrorPolykeyRemote`, and use that `exitCode`
2. Otherwise get the `exitCode` of the first exception.

This only works because `ErrorPolykeyRemote`'s cause type is limited to `ErrorPolykey`, it cannot be undefined or anything else.

### Additional context

* https://github.com/MatrixAI/js-db/commit/5172d01e694e7f0f58023fb41c76844da0a6284f
* https://github.com/mathew-kurian/TraceError.js - can use this stack style for reporting
* https://github.com/MatrixAI/js-logger/issues/3 - Needs design integration with js-logger so these 2 libraries would work together
* https://github.com/MatrixAI/js-polykey/pull/321#issuecomment-1028563774 - discussion on error handling between local node and remote node for agent & client service, realising the need for additional metadata indicating where the exception comes from, as well as handling non-ErrorPolykey errors
* https://github.com/MatrixAI/js-polykey/pull/311#issuecomment-1049484685 - discussion of how error chaining will make debugging of exceptions that originate from outside the domain being tested easier
* https://github.com/MatrixAI/js-errors/pull/1#issuecomment-1086832650 - Initial ideas on error serialisation and deserialisation in this thread

### Tasks

- [x] 1. Convert all errors from `CustomError` to new `AbstractError<T>` from `@matrixai/errors`
- [x] 2. Give all errors a static description and `errCode`
- [x] 3. In all places where one error is caught and a different one is thrown, include the original error as the `cause`
- [x] 4. Modify the validation utils to use new error chaining
- [x] 5. Modify gRPC `fromError` and `toError` functions to properly handle chained errors
- [x] 6. Include a timestamp when any error is thrown
- [x] 7. Add error logging to service handlers
- [x] 8. Update all packages relying on `js-errors` to `1.1.0` to use the `fromJSON` and `toJSON`
- [x] 9. PK should adapt its replacer and reviver functionality and bring in "rules" from both `src/client` and `src/agent`. Can use DI in the client service vs the agent service to change the serialisation/deserialisation.
- [x] 10. Logging should also use separate rules, one for client service, one for agent service.
- [x] 11. Handle the error chain at the root exception handler for `src/bin/polykey.ts` and `src/bin/polykey-agent.ts`
- [x] 12. Ensure that `AbstractError.fromJSON` can handle non-existent `cause` property

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate Error Chaining in a js-errors or @matrixai/errors package #304

Specification

Chaining

Client Server Error Architecture

JSON serialisation/deserialisation

Logging of Errors at the Service Handlers

Reporting the Error to the user at the Root Exception Handler

Additional context

Tasks

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Integrate Error Chaining in a js-errors or @matrixai/errors package #304

Description

Specification

Chaining

Client Server Error Architecture

JSON serialisation/deserialisation

Logging of Errors at the Service Handlers

Reporting the Error to the user at the Root Exception Handler

Additional context

Tasks

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions