Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meshtls: log errors parsing client certs #2467

Merged
merged 7 commits into from
Sep 27, 2023
Merged

Conversation

hawkw
Copy link
Contributor

@hawkw hawkw commented Sep 11, 2023

Depends on #2465

Currently, if errors occur while parsing a client identity from a TLS certificate, the client_identity function in linkerd-meshtls-rustls will simply discard the error and return None. This means that we cannot easily determine why a connection has no client identity --- there may have been no client cert, but we may also have failed to parse a client cert that was present.

In order to make debugging these issues a little easier, I've changed this function to log any errors returned by rustls-webpki while parsing client certs.

This commit changes the `linkerd-meshtls-rustls` crate to use the
upstream `rustls-webpki` crate, maintained by Rustls, rather than our
fork of `briansmith/webpki` from GitHub. Since `rustls-webpki` includes
the change which was the initial motivation for the `linkerd/webpki`
fork (rustls/webpki#42), we can now depend on upstream.
This picks up the upstream fix for rustls/webpki#167
Currently, if errors occur while parsing a client identity from a TLS
certificate, the `client_identity` function in `linkerd-meshtls-rustls`
will simply discard the error and return `None`. This means that we
cannot easily determine *why* a connection has no client identity ---
there may have been no client cert, but we may also have failed to parse
a client cert that was present.

In order to make debugging these issues a little easier, I've changed
this function to log any errors returned by `rustls-webpki` while
parsing client certs.
@hawkw hawkw requested a review from a team as a code owner September 11, 2023 17:34
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that the libraries already have some onerous logging behavior in these cases. Is that true? Can you test what happens when this behavior is triggered?

Base automatically changed from eliza/rustls-webpki to main September 11, 2023 18:03
@hawkw
Copy link
Contributor Author

hawkw commented Sep 11, 2023

my understanding is that the libraries already have some onerous logging behavior in these cases. Is that true? Can you test what happens when this behavior is triggered?

I believe you're thinking of rustls itself, which logs a bunch of stuff from the actual TLS implementation. rustls-webpki does not log anything (and we can confirm this by checking its dependency list for the presence of log or tracing: https://github.com/rustls/webpki/blob/64b97f32efa7ebe87d3e536144b28b04fc6550a5/Cargo.toml#L71-L74).

These errors would occur if a TLS handshake completed successfully, but we couldn't extract a valid DNS name from the client cert in that handshake. In that case, rustls itself wouldn't have logged anything, because the TLS handshake completed successfully.

@olix0r olix0r merged commit e92f325 into main Sep 27, 2023
10 checks passed
@olix0r olix0r deleted the eliza/webpki-error-logs branch September 27, 2023 18:24
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Oct 19, 2023
328826caa updated the balancer's discovery channel to prevent backing up
into the discovery stream by dropping the discovery stream. This results
in balancers becoming permanently stale (should they ever be used
again).

This change modifies the discovery stream so that these errors are fatal
for the balancer. These errors are recorded distinctly by the error counters.

To fix this, we replace the `DiscoverNew` module with a
`discover::NewServices` module that wraps the buffering layer. The
buffer now only holds target metadata, and services are only built as
the entry is dequeued from channel.

This has the (positive) side-effect that the proxy's stack_create_total
metric will not be incremented before the balancer actually uses an
endpoint stack. Previously, this metric would be incremented for all
queued endpoint updates.

We also now log at INFO the address of all additions and removals from a
balancer. This should dramatically improve diagnostics in stale endpoint
situations.

---

* build(deps): bump DavidAnson/markdownlint-cli2-action (linkerd/linkerd2-proxy#2460)
* build(deps): bump tj-actions/changed-files from 36.2.1 to 39.0.2 (linkerd/linkerd2-proxy#2468)
* build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.0 to 1.5.4 (linkerd/linkerd2-proxy#2448)
* meshtls: log errors parsing client certs (linkerd/linkerd2-proxy#2467)
* build(deps): bump actions/checkout from 3.5.0 to 4.1.0 (linkerd/linkerd2-proxy#2474)
* build(deps): bump tj-actions/changed-files from 39.0.2 to 39.2.0 (linkerd/linkerd2-proxy#2475)
* build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.4 to 1.5.5 (linkerd/linkerd2-proxy#2478)
* build(deps): bump DavidAnson/markdownlint-cli2-action (linkerd/linkerd2-proxy#2476)
* build(deps): bump actions/upload-artifact from 3.1.2 to 3.1.3 (linkerd/linkerd2-proxy#2479)
* Render grpc_status metric label as number (linkerd/linkerd2-proxy#2480)
* balance: Log and fail stuck discovery streams. (linkerd/linkerd2-proxy#2484)
* build(deps): update `rustix` to v0.36.16/v0.37.7 (linkerd/linkerd2-proxy#2488)
* balance: Fail the discovery stream on queue backup (linkerd/linkerd2-proxy#2486)

Signed-off-by: Oliver Gould <[email protected]>
hawkw pushed a commit to linkerd/linkerd2 that referenced this pull request Oct 19, 2023
328826caa updated the balancer's discovery channel to prevent backing up
into the discovery stream by dropping the discovery stream. This results
in balancers becoming permanently stale (should they ever be used
again).

This change modifies the discovery stream so that these errors are fatal
for the balancer. These errors are recorded distinctly by the error counters.

To fix this, we replace the `DiscoverNew` module with a
`discover::NewServices` module that wraps the buffering layer. The
buffer now only holds target metadata, and services are only built as
the entry is dequeued from channel.

This has the (positive) side-effect that the proxy's stack_create_total
metric will not be incremented before the balancer actually uses an
endpoint stack. Previously, this metric would be incremented for all
queued endpoint updates.

We also now log at INFO the address of all additions and removals from a
balancer. This should dramatically improve diagnostics in stale endpoint
situations.

---

* build(deps): bump DavidAnson/markdownlint-cli2-action (linkerd/linkerd2-proxy#2460)
* build(deps): bump tj-actions/changed-files from 36.2.1 to 39.0.2 (linkerd/linkerd2-proxy#2468)
* build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.0 to 1.5.4 (linkerd/linkerd2-proxy#2448)
* meshtls: log errors parsing client certs (linkerd/linkerd2-proxy#2467)
* build(deps): bump actions/checkout from 3.5.0 to 4.1.0 (linkerd/linkerd2-proxy#2474)
* build(deps): bump tj-actions/changed-files from 39.0.2 to 39.2.0 (linkerd/linkerd2-proxy#2475)
* build(deps): bump EmbarkStudios/cargo-deny-action from 1.5.4 to 1.5.5 (linkerd/linkerd2-proxy#2478)
* build(deps): bump DavidAnson/markdownlint-cli2-action (linkerd/linkerd2-proxy#2476)
* build(deps): bump actions/upload-artifact from 3.1.2 to 3.1.3 (linkerd/linkerd2-proxy#2479)
* Render grpc_status metric label as number (linkerd/linkerd2-proxy#2480)
* balance: Log and fail stuck discovery streams. (linkerd/linkerd2-proxy#2484)
* build(deps): update `rustix` to v0.36.16/v0.37.7 (linkerd/linkerd2-proxy#2488)
* balance: Fail the discovery stream on queue backup (linkerd/linkerd2-proxy#2486)

Signed-off-by: Oliver Gould <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants