Skip to content

tls: add OCSP stapling support with configurable stapling policy#12685

Merged
zuercher merged 39 commits intoenvoyproxy:masterfrom
daniel-goldstein:ocsp-stapling
Sep 24, 2020
Merged

tls: add OCSP stapling support with configurable stapling policy#12685
zuercher merged 39 commits intoenvoyproxy:masterfrom
daniel-goldstein:ocsp-stapling

Conversation

@daniel-goldstein
Copy link
Contributor

Signed-off-by: Daniel Goldstein danielgold95@gmail.com

Commit Message: add OCSP stapling support with configurable stapling policy. A pre-fetched OCSP response can be configured with its corresponding certificate via the new ocsp_staple field in the TlsCertificate message. The new ocsp_staple_policy field on DownstreamTlsContext determines whether an OCSP response is required and whether to continue using the TLS certificate for new connections once its OCSP response expires. The ocsp_staple_policy defaults to SKIP_STAPLING_IF_EXPIRED, which allows the operator to omit ocsp_staples from the configuration and will only use OCSP responses that are present and valid. This should therefore not break any existing configurations.

Additional Description: This PR completes* the goals set out in the OCSP stapling proposal and builds on #12307 which introduced the parsing utilities for validating OCSP responses. Counters were added to count client connections that request an OCSP staple as well as how many connections were configured with a staple, had the staple omitted or failed because of not meeting the OCSP policy.

* The design doc specifies a counter for the number of connections that received an OCSP response, but the ocsp_staple_responses counter reports the number of connections configured to deliver a response if the client requests it. This is because certificate selection (and OCSP configuration) occurs at a point during the handshake before the OCSP staple request extension is processed so we do not know at that point whether BoringSSL will ultimately use the OCSP response. This is elaborated on in the code where ocsp_staple_requests is incremented. A workaround can be implemented where additional state is passed to the BoringSSL app data, but changing the app data seemed out of the scope of this PR.

Risk Level: Medium - touches some core functionality of certificate selection but does not alter any existing behavior

Testing: Thorough unit tests both at the context_config and context level. Additional test cases were added to the ssl integration test to ensure handshake succeeds with RSA and ECDSA certificates with and without OCSP responses and with the default and stapling required OCSP policies.

Docs Changes: Added OCSP Stapling subsection in the SSL section of the architecture overview.
Release Notes: Added

Runtime flags: adds runtime options to override certain requirements on OCSP responses, specified in the new OCSP Stapling subsection of the docs.

Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #12685 was opened by daniel-goldstein.

see: more, trace.

Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
@zuercher
Copy link
Member

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Cannot retry non-completed check: envoy-presubmit (Linux-x64 fuzz_coverage), please wait.

🐱

Caused by: a #12685 (comment) was created by @zuercher.

see: more, trace.

@daniel-goldstein
Copy link
Contributor Author

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines, to retry CircleCI checks, use /retest-circle.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #12685 (comment) was created by @daniel-goldstein.

see: more, trace.

Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
Signed-off-by: Daniel Goldstein <danielgold95@gmail.com>
@zuercher
Copy link
Member

I'll be taking this PR over. Thanks for the great work, @daniel-goldstein!

std::string staple = Config::DataSource::read(source, true, api);
if (source.specifier_case() ==
envoy::config::core::v3::DataSource::SpecifierCase::kInlineString) {
staple = Base64::decode(staple);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of other fields for tls cert related stuff (private key, password, etc) that use a DataSource but don't have a base64 decode for the string case. Why is this one different? Maybe add comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the other data types, boringSSL will accept with a PEM-encoded DER blob (so base64 with -----BEGIN FOO----- and ------END FOO----- labels) or binary DER.

So the inline_string can contain that PEM-encoded blob without issue and boringSSL silently handles the base64 decoding (and maybe validates the label, I'm not sure) for the password, cert, etc. There isn't a PEM format for OCSP responses that I can find, and boringSSL definitely expects a binary-encoded DER. It literally does nothing with that binary blob except send it to clients that ask for stapling.

For inline_bytes the protobuf yaml/json decoding expects base64. For inline_string I tried hex-encoding the bytes and passing them in inline_string, but the yaml decoder (or protobuf) mangles the bytes. I think it's trying to make a valid UTF-8 string out of my bytes. So we require base64 for inline_string. I'll update the proto docs to note this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case probably just document OCSP shouldn't be specified via inline_string but use inline_bytes instead. Any reason to support base64 encoded inline_string for a new field?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think operators might wish to pass around OCSP responses in base64 format even though there isn't an official PEM encoding. This gives them a way to do that. The distinction is murky in YAML/JSON since inline_bytes has to be a base64-string there, but from a control plane server there is a distinction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a control plane couldn't fill inline_bytes but a base-64 encoded inline-string?

"OCSP response must be present under OcspStapleAction::Staple");
auto& resp_bytes = selected_ctx->ocsp_response_->rawBytes();
RELEASE_ASSERT(
SSL_set_ocsp_response(ssl_client_hello->ssl, resp_bytes.data(), resp_bytes.size()), "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to do this even if the client didn't request a staple? Should this be done in the other callback you added above?

Should OCSP policy affect cert selection if the client didn't request a staple? (That's hard to implement though, and probably not worth changing; I'm just thinking out loud).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safe to do because boringSSL will only transmit the stapled response if the client requests it (but it's not possible to know at this point whether the client will request it or not).

The OCSP policy only prevents selection of certs where we would fail the connection on the OCSP policy. In that case we try later certs (though in practice I think it's only exactly 1 or 2 certs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment, saying that this will only be attached if the client asks for it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and reworked this to not require the client to request OCSP stapling. The reason being that providing a cert with the must-staple extension upgrades the policy to MUST_STAPLE, but it's not guaranteed that all clients will support OCSP. Seems like they probably shouldn't be denied access because of it.

@ggreenway
Copy link
Member

/wait

@zuercher
Copy link
Member

@lizan ready for review again

@lizan
Copy link
Member

lizan commented Sep 17, 2020

@zuercher ping on base64-encoded inline_string, it is against why we introduced DataSource, (which is intended to make string a UTF-8 string only and inline_bytes for binaries). For examples inlined binary HTTP body in static response will be in inline_bytes, I don't think we want to introduce a confusing examples here without concrete reason behind it.

@htuch
Copy link
Member

htuch commented Sep 17, 2020

Yeah, I agree with @lizan on the base64 encoding, ideally we use inline_bytes. The canonical representation of proto3 bytes in JSON/YAML is base64 I think.

@zuercher
Copy link
Member

Ok. I'll drop support for inline_string for OCSP and merge master.

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>
Signed-off-by: Stephan Zuercher <zuercher@gmail.com>
Signed-off-by: Stephan Zuercher <zuercher@gmail.com>
Signed-off-by: Stephan Zuercher <zuercher@gmail.com>
@zuercher
Copy link
Member

Ready for review again.

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just two nit

@lizan lizan added the waiting label Sep 21, 2020
@htuch
Copy link
Member

htuch commented Sep 21, 2020

/lgtm api

Signed-off-by: Stephan Zuercher <zuercher@gmail.com>
Signed-off-by: Stephan Zuercher <zuercher@gmail.com>
@zuercher
Copy link
Member

ping @lizan

@zuercher zuercher merged commit cdd3a83 into envoyproxy:master Sep 24, 2020
switch (ocsp_staple_action) {
case OcspStapleAction::Staple: {
// We avoid setting the OCSP response if the client didn't request it, but doing so is safe.
RELEASE_ASSERT(selected_ctx->ocsp_response_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! I was just doing a pass on this PR and had a quick question to understand this better -- why is this a RELEASE_ASSERT?
My read initially was that the ocsp_response_ must be non-null since for OcspStapleAction::Staple we must have set the response (or otherwise thrown an error before if it didn't match its cert). My question is, is this a program invariant?

I was reading this in regards to envoy style:

Per above it's acceptable to turn failures into crash semantics via RELEASE_ASSERT(condition) or PANIC(message) if there is no other sensible behavior, e.g. in OOM (memory/FD) scenarios. Only RELEASE_ASSERT(condition) should be used to validate conditions that might be imposed by the external environment. ASSERT(condition) should be used to document (and check in debug-only builds) program invariants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants