Skip to content

quiche: make cert verification check extended key usage of the cert#18309

Merged
yanavlasov merged 11 commits intoenvoyproxy:mainfrom
danzh2010:certverify
Oct 8, 2021
Merged

quiche: make cert verification check extended key usage of the cert#18309
yanavlasov merged 11 commits intoenvoyproxy:mainfrom
danzh2010:certverify

Conversation

@danzh2010
Copy link
Copy Markdown
Contributor

@danzh2010 danzh2010 commented Sep 29, 2021

Restrict the set of certificates Envoy, as a TLS client, accepts from the peer to only those certificates that contain extendedKeyUsage: serverAuth. The change only affects QUIC.

Risk Level: low, QUIC support is in alpha
Testing: added unit tests in QUIC proof verifier

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Check envoy-presubmit isn't fully completed, but will still attempt retrying.
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18309 (comment) was created by @danzh2010.

see: more, trace.

@lizan
Copy link
Copy Markdown
Member

lizan commented Sep 30, 2021

/assign-from @envoyproxy/first-pass-reviewers

@repokitteh-read-only
Copy link
Copy Markdown

@envoyproxy/first-pass-reviewers assignee is @wbpcode

🐱

Caused by: a #18309 (comment) was created by @lizan.

see: more, trace.

return false;
}
// Currently only EnvoyQuicProofVerifier, which is used by the client code, calls this method. So
// hard-code "ssl_server" for now.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that since this method is always called to verify a chain of certs presented by a server, it is fine to use ssl_server here. In other words I don't think we need to mention EnvoyQuicProofVerifier in the comment and instead can simply explain that this method is always call to verify server certs.

What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased the comment.

}

TEST_F(EnvoyQuicProofVerifierTest, VerifyCertChainFailureNonServerAuthEKU) {
root_ca_cert_ = R"(-----BEGIN CERTIFICATE-----
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have documentation somewhere that explains how these certs were generated?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under test/config/integration/certs/ there are bunch of cert configs and certs.sh to generate various kinda cert. I just modified servercert.cfg a bit and run the script.

Commented about where this cert comes from.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Copy link
Copy Markdown
Contributor

@RyanTheOptimist RyanTheOptimist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks, Dan!

}

TEST_F(EnvoyQuicProofVerifierTest, VerifyCertChainFailureNonServerAuthEKU) {
root_ca_cert_ = R"(-----BEGIN CERTIFICATE-----
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contributions and sorry for the delayed review comments. Add some minor comments to the PR.

In addition, can you add a little more detailed background info about why we need this PR? Although I probably understand the purpose of your PR. But if there is some clear description, it will be more convenient for others to view this part of the code in the future.

Comment on lines +1176 to +1177
error_details = "Failed to verify certificate chain: X509_STORE_CTX_set_default";
return false;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a case to test this if branch?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, it's not clear to me how X509_STORE_CTX_set_default() would return 0 with "ssl_server". Probably this is impossible if the name is hard-coded to "ssl_server". But I'm not a boring SSL expert.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is impossible with the hard-coded parameter, I would still prefer to handle the case where it returns 0 so that future BoringSSL change doesn't break Envoy in a unexpected way.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it. But it seems that without this test, the test coverage is not up to the requirements. We can let the maintainer decide whether we can reduce the coverage requirement of this module.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a few irrelevant tls unit tests to raise coverage.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is impossible with a hard-coded parameter. To synthetically generate code coverage, you'd have to allow the name ("ssl_server") to be injectable by the test, and then supply an invalid/unregistered name.

}

TEST_F(EnvoyQuicProofVerifierTest, VerifyCertChainFailureNonServerAuthEKU) {
// Override the CA cert with test/config/integration/certs/ca.pem
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling error: ca.pem -> cacert.pem
and
minor suggestion: May be Override the CA cert with cert copied from test/config/integration/certs/cacert.pem. is more clear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +224 to +225
// This is a cert same as test/config/integration/certs/servercert.pem but with extKeyUsage:
// clientAuth.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here should be a more accurate description of the process of generating this cert.

May be: This is a cert generated with the test/config/integration/certs/certs.sh. And the config that used to generate this cert is same as test/config/integration/certs/servercert.cfg but with 'extKeyUsage: clientAuth'.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Oct 2, 2021

Thanks, it's look good overall for me. However, it seems that because the newly added code is not covered by the test, the coverage does not meet the requirements.

@wbpcode Can we add a case to test this if branch?
@danzh2010 TBH, it's not clear to me how X509_STORE_CTX_set_default() would return 0 with "ssl_server". Probably this is impossible if the name is hard-coded to "ssl_server". But I'm not a boring SSL expert.
@danzh2010 Even if it is impossible with the hard-coded parameter, I would still prefer to handle the case where it returns 0 so that future BoringSSL change doesn't break Envoy in a unexpected way.
@wbpcode Get it. But it seems that without this test, the test coverage is not up to the requirements. We can let the maintainer decide whether we can reduce the coverage requirement of this module.

@envoyproxy/envoy-maintainers

Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18309 (comment) was created by @danzh2010.

see: more, trace.

@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Oct 6, 2021

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18309 (comment) was created by @wbpcode.

see: more, trace.

Comment on lines -89 to -91
bool isThreadSafe() const {
return callbacks_ != nullptr && callbacks_->connection().dispatcher().isThreadSafe();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we removed this method?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only called in one place in ASSERT() which is not ignored in coverage tests. In order to raise the coverage, I inlined it instead.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But I personally don't think this is good choice. 🤣

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just add some other tests to raise up cov or just edit the cov requirements?

Copy link
Copy Markdown
Contributor Author

@danzh2010 danzh2010 Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c296145 already added some unit tests which is irrelevant to this PR. And now the coverage CI is happy.
As to inlining isThreadSafe() it makes much sense to me as it was a private method called only in one place which is an ASSERT().

Copy link
Copy Markdown
Member

@wbpcode wbpcode Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's OK to inline this method. But besides improving cov, what other useful effects does this removing have? 🤔

Copy link
Copy Markdown
Member

@wbpcode wbpcode Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, maybe we shouldn't make this kind of modification that has nothing to do with this PR or feature itself just in order to improve cov. Of course, this is just a personal thought.

Anyway, I have already approve this PR for the maintainer to do further review. Thanks very much for you contributions. 😃

wbpcode
wbpcode previously approved these changes Oct 7, 2021
Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will give an approve to push the PR to the next step.

It is only called in one place in ASSERT() which is not ignored in coverage tests. In order to raise the coverage, I inlined it instead.

I personally think this is not a good choice but keep it for the further review of @envoyproxy/senior-maintainers

@danzh2010
Copy link
Copy Markdown
Contributor Author

/assign @alyssawilk

if (!X509_STORE_CTX_set_default(ctx.get(), "ssl_server")) {
error_details = "Failed to verify certificate chain: X509_STORE_CTX_set_default";
return false;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Should you also be inheriting the params from the SSL_CTX? I don't see this file manipulating them in any way, but it's an edge case I wanted to call out. Specifically, the following elements of the SSL_CTX are not carried over, so if they were ever used to configure the SSL_CTX, they wouldn't be propagated to the Quiche path:

  • SSL{_CTX}_set_verify_depth
  • SSL{_CTX}_set1_param
  • SSL{_CTX}_set_purpose
  • SSL{_CTX}_set_trust

Because this code already grabs the SSL_CTX_set_cert_store path (line 1169), perhaps it makes sense to do so here as well?

Suggested change
}
}
if (!X509_VERIFY_PARAM_set1(X509_STORE_CTX_get0_param(ctx.get()),
SSL_CTX_get0_param(ssl_ctx))) {
error_details = "Failed to verify certificate chain: X509_VERIFY_PARAM_set1";
return false;
}

This will copy the X509_VERIFY_PARAM config from the SSL_CTX into the X509_STORE_CTX's verify param as appropriate, and matches the SSL/TLS layer implementation.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address the testing/coverage question: This is also an "impossible failure" scenario; the only way for this to fail is for SSL_CTX_get0_param to return an invalid parameter (since it will not return nullptr)

Because X509_VERIFY_PARAM is an opaque struct, whose setters validate the incoming parameters, to get an invalid parameter into the SSL_CTX_get0_param, you must have first used one of the X509_VERIFY_PARAM_set functions, which would have similarly validated the parameters then. So this leaves the only possible room for failure being a malloc failure, which presumably will blow things up spectacularly anyways out of necessity :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Envoy doesn't config those aspects of SSL_CTX objects AFAIK. But sure that we can add X509_VERIFY_PARAM_set1() just in case.

Q please, with X509_VERIFY_PARAM_set1(), do I still need to call X509_STORE_CTX_set_default()?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The combination - of setting purpose, and then inheriting from the SSL_CTX, more or less matches how BoringSSL/OpenSSL's verifier works (source (ignoring RFC 6125 name matching, which I believe you separately implement)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +1176 to +1177
error_details = "Failed to verify certificate chain: X509_STORE_CTX_set_default";
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this is impossible with a hard-coded parameter. To synthetically generate code coverage, you'd have to allow the name ("ssl_server") to be injectable by the test, and then supply an invalid/unregistered name.

Signed-off-by: Dan Zhang <danzh@google.com>
@danzh2010
Copy link
Copy Markdown
Contributor Author

/assign @yanavlasov

@danzh2010
Copy link
Copy Markdown
Contributor Author

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #18309 (comment) was created by @danzh2010.

see: more, trace.

@danzh2010
Copy link
Copy Markdown
Contributor Author

@sleevi @yanavlasov PTAL

Copy link
Copy Markdown

@sleevi sleevi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a Boring/OpenSSL perspective.

@yanavlasov yanavlasov merged commit 351c0ca into envoyproxy:main Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants