-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/1.0.0] HttpClientHandler tests failed with SSL Connect error on Debian and OSX #17550
Comments
I cannot repro this locally. The test connects to a remote, public, service (https://revoked.grc.com/). It seems likely that this service was simply having temporary issues when these tests ran. If this ends up being a common occurrence, we should find a way to make these tests more robust (use a local server, or something). But for now, I don't think there's anything to do here. |
Yeah, we've been trying to move off of these remote servers to local loopback servers, but it's been a bit slow-going. In this case, we needed a server capable of HTTP/2. |
This is reproing consistently on release debian runs, reopening this. @ericeil FYI |
We can't have all of our tests use loopback servers. We still need end-to-end coverage. And many times, bugs will not found when network i/o goes thru the loopback adapter. So, we need to consider this before changing more tests to loopback. What we need, instead, is an Xunit attribute with [Retries=n] so that the test can be retried if needed due to network congestion. |
cc: @CIPop |
All? No, I'm not suggesting "all". But more? Yes. For example, there's no strong reason why the majority of the tests we have using the echo server endpoints for testing things like redirection need to go off to a machine in the cloud.
Retries don't help when servers out of our control go down for periods of time. That was one of the primary reliability issues with some of the certificate-related tests, where the target servers would go down for periods of time, issues that largely went away by moving the innerloop tests to use a loopback server where possible and moving the tests that went off-box to outerloop. That said, I don't actually think that's the case here. I think the version of curl installed on those machines is having trouble with HTTP/2 + HTTPS. |
This looks like a machine configuration issue. I ran:
via the Jenkins Script Console on one of the Debian VMs, and I get this:
Note the line at the end about being "unable to get local issuer certificate". @bartonjs, am I understanding correctly that this means the needed certificate for verification isn't installed on the machine? |
It's like the OpenSsl library is broken on this distro, and not including the provided intermediate certs in the chainwalk.
Note that it has a 0 (the server cert), 1 (the low intermediate), and 2 (the high intermediate). Akamai follows the rules and didn't send 3 (the root). I saved 0 to akamai.cer, 1 to akamai.1.cer, and 2 to akamai.2.cer.
Okay, so it doesn't have a problem with the intermediates... is the chain legal?
So.... I don't know why OpenSSL is refusing to honor the input chain presented on the TLS connection... because that's why the chain is given in the first place... |
Okay, I have an answer for Debian. Ready for crazy? Here we go: TLS says you're supposed to send your server identity certificate and all intermediates (but not the root). In this case, what we get on the connection is:
(For the uninitiated: s=subject, i=issuer... notice how the s on one line is equal to the i on the preceding). libssl takes these certificates and shoves them into the "untrusted' bucket of an X509_STORE_CTX (aka chain builder). "untrusted" doesn't mean "bad", it means "extra context from before I start doing hard work"... the "untrusted" part means "not a source of trust". libcrypto takes this bucket and starts to build a chain. It builds, well, precisely this chain. It then asks "does the last thing sign itself?", and the answer is "no", so it decides more work is required. Now that more work is required, it says "okay, hash the name of the issuer of the topmost certificate, and look for a file by that name in a provided store directory" (/etc/ssl/certs is the only one provided). In this case, it happens to care about Sure enough, that file isn't found. Why? Because Debian removed all RSA-1024 certificates from the CA bundle, and GTE CyberTrust Root was an RSA-1024 cert. What's bad about this is that "Baltimore CyberTrust Root" is... well... also a root. That represents a cross-signed certificate authority. But since OpenSSL already found the Baltimore cert that was provided by Akamai OpenSSL won't look into the store any further. I must have accidentally ended up on a non-Debian machine when doing my previous testing, because the GTE CyberTrust Root cert was most definitely there (well, the long form was, the hashed name symlink might not have been); and SSH'd into a Debian 8(.4?) machine I get:
akamai.0.cer fails because it's issuer (akamai.1.cer) isn't well known. Fixes:
Turns out, we're not the first to have noticed this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812708 So, I think we'll want to put in release notes somewhere that (in case they weren't already painfully aware) Debian and TLS chain building aren't really in a happy place right now. The people on that bugreport thread mostly seem to be borrowing Ubuntu's ca-certificates contents to get back online. |
Thank you for the detailed investigation @bartonjs. Marking this as release notes so we record and reference the known Debian issue |
@bartonjs can you also prepare a PR to disable the 1 failing test for debian in order to get green badges in the /release/1.0.0 branch? |
@joshfree I'm on it. |
Release note contents: Debian users may experience unexpected failure when using SSL/TLSWhen new Root Certificate Authorities are being created it is not uncommon for the new CA public key to be "cross certified" by an existing Certificate Authority to boostrap the trust relationship into existing environments. For example, the "Baltimore CyberTrust Root" CA was cross-certified by the existing "GTE CyberTrust Global Root" CA. This process is usually considered to make clients more accepting. Metadata from the server can cause OpenSSL 1.0.1 to consider the cross-certified certificate chain without considering the direct-root chain. Combined with Debian's removal of some older trusted Root Certificate Authorities in the Microsoft has no specific guidance to offer users affected by this configuration state. This is currently tracked as bug 812488 in the Debian bug system (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812488). |
In both master and release, hence tagging this issue for RTM, pending investigation.
Test runs here
http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/release_1.0.0/job/debian8.4_debug/1/
http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/release_1.0.0/job/debian8.4_release/1/
http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/release_1.0.0/job/osx_debug/78/
http://dotnet-ci.cloudapp.net/job/dotnet_corefx/job/master/job/debian8.4_release/24/testReport/
The text was updated successfully, but these errors were encountered: