Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: flaky well-known endpoints test #3999

Closed
Tracked by #3246
jmayclin opened this issue May 10, 2023 · 5 comments
Closed
Tracked by #3246

ci: flaky well-known endpoints test #3999

jmayclin opened this issue May 10, 2023 · 5 comments

Comments

@jmayclin
Copy link
Contributor

Problem:

The well-known endpoints test occasionally fails to negotiate a TLS connection with Amazon.com

more errors, for well-known-endpoint Amazon on TLS 1.0

__________ test_well_known_endpoints[None-S2N-www.amazon.com-TLS1.0] ___________
Command '['s2nc', '--non-blocking', '-e', '-T', '-f', 
'../pems/trust-store/ca-bundle.trust.crt', '-c', 'test_all_tls12', 
'--enter-fips-mode', 'www.amazon.com', '443']' timed out after 5 seconds
 s2nc --non-blocking -e -T -f ../pems/trust-store/ca-bundle.trust.crt -c
 test_all_tls12 --enter-fips-mode www.amazon.com 443

log link

Solution:

Uncertain at the moment. I'm going to continue to paste failure here as I see them, until I can hopefully establish a pattern of behavior. E.g. does it always fail on the same TLS version?

Requirements / Acceptance Criteria:

This test must be reliable enough that when it fails my first thought should be "oh no, I have broken something with TLS negotiation" and not "I need to restart the test"

@jmayclin
Copy link
Contributor Author

TLS 1.2 failure with amazon.com

logs

FAILED test_well_known_endpoints.py::test_well_known_endpoints[KMS-PQ-TLS-1-0-2019-06-S2N-www.amazon.com-TLS1.2]
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!
============== 1 failed, 75 passed, 10 rerun in 62.03s (0:01:02) ===============
py39: exit 2 (62.25 seconds) /codebuild/output/src275075414/src/github.com/aws/s2n-tls/tests/integrationv2> pytest -x -n=2 --maxfail=1 --reruns=2 --cache-clear -rpfsq -o log_cli=true --log-cli-level=INFO --provider-version=openssl-1.0.2 --provider-criterion=off --fips-mode=0 --no-pq=0 /codebuild/output/src275075414/src/github.com/aws/s2n-tls/tests/integrationv2/test_well_known_endpoints.py pid=23849
  py39: FAIL code 2 (62.68=setup[0.43]+cmd[62.25] seconds)
  evaluation failed :( (62.74 seconds)
AssertionError: Command '['s2nc', '--non-blocking', '-e', '-T', '-f', '../pems/trust-store/ca-bundle.crt', '-c', 'KMS-PQ-TLS-1-0-2019-06', 'www.amazon.com', '443']' timed out after 5 seconds s2nc --non-blocking -e -T -f ../pems/trust-store/ca-bundle.crt -c KMS-PQ-TLS-1-0-2019-06 www.amazon.com 443

@jmayclin
Copy link
Contributor Author

TLS 1.1 failure with amazon.com

FAILED test_well_known_endpoints.py::test_well_known_endpoints[PQ-SIKE-TEST-TLS-1-0-2019-11-S2N-www.amazon.com-TLS1.1]

@maddeleine
Copy link
Contributor

This is kind of an interesting idea from this issue #756. We could run both s_client and s2nc and the test would fail if only s2nc fails, otherwise we conclude there's something wrong with the endpoint and not us.

@jmayclin
Copy link
Contributor Author

Certainly seems worth a try.

Although the easier solution is to just remove it from our list and replace it with some other well-known endpoint.

I'd also love for this to be considered a high priority, since the addition of the nix job in CI checks means that the failure rate of this test is now worse because we need 2 successes instead of just the 1.

@jmayclin
Copy link
Contributor Author

Resolved in #4884

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants