Attempts to fix flakiness in session_ticket_test #3913

maddeleine · 2023-03-30T20:21:39Z

Resolved issues:

N/A

Description of changes:

There are two types of flakiness seen with this test. One is that we expect the server to issue a new session ticket during the handshake and occasionally it does not. The other is that sometimes the ticket key used for encrypting the session ticket is not the expected ticket key.
We suspect that the cause of the first issue is that in order issue a session ticket the server needs to check if the ticket key is available. A key is valid if its intro time is earlier than the current time. Wall clocks don't guarantee montonicity, so it is possible that this check might not pass if the test executes quickly and the key was added close to when it is actually used. To fix this issue I changed the key intro time so that it is introduced in the past, so that by the time the key is used key_intro_time < now.
The second issue is caused by the fact that selecting a key is a random weighted selection, so we expect it to actually select the wrong key sometimes. To fix this I put the test in a loop and accepted some rate of failure.

It's very hard to verify that these fixes work since flakiness by nature is hard to replicate. I suspect we may need to iterate on this solution a bit if it gets merged and failures are still observed. But it's a start towards fixing the issue.

Also added an issue to beef up testing around the key selection algorithm: #3922

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tests/unit/s2n_session_ticket_test.c

lrstewart · 2023-04-06T17:50:37Z

tests/unit/s2n_session_ticket_test.c

+         * We expect to choose the wrong key 0.02% of the time. This value is drawn from the weight of the expected key, 
+         * which does not change per test run. Therefore, the probability that the test chooses the wrong key
+         * allowed_failures times is 0.0002 ^ 10, which is extremely unlikely to occur.


0.02% of the time is 1/5000. My understanding is that we're seeing this test fail more often than that.

Letting it fail up to 10 times doesn't seem like the right solution. The logic would definitely be wrong if the test failed more often than it succeeded, but that would pass this test. Like, if we're choosing the wrong key 9/10 times, that is very clearly an error.

Probably should mention that we're not seeing this test fail more often than expected. There was a flaw in my testing methodology when I got those results. I can change this to be 0.0002^2, that is still a really small probability, I just wasn't sure what number to choose.
Technically it is possible to choose the wrong key 9/10 times, its just very improbable.

I think it's important to identify what is being tested here. If I'm following the assertions correctly, it's wanting to show that when the expected key is selected, it behaves in a certain way. The original test even went as far to assume that it would never pick another one.

Now if we're wanting to test the distribution of it picking another one, we should have a more specialized test for that, as highlighted in #3922; ideally one that doesn't use self-talk but calls the API directly and has a large sample size to eliminate flakiness.

The limit of 10 is mostly just an arbitrary number to be able to say that it's very unlikely this will fail in this way again but not have it spin indefinitely.

Talked online and lowered allowed failures to 1. Also updated the comment.

github-actions bot added the s2n-core team label Mar 30, 2023

maddeleine requested review from lrstewart and goatgoose March 30, 2023 22:32

lrstewart reviewed Mar 31, 2023

View reviewed changes

maddeleine added 8 commits April 5, 2023 14:50

Fixing flakiness

bde327b

clang fixes

afd98b0

Increase key count

18fabaf

Messing with iterations

ca1a08f

Added const vals

63da7a4

Removed random double generation

305db4c

PR feedback

89b4ba1

Removed flakiness from test

8b8e109

maddeleine force-pushed the flaky_test branch from 4c2c699 to 8b8e109 Compare April 5, 2023 21:50

maddeleine mentioned this pull request Apr 5, 2023

Algorithm for choosing ticket key needs better unit tests #3922

Open

maddeleine requested a review from lrstewart April 5, 2023 22:02

clang fixes

8e2d821

maddeleine requested a review from camshaft April 5, 2023 22:16

lrstewart reviewed Apr 6, 2023

View reviewed changes

camshaft approved these changes Apr 6, 2023

View reviewed changes

Lowering allowed_failures

c6f8250

maddeleine requested a review from lrstewart April 6, 2023 21:39

lrstewart approved these changes Apr 11, 2023

View reviewed changes

Merge branch 'main' into flaky_test

9d58f86

maddeleine enabled auto-merge (squash) April 11, 2023 18:25

maddeleine merged commit 49097f4 into aws:main Apr 11, 2023

maddeleine deleted the flaky_test branch April 11, 2023 19:47

maddeleine mentioned this pull request May 2, 2023

Fix Flaky s2n_session_ticket_test #1563

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempts to fix flakiness in session_ticket_test #3913

Attempts to fix flakiness in session_ticket_test #3913

maddeleine commented Mar 30, 2023 •

edited

Loading

lrstewart Apr 6, 2023

maddeleine Apr 6, 2023

camshaft Apr 6, 2023

camshaft Apr 6, 2023

maddeleine Apr 6, 2023

Attempts to fix flakiness in session_ticket_test #3913

Attempts to fix flakiness in session_ticket_test #3913

Conversation

maddeleine commented Mar 30, 2023 • edited Loading

Resolved issues:

Description of changes:

lrstewart Apr 6, 2023

Choose a reason for hiding this comment

maddeleine Apr 6, 2023

Choose a reason for hiding this comment

camshaft Apr 6, 2023

Choose a reason for hiding this comment

camshaft Apr 6, 2023

Choose a reason for hiding this comment

maddeleine Apr 6, 2023

Choose a reason for hiding this comment

maddeleine commented Mar 30, 2023 •

edited

Loading