test: simulated time block on sleep#10551
Conversation
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…s in FakeUpstream Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…which seems to work for the cache test. Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
failure in asan on IpVersions/ProxyFilterIntegrationTest.UpstreamTls/IPv4 in //test/extensions/filters/http/dynamic_forward_proxy:proxy_filter_integration_test |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
…p on 0ms real time for now Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s), but failed to run 2 pipeline(s). |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks this is awesome. A few small comments and I will definitely like @alyssawilk to do a review also if possible. Also, can you try to do a big --runs_per_test run over all the tests that use simulated time to make sure we don't have any flakes? It should be all the tests you modified I guess.
/wait
| * Advances time forward by the specified duration. Timers may be triggered on | ||
| * their threads, but unlike advanceTimeWait(), this method does not block | ||
| * waiting for them to complete. |
There was a problem hiding this comment.
Can you add some text to each function as to why a test writer would want to use each one? It may not be immediately clear.
There was a problem hiding this comment.
done: added:
* This function should be used in multi-threaded tests, where other
* threads are running dispatcher loops. Integration tests should usually
* use this variant.
and
* This function should be used in single-threaded tests, in scenarios where
* after time is advanced, the main test thread will run a dispatcher
* loop. Unit tests will often use this variant.
| // This real-time polling delay should not be necessary. But without it, | ||
| // /test/extensions/filters/http/cache:cache_filter_integration_test fails | ||
| // about 40% of the time. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
Running all tests that referenced SimulatedTime with tsan and runs_per_test=100. Will check back tomorrow. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
All the tests that mention SimulatedTime (which covers more than what was modified in this PR) passed with tsan and --runs_per_test=100. There are still races (where there is still real-time there are races) I believe but they didn't come out in that run. |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
@alyssawilk can you take another look? thanks. |
alyssawilk
left a comment
There was a problem hiding this comment.
I'm mostly trusting Matt on what's called where, so only two remaining nits. The comments are a huge help, thanks.
As said on slack I'd love to see more than 100 runs since flakes generally show up 1:1000 or less.
As a random aside, the only_one_thread naming makes me sad because the function which is supposed to be called by multi-threaded tests checks "only one thread" and I had to spelunk in order to figure out why that was Ok.
| const MonotonicTime end_time = monotonicTime() + duration; | ||
|
|
||
| while (true) { | ||
| bool cont = true; |
There was a problem hiding this comment.
Renamed to timeout_not_reached
There was a problem hiding this comment.
did you forget to upload? I'm not seeing it?
There was a problem hiding this comment.
Nope; I thought I'd try intentionally not uploading until you were fine with my proposed changes, to avoid wasted CI churn :)
| if (alarms_.empty()) { | ||
| // If no alarms are pending, sleep till the end time. | ||
| setMonotonicTimeAndUnlock(end_time); | ||
| mutex.unlock(); |
There was a problem hiding this comment.
can you comment on the unusual lock pattern?
There was a problem hiding this comment.
Added comment:
// This function runs with the caller-provided mutex held. We need to
// hold this->mutex_ while accessing the timer-queue and blocking on
// callbacks completing. To avoid potential deadlock we must drop
// the caller's mutex before taking ours. We also must care to avoid
// break/continue/return/throw during this non-RAII lock operation.
+1 to change this and make it more clear. I had to look it up also. |
|
I made some renaming comment changes per suggestion. PTAL just at the comments; won't push till we finish iterating on the wording. I also successfully ran 1000 iters of tsan on 67 tests that mentioned SimulatedTime. That took 19 hours. I have not done anything about the |
alyssawilk
left a comment
There was a problem hiding this comment.
Yay on tests, thanks for the overnight run!
Also thanks for picking up the rename - I'd totally prefer a follow-up so you're good there.
| const MonotonicTime end_time = monotonicTime() + duration; | ||
|
|
||
| while (true) { | ||
| bool cont = true; |
There was a problem hiding this comment.
did you forget to upload? I'm not seeing it?
|
oh bleh, you even said, but I only saw the last email. yeah SGTM. |
|
Also fwiw this is all I had to say - headed afk to see if I can rest my headache away but I'm fine with Matt merging if he signs off else I'll TAL when I'm feeling better |
Signed-off-by: Joshua Marantz <jmarantz@google.com>
Signed-off-by: Joshua Marantz <jmarantz@google.com>
|
@mattklein123 you want to take a final look (maybe just at the local-var name-change and comment?) |
Make unit tests to use advanceTimeWait in favor of sleep (see envoyproxy/envoy#10551) Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Make unit tests to use advanceTimeWait in favor of sleep (see envoyproxy/envoy#10551) Signed-off-by: Otto van der Schaaf <oschaaf@we-amp.com>
Description: SimulatedTimeSystem::sleep() would advance time to trigger firing of timers, but does block on the threads executing those timers completing. That is the behavior does not match real-time sleep(), making it hard to retarget existing integration tests from real to simulated time.
However, there were some unit tests that are not multi-threaded, and used simulated time sleep in fashion where the timer callbacks would run on the same thread, and these need to retain the existing behavior. For these, a new API advanceTime() is now added, which enables callbacks but does not block waiting for them to complete.
Note that this PR does not yet solve all issues with sim-time in integration tests (#10568), as the infrastructure still relies on real-time sleeps in ways that are not fully understood. But it is a step in that direction.
Risk Level: low -- this is is a test/...-only PR.
Testing: /test/... in normal mode, and with tsan.
Docs Changes: n/a
Release Notes: n/a
Fixes: #10567