-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dispatcher: Run zero-delay timeout timers on the next iteration of the event loop #11823
dispatcher: Run zero-delay timeout timers on the next iteration of the event loop #11823
Conversation
…d of the current iteration. Guarded by runtime feature "envoy.reloadable_features.activate_timers_next_event_loop" Signed-off-by: Antonio Vicente <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks this is awesome (especially the comments about libevent). The prod changes LGTM. @jmarantz can you have review the simtime changes and tests?
Signed-off-by: Antonio Vicente <[email protected]>
merge master when you get a chance; I will assume the CI failures are due to flakes outside this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome work. left some discussion topics, unless you think this effort closes the topic.
/wait |
…rs_next_time Signed-off-by: Antonio Vicente <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge master when you get a chance; I will assume the CI failures are due to flakes outside this PR.
Done.
…rs_next_time Signed-off-by: Antonio Vicente <[email protected]>
…rs_next_time Signed-off-by: Antonio Vicente <[email protected]>
…rs_next_time Signed-off-by: Antonio Vicente <[email protected]>
Signed-off-by: Antonio Vicente <[email protected]>
@jmarantz PTAL when you have a chance. Thanks. |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically looks great, modulo a tiny nit and a question about windows vs non-windows.
/wait |
Maybe play 'master merge roulette' to see if you get lucky? IDK. FWIW the tsan failure was a segv in ServerTest.LogToFile:
Super annoying. No idea why that would segv, and there was no stack trace. |
…rs_next_time Signed-off-by: Antonio Vicente <[email protected]>
In your arm64-release test this failure happens and looks plausibly related
|
This is a determistic failure. I was able to repo locally in coverage mode with asan. Seems related to use of runtime features on timers created too early in the process. It is triggered by use of "-l trace" when running in coverage mode. [ RUN ] IpVersions/ServerInstanceImplTest.LogToFile/IPv4 |
I think that the two possible ways to fix this issue with use of runtime features too early in the process are:
I'm open to suggetions. |
Can we do this conditionally? i.e. check if logger is created or not before logging there. |
Logging happens inside the call to Runtime::runtimeFeatureEnabled. I have the impression that some loggers are created but loggers backed by files end up creating timers during the creation process and run into problems. One possible option would be to check for Runtime::LoaderSingleton::getExisting() before calling Runtime::runtimeFeatureEnabled in the timer class, but that doesn't feel great either. |
Signed-off-by: Antonio Vicente <[email protected]>
…rs_next_time Signed-off-by: Antonio Vicente <[email protected]>
Signed-off-by: Antonio Vicente <[email protected]>
Simulated time relinquishes the lock while activating timers. That allows other threads like the watchdog to insert timers that appear to be in the past. Invariant check makes //test/server:server_test fail about 0.5% of the time under ASAN. Signed-off-by: Antonio Vicente <[email protected]>
Signed-off-by: Antonio Vicente <[email protected]>
Signed-off-by: Antonio Vicente <[email protected]>
…rs_next_time Signed-off-by: Antonio Vicente <[email protected]>
Signed-off-by: Antonio Vicente <[email protected]>
Signed-off-by: Antonio Vicente <[email protected]>
Latest asan failure due to timeout in //test/common/network:udp_listener_impl_test is not reproducible over 2500 local runs. |
@antoniovicente sorry for the churn. Can you merge main one more time? I think tidy should be fixed now. /wait |
…rs_next_time Signed-off-by: Antonio Vicente <[email protected]>
Known flake. I will look at that today. @jmarantz are you OK with this change? We can force merge. |
Commit Message: dispatcher: Run 0-delay timeout timers on the next iteration of the event loop instead of the current iteration.
Additional Description: Processing 0-delay timers in the same loop they are generated can result in long timer callback chains which could starve other operations in the event loop or even result in infinite processing loops. Cases that required same-iteration scheduling behavior for 0-delay timers were refactored to use SchedulableCallback::scheduleCallbackCurrentIteration in #11663, so behavior changes due to this change should be relatively minor.
Risk Level: high, changes to event loop scheduling behavior of timers
Testing: unit
Docs Changes: n/a
Release Notes: n/a
Runtime guard: envoy.reloadable_features.activate_timers_next_event_loop