Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dispatcher: Run zero-delay timeout timers on the next iteration of the event loop #11823

Merged

Conversation

antoniovicente
Copy link
Contributor

@antoniovicente antoniovicente commented Jun 30, 2020

Commit Message: dispatcher: Run 0-delay timeout timers on the next iteration of the event loop instead of the current iteration.
Additional Description: Processing 0-delay timers in the same loop they are generated can result in long timer callback chains which could starve other operations in the event loop or even result in infinite processing loops. Cases that required same-iteration scheduling behavior for 0-delay timers were refactored to use SchedulableCallback::scheduleCallbackCurrentIteration in #11663, so behavior changes due to this change should be relatively minor.
Risk Level: high, changes to event loop scheduling behavior of timers
Testing: unit
Docs Changes: n/a
Release Notes: n/a
Runtime guard: envoy.reloadable_features.activate_timers_next_event_loop

…d of the current iteration.

Guarded by runtime feature "envoy.reloadable_features.activate_timers_next_event_loop"

Signed-off-by: Antonio Vicente <[email protected]>
@antoniovicente antoniovicente changed the title dispatcher: Run 0ms timeout timers on the next iteration of the event loop dispatcher: Run zero-delay timeout timers on the next iteration of the event loop Jun 30, 2020
@mattklein123 mattklein123 self-assigned this Jun 30, 2020
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks this is awesome (especially the comments about libevent). The prod changes LGTM. @jmarantz can you have review the simtime changes and tests?

Signed-off-by: Antonio Vicente <[email protected]>
@jmarantz
Copy link
Contributor

jmarantz commented Jul 2, 2020

merge master when you get a chance; I will assume the CI failures are due to flakes outside this PR.

jmarantz
jmarantz previously approved these changes Jul 2, 2020
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome work. left some discussion topics, unless you think this effort closes the topic.

source/common/event/libevent_scheduler.h Show resolved Hide resolved
source/common/event/libevent_scheduler.h Show resolved Hide resolved
source/common/event/libevent_scheduler.h Show resolved Hide resolved
@jmarantz
Copy link
Contributor

jmarantz commented Jul 2, 2020

/wait

Copy link
Contributor Author

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge master when you get a chance; I will assume the CI failures are due to flakes outside this PR.

Done.

source/common/event/libevent_scheduler.h Show resolved Hide resolved
source/common/event/libevent_scheduler.h Show resolved Hide resolved
source/common/event/libevent_scheduler.h Show resolved Hide resolved
@antoniovicente
Copy link
Contributor Author

/wait

@jmarantz PTAL when you have a chance. Thanks.

@jmarantz
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically looks great, modulo a tiny nit and a question about windows vs non-windows.

test/test_common/simulated_time_system_test.cc Outdated Show resolved Hide resolved
@jmarantz
Copy link
Contributor

/wait

@jmarantz
Copy link
Contributor

Maybe play 'master merge roulette' to see if you get lucky? IDK. FWIW the tsan failure was a segv in ServerTest.LogToFile:

2020-07-18T16:31:54.3830193Z [ RUN      ] IpVersions/ServerInstanceImplTest.LogToFile/IPv4
2020-07-18T16:31:54.3830955Z external/bazel_tools/tools/test/collect_coverage.sh: line 131: 16607 Segmentation fault      (core dumped) "$@"

Super annoying. No idea why that would segv, and there was no stack trace.

jmarantz
jmarantz previously approved these changes Jul 23, 2020
@jmarantz
Copy link
Contributor

jmarantz commented Jul 23, 2020

In your arm64-release test this failure happens and looks plausibly related

2020-07-23T16:55:24.5113964Z [ RUN      ] TimeSystemType/GuardDogDeathTest.MultiKillThresholdDeathTest/1
2020-07-23T16:55:24.5114490Z TestRandomGenerator running with seed 656857712
2020-07-23T16:55:24.5115292Z -- Test timed out at 2020-07-23 16:55:24 UTC --

@antoniovicente
Copy link
Contributor Author

antoniovicente commented Jul 23, 2020

Maybe play 'master merge roulette' to see if you get lucky? IDK. FWIW the tsan failure was a segv in ServerTest.LogToFile:

2020-07-18T16:31:54.3830193Z [ RUN      ] IpVersions/ServerInstanceImplTest.LogToFile/IPv4
2020-07-18T16:31:54.3830955Z external/bazel_tools/tools/test/collect_coverage.sh: line 131: 16607 Segmentation fault      (core dumped) "$@"

Super annoying. No idea why that would segv, and there was no stack trace.

cc @lizan

This segfault on IpVersions/ServerInstanceImplTest.LogToFile/IPv4 failure has happened at least twice in CI on this PR so it may be a real issue. I don't know how to repo, not having a crash stack makes debugging difficult.

This is a determistic failure. I was able to repo locally in coverage mode with asan. Seems related to use of runtime features on timers created too early in the process. It is triggered by use of "-l trace" when running in coverage mode.

[ RUN ] IpVersions/ServerInstanceImplTest.LogToFile/IPv4
source/common/common/logger_delegates.cc:21:14: runtime error: member call on misaligned address 0xbebebebebebebebe for type 'Envoy::AccessLog::AccessLogFile', which requires 8 byte alignment
0xbebebebebebebebe: note: pointer points here

#0 0x2121692f in Envoy::Logger::FileSinkDelegate::log(std::__1::basic_string_view<char, std::1::char_traits >) ??:?
#1 0x261ccabc in Envoy::Logger::DelegatingLogSink::log(spdlog::details::log_msg const&) ??:?
#2 0x16b75c4a in spdlog::logger::sink_it
(spdlog::details::log_msg const&) ??:?
#3 0x16dd6079 in spdlog::logger::log_it
(spdlog::details::log_msg const&, bool, bool) ??:?
#4 0x17d916a2 in void spdlog::logger::log<std::__1::basic_string_view<char, std::__1::char_traits > >(spdlog::source_loc, spdlog::level::level_enum, fmt::v6::basic_string_view, std::__1::basic_string_view<char, std::__1::char_traits > const&) ??:?
#5 0x21057a03 in Envoy::Runtime::runtimeFeatureEnabled(std::__1::basic_string_view<char, std::__1::char_traits >) ??:?
#6 0x1fe22052 in Envoy::Event::TimerImpl::TimerImpl(Envoy::CSmartPtr<event_base, &event_base_free>&, std::__1::function<void ()>, Envoy::Event::Dispatcher&) ??:?
#7 0x1fe1bc8a in std::__1::__unique_ifEnvoy::Event::TimerImpl::__unique_single std::__1::make_unique<Envoy::Event::TimerImpl, Envoy::CSmartPtr<event_base, &event_base_free>&, std::__1::function<void ()> const&, Envoy::Event::Dispatcher&>(Envoy::CSmartPtr<event_base, &event_base_free>&, std::__1::function<void ()> const&, Envoy::Event::Dispatcher&) libevent_scheduler.cc:?
#8 0x1fe175f8 in Envoy::Event::LibeventScheduler::createTimer(std::__1::function<void ()> const&, Envoy::Event::Dispatcher&) ??:?
#9 0x1fe155c3 in Envoy::Event::(anonymous namespace)::RealScheduler::createTimer(std::__1::function<void ()> const&, Envoy::Event::Dispatcher&) real_time_system.cc:?
#10 0x1f5486c9 in Envoy::Event::DispatcherImpl::createTimerInternal(std::__1::function<void ()>) ??:?
#11 0x1f5483ab in Envoy::Event::DispatcherImpl::createTimer(std::__1::function<void ()>) ??:?
#12 0x18f389bf in Envoy::AccessLog::AccessLogFileImpl::AccessLogFileImpl(std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >, Envoy::Thread::ThreadFactory&) ??:?
#13 0x18f62a33 in std::__1::__compressed_pair_elem<Envoy::AccessLog::AccessLogFileImpl, 1, false>::__compressed_pair_elem<std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(std::__1::piecewise_construct_t, std::__1::tuple<std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&>, std::__1::__tuple_indices<0ul, 1ul, 2ul, 3ul, 4ul, 5ul>) ??:?
#14 0x18f60852 in std::__1::__compressed_pair<std::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl, Envoy::AccessLog::AccessLogFileImpl>::__compressed_pair<std::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl&, std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&>(std::__1::piecewise_construct_t, std::__1::tuple<std::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl&>, std::__1::tuple<std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&>) ??:?
#15 0x18f5f0bc in std::__1::__shared_ptr_emplace<Envoy::AccessLog::AccessLogFileImpl, std::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl >::__shared_ptr_emplace<std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&>(std::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl, std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&) ??:?
#16 0x18f4f0b5 in ZNSt3__111make_sharedIN5Envoy9AccessLog17AccessLogFileImplEJNS_10unique_ptrINS1_10Filesystem4FileENS_14default_deleteIS6_EEEERNS1_5Event10DispatcherERNS1_6Thread13BasicLockableERNS1_18AccessLogFileStatsERKNS_6chrono8durationIxNS_5ratioILl1ELl1000EEEEERNSD_13ThreadFactoryEEEENS_9enable_ifIXntsr8is_arrayIT_EE5valueENS_10shared_ptrISS_EEE4typeEDpOT0 access_log_manager_impl.cc:?
#17 0x18f37fdc in Envoy::AccessLog::AccessLogManagerImpl::createAccessLog(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&) ??:?
#18 0x212165af in Envoy::Logger::FileSinkDelegate::FileSinkDelegate(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::AccessLog::AccessLogManager&, std::__1::shared_ptrEnvoy::Logger::DelegatingLogSink) ??:?
#19 0x1859c8e8 in std::__1::__unique_ifEnvoy::Logger::FileSinkDelegate::__unique_single std::__1::make_unique<Envoy::Logger::FileSinkDelegate, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::AccessLog::AccessLogManagerImpl&, std::__1::shared_ptrEnvoy::Logger::DelegatingLogSink >(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::AccessLog::AccessLogManagerImpl&, std::__1::shared_ptrEnvoy::Logger::DelegatingLogSink&&) server.cc:?
#20 0x184c18d5 in Envoy::Server::InstanceImpl::InstanceImpl(Envoy::Init::Manager&, Envoy::Server::Options const&, Envoy::Event::TimeSystem&, std::__1::shared_ptr<Envoy::Network::Address::Instance const>, Envoy::ListenerHooks&, Envoy::Server::HotRestart&, Envoy::Stats::StoreRoot&, Envoy::Thread::BasicLockable&, Envoy::Server::ComponentFactory&, std::__1::unique_ptr<Envoy::Random::RandomGenerator, std::__1::default_deleteEnvoy::Random::RandomGenerator >&&, Envoy::ThreadLocal::Instance&, Envoy::Thread::ThreadFactory&, Envoy::Filesystem::Instance&, std::__1::unique_ptr<Envoy::ProcessContext, std::__1::default_deleteEnvoy::ProcessContext >) ??:?
#21 0x16f32984 in std::__1::__unique_ifEnvoy::Server::InstanceImpl::__unique_single std::__1::make_unique<Envoy::Server::InstanceImpl, Envoy::Init::Manager&, testing::NiceMockEnvoy::Server::MockOptions&, Envoy::Event::GlobalTimeSystem&, std::__1::shared_ptrEnvoy::Network::Address::Ipv4Instance, Envoy::DefaultListenerHooks&, testing::NiceMockEnvoy::Server::MockHotRestart&, Envoy::Stats::TestIsolatedStoreImpl&, Envoy::Thread::MutexBasicLockable&, Envoy::Server::TestComponentFactory&, std::__1::unique_ptr<testing::NiceMockEnvoy::Random::MockRandomGenerator, std::__1::default_delete<testing::NiceMockEnvoy::Random::MockRandomGenerator > >, Envoy::ThreadLocal::InstanceImpl&, Envoy::Thread::ThreadFactory&, Envoy::Filesystem::Instance&, std::__1::unique_ptr<Envoy::ProcessContextImpl, std::__1::default_deleteEnvoy::ProcessContextImpl > >(Envoy::Init::Manager&, testing::NiceMockEnvoy::Server::MockOptions&, Envoy::Event::GlobalTimeSystem&, std::__1::shared_ptrEnvoy::Network::Address::Ipv4Instance&&, Envoy::DefaultListenerHooks&, testing::NiceMockEnvoy::Server::MockHotRestart&, Envoy::Stats::TestIsolatedStoreImpl&, Envoy::Thread::MutexBasicLockable&, Envoy::Server::TestComponentFactory&, std::__1::unique_ptr<testing::NiceMockEnvoy::Random::MockRandomGenerator, std::__1::default_delete<testing::NiceMockEnvoy::Random::MockRandomGenerator > >&&, Envoy::ThreadLocal::InstanceImpl&, Envoy::Thread::ThreadFactory&, Envoy::Filesystem::Instance&, std::__1::unique_ptr<Envoy::ProcessContextImpl, std::__1::default_deleteEnvoy::ProcessContextImpl >&&) server_test.cc:?
#22 0x169feb65 in Envoy::Server::(anonymous namespace)::ServerInstanceImplTestBase::initialize(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, bool) server_test.cc:?
#23 0x16a40152 in Envoy::Server::(anonymous namespace)::ServerInstanceImplTestBase::initialize(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&) server_test.cc:?
#24 0x16ae6dd7 in Envoy::Server::(anonymous namespace)::ServerInstanceImplTest_LogToFile_Test::TestBody() server_test.cc:?
#25 0x264f4ee0 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ??:?
#26 0x264a3dc7 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ??:?
#27 0x2645c33f in testing::Test::Run() ??:?
#28 0x2645e7df in testing::TestInfo::Run() ??:?
#29 0x264602c0 in testing::TestSuite::Run() ??:?
#30 0x26489a44 in testing::internal::UnitTestImpl::RunAllTests() ??:?
#31 0x2650ba60 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ??:?
#32 0x264ad6ac in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ??:?
#33 0x2648846b in testing::UnitTest::Run() ??:?
#34 0x1f2ceff5 in RUN_ALL_TESTS() ??:?
#35 0x1f2cccf9 in Envoy::TestRunner::RunTests(int, char**) ??:?
#36 0x1f2c5de8 in main ??:?
#37 0x7fe619924e0a in __libc_start_main /build/glibc-M65Gwz/glibc-2.30/csu/../csu/libc-start.c:308
#38 0x16905fa9 in _start ??:?

@antoniovicente
Copy link
Contributor Author

Maybe play 'master merge roulette' to see if you get lucky? IDK. FWIW the tsan failure was a segv in ServerTest.LogToFile:

2020-07-18T16:31:54.3830193Z [ RUN      ] IpVersions/ServerInstanceImplTest.LogToFile/IPv4
2020-07-18T16:31:54.3830955Z external/bazel_tools/tools/test/collect_coverage.sh: line 131: 16607 Segmentation fault      (core dumped) "$@"

Super annoying. No idea why that would segv, and there was no stack trace.

cc @lizan
This segfault on IpVersions/ServerInstanceImplTest.LogToFile/IPv4 failure has happened at least twice in CI on this PR so it may be a real issue. I don't know how to repo, not having a crash stack makes debugging difficult.

This is a determistic failure. I was able to repo locally in coverage mode with asan. Seems related to use of runtime features on timers created too early in the process. It is triggered by use of "-l trace" when running in coverage mode.

[ RUN ] IpVersions/ServerInstanceImplTest.LogToFile/IPv4
source/common/common/logger_delegates.cc:21:14: runtime error: member call on misaligned address 0xbebebebebebebebe for type 'Envoy::AccessLog::AccessLogFile', which requires 8 byte alignment
0xbebebebebebebebe: note: pointer points here

#0 0x2121692f in Envoy::Logger::FileSinkDelegate::log(std::__1::basic_string_view<char, std::1::char_traits >) ??:? #1 0x261ccabc in Envoy::Logger::DelegatingLogSink::log(spdlog::details::log_msg const&) ??:? #2 0x16b75c4a in spdlog::logger::sink_it(spdlog::details::log_msg const&) ??:? #3 0x16dd6079 in spdlog::logger::log_it(spdlog::details::log_msg const&, bool, bool) ??:?
#4 0x17d916a2 in void spdlog::logger::log<std::__1::basic_string_view<char, std::__1::char_traits > >(spdlog::source_loc, spdlog::level::level_enum, fmt::v6::basic_string_view, std::__1::basic_string_view<char, std::__1::char_traits > const&) ??:?
#5 0x21057a03 in Envoy::Runtime::runtimeFeatureEnabled(std::__1::basic_string_view<char, std::__1::char_traits >) ??:?
#6 0x1fe22052 in Envoy::Event::TimerImpl::TimerImpl(Envoy::CSmartPtr<event_base, &event_base_free>&, std::__1::function<void ()>, Envoy::Event::Dispatcher&) ??:?
#7 0x1fe1bc8a in std::__1::__unique_ifEnvoy::Event::TimerImpl::__unique_single std::__1::make_unique<Envoy::Event::TimerImpl, Envoy::CSmartPtr<event_base, &event_base_free>&, std::__1::function<void ()> const&, Envoy::Event::Dispatcher&>(Envoy::CSmartPtr<event_base, &event_base_free>&, std::__1::function<void ()> const&, Envoy::Event::Dispatcher&) libevent_scheduler.cc:?
#8 0x1fe175f8 in Envoy::Event::LibeventScheduler::createTimer(std::__1::function<void ()> const&, Envoy::Event::Dispatcher&) ??:?
#9 0x1fe155c3 in Envoy::Event::(anonymous namespace)::RealScheduler::createTimer(std::__1::function<void ()> const&, Envoy::Event::Dispatcher&) real_time_system.cc:?
#10 0x1f5486c9 in Envoy::Event::DispatcherImpl::createTimerInternal(std::__1::function<void ()>) ??:?
#11 0x1f5483ab in Envoy::Event::DispatcherImpl::createTimer(std::__1::function<void ()>) ??:?
#12 0x18f389bf in Envoy::AccessLog::AccessLogFileImpl::AccessLogFileImpl(std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> >, Envoy::Thread::ThreadFactory&) ??:?
#13 0x18f62a33 in std::__1::__compressed_pair_elem<Envoy::AccessLog::AccessLogFileImpl, 1, false>::__compressed_pair_elem<std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul>(std::__1::piecewise_construct_t, std::__1::tuple<std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&>, std::__1::__tuple_indices<0ul, 1ul, 2ul, 3ul, 4ul, 5ul>) ??:?
#14 0x18f60852 in std::__1::__compressed_pair<std::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl, Envoy::AccessLog::AccessLogFileImpl>::__compressed_pair<std::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl&, std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&>(std::__1::piecewise_construct_t, std::__1::tuplestd::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl&, std::__1::tuple<std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&>) ??:?
#15 0x18f5f0bc in std::__1::__shared_ptr_emplace<Envoy::AccessLog::AccessLogFileImpl, std::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl >::__shared_ptr_emplace<std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&>(std::__1::allocatorEnvoy::AccessLog::AccessLogFileImpl, std::__1::unique_ptr<Envoy::Filesystem::File, std::__1::default_deleteEnvoy::Filesystem::File >&&, Envoy::Event::Dispatcher&, Envoy::Thread::BasicLockable&, Envoy::AccessLogFileStats&, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000l> > const&, Envoy::Thread::ThreadFactory&) ??:?
#16 0x18f4f0b5 in ZNSt3__111make_sharedIN5Envoy9AccessLog17AccessLogFileImplEJNS_10unique_ptrINS1_10Filesystem4FileENS_14default_deleteIS6_EEEERNS1_5Event10DispatcherERNS1_6Thread13BasicLockableERNS1_18AccessLogFileStatsERKNS_6chrono8durationIxNS_5ratioILl1ELl1000EEEEERNSD_13ThreadFactoryEEEENS_9enable_ifIXntsr8is_arrayIT_EE5valueENS_10shared_ptrISS_EEE4typeEDpOT0 access_log_manager_impl.cc:?
#17 0x18f37fdc in Envoy::AccessLog::AccessLogManagerImpl::createAccessLog(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&) ??:?
#18 0x212165af in Envoy::Logger::FileSinkDelegate::FileSinkDelegate(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::AccessLog::AccessLogManager&, std::__1::shared_ptrEnvoy::Logger::DelegatingLogSink) ??:?
#19 0x1859c8e8 in std::__1::__unique_ifEnvoy::Logger::FileSinkDelegate::__unique_single std::__1::make_unique<Envoy::Logger::FileSinkDelegate, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::AccessLog::AccessLogManagerImpl&, std::__1::shared_ptrEnvoy::Logger::DelegatingLogSink >(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, Envoy::AccessLog::AccessLogManagerImpl&, std::__1::shared_ptrEnvoy::Logger::DelegatingLogSink&&) server.cc:?
#20 0x184c18d5 in Envoy::Server::InstanceImpl::InstanceImpl(Envoy::Init::Manager&, Envoy::Server::Options const&, Envoy::Event::TimeSystem&, std::__1::shared_ptr<Envoy::Network::Address::Instance const>, Envoy::ListenerHooks&, Envoy::Server::HotRestart&, Envoy::Stats::StoreRoot&, Envoy::Thread::BasicLockable&, Envoy::Server::ComponentFactory&, std::__1::unique_ptr<Envoy::Random::RandomGenerator, std::__1::default_deleteEnvoy::Random::RandomGenerator >&&, Envoy::ThreadLocal::Instance&, Envoy::Thread::ThreadFactory&, Envoy::Filesystem::Instance&, std::__1::unique_ptr<Envoy::ProcessContext, std::__1::default_deleteEnvoy::ProcessContext >) ??:?
#21 0x16f32984 in std::__1::__unique_ifEnvoy::Server::InstanceImpl::__unique_single std::__1::make_unique<Envoy::Server::InstanceImpl, Envoy::Init::Manager&, testing::NiceMockEnvoy::Server::MockOptions&, Envoy::Event::GlobalTimeSystem&, std::__1::shared_ptrEnvoy::Network::Address::Ipv4Instance, Envoy::DefaultListenerHooks&, testing::NiceMockEnvoy::Server::MockHotRestart&, Envoy::Stats::TestIsolatedStoreImpl&, Envoy::Thread::MutexBasicLockable&, Envoy::Server::TestComponentFactory&, std::__1::unique_ptr<testing::NiceMockEnvoy::Random::MockRandomGenerator, std::__1::default_delete<testing::NiceMockEnvoy::Random::MockRandomGenerator > >, Envoy::ThreadLocal::InstanceImpl&, Envoy::Thread::ThreadFactory&, Envoy::Filesystem::Instance&, std::__1::unique_ptr<Envoy::ProcessContextImpl, std::__1::default_deleteEnvoy::ProcessContextImpl > >(Envoy::Init::Manager&, testing::NiceMockEnvoy::Server::MockOptions&, Envoy::Event::GlobalTimeSystem&, std::__1::shared_ptrEnvoy::Network::Address::Ipv4Instance&&, Envoy::DefaultListenerHooks&, testing::NiceMockEnvoy::Server::MockHotRestart&, Envoy::Stats::TestIsolatedStoreImpl&, Envoy::Thread::MutexBasicLockable&, Envoy::Server::TestComponentFactory&, std::__1::unique_ptr<testing::NiceMockEnvoy::Random::MockRandomGenerator, std::__1::default_delete<testing::NiceMockEnvoy::Random::MockRandomGenerator > >&&, Envoy::ThreadLocal::InstanceImpl&, Envoy::Thread::ThreadFactory&, Envoy::Filesystem::Instance&, std::__1::unique_ptr<Envoy::ProcessContextImpl, std::__1::default_deleteEnvoy::ProcessContextImpl >&&) server_test.cc:?
#22 0x169feb65 in Envoy::Server::(anonymous namespace)::ServerInstanceImplTestBase::initialize(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, bool) server_test.cc:?
#23 0x16a40152 in Envoy::Server::(anonymous namespace)::ServerInstanceImplTestBase::initialize(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&) server_test.cc:?
#24 0x16ae6dd7 in Envoy::Server::(anonymous namespace)::ServerInstanceImplTest_LogToFile_Test::TestBody() server_test.cc:?
#25 0x264f4ee0 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ??:?
#26 0x264a3dc7 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) ??:?
#27 0x2645c33f in testing::Test::Run() ??:?
#28 0x2645e7df in testing::TestInfo::Run() ??:?
#29 0x264602c0 in testing::TestSuite::Run() ??:?
#30 0x26489a44 in testing::internal::UnitTestImpl::RunAllTests() ??:?
#31 0x2650ba60 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ??:?
#32 0x264ad6ac in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::)(), char const) ??:?
#33 0x2648846b in testing::UnitTest::Run() ??:?
#34 0x1f2ceff5 in RUN_ALL_TESTS() ??:?
#35 0x1f2cccf9 in Envoy::TestRunner::RunTests(int, char**) ??:?
#36 0x1f2c5de8 in main ??:?
#37 0x7fe619924e0a in __libc_start_main /build/glibc-M65Gwz/glibc-2.30/csu/../csu/libc-start.c:308
#38 0x16905fa9 in _start ??:?

I think that the two possible ways to fix this issue with use of runtime features too early in the process are:

  1. Remove the log in Envoy::Runtime::runtimeFeatureEnabled Runtime::LoaderSingleton::getExisting() retuns nullptr
  2. Do not add runtime feature guards to this change.

I'm open to suggetions.

@lizan
Copy link
Member

lizan commented Jul 25, 2020

Remove the log in Envoy::Runtime::runtimeFeatureEnabled Runtime::LoaderSingleton::getExisting() retuns nullptr

Can we do this conditionally? i.e. check if logger is created or not before logging there.

@antoniovicente
Copy link
Contributor Author

Remove the log in Envoy::Runtime::runtimeFeatureEnabled Runtime::LoaderSingleton::getExisting() retuns nullptr

Can we do this conditionally? i.e. check if logger is created or not before logging there.

Logging happens inside the call to Runtime::runtimeFeatureEnabled. I have the impression that some loggers are created but loggers backed by files end up creating timers during the creation process and run into problems.

One possible option would be to check for Runtime::LoaderSingleton::getExisting() before calling Runtime::runtimeFeatureEnabled in the timer class, but that doesn't feel great either.

Simulated time relinquishes the lock while activating timers.  That allows other threads like the watchdog to insert timers that appear to be in the past.
Invariant check makes //test/server:server_test fail about 0.5% of the time under ASAN.

Signed-off-by: Antonio Vicente <[email protected]>
Signed-off-by: Antonio Vicente <[email protected]>
Signed-off-by: Antonio Vicente <[email protected]>
Signed-off-by: Antonio Vicente <[email protected]>
@antoniovicente
Copy link
Contributor Author

Latest asan failure due to timeout in //test/common/network:udp_listener_impl_test is not reproducible over 2500 local runs.
forcing rerun of CI.

@mattklein123
Copy link
Member

@antoniovicente sorry for the churn. Can you merge main one more time? I think tidy should be fixed now.

/wait

@mattklein123
Copy link
Member

Known flake. I will look at that today. @jmarantz are you OK with this change? We can force merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants