Skip to content

[native] Fix crash in httpclient when accessing null eventbase#23258

Merged
amitkdutta merged 1 commit intoprestodb:masterfrom
amitkdutta:eventbase_null_check
Jul 19, 2024
Merged

[native] Fix crash in httpclient when accessing null eventbase#23258
amitkdutta merged 1 commit intoprestodb:masterfrom
amitkdutta:eventbase_null_check

Conversation

@amitkdutta
Copy link
Contributor

@amitkdutta amitkdutta commented Jul 19, 2024

Description

Fixing a crash when eventbase is null and accessed in httpclient

Motivation and Context

We are observing following crash sporadically when a worker is shutting down:

(unknown) at ../signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:-1
folly::EventBase::runInEventBaseThread(folly::Function<void ()>) at fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/atomic_base.h:636

facebook::presto::http::HttpClient::sendRequest(proxygen::HTTPMessage const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) at ./fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/http/HttpClient.cpp:536

facebook::presto::http::RequestBuilder::send(facebook::presto::http::HttpClient*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) at fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/http/HttpClient.h:248

facebook::presto::PrestoExchangeSource::doAbortResults(long) at ./fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PrestoExchangeSource.cpp:481
facebook::presto::PrestoExchangeSource::handleAbortResponse(folly::Try<std::unique_ptr<facebook::presto::http::HttpResponse, std::default_delete<facebook::presto::http::HttpResponse> > >) at ./fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PrestoExchangeSource.cpp:509

_ZN5folly6detail8function5call_IZNS_7futures6detail4CoreISt10unique_ptrIN8facebook6presto4http12HttpResponseESt14default_deleteISA_EEE11setCallbackIZNS4_10FutureBaseISD_E18thenImplementationIZNOS_6FutureISD_E7thenTryIZNS8_20PrestoExchangeSource14doAbortResultsElE3$_5EENSJ_INS4_17tryCallableResultISD_T_E10value_typeEEEOSP_EUlONS_8Executor9KeepAliveISU_EEONS_3TryISD_EEE_NS4_25tryExecutorCallableResultISD_S11_EEEENSt9enable_ifIXntsrNT0_13ReturnsFutureE5valueENSJ_INS15_10value_typeEEEE4typeEST_S15_NS4_18InlineContinuationEEUlSX_S10_E_EEvST_OSt10shared_ptrINS_14RequestContextEES1A_EUlRNS4_8CoreBaseESX_PNS_17exception_wrapperEE_Lb1ELb0EvJS1H_SX_S1J_EEET2_DpT3_RNS1_4DataE at ./fbcode/github/presto-trunk/presto-native-execution/presto_cpp/main/PrestoExchangeSource.cpp:485
void folly::detail::function::call_<folly::Executor::KeepAlive<folly::Executor>::add<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0>(folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<folly::Executor>&&, folly::futures::detail::State)::$_0&&) &&::{lambda()#1}, true, false, void>(, folly::detail::function::Data&) at fbcode/folly/Function.h:370
folly::ThreadPoolExecutor::runTask(std::shared_ptr<folly::ThreadPoolExecutor::Thread> const&, folly::ThreadPoolExecutor::Task&&) at fbcode/folly/Function.h:370
folly::CPUThreadPoolExecutor::threadRun(std::shared_ptr<folly::ThreadPoolExecutor::Thread>) at ./fbcode/folly/executors/CPUThreadPoolExecutor.cpp:350
void folly::detail::function::call_<std::_Bind<void (folly::ThreadPoolExecutor::*(folly::ThreadPoolExecutor*, std::shared_ptr<folly::ThreadPoolExecutor::Thread>))(std::shared_ptr<folly::ThreadPoolExecutor::Thread>)>, true, false, void>(, folly::detail::function::Data&) at fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/invoke.h:74
execute_native_thread_routine at /home/engshare/third-party2/libgcc/11.x/src/gcc-11.x/x86_64-facebook-linux/libstdc++-v3/src/c++11/../../../.././libstdc++-v3/src/c++11/thread.cc:82
start_thread at /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434
__clone3 at /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The reason is eventbase_ is a raw pointer and it can be null in this scenario. Adding null check before accessing it.

Impact

Shutdown crash is not observed anymore.

Test Plan

Existing tests and ran in a cluster setup to see worker shutdown is going fine.

@amitkdutta amitkdutta requested a review from a team as a code owner July 19, 2024 01:56
Copy link
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amitkdutta

@amitkdutta amitkdutta merged commit 2fdbc69 into prestodb:master Jul 19, 2024
};

if (delayMs > 0) {
if (delayMs > 0 && eventBase_ != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else condition uses eventBase_ as well. We should make sure life time of event base be equal or longer than client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are spot on! Let me fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants