Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce the lock granularity of TaskWorkerPool #3571

Merged
merged 6 commits into from
Feb 24, 2022
Merged

Reduce the lock granularity of TaskWorkerPool #3571

merged 6 commits into from
Feb 24, 2022

Conversation

Astralidea
Copy link
Contributor

@Astralidea Astralidea commented Feb 21, 2022

What type of PR is this:

  • bug
  • feature
  • enhancement
  • others

Which issues of this PR fixes :

Fixes #3539

Problem Summary(Required) :

_s_task_signatures_lock will lock all task of be.
When FE send lot of DROP task other type of task can not response to FE. will cause many strange problem.
image

there are many thread to submit_tasks

#0  0x00007f73d0b0e54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f73d0b09e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f73d0b09d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000041878a8 in __gthread_mutex_lock (__mutex=0xffa5540 <starrocks::TaskWorkerPool::_s_task_signatures_lock>) at /usr/include/c++/10.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:749
#4  0x000000000418809c in std::mutex::lock (this=0xffa5540 <starrocks::TaskWorkerPool::_s_task_signatures_lock>) at /usr/include/c++/10.3.0/bits/std_mutex.h:100
#5  0x000000000418f8f9 in std::lock_guard<std::mutex>::lock_guard (this=0x7f72b45b0dd0, __m=...) at /usr/include/c++/10.3.0/bits/std_mutex.h:159
#6  0x0000000005fc7d7b in starrocks::TaskWorkerPool::submit_tasks (this=0x611000560d00, tasks=0x7f72b45b1b30) at /root/starrocks/be/src/agent/task_worker_pool.cpp:211
#7  0x0000000005fafc8c in starrocks::AgentServer::submit_tasks (this=0x610000177540, agent_result=..., tasks=...) at /root/starrocks/be/src/agent/agent_server.cpp:232
#8  0x000000000556a412 in starrocks::BackendService::submit_tasks (this=0x60300708c7f0, return_value=..., tasks=...) at /root/starrocks/be/src/service/backend_service.h:80
#9  0x000000000585bee4 in starrocks::BackendServiceProcessor::process_submit_tasks (this=0x6080005a8fa0, seqid=1, iprot=0x6060006aeda0, oprot=0x6060006aeaa0, callContext=0x603004348510) at /root/starrocks/gensrc/build/gen_cpp/BackendService.cpp:4958
#10 0x000000000585709d in starrocks::BackendServiceProcessor::dispatchCall (this=0x6080005a8fa0, iprot=0x6060006aeda0, oprot=0x6060006aeaa0, fname=..., seqid=1, callContext=0x603004348510) at /root/starrocks/gensrc/build/gen_cpp/BackendService.cpp:4715
#11 0x00000000041a5c7a in apache::thrift::TDispatchProcessor::process (this=0x6080005a8fa0, in=..., out=..., connectionContext=0x603004348510) at /var/local/thirdparty/installed/include/thrift/TDispatchProcessor.h:121
#12 0x0000000009c0c921 in apache::thrift::server::TConnectedClient::run (this=0x60b0008452c0) at /usr/include/c++/10.3.0/ext/atomicity.h:100
#13 0x0000000009c053b3 in apache::thrift::server::TThreadedServer::TConnectedClientRunner::run (this=0x6060006aeb70) at src/thrift/server/TThreadedServer.cpp:145
#14 0x0000000009c0794a in apache::thrift::concurrency::Thread::threadMain (thread=...) at /usr/include/c++/10.3.0/ext/atomicity.h:100
#15 0x0000000009bf0730 in std::__invoke_impl<void, void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > (__f=<optimized out>) at /usr/include/c++/10.3.0/thread:215
#16 std::__invoke<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > (__fn=<optimized out>) at /usr/include/c++/10.3.0/bits/invoke.h:95
#17 std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > >::_M_invoke<0ul, 1ul> (this=<optimized out>) at /usr/include/c++/10.3.0/thread:264
#18 std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > >::operator() (this=<optimized out>) at /usr/include/c++/10.3.0/thread:271
#19 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(std::shared_ptr<apache::thrift::concurrency::Thread>), std::shared_ptr<apache::thrift::concurrency::Thread> > > >::_M_run (this=<optimized out>) at /usr/include/c++/10.3.0/thread:215
#20 0x000000000b708910 in std::execute_native_thread_routine (__p=0x603000b7f220) at ../../../.././libstdc++-v3/src/c++11/thread.cc:80
#21 0x00007f73d0b07ea5 in start_thread () from /lib64/libpthread.so.0

there are many thread to drop tablet

#0  0x00007f73d0b0e54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f73d0b09e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f73d0b09d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000041878a8 in __gthread_mutex_lock (__mutex=0xffa5540 <starrocks::TaskWorkerPool::_s_task_signatures_lock>) at /usr/include/c++/10.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:749
#4  0x000000000418809c in std::mutex::lock (this=0xffa5540 <starrocks::TaskWorkerPool::_s_task_signatures_lock>) at /usr/include/c++/10.3.0/bits/std_mutex.h:100
#5  0x000000000418f8f9 in std::lock_guard<std::mutex>::lock_guard (this=0x7f7313a66340, __m=...) at /usr/include/c++/10.3.0/bits/std_mutex.h:159
#6  0x0000000005fc8d62 in starrocks::TaskWorkerPool::_remove_task_info (this=0x611000560d00, task_type=starrocks::TTaskType::DROP, signature=56066938) at /root/starrocks/be/src/agent/task_worker_pool.cpp:259
#7  0x0000000005fcb302 in starrocks::TaskWorkerPool::_drop_tablet_worker_thread_callback (arg_this=0x611000560d00) at /root/starrocks/be/src/agent/task_worker_pool.cpp:426
#8  0x0000000005ff43f6 in std::__invoke_impl<void*, void* (*)(void*), starrocks::TaskWorkerPool*> (__f=@0x6030013db4f0: 0x5fca95c <starrocks::TaskWorkerPool::_drop_tablet_worker_thread_callback(void*)>) at /usr/include/c++/10.3.0/bits/invoke.h:60
#9  0x0000000005ff431e in std::__invoke<void* (*)(void*), starrocks::TaskWorkerPool*> (__fn=@0x6030013db4f0: 0x5fca95c <starrocks::TaskWorkerPool::_drop_tablet_worker_thread_callback(void*)>) at /usr/include/c++/10.3.0/bits/invoke.h:95
#10 0x0000000005ff428f in std::thread::_Invoker<std::tuple<void* (*)(void*), starrocks::TaskWorkerPool*> >::_M_invoke<0ul, 1ul> (this=0x6030013db4e8) at /usr/include/c++/10.3.0/thread:264
#11 0x0000000005ff424a in std::thread::_Invoker<std::tuple<void* (*)(void*), starrocks::TaskWorkerPool*> >::operator() (this=0x6030013db4e8) at /usr/include/c++/10.3.0/thread:271
#12 0x0000000005ff4134 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void* (*)(void*), starrocks::TaskWorkerPool*> > >::_M_run (this=0x6030013db4e0) at /usr/include/c++/10.3.0/thread:215
#13 0x000000000b708910 in std::execute_native_thread_routine (__p=0x6030013db4e0) at ../../../.././libstdc++-v3/src/c++11/thread.cc:80
#14 0x00007f73d0b07ea5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f73cfca28dd in clone () from /lib64/libc.so.6

and one thread to report

#0  0x00007f73d0b0e54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f73d0b09e9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f73d0b09d68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00000000041878a8 in __gthread_mutex_lock (__mutex=0xffa5540 <starrocks::TaskWorkerPool::_s_task_signatures_lock>) at /usr/include/c++/10.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:749
#4  0x000000000418809c in std::mutex::lock (this=0xffa5540 <starrocks::TaskWorkerPool::_s_task_signatures_lock>) at /usr/include/c++/10.3.0/bits/std_mutex.h:100
#5  0x000000000418f8f9 in std::lock_guard<std::mutex>::lock_guard (this=0x7f72fb5d4e10, __m=...) at /usr/include/c++/10.3.0/bits/std_mutex.h:159
#6  0x0000000005fd824b in starrocks::TaskWorkerPool::_report_task_worker_thread_callback (arg_this=0x611000250fc0) at /root/starrocks/be/src/agent/task_worker_pool.cpp:1263
#7  0x0000000005ff43f6 in std::__invoke_impl<void*, void* (*)(void*), starrocks::TaskWorkerPool*> (__f=@0x6030013d9ba0: 0x5fd80e8 <starrocks::TaskWorkerPool::_report_task_worker_thread_callback(void*)>) at /usr/include/c++/10.3.0/bits/invoke.h:60
#8  0x0000000005ff431e in std::__invoke<void* (*)(void*), starrocks::TaskWorkerPool*> (__fn=@0x6030013d9ba0: 0x5fd80e8 <starrocks::TaskWorkerPool::_report_task_worker_thread_callback(void*)>) at /usr/include/c++/10.3.0/bits/invoke.h:95
#9  0x0000000005ff428f in std::thread::_Invoker<std::tuple<void* (*)(void*), starrocks::TaskWorkerPool*> >::_M_invoke<0ul, 1ul> (this=0x6030013d9b98) at /usr/include/c++/10.3.0/thread:264
#10 0x0000000005ff424a in std::thread::_Invoker<std::tuple<void* (*)(void*), starrocks::TaskWorkerPool*> >::operator() (this=0x6030013d9b98) at /usr/include/c++/10.3.0/thread:271
#11 0x0000000005ff4134 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void* (*)(void*), starrocks::TaskWorkerPool*> > >::_M_run (this=0x6030013d9b90) at /usr/include/c++/10.3.0/thread:215
#12 0x000000000b708910 in std::execute_native_thread_routine (__p=0x6030013d9b90) at ../../../.././libstdc++-v3/src/c++11/thread.cc:80
#13 0x00007f73d0b07ea5 in start_thread () from /lib64/libpthr

@Astralidea Astralidea changed the title Optimize task_worker_pool lock granularity. Optimize task_worker_pool lock granularity Feb 21, 2022
@@ -146,7 +146,7 @@ class TaskWorkerPool {
static FrontendServiceClientCache _master_service_client_cache;
static std::atomic_ulong _s_report_version;

static std::mutex _s_task_signatures_lock;
static std::mutex _s_task_signatures_lock[TTaskType::type::TASK_TYPE_COUNT];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should manage and define a mutex for current active task types. TASK_TYPE_COUNT include may deprecated type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only cause create more unused object, but I am not find an easy way to solve this problem.
I'm more afraid that this will be ignored when a new type is added later.

be/src/agent/task_worker_pool.cpp Outdated Show resolved Hide resolved
@chaoyli chaoyli changed the title Optimize task_worker_pool lock granularity Reduce the lock granularity of TaskWorkerPool Feb 21, 2022
chaoyli
chaoyli previously approved these changes Feb 21, 2022
gensrc/thrift/Types.thrift Outdated Show resolved Hide resolved
Copy link
Contributor

@sduzh sduzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is not thread-safe and the change in _report_task_worker_thread_callback is incorrect.

sduzh
sduzh previously approved these changes Feb 23, 2022
@chaoyli chaoyli merged commit a6748a6 into StarRocks:main Feb 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants