-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Fix heap-use-after-free of ThreadPool bvar #42351
Conversation
be/src/util/threadpool.cpp
Outdated
_total_executed_tasks << 1; | ||
_total_pending_time_ns << start_time.GetDeltaSince(task.submit_time).ToNanoseconds(); | ||
_total_execute_time_ns << finish_time.GetDeltaSince(start_time).ToNanoseconds(); | ||
_total_executed_tasks.fetch_add(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_total_executed_tasks.fetch_add(1); | |
_total_executed_tasks.fetch_add(1, std::memory_order_relaxed); |
be/src/util/threadpool.cpp
Outdated
_total_pending_time_ns << start_time.GetDeltaSince(task.submit_time).ToNanoseconds(); | ||
_total_execute_time_ns << finish_time.GetDeltaSince(start_time).ToNanoseconds(); | ||
_total_executed_tasks.fetch_add(1); | ||
_total_pending_time_ns.fetch_add(start_time.GetDeltaSince(task.submit_time).ToNanoseconds()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
be/src/util/threadpool.cpp
Outdated
_total_execute_time_ns << finish_time.GetDeltaSince(start_time).ToNanoseconds(); | ||
_total_executed_tasks.fetch_add(1); | ||
_total_pending_time_ns.fetch_add(start_time.GetDeltaSince(task.submit_time).ToNanoseconds()); | ||
_total_execute_time_ns.fetch_add(finish_time.GetDeltaSince(start_time).ToNanoseconds()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
be/src/util/threadpool.h
Outdated
@@ -249,14 +248,11 @@ class ThreadPool { | |||
|
|||
int max_threads() const { return _max_threads.load(std::memory_order_acquire); } | |||
|
|||
// Use bvar as the counter, and should not be called frequently. | |||
int64_t total_executed_tasks() const { return _total_executed_tasks.get_value(); } | |||
int64_t total_executed_tasks() const { return _total_executed_tasks; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64_t total_executed_tasks() const { return _total_executed_tasks; } | |
int64_t total_executed_tasks() const { return _total_executed_tasks.load(std::memory_order_relaxed); } |
be/src/util/threadpool.h
Outdated
|
||
// Use bvar as the counter, and should not be called frequently. | ||
int64_t total_pending_time_ns() const { return _total_pending_time_ns.get_value(); } | ||
int64_t total_pending_time_ns() const { return _total_pending_time_ns; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
be/src/util/threadpool.h
Outdated
|
||
// Use bvar as the counter, and should not be called frequently. | ||
int64_t total_execute_time_ns() const { return _total_execute_time_ns.get_value(); } | ||
int64_t total_execute_time_ns() const { return _total_execute_time_ns; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
be/src/util/threadpool.h
Outdated
@@ -249,14 +248,11 @@ class ThreadPool { | |||
|
|||
int max_threads() const { return _max_threads.load(std::memory_order_acquire); } | |||
|
|||
// Use bvar as the counter, and should not be called frequently. | |||
int64_t total_executed_tasks() const { return _total_executed_tasks.get_value(); } | |||
int64_t total_executed_tasks() const { return _total_executed_tasks.load(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64_t total_executed_tasks() const { return _total_executed_tasks.load(); } | |
int64_t total_executed_tasks() const { return _total_executed_tasks.load(std::memory_order_relaxed); } |
be/src/util/threadpool.h
Outdated
|
||
// Use bvar as the counter, and should not be called frequently. | ||
int64_t total_pending_time_ns() const { return _total_pending_time_ns.get_value(); } | ||
int64_t total_pending_time_ns() const { return _total_pending_time_ns.load(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
be/src/util/threadpool.h
Outdated
|
||
// Use bvar as the counter, and should not be called frequently. | ||
int64_t total_execute_time_ns() const { return _total_execute_time_ns.get_value(); } | ||
int64_t total_execute_time_ns() const { return _total_execute_time_ns.load(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Is the performance impact of using std::atomic acceptable? Has there been any comparison done with CoreLocalCounter? |
a26ac03
to
20c300b
Compare
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
@sduzh Thanks for your advice. I add
As a result, I think using
|
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 6 / 6 (100.00%) file detail
|
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
Signed-off-by: PengFei Li <[email protected]>
) (#42682) Signed-off-by: PengFei Li <[email protected]>
) (#42683) Signed-off-by: PengFei Li <[email protected]>
Why I'm doing:
Fix #42319.
The problem is introduced by #40171. The pr adds bvar members in
ThreadPool
, and each thread will update the bvars. bvar uses thread-local to improve performance, and will do some work (delete_thread_exit_helper
) in__nptl_deallocate_tsd
before the pthread exits. It will access bvar members here.But
ThreadPool
does not join the thread before destructing the pool itself, so it's possible that the bvar members are destroyed beforedelete_thread_exit_helper
finishes which will lead to heap-use-after-free.What I'm doing:
There are two ways to fix the problem
ThreadPool::dispatch_thread
The 1 is more elegant, but more complicated because the threadpool is dynamic, and need a way to join threads (only join the threads in the destructor is not enough). Currently use 2 to fix the problem. According to the test, updating bvar takes about 10ns constantly as the number of threads increase, and CoreLocalCounter takes about 20ns constantly. I think it's acceptable to use CoreLocalCounter in this scenario.
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: