Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix heap-use-after-free of ThreadPool bvar #42351

Merged
merged 4 commits into from
Mar 12, 2024

Conversation

banmoy
Copy link
Contributor

@banmoy banmoy commented Mar 8, 2024

Why I'm doing:

Fix #42319.
The problem is introduced by #40171. The pr adds bvar members in ThreadPool, and each thread will update the bvars. bvar uses thread-local to improve performance, and will do some work (delete_thread_exit_helper) in __nptl_deallocate_tsd before the pthread exits. It will access bvar members here.

      #5 0x2075252f in bvar::detail::AgentGroup<bvar::detail::AgentCombiner<long, long, bvar::detail::AddTo<long> >::Agent>::_destroy_tls_blocks() (/root/starrocks/be/ut_build_ASAN/test/starrocks_test+0x2075252f)
      #6 0x25a5c0a2 in butil::detail::delete_thread_exit_helper(void*) (/root/starrocks/be/ut_build_ASAN/test/starrocks_test+0x25a5c0a2)
      #7 0x7fe5fdb57ca1 in __nptl_deallocate_tsd (/lib64/libpthread.so.0+0x7ca1)
      #8 0x7fe5fdb57eb2 in start_thread (/lib64/libpthread.so.0+0x7eb2)

But ThreadPool does not join the thread before destructing the pool itself, so it's possible that the bvar members are destroyed before delete_thread_exit_helper finishes which will lead to heap-use-after-free.

What I'm doing:

There are two ways to fix the problem

  1. the threadpool joins the thread to ensure the thread totally exits, and then destructs the pool itself
  2. replace bvar with CoreLocalCounter to ensure the thread will not access threadpool members out of ThreadPool::dispatch_thread

The 1 is more elegant, but more complicated because the threadpool is dynamic, and need a way to join threads (only join the threads in the destructor is not enough). Currently use 2 to fix the problem. According to the test, updating bvar takes about 10ns constantly as the number of threads increase, and CoreLocalCounter takes about 20ns constantly. I think it's acceptable to use CoreLocalCounter in this scenario.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@mergify mergify bot assigned banmoy Mar 8, 2024
@banmoy banmoy marked this pull request as ready for review March 8, 2024 15:13
@banmoy banmoy requested a review from a team as a code owner March 8, 2024 15:13
@banmoy banmoy changed the title [BugFix] Fix bvar heap-use-after-free in ThreadPool [BugFix] Fix heap-use-after-free of ThreadPool bvar Mar 8, 2024
_total_executed_tasks << 1;
_total_pending_time_ns << start_time.GetDeltaSince(task.submit_time).ToNanoseconds();
_total_execute_time_ns << finish_time.GetDeltaSince(start_time).ToNanoseconds();
_total_executed_tasks.fetch_add(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_total_executed_tasks.fetch_add(1);
_total_executed_tasks.fetch_add(1, std::memory_order_relaxed);

_total_pending_time_ns << start_time.GetDeltaSince(task.submit_time).ToNanoseconds();
_total_execute_time_ns << finish_time.GetDeltaSince(start_time).ToNanoseconds();
_total_executed_tasks.fetch_add(1);
_total_pending_time_ns.fetch_add(start_time.GetDeltaSince(task.submit_time).ToNanoseconds());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

_total_execute_time_ns << finish_time.GetDeltaSince(start_time).ToNanoseconds();
_total_executed_tasks.fetch_add(1);
_total_pending_time_ns.fetch_add(start_time.GetDeltaSince(task.submit_time).ToNanoseconds());
_total_execute_time_ns.fetch_add(finish_time.GetDeltaSince(start_time).ToNanoseconds());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -249,14 +248,11 @@ class ThreadPool {

int max_threads() const { return _max_threads.load(std::memory_order_acquire); }

// Use bvar as the counter, and should not be called frequently.
int64_t total_executed_tasks() const { return _total_executed_tasks.get_value(); }
int64_t total_executed_tasks() const { return _total_executed_tasks; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int64_t total_executed_tasks() const { return _total_executed_tasks; }
int64_t total_executed_tasks() const { return _total_executed_tasks.load(std::memory_order_relaxed); }


// Use bvar as the counter, and should not be called frequently.
int64_t total_pending_time_ns() const { return _total_pending_time_ns.get_value(); }
int64_t total_pending_time_ns() const { return _total_pending_time_ns; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


// Use bvar as the counter, and should not be called frequently.
int64_t total_execute_time_ns() const { return _total_execute_time_ns.get_value(); }
int64_t total_execute_time_ns() const { return _total_execute_time_ns; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -249,14 +248,11 @@ class ThreadPool {

int max_threads() const { return _max_threads.load(std::memory_order_acquire); }

// Use bvar as the counter, and should not be called frequently.
int64_t total_executed_tasks() const { return _total_executed_tasks.get_value(); }
int64_t total_executed_tasks() const { return _total_executed_tasks.load(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int64_t total_executed_tasks() const { return _total_executed_tasks.load(); }
int64_t total_executed_tasks() const { return _total_executed_tasks.load(std::memory_order_relaxed); }


// Use bvar as the counter, and should not be called frequently.
int64_t total_pending_time_ns() const { return _total_pending_time_ns.get_value(); }
int64_t total_pending_time_ns() const { return _total_pending_time_ns.load(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto


// Use bvar as the counter, and should not be called frequently.
int64_t total_execute_time_ns() const { return _total_execute_time_ns.get_value(); }
int64_t total_execute_time_ns() const { return _total_execute_time_ns.load(); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@sduzh
Copy link
Contributor

sduzh commented Mar 11, 2024

Is the performance impact of using std::atomic acceptable? Has there been any comparison done with CoreLocalCounter?

@banmoy banmoy marked this pull request as draft March 11, 2024 09:58
@banmoy banmoy force-pushed the fix_asan branch 2 times, most recently from a26ac03 to 20c300b Compare March 11, 2024 15:27
@banmoy
Copy link
Contributor Author

banmoy commented Mar 11, 2024

@sduzh Thanks for your advice. I add CoreLocalCounterTest.test_perf to compare the update performance among bvar, CoreLocalCounter and atomic with different threads. The results are as following

  • the performance of both bvar and CoreLocalCounter remains low as the number of threads increases, but the performance of atomic decreases as the number of threads increases
  • the cost of bvar is about 10ns, and the cost of CoreLocalCounter is about 20ns

As a result, I think using CoreLocalCounter to replace bvar is acceptable although it has ~10ns overhead than bvar

#thread 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
bvar 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 10 10
CoreLocalCounter 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
atomic 6 11 34 54 68 81 90 109 113 149 153 173 181 201 211 250 234 255 270 291 291 313 369 335

@banmoy banmoy marked this pull request as ready for review March 11, 2024 16:30
Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@sduzh sduzh requested review from kevincai and imay March 12, 2024 03:05
@sduzh sduzh enabled auto-merge (squash) March 12, 2024 03:05
Copy link

[BE Incremental Coverage Report]

pass : 6 / 6 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/util/threadpool.h 3 3 100.00% []
🔵 be/src/util/threadpool.cpp 3 3 100.00% []

@sduzh sduzh merged commit edb5a63 into StarRocks:main Mar 12, 2024
47 checks passed
banmoy added a commit to banmoy/starrocks that referenced this pull request Mar 15, 2024
banmoy added a commit to banmoy/starrocks that referenced this pull request Mar 15, 2024
banmoy added a commit to banmoy/starrocks that referenced this pull request Mar 15, 2024
banmoy added a commit to banmoy/starrocks that referenced this pull request Mar 15, 2024
banmoy added a commit to banmoy/starrocks that referenced this pull request Mar 15, 2024
banmoy added a commit to banmoy/starrocks that referenced this pull request Mar 15, 2024
banmoy added a commit to banmoy/starrocks that referenced this pull request Mar 15, 2024
banmoy added a commit to banmoy/starrocks that referenced this pull request Mar 15, 2024
wyb pushed a commit that referenced this pull request Mar 15, 2024
wyb pushed a commit that referenced this pull request Mar 15, 2024
@before-Sunrise before-Sunrise mentioned this pull request Apr 16, 2024
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

heap-use-after-free
3 participants