Cache wasm stacks in thread-local storage to avoid a mutex. #3922

rlane · 2023-05-26T22:43:10Z

I noticed that when running many concurrent WASM function calls on a 48-core VM the mutex protecting the stack freelist was contended. Replacing it with thread-local storage reduced CPU usage by 75% in a benchmark (20% on my 14-core laptop) and didn't affect single-threaded performance.

syrusakbary

This PR looks great to me. Waiting for @ptitSeb or @theduke feedback as well

ptitSeb

It looks good to me. Thread local instead of a mutex, why not!

rlane · 2023-06-07T19:55:55Z

Anything I can help with to get this merged?

theduke · 2023-06-07T20:27:01Z

@rlane we haven't forgotten about this, we've just been discussing internally if this is the right way to go for all situations, or if we want an additional layered cache tier , with a thread local as the first layer and an additional shared cache.

Or if we need a feature flag to toggle this, because there are scenarios were keeping a stack around in each thread could be suboptimal for memory consumption, like in a scenario with a large thread pool where threads only run occasionally.

We are a bit swamped this week, we'll get back to this next week at the latest.

Would be we be fine with just putting the thread local behind a feature flag to punt on the larger discussion for now, @syrusakbary / @ptitSeb ?

Arshia001 · 2023-09-01T15:24:54Z

@rlane #4196 was just merged with an alternate implementation. Closing this issue.

rlane · 2023-10-30T03:09:58Z

The crossbeam alternative has some performance issues on large machines (224 cores). Here's what I see in perf top:

  61.24%  tournament         [.] crossbeam_queue::seg_queue::SegQueue<T>::push
   9.50%  tournament         [.] crossbeam_queue::seg_queue::SegQueue<T>::pop
... rest of the profile

Cache wasm stacks in thread-local storage to avoid a mutex.

64ba68e

rlane requested a review from syrusakbary as a code owner May 26, 2023 22:43

syrusakbary reviewed May 29, 2023

View reviewed changes

ptitSeb approved these changes May 29, 2023

View reviewed changes

ptitSeb and others added 4 commits May 30, 2023 08:32

Merge branch 'master' into tls-stack

82626fb

Merge branch 'master' into tls-stack

a5a3d3d

Merge branch 'master' into tls-stack

e12819d

Merge branch 'master' into tls-stack

c77ec77

Arshia001 mentioned this pull request Sep 1, 2023

Replace stack pool mutex with lock-free queue #4196

Merged

Arshia001 closed this Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache wasm stacks in thread-local storage to avoid a mutex. #3922

Cache wasm stacks in thread-local storage to avoid a mutex. #3922

rlane commented May 26, 2023

syrusakbary left a comment

ptitSeb left a comment

rlane commented Jun 7, 2023

theduke commented Jun 7, 2023

Arshia001 commented Sep 1, 2023

rlane commented Oct 30, 2023

Cache wasm stacks in thread-local storage to avoid a mutex. #3922

Cache wasm stacks in thread-local storage to avoid a mutex. #3922

Conversation

rlane commented May 26, 2023

syrusakbary left a comment

Choose a reason for hiding this comment

ptitSeb left a comment

Choose a reason for hiding this comment

rlane commented Jun 7, 2023

theduke commented Jun 7, 2023

Arshia001 commented Sep 1, 2023

rlane commented Oct 30, 2023