-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache wasm stacks in thread-local storage to avoid a mutex. #3922
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. Thread local instead of a mutex, why not!
Anything I can help with to get this merged? |
@rlane we haven't forgotten about this, we've just been discussing internally if this is the right way to go for all situations, or if we want an additional layered cache tier , with a thread local as the first layer and an additional shared cache. Or if we need a feature flag to toggle this, because there are scenarios were keeping a stack around in each thread could be suboptimal for memory consumption, like in a scenario with a large thread pool where threads only run occasionally. We are a bit swamped this week, we'll get back to this next week at the latest. Would be we be fine with just putting the thread local behind a feature flag to punt on the larger discussion for now, @syrusakbary / @ptitSeb ? |
The crossbeam alternative has some performance issues on large machines (224 cores). Here's what I see in
|
I noticed that when running many concurrent WASM function calls on a 48-core VM the mutex protecting the stack freelist was contended. Replacing it with thread-local storage reduced CPU usage by 75% in a benchmark (20% on my 14-core laptop) and didn't affect single-threaded performance.