-
Notifications
You must be signed in to change notification settings - Fork 5.3k
A few local optimizations for ThreadPool #122333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @mangod9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements local optimizations to the ThreadPool implementation to improve performance while maintaining the same algorithm. The changes focus on reducing memory write operations, relaxing fence strengths, and refining the worker thread request mechanism to mitigate the Thundering Herd problem.
Key Changes:
- Replaced multiple-worker request system with single outstanding request flag to reduce contention
- Optimized memory barriers by replacing Volatile.Write with regular writes where safe
- Moved thread request acknowledgment earlier in worker lifecycle to improve responsiveness
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| WindowsThreadPool.cs | Moved CacheLineSeparated struct here and added EnsureWorkerRequested wrapper method |
| ThreadPoolWorkQueue.cs | Removed local CacheLineSeparated struct, optimized LocalPush writes, simplified worker request logic |
| ThreadPool.Windows.cs | Updated YieldFromDispatchLoop signature and renamed RequestWorkerThread to EnsureWorkerRequested |
| ThreadPool.Wasi.cs | Updated method signatures to match new API |
| ThreadPool.Unix.cs | Updated method signatures and added NotifyDispatchProgress call |
| ThreadPool.Browser.cs | Updated YieldFromDispatchLoop signature |
| ThreadPool.Browser.Threads.cs | Updated YieldFromDispatchLoop signature |
| PortableThreadPool.cs | Changed request tracking from counter to boolean flag, simplified worker request logic |
| PortableThreadPool.WorkerThread.cs | Simplified worker loop to use flag instead of counter, improved comments |
| PortableThreadPool.ThreadCounts.cs | Added DecrementProcessingWork helper method |
| PortableThreadPool.GateThread.cs | Updated to use new flag field name |
| PortableThreadPool.Blocking.cs | Updated to use new flag field name |
src/libraries/System.Private.CoreLib/src/System/Threading/ThreadPool.Windows.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/WindowsThreadPool.cs
Show resolved
Hide resolved
|
Found one TE scenario that is badly regressed by this change, for a wrong reason. Since we seem to rely on spurious wakes of workers, faster Dispatch ends up with a lot more spurious wakes, which consume all the freed CPU cycles and more. Now we really need to look at getting rid of spurious wakes. (I think the small subset of spurious wakes is inevitable, while the rest is a waste) But I am afraid the spurious wakes are masking dispatching latency that is there due to expensive queue sweeps, so likely the next+1 step will be reducing the cost of the queue sweep... Looks like improving threadpool can't be done incrementally. Not in small steps anyways. |
A follow up to #121887
The PR is mostly a collection of small opportunities which would not be good candidates to include in backportable fix.
A few changes that relax fence strengths, reduce number of memory writes per workitem, etc...
Hopefully this will negate some of the impact from #121887
Overall, the algorithm did not change, so we should not see drastic changes.
We should look at more impactful changes separately.