Skip to content

Conversation

@VSadov
Copy link
Member

@VSadov VSadov commented Dec 9, 2025

A follow up to #121887

The PR is mostly a collection of small opportunities which would not be good candidates to include in backportable fix.

A few changes that relax fence strengths, reduce number of memory writes per workitem, etc...
Hopefully this will negate some of the impact from #121887

Overall, the algorithm did not change, so we should not see drastic changes.
We should look at more impactful changes separately.

Copilot AI review requested due to automatic review settings December 9, 2025 04:21
@VSadov VSadov requested a review from stephentoub December 9, 2025 04:21
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements local optimizations to the ThreadPool implementation to improve performance while maintaining the same algorithm. The changes focus on reducing memory write operations, relaxing fence strengths, and refining the worker thread request mechanism to mitigate the Thundering Herd problem.

Key Changes:

  • Replaced multiple-worker request system with single outstanding request flag to reduce contention
  • Optimized memory barriers by replacing Volatile.Write with regular writes where safe
  • Moved thread request acknowledgment earlier in worker lifecycle to improve responsiveness

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
WindowsThreadPool.cs Moved CacheLineSeparated struct here and added EnsureWorkerRequested wrapper method
ThreadPoolWorkQueue.cs Removed local CacheLineSeparated struct, optimized LocalPush writes, simplified worker request logic
ThreadPool.Windows.cs Updated YieldFromDispatchLoop signature and renamed RequestWorkerThread to EnsureWorkerRequested
ThreadPool.Wasi.cs Updated method signatures to match new API
ThreadPool.Unix.cs Updated method signatures and added NotifyDispatchProgress call
ThreadPool.Browser.cs Updated YieldFromDispatchLoop signature
ThreadPool.Browser.Threads.cs Updated YieldFromDispatchLoop signature
PortableThreadPool.cs Changed request tracking from counter to boolean flag, simplified worker request logic
PortableThreadPool.WorkerThread.cs Simplified worker loop to use flag instead of counter, improved comments
PortableThreadPool.ThreadCounts.cs Added DecrementProcessingWork helper method
PortableThreadPool.GateThread.cs Updated to use new flag field name
PortableThreadPool.Blocking.cs Updated to use new flag field name

@VSadov
Copy link
Member Author

VSadov commented Dec 19, 2025

Found one TE scenario that is badly regressed by this change, for a wrong reason. Since we seem to rely on spurious wakes of workers, faster Dispatch ends up with a lot more spurious wakes, which consume all the freed CPU cycles and more.

Now we really need to look at getting rid of spurious wakes. (I think the small subset of spurious wakes is inevitable, while the rest is a waste)

But I am afraid the spurious wakes are masking dispatching latency that is there due to expensive queue sweeps, so likely the next+1 step will be reducing the cost of the queue sweep...

Looks like improving threadpool can't be done incrementally. Not in small steps anyways.

@VSadov VSadov closed this Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants