-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partition some pools by core #40476
Partition some pools by core #40476
Conversation
Should probably measure the two different pool changes independently and then together. |
@BrennanConroy The Sockets change is only measurable when the memory block one is fixed, otherwise it's hidden. I can run one benchmark with memory / memory + socket at least to show it matters, but will do add the extended benchmarks only using the two changes, otherwise it's too many combinations. |
I see, makes sense 👍 |
How does the benchmark results look like? |
@sebastienros - should we have the YARP benchmarks running on this hardware? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love the see the JSON Platform results using these changes (it's the best benchmark for testing max throughput)
@@ -46,6 +46,15 @@ internal sealed class PinnedBlockMemoryPool : MemoryPool<byte> | |||
/// </summary> | |||
private const int AnySize = -1; | |||
|
|||
public PinnedBlockMemoryPool() | |||
{ | |||
_queues = new ConcurrentQueue<MemoryPoolBlock>[Environment.ProcessorCount]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, to https://github.com/dotnet/runtime/blob/07e87bc4cb0358f57e7116e047f7d2017b049cf9/src/libraries/System.Private.CoreLib/src/System/Buffers/TlsOverPerCoreLockedStacksArrayPool.cs#L301 I think it would be good to have some upper limit
|
||
var partition = Thread.GetCurrentProcessorId() % _queues.Length; | ||
|
||
if (_queues[partition].TryDequeue(out var block)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if Thread.GetCurrentProcessorId()
changes between Rent
and Return
operations and the pool gets starved?
should it stop on first failure or try to dequeue from another partition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It likely would be better to assign a pooled item with a partition and return it to the same partition it was taken from. I had seen this issue before and recently again in testing with thread pool changes along with this change, where there is still a fair bit of allocation even with this change. The issue is that currently, some threads rent buffers and in many cases other threads return them, so the threads renting buffers will eventually run out and will have to allocate. Returning the buffers (or pooled items similarly for SocketSenderPool
) to the same queue that it came from would maintain some balance between the caches, based on what I've seen it solves the allocation issues. The issue I saw with this change may not exist or be visible currently, but it's looming and I don't think there's much reason to not fix it.
@@ -10,19 +10,28 @@ internal class SocketSenderPool : IDisposable | |||
{ | |||
private const int MaxQueueSize = 1024; // REVIEW: Is this good enough? | |||
|
|||
private readonly ConcurrentQueue<SocketSender> _queue = new(); | |||
private readonly ConcurrentQueue<SocketSender>[] _queues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to just disable the SocketSenders
pooling for machines with large amount of cpu cores?
@sebastienros, any update on this PR? Would like to have this for .NET 7 for further experiments. |
Will come back to it very soon, as soon as I am done with output caching (#41037) |
Replaced by #42237 |
Benchmarks to follow