Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition some pools by core #40476

Closed
wants to merge 1 commit into from
Closed

Partition some pools by core #40476

wants to merge 1 commit into from

Conversation

sebastienros
Copy link
Member

Benchmarks to follow

@BrennanConroy
Copy link
Member

Should probably measure the two different pool changes independently and then together.

@sebastienros
Copy link
Member Author

@BrennanConroy The Sockets change is only measurable when the memory block one is fixed, otherwise it's hidden. I can run one benchmark with memory / memory + socket at least to show it matters, but will do add the extended benchmarks only using the two changes, otherwise it's too many combinations.

@BrennanConroy
Copy link
Member

The Sockets change is only measurable when the memory block one is fixed, otherwise it's hidden.

I see, makes sense 👍

@kunalspathak
Copy link
Member

How does the benchmark results look like?

@samsp-msft
Copy link
Member

@sebastienros - should we have the YARP benchmarks running on this hardware?

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love the see the JSON Platform results using these changes (it's the best benchmark for testing max throughput)

@@ -46,6 +46,15 @@ internal sealed class PinnedBlockMemoryPool : MemoryPool<byte>
/// </summary>
private const int AnySize = -1;

public PinnedBlockMemoryPool()
{
_queues = new ConcurrentQueue<MemoryPoolBlock>[Environment.ProcessorCount];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


var partition = Thread.GetCurrentProcessorId() % _queues.Length;

if (_queues[partition].TryDequeue(out var block))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if Thread.GetCurrentProcessorId() changes between Rent and Return operations and the pool gets starved?

should it stop on first failure or try to dequeue from another partition?

https://github.com/dotnet/runtime/blob/07e87bc4cb0358f57e7116e047f7d2017b049cf9/src/libraries/System.Private.CoreLib/src/System/Buffers/TlsOverPerCoreLockedStacksArrayPool.cs#L338

Copy link
Member

@kouvel kouvel May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It likely would be better to assign a pooled item with a partition and return it to the same partition it was taken from. I had seen this issue before and recently again in testing with thread pool changes along with this change, where there is still a fair bit of allocation even with this change. The issue is that currently, some threads rent buffers and in many cases other threads return them, so the threads renting buffers will eventually run out and will have to allocate. Returning the buffers (or pooled items similarly for SocketSenderPool) to the same queue that it came from would maintain some balance between the caches, based on what I've seen it solves the allocation issues. The issue I saw with this change may not exist or be visible currently, but it's looming and I don't think there's much reason to not fix it.

@@ -10,19 +10,28 @@ internal class SocketSenderPool : IDisposable
{
private const int MaxQueueSize = 1024; // REVIEW: Is this good enough?

private readonly ConcurrentQueue<SocketSender> _queue = new();
private readonly ConcurrentQueue<SocketSender>[] _queues;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to just disable the SocketSenders pooling for machines with large amount of cpu cores?

@kouvel
Copy link
Member

kouvel commented Jun 7, 2022

@sebastienros, any update on this PR? Would like to have this for .NET 7 for further experiments.

@sebastienros
Copy link
Member Author

Will come back to it very soon, as soon as I am done with output caching (#41037)

@sebastienros
Copy link
Member Author

Replaced by #42237

@sebastienros sebastienros deleted the sebros/queue branch June 17, 2022 22:07
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions Perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants