Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use ArrayPool instead of having a private buffer cache #45690

Merged
merged 18 commits into from
Aug 19, 2021

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Dec 7, 2020

Contributes to #45315

using System.Threading.Tasks;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Configuration;

namespace Template
{
    public class Startup
    {
        public Startup(IConfiguration configuration) => Configuration = configuration;

        public IConfiguration Configuration { get; }

        // This method gets called by the runtime. Use this method to configure the HTTP request pipeline.
        public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
        {
            app.UseRouting();

            app.UseEndpoints(routeBuilder =>
            {
                routeBuilder.Map("GetProcesses", context =>
                {
                    foreach (var process in System.Diagnostics.Process.GetProcesses())
                    {
                        process.Dispose();
                    }

                    return Task.CompletedTask;
                });
            });
        }
    }
}

Citrine (28 cores):

load before after
CPU Usage (%) 3 4
Cores usage (%) 85 102
Working Set (MB) 48 48
Build Time (ms) 5,057 4,123
Start Time (ms) 0 0
Published Size (KB) 76,401 76,401
.NET Core SDK Version 3.1.404 3.1.404
First Request (ms) 101 106
Requests/sec 14,392 16,706
Requests 216,986 251,968
Mean latency (ms) 35.53 30.61
Max latency (ms) 301.49 310.73
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1.26 1.47
Latency 50th (ms) 31.94 29.67
Latency 75th (ms) 40.05 31.19
Latency 90th (ms) 49.31 36.68
Latency 99th (ms) 69.83 48.34

Perf (12 cores):

load before after
CPU Usage (%) 10 12
Cores usage (%) 118 142
Working Set (MB) 49 49
Build Time (ms) 5,877 5,898
Start Time (ms) 0 0
Published Size (KB) 76,401 76,401
First Request (ms) 103 106
Requests/sec 13,499 17,852
Requests 203,683 269,349
Mean latency (ms) 38.06 28.65
Max latency (ms) 240.17 168.62
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1.18 1.57
Latency 50th (ms) 38.57 26.50
Latency 75th (ms) 46.44 36.76
Latency 90th (ms) 57.80 40.75
Latency 99th (ms) 84.34 57.51

No difference for micro-benchmarks (this was expected):

|             Method            Toolchain |     Mean | Ratio |   Gen 0 |   Gen 1 | Gen 2 | Allocated |
|------------------- -------------------- |---------:|------:|--------:|--------:|------:|----------:|
|       GetProcesses   \after\CoreRun.exe | 4.948 ms |  1.00 | 78.4314 | 19.6078 |     - |    621 KB |
|       GetProcesses  \before\CoreRun.exe | 4.916 ms |  1.00 | 80.0000 | 20.0000 |     - |    619 KB |
|                                         |          |       |         |         |       |           |
| GetProcessesByName   \after\CoreRun.exe | 4.912 ms |  1.00 | 80.0000 | 20.0000 |     - |    619 KB |
| GetProcessesByName  \before\CoreRun.exe | 4.903 ms |  1.00 | 80.0000 | 20.0000 |     - |    619 KB |

@ghost
Copy link

ghost commented Dec 7, 2020

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #45315

Author: adamsitnik
Assignees: -
Labels:

area-System.Diagnostics.Process, tenet-performance

Milestone: 6.0.0

@adamsitnik adamsitnik closed this Mar 25, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 24, 2021
@jkotas
Copy link
Member

jkotas commented Aug 13, 2021

@adamsitnik Now that ArrayPool was fixed to pool unlimited size arrays, I think this PR can be resurrect and updated to use ArrayPool unconditionally.

@adamsitnik adamsitnik reopened this Aug 16, 2021
@adamsitnik adamsitnik requested a review from jkotas August 16, 2021 09:19
@adamsitnik
Copy link
Member Author

Now that ArrayPool was fixed to pool unlimited size arrays, I think this PR can be resurrect and updated to use ArrayPool unconditionally.

@jkotas done, PTAL

@adamsitnik adamsitnik modified the milestones: 6.0.0, 7.0.0 Aug 16, 2021
@adamsitnik adamsitnik force-pushed the NtProcessInfoHelperRemoveCache branch from 5e56129 to fc4378f Compare August 17, 2021 18:23
@adamsitnik adamsitnik requested a review from jkotas August 17, 2021 18:27
@jkotas
Copy link
Member

jkotas commented Aug 17, 2021

It looks good to me. I think it would be useful to verify that there is advantage in using the array pool instead of a simpler NativeMemory.Alloc in this situation - #45690 (comment) .

@adamsitnik
Copy link
Member Author

I think it would be useful to verify that there is advantage in using the array pool instead of a simpler NativeMemory.Alloc

I did some measurements using modified benchmarks from performance repo:

[Benchmark]
public void GetProcessesByName()
{
    foreach (var process in Process.GetProcessesByName(_nonExistingName))
    {
        process.Dispose();
    }
}


[Benchmark(OperationsPerInvoke = 10 * 24)]
public void GetProcesses_Parallel()
{
    Parallel.For(0, 24, _ => // my PC has 24 cores
    {
        for (int i = 0; i < 10; i++)
        {
            foreach (var process in Process.GetProcesses())
            {
                process.Dispose();
            }
        }
    });
}

[Benchmark(OperationsPerInvoke = 10 * 24)]
public void GetProcessesByName_Parallel()
{
    Parallel.For(0, 24, _ =>
    {
        for (int i = 0; i < 10; i++)
        {
            foreach (var process in Process.GetProcessesByName(_nonExistingName))
            {
                process.Dispose();
            }
        }
    });
}

And the results were following:

BenchmarkDotNet=v0.13.0.1559-nightly, OS=Windows 10.0.19043.1165 (21H1/May2021Update)                                                                                                                    
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores                                                                                                                      
.NET SDK=6.0.100-rc.1.21417.19                                                                                                                                                                                        
Method Toolchain Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
GetProcesses \alignedAlloc\corerun.exe 4.791 ms 1.05 - - - 477 KB
GetProcesses \arrayPool\corerun.exe 4.508 ms 1.00 55.5556 18.5185 - 475 KB
GetProcessesByName \alignedAlloc\corerun.exe 4.658 ms 1.05 46.8750 15.6250 - 475 KB
GetProcessesByName \arrayPool\corerun.exe 4.456 ms 1.00 46.8750 15.6250 - 473 KB
GetProcesses_Parallel \alignedAlloc\corerun.exe 1.154 ms 0.66 58.3333 16.6667 4.1667 476 KB
GetProcesses_Parallel \arrayPool\corerun.exe 1.763 ms 1.00 62.5000 20.8333 4.1667 484 KB
GetProcessesByName_Parallel \alignedAlloc\corerun.exe 1.098 ms 0.60 58.3333 16.6667 4.1667 475 KB
GetProcessesByName_Parallel \arrayPool\corerun.exe 1.843 ms 1.00 62.5000 16.6667 4.1667 484 KB

AlignedAlloc seems to perform worse for single-threaded (5%), but better for parallel usage (30-40%).
With AlignedAlloc the code is also simpler.

@stephentoub @jkotas I don't have a strong opinion here. What are your thoughts on this?

@jkotas
Copy link
Member

jkotas commented Aug 18, 2021

AlignedAlloc

Nit: It can be just NativeLibrary.Alloc. It guarantees sufficient alignment. AlignedAlloc is unnecessary.

What are your thoughts on this?

I would lean towards using NativeLibrary.Alloc. I think it will give lower high-memory watermark for some common usage patterns of this API. I think minimizing working set is more important than saving cycles for this API.

I do not have a strong opinion on this either. Thank you for collecting the numbers!

@adamsitnik
Copy link
Member Author

NativeLibrary.Alloc

I've switched to NativeLibrary.Alloc and ensured that perf characteristics don't look worse than AlignedAlloc.

Method Toolchain Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
GetProcesses \alignedAlloc\corerun.exe 2,991.6 us 1.08 - - - 405 KB
GetProcesses \alloc\corerun.exe 2,903.3 us 1.04 - - - 397 KB
GetProcesses \arrayPool\corerun.exe 2,776.7 us 1.00 47.0588 11.7647 - 395 KB
GetProcessesByName \alignedAlloc\corerun.exe 2,933.2 us 1.06 41.6667 10.4167 - 403 KB
GetProcessesByName \alloc\corerun.exe 2,805.3 us 1.01 37.5000 12.5000 - 395 KB
GetProcessesByName \arrayPool\corerun.exe 2,776.5 us 1.00 41.6667 10.4167 - 394 KB
GetProcesses_Parallel \alignedAlloc\corerun.exe 1,049.0 us 0.70 54.1667 12.5000 4.1667 404 KB
GetProcesses_Parallel \alloc\corerun.exe 963.5 us 0.64 45.8333 12.5000 4.1667 397 KB
GetProcesses_Parallel \arrayPool\corerun.exe 1,521.2 us 1.00 50.0000 12.5000 4.1667 395 KB
GetProcessesByName_Parallel \alignedAlloc\corerun.exe 982.3 us 0.64 50.0000 8.3333 4.1667 402 KB
GetProcessesByName_Parallel \alloc\corerun.exe 1,035.7 us 0.68 54.1667 20.8333 4.1667 400 KB
GetProcessesByName_Parallel \arrayPool\corerun.exe 1,544.2 us 1.00 50.0000 12.5000 4.1667 393 KB

@jkotas thank you for a lot of good hints and great discussion!

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@adamsitnik adamsitnik merged commit 17e92db into dotnet:main Aug 19, 2021
@adamsitnik adamsitnik deleted the NtProcessInfoHelperRemoveCache branch August 19, 2021 17:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants