Add support for Windows IO completions to the portable thread pool #64834

kouvel · 2022-02-04T22:57:11Z

Added an implementation for BindHandle
Polling for IO completions is done in batches on separate threads similarly to what is done on Unixes
Added a high-priority work item queue to have IO completion work items run at higher priority
Removed the time-sensitive work item queue, used the high-priority queue instead

Test fixes on Mono

Fixes FileSystemWatch EnableRaisingEvents not supported on MonoVM #63185
Fixes [mono] Portable threadpool does not support async IO eg ThreadPoolBoundHandle on Windows #34582

ghost · 2022-02-04T22:57:20Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details

Added an implementation for BindHandle
Polling for IO completions is done in batches on separate threads similarly to what is done on Unixes
Added a high-priority work item queue to have IO completion work items run at higher priority
Removed the time-sensitive work item queue, used the high-priority queue instead

Author:	kouvel
Assignees:	kouvel
Labels:	`area-System.Threading`
Milestone:	7.0.0

kouvel · 2022-02-04T23:06:13Z

ASP.NET perf on citrine-win:

	Before	After	Diff
PlaintextPlatform	10,346,490	10,627,688	2.7%
JsonPlatform	840,775	973,058	15.7%
FortunesPlatform	387,290	428,023	10.5%
Plaintext	5,201,727	5,197,202	-0.1%
Json	824,142	919,967	11.6%
Fortunes	277,752	303,691	9.3%

src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/ThreadPool.Windows.cs

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs

marek-safar · 2022-02-07T08:53:19Z

@lateralusX please review

lateralusX · 2022-02-07T10:54:17Z

@kouvel Should we enable the runtime tests currently disabled on Mono as part of this PR. There are a handful of tests currently labeled with #34582 as active issue due to missing features, I guess the change in this PR should resolve that issue. If so, we would get some additional coverage on MonoVM as part of doing this change.

There is also an open issue tied to the same missing functionality, #63185, and that should probably be validate by

runtime/src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.unit.cs

Line 17 in 8f87ac7

public class FileSystemWatcherTests : FileSystemWatcherTest

, if we enable that test on MonoVM.

I will build your branch and validate some of these things manual as part of reviewing the MonoVM side of changes done in this PR.

lateralusX · 2022-02-07T16:47:29Z

Manual verification of #63185 using this PR running on MonoVM+Windows, works.

kouvel · 2022-02-07T19:58:08Z

@lateralusX, there seem to be a number of tests in various places disabled on Mono for the same reason. I have enabled the tests mentioned in #34582 for now, there are likely more to enable and I can try to do so in a separate commit but I'm not sure I'll find them all.

kouvel · 2022-02-07T20:06:55Z

Oh I see that there are a number of other disabled tests referring to #34582, I can start with those

lateralusX · 2022-02-08T11:02:43Z

Running through the rest of the enabled tests locally and all seems to pass, great work!

If we would like to run the Windows library test suite on Mono as part of this PR (for validation), we would need to enable it on PR runs (currently not run on PR's), flip this to get the test suite running on this PR:

runtime/eng/pipelines/runtime.yml

Line 1165 in a24364a

# - windows_x64

There is one test failing in the drawning test suite on Windows, that is currently expected and will be fixed in a different PR where I fix and enable other failing tests on Windows.

src/mono/System.Private.CoreLib/src/System/Threading/ThreadPool.Browser.Mono.cs

src/coreclr/System.Private.CoreLib/src/System/Threading/ThreadPool.CoreCLR.Windows.cs

stephentoub · 2022-02-08T17:57:01Z

at least, I assume that works.

only once #64720 is merged.

But this case is using a ThrowHelper, which we generally haven't replaced.

kouvel · 2022-02-11T21:59:17Z

Rebased to fix conflicts

…ns on Windows - Depends on dotnet/runtime#64834 - Refactored the enumeration of the global queue and reused it to also enumerate the high-priority work item queue - Omitted info that is specific to the native thread pool implementation, when the portable thread pool is being used for IO completions

davidfowl · 2022-02-13T04:48:53Z

This makes me so happy 😄

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs

- Renamed config var for number of poller threads to indicate that they are IO pollers - Removed an unused property - Reordered IO completion callback arguments for consistency with before and other code paths - A `FileSystemWatcher` test failed twice, unable to repro locally, tried offloading background work to a separate thread

This reverts commit 9a73d8e6fbac6d3bb9d70d6aab8b8b0de5b3b6c3.

…ng IO completion callbacks

kouvel · 2022-03-04T18:47:12Z

Looks like the remaining CI failures are unrelated failures also happening elsewhere.

…ns on Windows (#2870) - Depends on dotnet/runtime#64834 - Refactored the enumeration of the global queue and reused it to also enumerate the high-priority work item queue - Omitted info that is specific to the native thread pool implementation, when the portable thread pool is being used for IO completions

They were disabled against dotnet#34582 which was fixed by dotnet#64834

They were disabled against #34582 which was fixed by #64834

mconnew · 2022-08-22T20:19:32Z

I just learnt about this change/PR. On Windows, in the previous IO thread pool implementation, threads which were bound to an IOCP have a different behavior than normal threads. In the Threads and Concurrency section of the win32 documentation it explains about limiting the number of runnable threads, and .NET would limit it to the number of CPU cores. This has the effect that work executing on an IO thread could not be pre-emptively context switched by another one (within the IO thread pool). Posting the work to a regular thread which has higher priority than a regular thread but without this runnable concurrency limit is going to reduce performance in some scenarios. Unless that thread pool is hard limited in the number of threads in it (and not modifiable with ThreadPool.SetMaxThreads), there's going to be more context switching which introduces more overhead. But if you do that and you execute work on one of the threads which blocks, you don't have any extra threads which can be woken up to complete more work so that's not an option (and you deadlock if the blocking thread is waiting on some other work that's waiting).
I suspect this change will see some improvements for asp.net core, but that's because it doesn't take advantage of the IOCP capability. I would expect any library which is purposefully utilizing IO threads to parallelize work to see a regression with the cost of context switching. If a library (such as asp.net core) wishes to offload work to a high priority regular thread pool, it should do so itself. I think it's a bad idea to effectively remove IO threads from the runtime and offload the work to a non IOCP thread pool automatically.
We've been performance testing CoreWCF and comparing Linux with Windows. We have special code to utilize IO threads on Windows an in efficient manner. On Linux, when perf testing with all that code disabled (just let the regular thread pool handle the work) we see measurably worse performance on Linux than Windows. The "known wisdom" is that Linux is faster than Windows for networking, but when correctly utilizing IOCP, Windows can have better throughput than Linux.

kouvel · 2022-08-24T16:28:59Z

@mconnew, fair point. It may be possible to limit the IOCP concurrency for worker threads too on Windows since they also use an IOCP, perhaps an option could be exposed through config, it may be interesting to try although there may be some downsides to using it in some cases.

The IOCP concurrency limit applies only so long as threads don't block for any reason. If for instance there is any lock contention that causes one thread to wait (even in a spin-lock with Sleep(1)), another thread will be released by the IOCP, and after the lock becomes available there would be one extra thread running. So any sort of frequent lock contention or slightly blocking operation could cause more threads than the concurrency limit to be active despite the concurrency limit, until the extra threads wait for more IO completions.

Extra context switching can also occur between IO threads and worker threads when both are being used, which is not uncommon. Reusing one pool for both types of work can reduce the amount of context switching in those typical cases.

There is another difference between Windows and Linux in how IO completions are processed. Before this change on Windows, IO completions are processed in parallel to worker threads. After this change on Windows, IO completions are queued to the worker thread pool at higher priority. On Linux, IO completions are queued as normal work items to the worker thread pool, and a long work queue could artificially delay IO completions from being processed. There are some other differences in how wait completions are processed.

I'd like to understand what kind of effect this change would have in CoreWCF's performance. I'm also curious which differences are causing it to perform slower on Linux. Can you point me to the performance tests and instructions for running them?

davidfowl · 2022-08-24T16:38:39Z

After this change on Windows, IO completions are queued to the worker thread pool at higher priority. On Linux, IO completions are queued as normal work items to the worker thread pool, and a long work queue could artificially delay IO completions from being processed.

@kouvel feels like we should make this change on linux as well. Another benefit to moving the polling logic into the runtime.

kouvel · 2022-08-24T16:48:55Z

@kouvel feels like we should make this change on linux as well. Another benefit to moving the polling logic into the runtime.

I think it would be good to do at least for quasi-correctness, if not for perf. Doing so may need some APIs such as #61273 or similar (or at least an internal way to get to those APIs).

davidfowl · 2022-08-25T23:13:54Z

Should re-open that issue for .NET 8?

kouvel · 2022-08-26T13:39:40Z

Should re-open that issue for .NET 8?

Maybe the first option in #61273 (comment) could be used for starters.

davidfowl · 2022-08-26T16:36:25Z

Option 1b would be to expose ThreadPool.UnsafeQueueHighPriorityWorkItemInternal so the sockets assembly and use that instead. That wouldn't require moving the polling loop, but it doesn't get us any closer to making those new public APIs.

mconnew · 2022-08-26T21:08:28Z

@kouvel, I could be wrong on this, but my understanding was that when a thread goes into an alertable state (eg the sleep(1) you mentioned), it yields the CPU and the kernel will activate a dormant thread. But that thread which went alertable won't get CPU time again until another thread goes alertable and yields up the CPU? Is this not the case? Does it only limit releasing a thread waiting for a completion from the IOCP if there are more than the configured concurrency limit executing?

The model I would like to see is to have an instantiable ThreadPool class. This class would have the option on Windows to create an IOCP for the instance, as well as options to specify OS thread priority for threads created by the pool. It would have a helper method to bind to a Socket which on Windows would either bind the socket to the ThreadPool instances IOCP or bind to the ambient IO thread pool but have the completion be dispatched to the ThreadPool instance (similar to the new IOCP model) if IOCP hasn't been configured. On Linux it would set up a singleton polling loop in the ThreadPool instance and dispatch ready notifications to the ThreadPool work queue.

The ThreadPool would have public helper properties which can return a SynchronizationContext and a TaskScheduler to make it easy to run code in the ThreadPool using methods like TaskFactory.StartNew. It would also have its own QueueUserWorkItem method to easily dispatch delegates to it.

I think this model would satisfy being able to keep the original behavior that WCF/CoreWCF uses to do pseudo cooperative multitasking, make it really easy to have all IO operations initiated and completed in the pool, and it would enable the current new IO thread pool model which improves performance for some scenarios, and it doesn't require compromising.

I think we're getting to the point where having two implicit built-in thread pools with hard coded behavior is becoming a limiting factor.

davidfowl · 2022-08-27T00:26:19Z

@mconnew You should file another issue and write up an API proposal. It sounds like you basically want a way to spin up your own poller for network IO (I wouldn't call this a threadpool).

kouvel · 2022-08-27T20:30:17Z

I could be wrong on this, but my understanding was that when a thread goes into an alertable state (eg the sleep(1) you mentioned), it yields the CPU and the kernel will activate a dormant thread. But that thread which went alertable won't get CPU time again until another thread goes alertable and yields up the CPU? Is this not the case?

It looks like after the sleep expires, that thread is scheduled normally, even if that would exceed the configured concurrency limit. I didn't see a difference in behavior between alertable and non-alertable waits, both seem to allow more threads than the concurrency limit to run simultaneously. Waiting on GetQueuedCompletionStatus does appear to block the extra threads based on the concurrency limit.

The model I would like to see is to have an instantiable ThreadPool class.

That may be reasonable if the underlying thread resources are shared to a large degree. I would be concerned about having too many thread pools that amount to having too many threads underneath. There may be other advantages to having instantiable thread pools, such as for configuring an instance for how those work items run.

I think this model would satisfy being able to keep the original behavior that WCF/CoreWCF uses to do pseudo cooperative multitasking, make it really easy to have all IO operations initiated and completed in the pool, and it would enable the current new IO thread pool model which improves performance for some scenarios, and it doesn't require compromising.

I'd like to understand this more. Is there a performance drop with this change, or between .NET 6 and .NET 7? If so, I'd like to understand better what is causing it and see some perf profiles. If it really is a separate pool of IO completion threads that is helping in the scenario, there may be options to consider, and it would help to understand the reason for the difference to determine which options may be fruitful.

kouvel added the area-System.Threading label Feb 4, 2022

kouvel added this to the 7.0.0 milestone Feb 4, 2022

kouvel self-assigned this Feb 4, 2022

kouvel requested review from marek-safar and MichalStrehovsky as code owners February 4, 2022 22:57

kouvel requested a review from mangod9 February 4, 2022 23:08

teo-tsirpanis reviewed Feb 4, 2022

View reviewed changes

src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/ThreadPool.Windows.cs Outdated Show resolved Hide resolved

teo-tsirpanis reviewed Feb 4, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs Outdated Show resolved Hide resolved

kouvel force-pushed the TpIoWin branch from ca806c5 to 8478801 Compare February 7, 2022 19:53

lateralusX reviewed Feb 8, 2022

View reviewed changes

src/mono/System.Private.CoreLib/src/System/Threading/ThreadPool.Browser.Mono.cs Show resolved Hide resolved

kouvel mentioned this pull request Feb 8, 2022

Tracking issue for threading work items #52701

Open

17 tasks

danmoseley reviewed Feb 8, 2022

View reviewed changes

src/coreclr/System.Private.CoreLib/src/System/Threading/ThreadPool.CoreCLR.Windows.cs Show resolved Hide resolved

ViktorHofer mentioned this pull request Feb 8, 2022

Don't use Targets* helper properties in libs #64500

Merged

kouvel force-pushed the TpIoWin branch from 37329fa to 0644528 Compare February 11, 2022 21:58

kouvel requested a review from janvorli February 11, 2022 21:59

kouvel mentioned this pull request Feb 13, 2022

Update SOS for when the portable thread pool is used for IO completions on Windows dotnet/diagnostics#2870

Merged

kouvel force-pushed the TpIoWin branch from 2f898d6 to c7cf417 Compare February 13, 2022 04:27

davidfowl reviewed Feb 13, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs Outdated Show resolved Hide resolved

kouvel added 8 commits March 3, 2022 14:05

Temporarily enable libraries tests against Mono for this PR

0a2dc42

Try to fix a time-sensitive test that is occasionally failing

64cca54

Note SOS usage of a variable

251de07

Address feedback

165842a

Revert "Temporarily enable libraries tests against Mono for this PR"

81f18f0

This reverts commit 9a73d8e6fbac6d3bb9d70d6aab8b8b0de5b3b6c3.

Fix build after rebase

14893de

Add checks for some DOTNET vars from Net.Sockets and implement inlini…

a310f12

…ng IO completion callbacks

kouvel force-pushed the TpIoWin branch from 1efbddf to a310f12 Compare March 3, 2022 22:05

runfoapp bot mentioned this pull request Mar 4, 2022

System.IO.Tests work item failing with SIGKILL #65791

Closed

kouvel merged commit dd3b61d into dotnet:main Mar 4, 2022

kouvel deleted the TpIoWin branch March 4, 2022 18:47

kunalspathak mentioned this pull request Mar 8, 2022

Regressions in System.Net.Http.Tests.SocketsHttpHandlerPerfTest #66343

Closed

DrewScoggins mentioned this pull request Mar 10, 2022

Regressions in System.IO.Tests.Perf_RandomAccess #66465

Closed

akoeplinger added a commit to akoeplinger/runtime that referenced this pull request Mar 11, 2022

Reenable HostApplicationBuilderTests disabled on Mono/Windows

6a1e2f0

They were disabled against dotnet#34582 which was fixed by dotnet#64834

akoeplinger mentioned this pull request Mar 11, 2022

Reenable HostApplicationBuilderTests disabled on Mono/Windows #66521

Merged

akoeplinger added a commit that referenced this pull request Mar 12, 2022

Reenable HostApplicationBuilderTests disabled on Mono/Windows (#66521)

48e431b

They were disabled against #34582 which was fixed by #64834

ghost locked as resolved and limited conversation to collaborators Apr 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Windows IO completions to the portable thread pool #64834

Add support for Windows IO completions to the portable thread pool #64834

kouvel commented Feb 4, 2022 •

edited

Loading

ghost commented Feb 4, 2022

kouvel commented Feb 4, 2022

marek-safar commented Feb 7, 2022

lateralusX commented Feb 7, 2022 •

edited

Loading

lateralusX commented Feb 7, 2022 •

edited

Loading

kouvel commented Feb 7, 2022

kouvel commented Feb 7, 2022

lateralusX commented Feb 8, 2022 •

edited

Loading

stephentoub commented Feb 8, 2022 •

edited

Loading

kouvel commented Feb 11, 2022

davidfowl commented Feb 13, 2022

kouvel commented Mar 4, 2022

mconnew commented Aug 22, 2022

kouvel commented Aug 24, 2022

davidfowl commented Aug 24, 2022

kouvel commented Aug 24, 2022

davidfowl commented Aug 25, 2022

kouvel commented Aug 26, 2022

davidfowl commented Aug 26, 2022 •

edited

Loading

mconnew commented Aug 26, 2022

davidfowl commented Aug 27, 2022

kouvel commented Aug 27, 2022 •

edited

Loading

Add support for Windows IO completions to the portable thread pool #64834

Add support for Windows IO completions to the portable thread pool #64834

Conversation

kouvel commented Feb 4, 2022 • edited Loading

ghost commented Feb 4, 2022

kouvel commented Feb 4, 2022

marek-safar commented Feb 7, 2022

lateralusX commented Feb 7, 2022 • edited Loading

lateralusX commented Feb 7, 2022 • edited Loading

kouvel commented Feb 7, 2022

kouvel commented Feb 7, 2022

lateralusX commented Feb 8, 2022 • edited Loading

stephentoub commented Feb 8, 2022 • edited Loading

kouvel commented Feb 11, 2022

davidfowl commented Feb 13, 2022

kouvel commented Mar 4, 2022

mconnew commented Aug 22, 2022

kouvel commented Aug 24, 2022

davidfowl commented Aug 24, 2022

kouvel commented Aug 24, 2022

davidfowl commented Aug 25, 2022

kouvel commented Aug 26, 2022

davidfowl commented Aug 26, 2022 • edited Loading

mconnew commented Aug 26, 2022

davidfowl commented Aug 27, 2022

kouvel commented Aug 27, 2022 • edited Loading

kouvel commented Feb 4, 2022 •

edited

Loading

lateralusX commented Feb 7, 2022 •

edited

Loading

lateralusX commented Feb 7, 2022 •

edited

Loading

lateralusX commented Feb 8, 2022 •

edited

Loading

stephentoub commented Feb 8, 2022 •

edited

Loading

davidfowl commented Aug 26, 2022 •

edited

Loading

kouvel commented Aug 27, 2022 •

edited

Loading