-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Windows IO completions to the portable thread pool #64834
Conversation
Tagging subscribers to this area: @mangod9 Issue Details
|
ASP.NET perf on citrine-win:
|
src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/ThreadPool.Windows.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs
Outdated
Show resolved
Hide resolved
@lateralusX please review |
@kouvel Should we enable the runtime tests currently disabled on Mono as part of this PR. There are a handful of tests currently labeled with #34582 as active issue due to missing features, I guess the change in this PR should resolve that issue. If so, we would get some additional coverage on MonoVM as part of doing this change. There is also an open issue tied to the same missing functionality, #63185, and that should probably be validate by runtime/src/libraries/System.IO.FileSystem.Watcher/tests/FileSystemWatcher.unit.cs Line 17 in 8f87ac7
I will build your branch and validate some of these things manual as part of reviewing the MonoVM side of changes done in this PR. |
Manual verification of #63185 using this PR running on MonoVM+Windows, works. |
@lateralusX, there seem to be a number of tests in various places disabled on Mono for the same reason. I have enabled the tests mentioned in #34582 for now, there are likely more to enable and I can try to do so in a separate commit but I'm not sure I'll find them all. |
Oh I see that there are a number of other disabled tests referring to #34582, I can start with those |
Running through the rest of the enabled tests locally and all seems to pass, great work! If we would like to run the Windows library test suite on Mono as part of this PR (for validation), we would need to enable it on PR runs (currently not run on PR's), flip this to get the test suite running on this PR: runtime/eng/pipelines/runtime.yml Line 1165 in a24364a
There is one test failing in the drawning test suite on Windows, that is currently expected and will be fixed in a different PR where I fix and enable other failing tests on Windows. |
src/mono/System.Private.CoreLib/src/System/Threading/ThreadPool.Browser.Mono.cs
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Threading/ThreadPool.CoreCLR.Windows.cs
Show resolved
Hide resolved
only once #64720 is merged. But this case is using a ThrowHelper, which we generally haven't replaced. |
Rebased to fix conflicts |
…ns on Windows - Depends on dotnet/runtime#64834 - Refactored the enumeration of the global queue and reused it to also enumerate the high-priority work item queue - Omitted info that is specific to the native thread pool implementation, when the portable thread pool is being used for IO completions
This makes me so happy 😄 |
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs
Outdated
Show resolved
Hide resolved
- Renamed config var for number of poller threads to indicate that they are IO pollers - Removed an unused property - Reordered IO completion callback arguments for consistency with before and other code paths - A `FileSystemWatcher` test failed twice, unable to repro locally, tried offloading background work to a separate thread
This reverts commit 9a73d8e6fbac6d3bb9d70d6aab8b8b0de5b3b6c3.
…ng IO completion callbacks
Looks like the remaining CI failures are unrelated failures also happening elsewhere. |
…ns on Windows (#2870) - Depends on dotnet/runtime#64834 - Refactored the enumeration of the global queue and reused it to also enumerate the high-priority work item queue - Omitted info that is specific to the native thread pool implementation, when the portable thread pool is being used for IO completions
They were disabled against dotnet#34582 which was fixed by dotnet#64834
I just learnt about this change/PR. On Windows, in the previous IO thread pool implementation, threads which were bound to an IOCP have a different behavior than normal threads. In the Threads and Concurrency section of the win32 documentation it explains about limiting the number of runnable threads, and .NET would limit it to the number of CPU cores. This has the effect that work executing on an IO thread could not be pre-emptively context switched by another one (within the IO thread pool). Posting the work to a regular thread which has higher priority than a regular thread but without this runnable concurrency limit is going to reduce performance in some scenarios. Unless that thread pool is hard limited in the number of threads in it (and not modifiable with ThreadPool.SetMaxThreads), there's going to be more context switching which introduces more overhead. But if you do that and you execute work on one of the threads which blocks, you don't have any extra threads which can be woken up to complete more work so that's not an option (and you deadlock if the blocking thread is waiting on some other work that's waiting). |
@mconnew, fair point. It may be possible to limit the IOCP concurrency for worker threads too on Windows since they also use an IOCP, perhaps an option could be exposed through config, it may be interesting to try although there may be some downsides to using it in some cases. The IOCP concurrency limit applies only so long as threads don't block for any reason. If for instance there is any lock contention that causes one thread to wait (even in a spin-lock with Extra context switching can also occur between IO threads and worker threads when both are being used, which is not uncommon. Reusing one pool for both types of work can reduce the amount of context switching in those typical cases. There is another difference between Windows and Linux in how IO completions are processed. Before this change on Windows, IO completions are processed in parallel to worker threads. After this change on Windows, IO completions are queued to the worker thread pool at higher priority. On Linux, IO completions are queued as normal work items to the worker thread pool, and a long work queue could artificially delay IO completions from being processed. There are some other differences in how wait completions are processed. I'd like to understand what kind of effect this change would have in CoreWCF's performance. I'm also curious which differences are causing it to perform slower on Linux. Can you point me to the performance tests and instructions for running them? |
@kouvel feels like we should make this change on linux as well. Another benefit to moving the polling logic into the runtime. |
I think it would be good to do at least for quasi-correctness, if not for perf. Doing so may need some APIs such as #61273 or similar (or at least an internal way to get to those APIs). |
Should re-open that issue for .NET 8? |
Maybe the first option in #61273 (comment) could be used for starters. |
Option 1b would be to expose |
@kouvel, I could be wrong on this, but my understanding was that when a thread goes into an alertable state (eg the sleep(1) you mentioned), it yields the CPU and the kernel will activate a dormant thread. But that thread which went alertable won't get CPU time again until another thread goes alertable and yields up the CPU? Is this not the case? Does it only limit releasing a thread waiting for a completion from the IOCP if there are more than the configured concurrency limit executing? The model I would like to see is to have an instantiable ThreadPool class. This class would have the option on Windows to create an IOCP for the instance, as well as options to specify OS thread priority for threads created by the pool. It would have a helper method to bind to a Socket which on Windows would either bind the socket to the ThreadPool instances IOCP or bind to the ambient IO thread pool but have the completion be dispatched to the ThreadPool instance (similar to the new IOCP model) if IOCP hasn't been configured. On Linux it would set up a singleton polling loop in the ThreadPool instance and dispatch ready notifications to the ThreadPool work queue. The ThreadPool would have public helper properties which can return a SynchronizationContext and a TaskScheduler to make it easy to run code in the ThreadPool using methods like TaskFactory.StartNew. It would also have its own QueueUserWorkItem method to easily dispatch delegates to it. I think this model would satisfy being able to keep the original behavior that WCF/CoreWCF uses to do pseudo cooperative multitasking, make it really easy to have all IO operations initiated and completed in the pool, and it would enable the current new IO thread pool model which improves performance for some scenarios, and it doesn't require compromising. I think we're getting to the point where having two implicit built-in thread pools with hard coded behavior is becoming a limiting factor. |
@mconnew You should file another issue and write up an API proposal. It sounds like you basically want a way to spin up your own poller for network IO (I wouldn't call this a threadpool). |
It looks like after the sleep expires, that thread is scheduled normally, even if that would exceed the configured concurrency limit. I didn't see a difference in behavior between alertable and non-alertable waits, both seem to allow more threads than the concurrency limit to run simultaneously. Waiting on
That may be reasonable if the underlying thread resources are shared to a large degree. I would be concerned about having too many thread pools that amount to having too many threads underneath. There may be other advantages to having instantiable thread pools, such as for configuring an instance for how those work items run.
I'd like to understand this more. Is there a performance drop with this change, or between .NET 6 and .NET 7? If so, I'd like to understand better what is causing it and see some perf profiles. If it really is a separate pool of IO completion threads that is helping in the scenario, there may be options to consider, and it would help to understand the reason for the difference to determine which options may be fruitful. |
Test fixes on Mono