Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for availability of multiple processor groups on Windows 11+ #68639

Merged

Conversation

AntonLapounov
Copy link
Member

@AntonLapounov AntonLapounov commented Apr 28, 2022

On Windows 11+ and Windows Server 2022+, a process is no longer restricted to a single processor group by default. If more than one processor group is available to the process (a non-affinitized process on Windows 11+), default to using multiple processor groups; otherwise, default to using a single processor group. This default behavior may be overridden by the DOTNET_GCCpuGroup and DOTNET_Thread_UseAllCpuGroupsconfiguration configuration values.

Fix comments according to dotnet/coreclr#3896 (comment).

Fixes #67180.

@ghost
Copy link

ghost commented Apr 28, 2022

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

Issue Details
  • Starting with Windows 11 and Windows Server 2022, a process is no longer restricted to a single processor group by default. Use all processor groups if the process is not affinitized to a single one (as if both DOTNET_GCCpuGroup and DOTNET_Thread_UseAllCpuGroups were set).
  • Remove the unused end field from CPU_Group_Info structures. Should we need it, it may be trivially calculated as begin + nr_active - 1.
  • Get rid of the InitCPUGroupInfoRange function.
  • Fix conditions near the GetLogicalProcessorInformationEx calls.

Fixes #67180.

Author: AntonLapounov
Assignees: -
Labels:

area-System.Threading

Milestone: -

@stephentoub
Copy link
Member

What should we be doing around APIs like:
https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.process.processoraffinity?view=net-6.0
https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.processthread.processoraffinity?view=net-6.0
?

While these have been incomplete since machines could have more than 64 processors, they've at least been consistent with the runtime defaults, and would only end up being inconsistent if you specified these environment variables to opt in to a larger number of processors. With this change, it seems these APIs will now be problematic if, for example, code is roundtripping the value, e.g. read the affinity, temporarily change it, set it back to what it was before.

@AntonLapounov
Copy link
Member Author

@stephentoub That is an interesting question; however, I have doubts regarding claimed consistency at present. For instance, in some runs on an 80 core machine (with no environment variables set) I see the following output:

// 16
Console.WriteLine(Environment.ProcessorCount);
// 0xffffffffffffffff
Console.WriteLine($"0x{Process.GetCurrentProcess().ProcessorAffinity:x}");

Please note that this change does not opt in to use a larger number of processors. Instead, it reflects the changes introduced by Windows 11. For compatibility reasons regarding old affinity APIs, Windows 11 assigns a "primary processor group" to all processes and threads, which should reduce the number of broken apps. Of course, there is no guarantee.

@stephentoub
Copy link
Member

stephentoub commented Apr 28, 2022

however, I have doubts regarding claimed consistency at present

Returning 0xffffffffffffffff when the ProcessorCount returns 16 seems like a bug.

But regardless my concern isn't if ProcessorAffinity returns a bitset representing a larger number of cores than is available, but rather ProcessorAffinity only allowing you to specify up to 64. If ProcessorCount actually returned 80 in your example, and by default the process was affinitized to all 80 cores, then someone doing IntPtr old = ProcessorAffinity; ProcessorAffinity = old; would presumably suddenly find their affinity lowered from 80 to 64?

At a minimum, we should validate that ProcessorAffinity and friends are consistent with themselves, e.g. if the getter is returning the value for a single processor group, then setting it should affect the same processor group, and leave other processor group affinities untouched, in which case my example wouldn't be problematic.

@kouvel
Copy link
Member

kouvel commented Apr 28, 2022

GetProcessAffinityMask and SetProcessAffinityMask are similarly problematic, they're likely not compatible with multiple processor groups. May just need to switch away from using APIs that only support one processor group. At least the default behavior would be reasonable.

@AntonLapounov
Copy link
Member Author

if the getter is returning the value for a single processor group, then setting it should affect the same processor group, and leave other processor group affinities untouched

I do not think the old affinity API that we use allows that.

someone doing IntPtr old = ProcessorAffinity; ProcessorAffinity = old;

How frequently is this pattern used in real apps?

@stephentoub
Copy link
Member

stephentoub commented Apr 28, 2022

How frequently is this pattern used in real apps?

I don't think it's super common (setting affinity itself already isn't super common), but a quick search resulted in a few examples that might be impacted if we get this wrong:

To be clear, I don't think this impacts your change. But we should follow up to make sure that either ProcessorAffinity already does the right thing (e.g. roundtripping it will always reflect the same processor group), or we use a different API under the covers that will, or we deprecate these members and replace them with ones not based on a 64-core limitation.

@AntonLapounov
Copy link
Member Author

So normally apps use ProcessorAffinity's setter to limit the number of processors the process may be executed on, e.g.,

var self = Process.GetCurrentProcess();
self.ProcessorAffinity = new IntPtr((1L << limitCoresTo) - 1);

Then implementing your suggestion:

if the getter is returning the value for a single processor group, then setting it should affect the same processor group, and leave other processor group affinities untouched

would actually break those apps. For example, instead of limiting the process to 2 cores, it would limit the process to 2+16 cores. Am I missing something?

@stephentoub
Copy link
Member

stephentoub commented Apr 29, 2022

For example, instead of limiting the process to 2 cores, it would limit the process to 2+16 cores.

Are you sure? Today Process.ProcessorAffinity on Windows just calls SetProcessAffinityMask, which is documented as:

On a system with more than 64 processors, the SetProcessAffinityMask function can be used to set the process affinity mask only for processes with threads in a single processor group.

That suggests it would already have the behavior of limiting it to 2+16 cores, on what I assume is your example system of an 80 core machine. Either that, or throwing an exception. I don't have an 80 core system on which to test.

@AntonLapounov
Copy link
Member Author

SetProcessAffinityMask effectively affinitizes the process to a single processor group and the given set of processors within that group. Otherwise, it would break many apps as you described. There is one caveat though:

Starting with Windows 11 and Windows Server 2022, on a system with more than 64 processors, process and thread affinities span all processors in the system, across all processor groups, by default. Instead of always failing in case the calling process contains threads in more than one processor group, the SetProcessAffinityMask function fails (returning zero with ERROR_INVALID_PARAMETER last error code) if the process had explicitly set the affinity of one or more of its threads outside of the process' primary group.

That means if the CLR runtime starts assigning thread affinities (controlled by Thread_AssignCpuGroups), a subsequent SetProcessAffinityMask call may fail.

@stephentoub
Copy link
Member

SetProcessAffinityMask effectively affinitizes the process to a single processor group and the given set of processors within that group.

Ok, in that case we're back to the issue I described, of roundtripping the value as in various of those examples downgrading how many cores the process will actually be able to use.

@AntonLapounov
Copy link
Member Author

The roundtripping issue cannot be solved using existing managed API, such as ProcessorAffinity's setter. New CPU set-based API may be needed instead. I think the scenario of temporary downgrading the affinity of the process and then demanding all processor groups is quite exotic though.

@stephentoub
Copy link
Member

I think the scenario of temporary downgrading the affinity of the process and then demanding all processor groups is quite exotic though.

It's not just temporarily downgrading. It's reading the mask to know what's currently affinitized, tweaking the value, and writing it back. Most of the links above fit that pattern, and I found those in just a few minutes of searching.

@AntonLapounov
Copy link
Member Author

Oh, I was thinking about restoring the original affinity. Tweaking the affinity will continue to work with the caveat I mentioned. There are many issues in those code links though. For instance, the first link has the following code, which will not work as intended if Environment.ProcessorCount is greater than 31:

        private static IntPtr FixAffinity(IntPtr processorAffinity)
        {
            int cpuMask = (1 << Environment.ProcessorCount) - 1;

@stephentoub
Copy link
Member

Tweaking the affinity will continue to work with the caveat I mentioned.

Can you elaborate?

If setting Processor.ProcessAffinity results in scoping down to just a single processor group, then someone doing something like:
https://github.com/NiclasOlofsson/MiNET/blob/df0d4ffb8c008654271e0c94187cada0abf6a30b/src/MiNET/MiNET.ServiceKiller/Emulator.cs#L81-L82
will be unwittingly reducing the number of cores their process can target, yes?

@AntonLapounov
Copy link
Member Author

Yes, this affinitizes the process to a single processor group:

currentProcess.ProcessorAffinity = currentProcess.ProcessorAffinity;

Not sure what we could do here without requesting code changes. I think treating the full mask "magically" is not an option.

@AntonLapounov AntonLapounov changed the title Use all processor groups if the process is not affinitized to a single one Fix ProcessorCount calculation on Windows 11 May 2, 2022
@AntonLapounov
Copy link
Member Author

@Maoni0 @kouvel @mangod9
According to the discussion with @stephentoub above, these changes may break some apps that use affinity API on Windows 11/Server 2022 machines with multiple processor groups. That happens because setting both m_enableGCCPUGroups and m_threadUseAllCpuGroups to TRUE enables assigning threads to processor groups by default (m_threadAssignCpuGroups). That is required on Windows 7–10 in order to use multiple processor groups. On Windows 11 that is not required as the OS already uses all processor groups by default. When the process explicitly assigns its thread to a non-primary processor group, the affinity API starts behaving differently. In particular, setting Process.ProcessorAffinity fails with Win32Exception (87): The parameter is incorrect.

To minimize the impact, we may enable assigning threads to processor groups only when both the GCCpuGroup and Thread_UseAllCpuGroups switches are set (that is the current behavior without this PR). If at least one of the switches is not set and we have more than one processor group available to the process, we may set both m_enableGCCPUGroups and m_threadUseAllCpuGroups to TRUE, but keep m_threadAssignCpuGroups set to FALSE. That way affinity API would still work and the number of available processors would still be reported correctly. However, the performance of some apps on Windows 11/Server 2022 machines with multiple processor groups may be worse compared to the scenario with explicitly assigned thread affinities. Those apps would need to set both the GCCpuGroup and Thread_UseAllCpuGroups switches to enable assigning thread affinities (as on Windows 7–10).

Do you think we should minimize the impact and avoid the business of explicitly setting thread affinities unless requested as described above?

@kouvel
Copy link
Member

kouvel commented May 5, 2022

setting Process.ProcessorAffinity fails with Win32Exception (87): The parameter is incorrect.

Why does that happen?

m_threadAssignCpuGroups

On an OS that schedules threads across processor groups (or some other similar abstraction), I would expect this setting to be disabled because the OS knows much better where to schedule threads. Should this setting be disabled by default on Win11 when there are multiple CPU groups (similarly to Linux)?

However, the performance of some apps on Windows 11/Server 2022 machines with multiple processor groups may be worse compared to the scenario with explicitly assigned thread affinities. Those apps would need to set both the GCCpuGroup and Thread_UseAllCpuGroups switches to enable assigning thread affinities (as on Windows 7–10).

There are cases where the thread-spreading that the CLR does is somewhat effective, but it's not great and it generally [edit] is more effective with server GC due to explicitly assigned thread affinities for GC threads, where the scheduler appears to otherwise schedule threads across CPU groups in round-robin fashion (not sure if that has been fixed). I don't know if there is a generic solution, but a possibility may be to change the defaults for those boolean config switches to 2, which means the default behavior (on Win10, that would be as before, on Win11, that would be the new behavior), 0 would mean it does not use multiple CPU groups and only uses the assigned CPU group on Win10 and Win11, and 1 would mean it would use all CPU groups on both OSes. That perhaps would make the config possibilities available to Win10 and Win11 equally, with decent defaults.

@stephentoub
Copy link
Member

setting Process.ProcessorAffinity fails with Win32Exception (87): The parameter is incorrect

Is there another way to implement the method that avoids that, e.g. a newer function it could call?

@AntonLapounov
Copy link
Member Author

Why does that happen?

That error is reported by SetProcessAffinityMask:

On a system with more than 64 processors, if the calling process contains threads in more than one processor group, the last error code is ERROR_INVALID_PARAMETER.

On an OS that schedules threads across processor groups (or some other similar abstraction), I would expect this setting to be disabled because the OS knows much better where to schedule threads. Should this setting be disabled by default on Win11 when there are multiple CPU groups (similarly to Linux)?

@kunalspathak has made some performance measurements on an Ampere machine. The performance was better when thread affinities were assigned by the CLR runtime. That machine has a single NUMA node and we have not done extensive testing though.

a possibility may be to change the defaults for those boolean config switches to 2

The Thread_UseAllCpuGroups switch is documented as true/false, I do not think we can change that.

@AntonLapounov
Copy link
Member Author

Is there another way to implement the method that avoids that, e.g. a newer function it could call?

I think there is no good way other than iterate over all threads in the process and reset their affinities.

@kouvel
Copy link
Member

kouvel commented May 6, 2022

On a system with more than 64 processors, if the calling process contains threads in more than one processor group, the last error code is ERROR_INVALID_PARAMETER.

It sounds like SetProcessAffinityMask may work for a while and then start to fail dynamically at the point when the process actually has threads that are affinitized to multiple CPU groups. Or it would fail when server GC is being used since GC threads are affinitized. Is that correct?

To minimize the impact, we may enable assigning threads to processor groups only when both the GCCpuGroup and Thread_UseAllCpuGroups switches are set (that is the current behavior without this PR). If at least one of the switches is not set and we have more than one processor group available to the process, we may set both m_enableGCCPUGroups and m_threadUseAllCpuGroups to TRUE, but keep m_threadAssignCpuGroups set to FALSE. That way affinity API would still work and the number of available processors would still be reported correctly.

I figure it would only help to avoid the affinity API issue when not using server GC because GC threads are also affinitized. I don't think it's worth doing the above just to avoid the affinity API error. Also due to the scheduling issue when there are multiple CPU groups and server GC is enabled, it would probably make sense to have m_threadAssignCpuGroups be true when multiple CPU groups are being used, which was the default behavior before.

As for the SetProcessAffinityMask failure, would it be reasonable to ignore the failure silently for error-wise compatibility rather than behavior-wise compatibility?

@AntonLapounov
Copy link
Member Author

Is that correct?

Yes.

I figure it would only help to avoid the affinity API issue when not using server GC because GC threads are also affinitized.

Correct.

As for the SetProcessAffinityMask failure, would it be reasonable to ignore the failure silently for error-wise compatibility rather than behavior-wise compatibility?

I am afraid that may lead to surprising and non-deterministic behavior in scenarios where an app attempts to restrict its affinity.

Regarding your idea of having three values for the switch, I think another possible option is to keep Thread_UseAllCpuGroups as true/false, but change its default value for Windows 11.

@kouvel
Copy link
Member

kouvel commented May 9, 2022

I am afraid that may lead to surprising and non-deterministic behavior in scenarios where an app attempts to restrict its affinity.

It's also a bit awkward that the affinity can be restricted or expanded dynamically, and to honor it the CPU group info would need to be updated, and thread affinities and thread group affinities reassigned. For the way the affinity APIs work currently, it seems like it would only make sense when using multiple CPU groups is disabled. So perhaps another option for now is to have a way to disable using multiple CPU groups and expect that they be disabled in those scenarios. Then affinity changes when using multiple CPU groups could throw an exception.

Also wonder if the CPU set APIs could be used to change affinities instead. Not sure how they would interoperate with thread affinities and thread group affinities though.

@AntonLapounov
Copy link
Member Author

@stephentoub @kouvel Thank you for your very valuable feedback! Do you think we need a meeting to discuss this further? Note that we have wanted to start using all processors on Windows by default for a long time (#13465). Windows 11 makes that easier for us by no longer constraining processes to a single processor group. The behavior of some apps may change on Windows 11 with or without this PR. As @kouvel noticed, even without changes, setting process’s affinity in the middle of the process’s lifetime may not work as intended. For instance, we neither update the Environment.ProcessorCount value nor reset server GC thread. With this PR, setting process’s affinity may fail if the process has already placed threads in multiple processor groups. The runtime cannot predict if the process will attempt to alter its affinity during its lifetime and it seems the only way to keep compatibility here is to discard performance advantage of using all available processors and assigning thread affinities.

As a workaround, users may force running a given process on a single processor group with start /affinity. If that is not enough, we may consider changing the default values of DOTNET_GCCpuGroup and DOTNET_Thread_UseAllCpuGroups or introduce yet another switch, complicating things further.

I would like to have opportunity to receive user feedback on this change, so please let me know if I should proceed with this PR or/and schedule a meeting to discuss possible alternatives.

@stephentoub
Copy link
Member

stephentoub commented May 11, 2022

It seems there's two modes of failure here:

  • Someone sets Process.ProcessorAffinity and it doesn't set the affinity the way that was intended
  • Someone sets Process.ProcessorAffinity and it throws an exception where it didn't before, likely crashing the process given the way it's typically used

It's the latter mode that concerns me most (although the silently-doing-something-different-than-intended aspect of the first also concerns me), in particular because folks are only going to face this on certain hardware.

Why don't we have a short meeting to discuss. I don't want to block progress here, and it'd be great if we could come up with a mitigation. If we proceed with setting the affinity by default such that any call to Process.ProcessorAffinity is going to throw on some hardware, we need to change the implementation of ProcessorAffinity in some way.

@kouvel
Copy link
Member

kouvel commented May 11, 2022

Another option may be to not affinitize GC threads and not set group affinity for other CLR-created threads by default on Win11, though that would probably need more extensive testing. With threads not affinitized explicitly, the scheduling issue would likely go away, and SetProcessorAffinityMask would affinitize the process to its primary processor group. That would probably be the closest to current behavior on Linux. A short meeting to discuss would be great.

@AntonLapounov
Copy link
Member Author

Another option may be to not affinitize GC threads and not set group affinity for other CLR-created threads by default on Win11, though that would probably need more extensive testing.

Wouldn't that negatively affect server GC performance? It would be unfortunate to sacrifice performance of most apps for the sake of a rarely used feature.

@kouvel
Copy link
Member

kouvel commented May 12, 2022

Wouldn't that negatively affect server GC performance?

I'm skeptical. I don't know the reason why server GC threads are affinitized. Maybe it has something to do with pre-Win11 scheduler behavior. Maybe that reason doesn't hold anymore with the Win11 scheduler changes. AFAIK GC threads are not affinitized on Linux, maybe that wasn't possible, or maybe it hasn't been an issue either. [Edit] I was mistaken, GC threads are affinitized on Linux too, so maybe there's a good reason for that aside from scheduler behavior.

@kunalspathak
Copy link
Member

@AntonLapounov - any update on this PR? Is there anything left?

@AntonLapounov
Copy link
Member Author

I have split off the clean-up part and merged. The rest cannot be merged without breaking existing apps that set the process affinity mask on Windows. Setting affinities for threads would break existing apps, not setting them would degrade performance. Another possible option may be to use CPU Sets API to set 'soft' affinities for threads, which I am still investigating.

@AntonLapounov
Copy link
Member Author

Closing this PR as we will likely use a different approach.

@kunalspathak
Copy link
Member

Closing this PR as we will likely use a different approach.

Do you think we will be doing anything in .NET 7?

@ghost ghost locked as resolved and limited conversation to collaborators Jul 6, 2022
@AntonLapounov AntonLapounov reopened this Aug 2, 2022
@AntonLapounov AntonLapounov force-pushed the UseAllProcessorGroupsIfNotAffinitized branch from 9ce0750 to 4cbd7a1 Compare August 2, 2022 00:06
@AntonLapounov AntonLapounov force-pushed the UseAllProcessorGroupsIfNotAffinitized branch from 23ed07f to c8dcb34 Compare August 10, 2022 06:08
@AntonLapounov AntonLapounov changed the title Fix ProcessorCount calculation on Windows 11 Account for availability of multiple processor groups on Windows 11+ Aug 10, 2022
@AntonLapounov AntonLapounov merged commit b908ecf into dotnet:main Aug 11, 2022
@AntonLapounov AntonLapounov deleted the UseAllProcessorGroupsIfNotAffinitized branch August 11, 2022 02:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arm64: Environment.ProcessorCount returns wrong value on higher core machine
6 participants