-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix Linux FP exception when NUMA nodes greater than 1 #22861
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34202,7 +34202,10 @@ HRESULT GCHeap::Initialize() | |
| #ifdef MULTIPLE_HEAPS | ||
| nhp_from_config = static_cast<uint32_t>(GCConfig::GetHeapCount()); | ||
|
|
||
| uint32_t nhp_from_process = GCToOSInterface::GetCurrentProcessCpuCount(); | ||
| // GetCurrentProcessCpuCount only returns up to 64 procs. | ||
| uint32_t nhp_from_process = GCToOSInterface::CanEnableGCCPUGroups() ? | ||
| GCToOSInterface::GetTotalProcessorCount(): | ||
| GCToOSInterface::GetCurrentProcessCpuCount(); | ||
|
|
||
| if (nhp_from_config) | ||
| { | ||
|
|
@@ -34233,6 +34236,20 @@ HRESULT GCHeap::Initialize() | |
| { | ||
| pmask &= smask; | ||
|
|
||
| #ifdef FEATURE_PAL | ||
| // GetCurrentProcessAffinityMask can return pmask=0 and smask=0 on | ||
| // systems with more than 1 NUMA node. The pmask decides the | ||
| // number of GC heaps to be used and the processors they are | ||
| // affinitized with. So pmask is now set to reflect that 64 | ||
| // processors are available to begin with. The actual processors in | ||
| // the system may be lower and are taken into account before | ||
| // finalizing the number of heaps. | ||
| if (!pmask) | ||
| { | ||
| pmask = SIZE_T_MAX; | ||
|
||
| } | ||
| #endif // FEATURE_PAL | ||
|
|
||
| if (gc_thread_affinity_mask) | ||
| { | ||
| pmask &= gc_thread_affinity_mask; | ||
|
|
@@ -34249,6 +34266,11 @@ HRESULT GCHeap::Initialize() | |
| } | ||
|
|
||
| nhp = min (nhp, set_bits_in_pmask); | ||
|
|
||
| #ifdef FEATURE_PAL | ||
| // Limit the GC heaps to the number of processors available in the system. | ||
| nhp = min (nhp, GCToOSInterface::GetTotalProcessorCount()); | ||
| #endif // FEATURE_PAL | ||
| } | ||
| else | ||
| { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know why this is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented that based on the MSDN doc, which says:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think it can happen on Windows too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is on the path where
if (!(GCToOSInterface::CanEnableGCCPUGroups()))so we are saying there's < 64 procs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my 96-core machine, this API returns a pmask set to 48 cores on Windows & 0 on Linux. For an 8-core machine, we see a pmask set to 8 cores on both Windows/Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but I was trying to say that even if there are > 64 processors available to this process, we can still get to this code path if the COMPlus_GCCpuGroup=0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I am starting to doubt I understand what the If the calling process contains threads in multiple groups in the MSDN doc means. I have read it as "if the current process has affinity mask set to multiple groups" and that's how I have implemented it in the PAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right - if you have GCCpuGroup set to 0 it would be more than 64 procs available to this process while CanEnableGCCCpuGroups is FALSE.
I'm looking at the cpu group code in util.cpp. the policy it uses is a little odd to me - it enables cpu groups by default instead of checking to see if one of the configs wants to enable cpu groups and then enable it if that's the case. but by default processes do not use more than one cpu group worth of processors. what's the policy on linux? if you have > 64 procs do processes use all procs by default?
there is a discrepancy on windows and linux regardless, we should unify the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linux has a very different way of reporting / setting affinity and there is no special handling for more or less than 64 processors. It doesn't have any groups. These are all Windows specific constructs.
There is
sched_getaffinity/sched_setaffinitythat use acpu_set_twhich can be manipulated as described here: https://linux.die.net/man/3/cpu_set. It is implemented as a bitset that can hold as many bits as needed for all the processors in the system.Then there is a function
numa_node_to_cpusthat fills in acpu_set_twith all processors belonging to the requested numa node index where the numa node index can be a value from 0 tonuma_max_node() - 1And finallynuma_num_possible_cpus()that returns the number of cpus enabled by the kernel (there is a kernel option that allows you to limit that number at boot time if needed).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my code in PAL takes these values and transforms them into the Windows style, artificially creating groups so that processors in a group belong to single NUMA node. So e.g. on my box, I have two NUMA nodes each containing 4 CPUs. So I create two groups with 4 processors each.