Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion src/gc/gc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34202,7 +34202,10 @@ HRESULT GCHeap::Initialize()
#ifdef MULTIPLE_HEAPS
nhp_from_config = static_cast<uint32_t>(GCConfig::GetHeapCount());

uint32_t nhp_from_process = GCToOSInterface::GetCurrentProcessCpuCount();
// GetCurrentProcessCpuCount only returns up to 64 procs.
uint32_t nhp_from_process = GCToOSInterface::CanEnableGCCPUGroups() ?
GCToOSInterface::GetTotalProcessorCount():
GCToOSInterface::GetCurrentProcessCpuCount();

if (nhp_from_config)
{
Expand Down Expand Up @@ -34233,6 +34236,20 @@ HRESULT GCHeap::Initialize()
{
pmask &= smask;

#ifdef FEATURE_PAL
// GetCurrentProcessAffinityMask can return pmask=0 and smask=0 on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know why this is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented that based on the MSDN doc, which says:

On a system with more than 64 processors, if the threads of the calling process are in a single processor group, the function sets the variables pointed to by lpProcessAffinityMask and lpSystemAffinityMask to the process affinity mask and the processor mask of active logical processors for that group. If the calling process contains threads in multiple groups, the function returns zero for both affinity masks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think it can happen on Windows too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is on the path where if (!(GCToOSInterface::CanEnableGCCPUGroups())) so we are saying there's < 64 procs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my 96-core machine, this API returns a pmask set to 48 cores on Windows & 0 on Linux. For an 8-core machine, we see a pmask set to 8 cores on both Windows/Linux.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but I was trying to say that even if there are > 64 processors available to this process, we can still get to this code path if the COMPlus_GCCpuGroup=0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am starting to doubt I understand what the If the calling process contains threads in multiple groups in the MSDN doc means. I have read it as "if the current process has affinity mask set to multiple groups" and that's how I have implemented it in the PAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right - if you have GCCpuGroup set to 0 it would be more than 64 procs available to this process while CanEnableGCCCpuGroups is FALSE.

I'm looking at the cpu group code in util.cpp. the policy it uses is a little odd to me - it enables cpu groups by default instead of checking to see if one of the configs wants to enable cpu groups and then enable it if that's the case. but by default processes do not use more than one cpu group worth of processors. what's the policy on linux? if you have > 64 procs do processes use all procs by default?

there is a discrepancy on windows and linux regardless, we should unify the behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have > 64 procs do processes use all procs by default?

Linux has a very different way of reporting / setting affinity and there is no special handling for more or less than 64 processors. It doesn't have any groups. These are all Windows specific constructs.

There is sched_getaffinity / sched_setaffinity that use a cpu_set_t which can be manipulated as described here: https://linux.die.net/man/3/cpu_set. It is implemented as a bitset that can hold as many bits as needed for all the processors in the system.
Then there is a function numa_node_to_cpus that fills in a cpu_set_t with all processors belonging to the requested numa node index where the numa node index can be a value from 0 to numa_max_node() - 1 And finally numa_num_possible_cpus() that returns the number of cpus enabled by the kernel (there is a kernel option that allows you to limit that number at boot time if needed).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my code in PAL takes these values and transforms them into the Windows style, artificially creating groups so that processors in a group belong to single NUMA node. So e.g. on my box, I have two NUMA nodes each containing 4 CPUs. So I create two groups with 4 processors each.

// systems with more than 1 NUMA node. The pmask decides the
// number of GC heaps to be used and the processors they are
// affinitized with. So pmask is now set to reflect that 64
// processors are available to begin with. The actual processors in
// the system may be lower and are taken into account before
// finalizing the number of heaps.
if (!pmask)
{
pmask = SIZE_T_MAX;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tested this on 32-bit? if so I am surprised you didn't get a compiler warning here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried building for x86 and did not see any compiler warnings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tried this with the compiler explorer, it displayed a warning on x86:
[x86-64 clang 7.0.0 #1] warning: implicit conversion from 'unsigned long long' to 'uintptr_t' (aka 'unsigned int') changes value from 18446744073709551615 to 4294967295 [-Wconstant-conversion]

@vkvenkat have you actually tried the x86 on Unix?

Copy link
Author

@vkvenkat vkvenkat Feb 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any warnings when I built for x86 on Ubuntu, but theoretically there should have been one. So I will use the BIT64 & BIT32 macros to fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkvenkat you can use UINTPTR_MAX.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed this change & squashed commits.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janvorli Some OSX CI builds are failing: Undefined symbols for architecture x86_64: "_GetCurrentProcessorNumberEx", referenced from CPUGroupInfo::CalculateCurrentProcessorNumber() in libutilcodestaticnohost.a(util.cpp.o). Do we need to revert to using GetProcAddress calls with exports in mscorwks_unixexports.src rather than direct API calls to fix this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work. Let me give it a quick try on my local mac.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've found the problem. The libutilcodestaticnohost were missing coreclrpal library. This fixes the issue:

diff --git a/src/utilcode/staticnohost/CMakeLists.txt b/src/utilcode/staticnohost/CMakeLists.txt
index eea4d60785..e66a5de40d 100644
--- a/src/utilcode/staticnohost/CMakeLists.txt
+++ b/src/utilcode/staticnohost/CMakeLists.txt
@@ -8,5 +8,5 @@ endif(WIN32)
 add_library_clr(utilcodestaticnohost STATIC ${UTILCODE_STATICNOHOST_SOURCES})

 if(CLR_CMAKE_PLATFORM_UNIX)
-  target_link_libraries(utilcodestaticnohost  nativeresourcestring)
+  target_link_libraries(utilcodestaticnohost  nativeresourcestring coreclrpal)
 endif(CLR_CMAKE_PLATFORM_UNIX)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this fixes it for OSX. But I ran into this issue on Linux: ../../pal/src/libcoreclrpal.a(process.cpp.o): In function CreateProcessW: coreclr/src/pal/src/thread/process.cpp:530: multiple definition of CreateProcessW
CMakeFiles/mscordbi.dir/__/mscordac/palredefines.S.o:(.text+0x14f): first defined here

Using a nested check for CLR_CMAKE_PLATFORM_DARWIN fixes this.

}
#endif // FEATURE_PAL

if (gc_thread_affinity_mask)
{
pmask &= gc_thread_affinity_mask;
Expand All @@ -34249,6 +34266,11 @@ HRESULT GCHeap::Initialize()
}

nhp = min (nhp, set_bits_in_pmask);

#ifdef FEATURE_PAL
// Limit the GC heaps to the number of processors available in the system.
nhp = min (nhp, GCToOSInterface::GetTotalProcessorCount());
#endif // FEATURE_PAL
}
else
{
Expand Down
50 changes: 1 addition & 49 deletions src/inc/utilcode.h
Original file line number Diff line number Diff line change
Expand Up @@ -1355,31 +1355,10 @@ class NumaNodeInfo
static void InitNumaNodeInfo();

#if !defined(FEATURE_REDHAWK)
private: // apis types

//GetNumaHighestNodeNumber()
typedef BOOL
(WINAPI *PGNHNN)(PULONG);
//VirtualAllocExNuma()
typedef LPVOID
(WINAPI *PVAExN)(HANDLE,LPVOID,SIZE_T,DWORD,DWORD,DWORD);

// api pfns and members
static PGNHNN m_pGetNumaHighestNodeNumber;
static PVAExN m_pVirtualAllocExNuma;

public: // functions

static LPVOID VirtualAllocExNuma(HANDLE hProc, LPVOID lpAddr, SIZE_T size,
DWORD allocType, DWORD prot, DWORD node);

private:
//GetNumaProcessorNodeEx()
typedef BOOL
(WINAPI *PGNPNEx)(PPROCESSOR_NUMBER, PUSHORT);
static PGNPNEx m_pGetNumaProcessorNodeEx;

public:
static BOOL GetNumaProcessorNodeEx(PPROCESSOR_NUMBER proc_no, PUSHORT node_no);
#endif
};
Expand Down Expand Up @@ -1407,7 +1386,6 @@ class CPUGroupInfo
static CPU_Group_Info *m_CPUGroupInfoArray;
static bool s_hadSingleProcessorAtStartup;

static BOOL InitCPUGroupInfoAPI();
static BOOL InitCPUGroupInfoArray();
static BOOL InitCPUGroupInfoRange();
static void InitCPUGroupInfo();
Expand All @@ -1424,34 +1402,8 @@ class CPUGroupInfo
//static void PopulateCPUUsageArray(void * infoBuffer, ULONG infoSize);

#if !defined(FEATURE_REDHAWK)
private:
//GetLogicalProcessorInforomationEx()
typedef BOOL
(WINAPI *PGLPIEx)(DWORD, SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *, PDWORD);
//SetThreadGroupAffinity()
typedef BOOL
(WINAPI *PSTGA)(HANDLE, GROUP_AFFINITY *, GROUP_AFFINITY *);
//GetThreadGroupAffinity()
typedef BOOL
(WINAPI *PGTGA)(HANDLE, GROUP_AFFINITY *);
//GetCurrentProcessorNumberEx()
typedef void
(WINAPI *PGCPNEx)(PROCESSOR_NUMBER *);
//GetSystemTimes()
typedef BOOL
(WINAPI *PGST)(FILETIME *, FILETIME *, FILETIME *);
//NtQuerySystemInformationEx()
//typedef int
//(WINAPI *PNTQSIEx)(SYSTEM_INFORMATION_CLASS, PULONG, ULONG, PVOID, ULONG, PULONG);
static PGLPIEx m_pGetLogicalProcessorInformationEx;
static PSTGA m_pSetThreadGroupAffinity;
static PGTGA m_pGetThreadGroupAffinity;
static PGCPNEx m_pGetCurrentProcessorNumberEx;
static PGST m_pGetSystemTimes;
//static PNTQSIEx m_pNtQuerySystemInformationEx;

public:
static BOOL GetLogicalProcessorInformationEx(DWORD relationship,
static BOOL GetLogicalProcessorInformationEx(LOGICAL_PROCESSOR_RELATIONSHIP relationship,
SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *slpiex, PDWORD count);
static BOOL SetThreadGroupAffinity(HANDLE h,
GROUP_AFFINITY *groupAffinity, GROUP_AFFINITY *previousGroupAffinity);
Expand Down
3 changes: 3 additions & 0 deletions src/utilcode/staticnohost/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,7 @@ add_library_clr(utilcodestaticnohost STATIC ${UTILCODE_STATICNOHOST_SOURCES})

if(CLR_CMAKE_PLATFORM_UNIX)
target_link_libraries(utilcodestaticnohost nativeresourcestring)
if(CLR_CMAKE_PLATFORM_DARWIN)
target_link_libraries(utilcodestaticnohost coreclrpal)
endif(CLR_CMAKE_PLATFORM_DARWIN)
endif(CLR_CMAKE_PLATFORM_UNIX)
95 changes: 15 additions & 80 deletions src/utilcode/util.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -743,19 +743,16 @@ BYTE * ClrVirtualAllocWithinRange(const BYTE *pMinAddr,
// NumaNodeInfo
//******************************************************************************
#if !defined(FEATURE_REDHAWK)
/*static*/ NumaNodeInfo::PGNHNN NumaNodeInfo::m_pGetNumaHighestNodeNumber = NULL;
/*static*/ NumaNodeInfo::PVAExN NumaNodeInfo::m_pVirtualAllocExNuma = NULL;

/*static*/ LPVOID NumaNodeInfo::VirtualAllocExNuma(HANDLE hProc, LPVOID lpAddr, SIZE_T dwSize,
DWORD allocType, DWORD prot, DWORD node)
{
return (*m_pVirtualAllocExNuma)(hProc, lpAddr, dwSize, allocType, prot, node);
return ::VirtualAllocExNuma(hProc, lpAddr, dwSize, allocType, prot, node);
}
/*static*/ NumaNodeInfo::PGNPNEx NumaNodeInfo::m_pGetNumaProcessorNodeEx = NULL;

/*static*/ BOOL NumaNodeInfo::GetNumaProcessorNodeEx(PPROCESSOR_NUMBER proc_no, PUSHORT node_no)
{
return (*m_pGetNumaProcessorNodeEx)(proc_no, node_no);
return ::GetNumaProcessorNodeEx(proc_no, node_no);
}
#endif

Expand All @@ -778,20 +775,8 @@ BYTE * ClrVirtualAllocWithinRange(const BYTE *pMinAddr,
if (hMod == NULL)
return FALSE;

m_pGetNumaHighestNodeNumber = (PGNHNN) GetProcAddress(hMod, "GetNumaHighestNodeNumber");
if (m_pGetNumaHighestNodeNumber == NULL)
return FALSE;

// fail to get the highest numa node number
if (!m_pGetNumaHighestNodeNumber(&highest) || (highest == 0))
return FALSE;

m_pGetNumaProcessorNodeEx = (PGNPNEx) GetProcAddress(hMod, "GetNumaProcessorNodeEx");
if (m_pGetNumaProcessorNodeEx == NULL)
return FALSE;

m_pVirtualAllocExNuma = (PVAExN) GetProcAddress(hMod, "VirtualAllocExNuma");
if (m_pVirtualAllocExNuma == NULL)
if (!::GetNumaHighestNodeNumber(&highest) || (highest == 0))
return FALSE;

return TRUE;
Expand All @@ -814,37 +799,37 @@ BYTE * ClrVirtualAllocWithinRange(const BYTE *pMinAddr,
// NumaNodeInfo
//******************************************************************************
#if !defined(FEATURE_REDHAWK)
/*static*/ CPUGroupInfo::PGLPIEx CPUGroupInfo::m_pGetLogicalProcessorInformationEx = NULL;
/*static*/ CPUGroupInfo::PSTGA CPUGroupInfo::m_pSetThreadGroupAffinity = NULL;
/*static*/ CPUGroupInfo::PGTGA CPUGroupInfo::m_pGetThreadGroupAffinity = NULL;
/*static*/ CPUGroupInfo::PGCPNEx CPUGroupInfo::m_pGetCurrentProcessorNumberEx = NULL;
/*static*/ CPUGroupInfo::PGST CPUGroupInfo::m_pGetSystemTimes = NULL;
/*static*/ //CPUGroupInfo::PNTQSIEx CPUGroupInfo::m_pNtQuerySystemInformationEx = NULL;

/*static*/ BOOL CPUGroupInfo::GetLogicalProcessorInformationEx(DWORD relationship,
/*static*/ BOOL CPUGroupInfo::GetLogicalProcessorInformationEx(LOGICAL_PROCESSOR_RELATIONSHIP relationship,
SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *slpiex, PDWORD count)
{
LIMITED_METHOD_CONTRACT;
return (*m_pGetLogicalProcessorInformationEx)(relationship, slpiex, count);
return ::GetLogicalProcessorInformationEx(relationship, slpiex, count);
}

/*static*/ BOOL CPUGroupInfo::SetThreadGroupAffinity(HANDLE h,
GROUP_AFFINITY *groupAffinity, GROUP_AFFINITY *previousGroupAffinity)
{
LIMITED_METHOD_CONTRACT;
return (*m_pSetThreadGroupAffinity)(h, groupAffinity, previousGroupAffinity);
return ::SetThreadGroupAffinity(h, groupAffinity, previousGroupAffinity);
}

/*static*/ BOOL CPUGroupInfo::GetThreadGroupAffinity(HANDLE h, GROUP_AFFINITY *groupAffinity)
{
LIMITED_METHOD_CONTRACT;
return (*m_pGetThreadGroupAffinity)(h, groupAffinity);
return ::GetThreadGroupAffinity(h, groupAffinity);
}

/*static*/ BOOL CPUGroupInfo::GetSystemTimes(FILETIME *idleTime, FILETIME *kernelTime, FILETIME *userTime)
{
LIMITED_METHOD_CONTRACT;
return (*m_pGetSystemTimes)(idleTime, kernelTime, userTime);

#ifndef FEATURE_PAL
return ::GetSystemTimes(idleTime, kernelTime, userTime);
#else
return FALSE;
#endif
}
#endif

Expand All @@ -857,53 +842,6 @@ BYTE * ClrVirtualAllocWithinRange(const BYTE *pMinAddr,
/*static*/ LONG CPUGroupInfo::m_initialization = 0;
/*static*/ bool CPUGroupInfo::s_hadSingleProcessorAtStartup = false;

// Check and setup function pointers for >64 LP Support
/*static*/ BOOL CPUGroupInfo::InitCPUGroupInfoAPI()
{
CONTRACTL
{
NOTHROW;
GC_NOTRIGGER;
}
CONTRACTL_END;

#if !defined(FEATURE_REDHAWK) && (defined(_TARGET_AMD64_) || defined(_TARGET_ARM64_))
#ifndef FEATURE_PAL
HMODULE hMod = GetModuleHandleW(WINDOWS_KERNEL32_DLLNAME_W);
#else
HMODULE hMod = GetCLRModule();
#endif
if (hMod == NULL)
return FALSE;

m_pGetLogicalProcessorInformationEx = (PGLPIEx)GetProcAddress(hMod, "GetLogicalProcessorInformationEx");
if (m_pGetLogicalProcessorInformationEx == NULL)
return FALSE;

m_pSetThreadGroupAffinity = (PSTGA)GetProcAddress(hMod, "SetThreadGroupAffinity");
if (m_pSetThreadGroupAffinity == NULL)
return FALSE;

m_pGetThreadGroupAffinity = (PGTGA)GetProcAddress(hMod, "GetThreadGroupAffinity");
if (m_pGetThreadGroupAffinity == NULL)
return FALSE;

m_pGetCurrentProcessorNumberEx = (PGCPNEx)GetProcAddress(hMod, "GetCurrentProcessorNumberEx");
if (m_pGetCurrentProcessorNumberEx == NULL)
return FALSE;

#ifndef FEATURE_PAL
m_pGetSystemTimes = (PGST)GetProcAddress(hMod, "GetSystemTimes");
if (m_pGetSystemTimes == NULL)
return FALSE;
#endif

return TRUE;
#else
return FALSE;
#endif
}

#if !defined(FEATURE_REDHAWK) && (defined(_TARGET_AMD64_) || defined(_TARGET_ARM64_))
// Calculate greatest common divisor
DWORD GCD(DWORD u, DWORD v)
Expand Down Expand Up @@ -955,7 +893,7 @@ DWORD LCM(DWORD u, DWORD v)
return FALSE;

pSLPIEx = (SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX *)bBuffer;
if (!m_pGetLogicalProcessorInformationEx(RelationGroup, pSLPIEx, &cbSLPIEx))
if (!::GetLogicalProcessorInformationEx(RelationGroup, pSLPIEx, &cbSLPIEx))
{
delete[] bBuffer;
return FALSE;
Expand Down Expand Up @@ -1042,9 +980,6 @@ DWORD LCM(DWORD u, DWORD v)
if (!enableGCCPUGroups)
return;

if (!InitCPUGroupInfoAPI())
return;

if (!InitCPUGroupInfoArray())
return;

Expand Down Expand Up @@ -1167,7 +1102,7 @@ DWORD LCM(DWORD u, DWORD v)
proc_no.Group=0;
proc_no.Number=0;
proc_no.Reserved=0;
(*m_pGetCurrentProcessorNumberEx)(&proc_no);
::GetCurrentProcessorNumberEx(&proc_no);

DWORD fullNumber = 0;
for (WORD i = 0; i < proc_no.Group; i++)
Expand Down