Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow overriding processor count via configuration setting #52492

Merged
merged 8 commits into from
May 11, 2021

Conversation

AntonLapounov
Copy link
Member

@AntonLapounov AntonLapounov commented May 8, 2021

Introduce the DOTNET_PROCESSOR_COUNT configuration settings that allows to override the number of processors available for the process. Move the GetCurrentProcessCpuCount function from GCToOSInterface to GCToEEInterface to reduce code duplication. Fixes #48094.

@ghost
Copy link

ghost commented May 8, 2021

Tagging subscribers to this area: @tannergooding
See info in area-owners.md if you want to be subscribed.

Issue Details

Introduce the DOTNET_PROCESSOR_COUNT configuration settings that allows to override the number of processors available for the process. Fixes #48094.

Author: AntonLapounov
Assignees: -
Labels:

area-System.Runtime

Milestone: -

// - GetCurrentProcessCpuCount() on Unixes tries to take into account cgroups CPU quota limits where applicable
processorCount = GetCurrentProcessCpuCount();
}
processorCount = GetCurrentProcessCpuCount();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere the CanEnableThreadUseAllCpuGroups check is moved inside GetCurrentProcessCpuCount().

#ifndef TARGET_WINDOWS
// Limit the GC heaps to the number of processors available in the system.
nhp = min (nhp, GCToOSInterface::GetTotalProcessorCount());
#endif // !TARGET_WINDOWS
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clipping was needed by the previous version of the code. The number of GC heaps is still limited by MAX_SUPPORTED_CPUS (always) and process_affinity_set->Count() (if affinitizing).

{
m_enableGCCPUGroups = TRUE;
m_threadUseAllCpuGroups = CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_Thread_UseAllCpuGroups) != 0;
m_threadAssignCpuGroups = CLRConfig::GetConfigValue(CLRConfig::EXTERNAL_Thread_AssignCpuGroups) != 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional changes in this function. Simplifying the code.

if (IsInitialized())
return;

if (InterlockedCompareExchange(&m_initialization, 1, 0) == 0)
{
InitCPUGroupInfo();
m_initialization = -1;
VolatileStore(&m_initialization, -1L);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading/writing m_initialization without any barriers seemed like a potential race condition, so I am changing the code to use VolatileLoad/VolatileStore. There should be no performance-critical code path calling this function.

return m_threadUseAllCpuGroups;
}

/*static*/ BOOL CPUGroupInfo::CanAssignCpuGroupsToThreads()
{
LIMITED_METHOD_CONTRACT;
_ASSERTE(m_enableGCCPUGroups || !m_threadAssignCpuGroups);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some places in the old code used

if (CPUGroupInfo::CanEnableGCCPUGroups() && CPUGroupInfo::CanEnableThreadUseAllCpuGroups())

while other places used

if (CPUGroupInfo::CanEnableThreadUseAllCpuGroups())

Those conditions are equivalent: we set the second flag only when the first flag is set. For that reason I simplified the code everywhere and added this assert.

return m_threadAssignCpuGroups;
}
#endif // HOST_WINDOWS

extern SYSTEM_INFO g_SystemInfo;

int GetTotalProcessorCount()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has been refactored from GCToOSInterface::GetTotalProcessorCount so we can use it in vm\appdomain.cpp.

// For server GC this value indicates the number of GC heaps used in circular order to allocate sized
// ref handles. It must not exceed the array size allocated by the handle table (see getNumberOfSlots
// in objecthandle.cpp). We might want to use GetNumberOfHeaps if it were accessible here.
m_iNumberOfProcessors = min(GetCurrentProcessCpuCount(), GetTotalProcessorCount());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the comment in getNumberOfSlots:

int getNumberOfSlots()
{
WRAPPER_NO_CONTRACT;
// when Ref_Initialize called, IGCHeap::GetNumberOfHeaps() is still 0, so use #procs as a workaround
// it is legal since even if later #heaps < #procs we create handles by thread home heap
// and just have extra unused slots in HandleTableBuckets, which does not take a lot of space
if (!IsServerHeap())
return 1;
return GCToOSInterface::GetTotalProcessorCount();
}

@AntonLapounov
Copy link
Member Author

I am puzzled why I am hitting this linker error on macOS:

[ 93%] Linking CXX shared library libmscordbi.dylib
Undefined symbols for architecture x86_64:
  "_PAL_GetTotalCpuCount", referenced from:
      GetTotalProcessorCount() in libutilcodestaticnohost.a(util.cpp.o)
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

The util.cpp file uses PAL_GetLogicalCpuCountFromOS and I added usage of PAL_GetTotalCpuCount, which is defined in the same src\coreclr\pal\src\misc\sysinfo.cpp file. Somehow the linker is OK with the former, but not with the latter. I do not have a macOS device and lab build logs seem useless. On Ubuntu it links just fine and I do not see any macOS-specific #ifdefs in the code. Is there a way to debug this?

@janvorli
Copy link
Member

I am puzzled why I am hitting this linker error on macOS:

The reason is that mscordbi doesn't use PAL, but relies on symbols exported by mscordaccore instead. So you'll need to add the PAL_GetTotalCpuCount to the exported symbols of mscordaccore (in https://github.com/dotnet/runtime/blob/main/src/coreclr/dlls/mscordac/mscordac_unixexports.src)

@jkotas jkotas requested a review from kouvel May 10, 2021 15:54
@jkotas
Copy link
Member

jkotas commented May 10, 2021

cc @dotnet/gc for the GC related changes.

src/coreclr/gc/gcconfig.h Outdated Show resolved Hide resolved
@AntonLapounov
Copy link
Member Author

Any additional feedback?

@kouvel
Copy link
Member

kouvel commented May 11, 2021

I'll take a look as well

@mangod9
Copy link
Member

mangod9 commented May 11, 2021

Do you plan to add some tests to validate this, or will that be a separate PR?

@AntonLapounov
Copy link
Member Author

There will be another PR to respect the job CPU limit. I can include a test that will test both settings.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Member

@kouvel kouvel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@AntonLapounov AntonLapounov merged commit 28ec201 into dotnet:main May 11, 2021
@AntonLapounov AntonLapounov deleted the OverrideProcessorCount branch May 11, 2021 22:32
@AntonLapounov
Copy link
Member Author

Thank you all for the feedback and great suggestion that allowed to reduce code duplication.

@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable Environment.ProcessorCount to return total machine processors in containers
7 participants