-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Environment.ProcessorCount to return total machine processors in containers #48094
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
There are lots of places influenced directly or indirectly by the result of ProcessorCount. Is changing it globally for all consumers going to have the desired effect and something someone can reason about? As you said, different subsystems benefit differently from the behaviors. Or is the idea it's just another knob someone can try to see if it helps performance in aggregate? |
Great question, @stephentoub. The goal of this is to enable experimentation. For now, you'd get the performance in aggregate, as you call out. Assuming we get adequate and compelling information, we may offer something different. |
To be clear, I'm fine with us adding the environment variable. Just trying to understand what that "adequate and compelling information" information would look like. Given that this is a global switch, what would cause us to think we needed to offer something different? Certainly I can imagine the simple case of "I flipped the switch and my perf got better", and we get enough of that signal to decide to change the default (though we already hit issues with that, as you call out). What would feedback look like that would cause us to consider something more fine-grained? Someone flips the switch, gets slightly better perf, takes traces and shares them with us, and that influences our understanding? Something else? |
I think there is near zero chance we'll change the default on this property. The experience of doing that last time was bad for everyone involved, and nothing has changed to suggest a different outcome (other than that people would be less tolerant of trying it a second time). I know you probably know that. I'm just wanting to send a clear message to anyone reading this that changing the default isn't on the table.
That's it. If we get interesting information, then we can potentially offer additional customization under a different configuration or something else. We're about to start our container performance testing, soon. I have a doc on that that I intend to publish. As part of that, we should dust off our investigation from #622. We produced a lot of useful data (mostly trapped in email) that suggests that there is "a there, there" with "max procs" enabled. We just didn't get to the point of productizing that, for a variety of reasons (almost entirely schedule problems). This would all lead to exposing one or more different APIs or configuration or something else. @jkotas was directly involved in those investigations. I'm curious if he has a similar view, that the investigation data suggested that there is a different performance profile enabled by max procs (or at least higher procs than the default) that would be beneficial for some apps. ?? In fairness, I need to re-read those mails to ensure that my memory of that investigation is sound. |
Agree. We know that there are likely some workloads where this setting can be used to optimize performance. This setting will allow us to find out more data about it and unblock the workloads where the default policy works particularly poorly. This setting can be used as a mitigation for #47427 breaking change.
Yes, we have seen some of that signal e.g. in #11933 (comment) where the container had to be given artificially high quota to make the runtime allow sufficient level of parallelism. Both Go and Java have equivalent configuration option. It suggests that it is something generally useful. |
The proposal text implies we want to override the value returned by Does the "actual processor count" in the proposal text refer to the total number of processors available disregarding affinity, cgroup, and other limitations? (Otherwise, we will not be able to mitigate #47427 breaking change.)
|
The intention of this proposal is to use the configured count everywhere the runtime uses processor count. The split between native and managed implementation is accidental and it is changing over time. As you have pointed out, it would not make sense to say that the configuration should be applicable to parts that happen to be implemented in the managed code at the moment.
I agree that limiting it like this is unnecessary restriction. It makes sense to respect any reasonable positive value, say up to 0xFFFF hex.
Depends what we want to be consistent with. The intentionally designed .NET Core |
Thank you, that makes more sense. The proposal text has to be clarified accordingly.
Very few settings in clrconfigvalues.h, besides profiler ones, follow ALL_CAPS style. If there is no global consistency, it might be a good idea to keep at least local consistency. All mentioned DOTNET_ALL_CAPS settings are defined elsewhere. |
And besides From customer point of view, it does not matter which file is used to define the configuration setting in the implementation. We should optimize for consistency from customer point of view. |
Then it makes sense to review officially documented run-time configuration settings: 4 configuration settings follow CORECLR_ALL_CAPS styleCORECLR_ENABLE_PROFILINGCORECLR_PROFILER CORECLR_PROFILER_PATH CORECLR_PROFILER_PATH_64 5 configuration settings follow DOTNET_NAMESPACE_ALL_CAPS styleDOTNET_SYSTEM_GLOBALIZATION_INVARIANTDOTNET_SYSTEM_GLOBALIZATION_USENLS DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_HTTP2SUPPORT DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_ALLOWLATIN1HEADERS In this category underscores come from 29 configuration settings follow COMPlus_PascalCase styleCOMPlus_TieredCompilationCOMPlus_TC_QuickJit COMPlus_TC_QuickJitForLoops COMPlus_ReadyToRun COMPlus_EnableDiagnostics COMPlus_PerfMapEnabled COMPlus_PerfMapIgnoreSignal COMPlus_gcServer COMPlus_gcConcurrent COMPlus_GCHeapCount COMPlus_GCHeapAffinitizeMask COMPlus_GCHeapAffinitizeRanges COMPlus_GCCpuGroup COMPlus_GCNoAffinitize COMPlus_GCHeapHardLimit COMPlus_GCHeapHardLimitPercent COMPLUS_GCHeapHardLimitSOH COMPLUS_GCHeapHardLimitLOH COMPLUS_GCHeapHardLimitPOH COMPLUS_GCHeapHardLimitSOHPercent COMPLUS_GCHeapHardLimitLOHPercent COMPLUS_GCHeapHardLimitPOHPercent COMPlus_GCHighMemPercent COMPlus_GCRetainVM COMPlus_GCLargePages COMPlus_gcAllowVeryLargeObjects COMPlus_GCLOHThreshold COMPlus_GCName COMPlus_Thread_UseAllCpuGroups Arguably, adding |
See #13691 (comment) |
I added a new "Notes" section and added a comment about affecting the native runtime. We could use |
See #53149 for a demonstration of this change. |
There has been significant discussion and feedback about the value of
Environment.ProcessorCount
presented in containers. We know that there are scenarios that are underserved by the behavior of this API. We haven't exposed a new API because we haven't developed enough insight on what exactly it would do (as simple as it may seem).This discussion relates to scenarios where the
--cpus
or related arguments are used withdocker run
. There are other ways of controlling CPU usage with docker like CPU affinity that are not related to this proposal.Environment.ProcessorCount
returns the rounded up decimal value presented via--cpus
. For example--cpus 0.4
will result in a return value of 1.Environment.ProcessorCount
returns the value specified by an environment variable.In terms of the the opt-in mode, for now I'm proposing we expose an environment variable:
DOTNET_PROCESSOR_COUNT
The value is converted to an integer and returned as the new value of
Environment.ProcessorCount
. It must be at > 0 and <= the actual processor count. If those requirements are not met, the actual processor count is returned.Notes:
Environment.ProcessorCount
for managed code, and any analogues to it in the native runtime code.Assuming we collect more information on this behavior providing significant benefit, we may provide a new API that provides this information. If you like this idea, please help motivate it with usage information.
We previously (extensively) researched offering the unclipped behavior as the default value for this property. The challenge is that we found that some subsystems always work best when honoring the clipped behavior and other subsystems provide neutral to (potentially much) better behavior when honoring the unclipped behavior. It's this distinction that is the challenge. We have to decide which value each subsystem should honor, and whether we need to expose more configuration options to further tune behavior. We need more findings in the real world to help us on this journey. Provided we have compelling data, multiple avenues are open on this topic.
Related issues:
Environment.ProcessorCount
on Windows takes processor affinity into accountEnvironment.ProcessorCount
returnsThe text was updated successfully, but these errors were encountered: