-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Too many open files" error on Linux (on s390x) #12410
Labels
Area:HttpCommunication
Functionality:Restore
Priority:3
Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog.
Type:Bug
Comments
And these builds use Mono runtime, not CoreCLR. |
heng-liu
added
Functionality:Restore
Area:HttpCommunication
Triage:NeedsTriageDiscussion
and removed
Triage:Untriaged
labels
Feb 9, 2023
nkolev92
added
Priority:3
Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog.
Pipeline:Icebox
and removed
Triage:NeedsTriageDiscussion
labels
Feb 9, 2023
We'd be happy to review and consider a change in this area. |
uweigand
added a commit
to uweigand/NuGet.Client
that referenced
this issue
Feb 10, 2023
Avoid "too many open files" error in some Linux configuations. Guard _findPackagesByIdResource.DoesPackageExistAsync under _throttle. Addresses NuGet/Home#12410.
8 tasks
Thanks, I've now opened a PR. |
uweigand
added a commit
to uweigand/NuGet.Client
that referenced
this issue
Feb 21, 2023
Avoid "too many open files" error in some Linux configuations. Guard _findPackagesByIdResource.DoesPackageExistAsync under _throttle. Addresses NuGet/Home#12410.
uweigand
added a commit
to uweigand/NuGet.Client
that referenced
this issue
Mar 2, 2023
Add environment variable to override OS-specific limit of concurrent connections in SourceRepositoryDependencyProvider.cs. Also guard _findPackagesByIdResource.DoesPackageExistAsync under the concurrent connection _throttle. Addresses NuGet/Home#12410.
zivkan
pushed a commit
to uweigand/NuGet.Client
that referenced
this issue
Mar 3, 2023
Add environment variable to override OS-specific limit of concurrent connections in SourceRepositoryDependencyProvider.cs. Also guard _findPackagesByIdResource.DoesPackageExistAsync under the concurrent connection _throttle. Addresses NuGet/Home#12410.
This was referenced Mar 7, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Area:HttpCommunication
Functionality:Restore
Priority:3
Issues under consideration. With enough upvotes, will be reconsidered to be added to the backlog.
Type:Bug
NuGet Product Used
dotnet.exe
Product Version
dotnet 6.0, 7.0
Worked before?
n/a
Impact
It's more difficult to complete my work
Repro Steps & Context
The original use case where we ran into this problem was running Red Hat's regression test suite:
https://github.com/redhat-developer/dotnet-regular-tests
against the Red Hat dotnet RPMs on linux-s390x.
After some investigation, I was able to create a much simpler test case:
dotnet new xunit
hello world.nuget.config
that pulls in a large number of repo sources, e.g. along these lines: https://github.com/dotnet/installer/blob/release/6.0.1xx/NuGet.config~/.nuget/packages
or settingNUGET_PACKAGES
to an empty directory.dotnet restore
This results in many many errors along the lines of:
Note that the test runner framework employed by
dotnet-regular-tests
implicitly performs steps 2) and 3) above; 2) is done to allow testing dotnet prereleases that may pull in dependencies not yet onnuget.org
, and 3) is done to ensure reproducible results.[ Note also that we're only seeing this on the s390x (IBM Z) architecture, not on Intel. I wasn't able to figure out any platform-specific code difference that might responsible for that - it seems to simply come down to different I/O timing behavior. ]
Initial investigation with
strace
confirms that the error message is correct: at the time of failure, we do indeed have 1024 open file descriptors, which is the default limit on Linux. (While it is possible to increase that limit, this is not done by default, since file handles larger than 1024 cannot be processed viaselect
due to the fixed size of thefd_set
data type, and therefore some applications may no longer work correctly if that limit is extended.)Those 1024 open file descriptors for the most part appear to originate from about 500 simultaneous connections to various nuget sources, where each such connection uses two file descriptors - one for the TCP socket, and one for a local lock file. Those connections are (nearly) all about collecting package metadata, i.e.
index.json
files from many versions of many packages across many nuget sources.In some cases, I was able to collect an exception backtrace, and it looks mostly similar to this:
Now, looking at the nuget source code, this doesn't appear to be the result of any obvious bug, it's more of a design issue - in this scenario, nuget really does have to check all these nuget sources for different versions of the dependent packages, and it is indeed better to run those checks in parallel. That said, it still would seem preferable to at least limit concurrency to the extent that we don't exceed default system limits.
I also noticed that there are in fact two mechanisms in place that appear to have the intent to limit concurrency just so, but they don't work in this particular use case.
The first is a limit imposed by
./src/NuGet.Core/NuGet.Protocol/HttpSource/HttpSource.cs
by thethrottle
argument to theHttpSource
constructor: https://github.com/NuGet/NuGet.Client/blob/419ab86c016778ee3aeab40ea6fee879c09c7365/src/NuGet.Core/NuGet.Protocol/HttpSource/HttpSource.cs#L41However, the default for this throttle appears to be to provide no limitation at all - it gets used only when either the
--disable-parallel
option is given todotnet restore
, or if there is amaxHttpRequestsPerSource
statement innuget.config
.This actually works (sometimes) for my use case:
--disable-parallel
makes it succeed reliably, but takes are very long time (5 minutes to restore the hello world example); while settingmaxHttpRequestsPerSource
to a low value like 16 makes the restore succeed sometimes (after about 30 seconds), but still fails in other cases, apparently depending on random timing effects.The second mechanism is an extra throttle in
src/NuGet.Core/NuGet.Commands/RestoreCommand/SourceRepositoryDependencyProvider.cs
which was apparently added to address a problem with MacOS only supporting 256 open files by default: https://github.com/NuGet/NuGet.Client/blob/419ab86c016778ee3aeab40ea6fee879c09c7365/src/NuGet.Core/NuGet.Commands/RestoreCommand/SourceRepositoryDependencyProvider.cs#L47But this doesn't work either - it's only enabled on MacOS, not Linux, and even if I do enable it, it still doesn't work reliably. The latter appears to be due to the fact the throttle isn't applied consistently: while most uses of the
_findPackagesByIdResource
callbacks are guarded by the throttle, there is one call to_findPackagesByIdResource.DoesPackageExistAsync
which isn't. This looks like an oversight to me ...If I add a throttle guard to that call, and enable the throttle on Linux,
dotnet restore
now succeeds reliably, and takes only about 18 seconds for a full restore from an empty nuget cache.I'd appreciate any comments or suggestions on whether this change looks reasonable and might be acceptable upstream, or whether this should be solved in a different way - happy to try other solutions! For reference, this is the patch I'm currently using - I'd be happy to submit a PR along those lines if you'd like me to.
Related issues for reference:
#2004
#2163
#8571
FYI @omajid @tmds @crummel
Resolution
In the meantime, the Mono runtime has been changed to raise the soft RLIMIT_NOFILE on Linux, making the original issue described here no longer appear. However, as a result of the ongoing discussion, the linked PR still implements two changes to the NuGet.client code base:
NUGET_CONCURRENCY_LIMIT
to override OS-specific limit of concurrent connections in SourceRepositoryDependencyProvider.cs. This allows manually limiting concurrency just in case, e.g. when running in an environment where the OS limit cannot be raised high enough. Conversely, it would also allow manually lifting the automatic limit imposed by NuGet on MacOS, in case it isn't actually needed in some environments._findPackagesByIdResource.DoesPackageExistAsync
under_throttle
. Note that all other_findPackagesByIdResource
callbacks were already guarded under the same_throttle
to limit concurrency, so the consensus was that this was an oversight. Adding this additional guard is required to have the concurrency limit work reliably in those cases where it is enabled, e.g. on MacOS or when using the new environment variable.Documentation update
The documentation should be updated to describe the new
NUGET_CONCURRENCY_LIMIT
variable.The text was updated successfully, but these errors were encountered: