-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dotnet restore takes too much time (docker dotnet/sdk:8.0) #13062
Comments
@mcanzerini I'm sorry the experience has gotten so much worse for you. You only provided a small snippet of the verbose logs, so I can't tell if there are any HTTP communication issues. We had another customer report that their nexus proxy doesn't handle nuget.org's vulnerability info resource, causing HTTP timeouts because nuget.org is blocked, so all NuGet's requests and retries time out, causing long delays. You appear to be using Artifactory, but I don't know how it works, or if it has a similar Vulnerability Info resource issue. You could try disabling NuGet Audit to see if that resolves the slowdown, but without more detailed logs, I can't say whether this appear to be the problem or not. Although the 100% CPU usage makes it sound unlikely.
How resource limited are the CI agents? How many CPU cores and how much RAM? One new feature in .NET 8 is NuGet Audit, so you could try disabling to see if that helps. Audit needs to load a vulnerabilities database. I'm not sure how big the in-memory size is, but the JSON file is 275KB. So, unless your CI agent is extremely RAM limited, it doesn't seem likely to me that Audit is exhausting all RAM, causing the OS to do a lot of disk swapping.
I consider high CPU usage to be normal. Downloading packages is only a small part of what restore does, and almost everything else is CPU limited. If you're able to use Otherwise, I assume your solution uses private packages on your artifactory feed. Are you able to create a sample solution that only uses nuget.org, that also has the same performance problem? If you can zip up that sample and send it to us, that will make it much easier to investigate. |
Thanks for your response. Even if I did not have logs about timeout during vulnerability audit (unlike the issue author you mentionned), I have disabled the NuGet Audit in all my .csproj with I will consider to try To my mind, like you guess, this is not an issue about memory. In the issue you mentionned, the author still has the problem without the vulnerability audit. So I presumed our problems are similar. Thanks for your time |
We are seeing the same thing as well. In our nuget.config we just point to our internal nuget proxy.
|
@n00j Can you create a sample solution that only uses nuget.org as a package source, and reproduce the problem? Can use capture a trace with |
I'll do this with .NET 6 (we only use LTS releases internally) and .NET 8. |
.NET 6
.NET 8
The restore was run inside our isolated network which only has access to our nuget proxy server. I could not reproduce the slowness with a restore that had full access to the nuget.org upstream. Maybe there are some additional network requests that are being performed in .NET 8 nuget restore that needs access to nuget.org? edit: edit2:
edit3: edit4:
I also see requests to |
@n00j thank you for the traces! It was really helpful, and so often customers don't provide us with this kind of information to investigate. I really appreciate that you captured and shared it. It appears the issue is that NuGet enabled signed package verification on Linux by default in .NET 8, whereas it was off by default (so, opt-in) prior. I wasn't aware of this change. I actually thought it was on by default from 6.0.4xx, but the docs say otherwise: https://learn.microsoft.com/en-us/dotnet/core/tools/nuget-signed-package-verification#linux You can turn signed package verification off completely by setting the environment variable Both of these will reduce your supply chain security, so it's a risk decision you, or your security team, need to make. In order to keep all the signature verification, your network needs to allow outgoing requests to effectively any IP address. In practise it's a small number of IP addresses. Unfortunately I don't know a way to find the URLs. Also, I'm not a security expert, but I don't understand why your company's firewall would be configured to drop outgoing traffic, rather than "reject". At least if the firewall rejects, then apps like NuGet will get quick feedback that the TCP connection isn't going to work, rather than needing to time out. If you can influence this change, then non-fatal errors like this one wouldn't have affected you (or your CI duration). |
As mentioned in my previous message, it turns out that NuGet is making additional requests that aren't default in the .NET 7 or .NET 6 SDKs, but it's not to nuget.org, it's to whatever URL that code signing certificates specify for their Certificate Revocation List (CRL).
This lock messages have been in NuGet for 6 years, without changing verbosity, so they should have been there earlier: https://github.com/NuGet/NuGet.Client/blame/a59e64507383b64bcfbe9bf63b34aca946ab0da9/src/NuGet.Core/NuGet.Packaging/PackageExtractor.cs#L399
Yes. nuget.org has used Verizon's CDN for many years (and more recently Windows Azure CDN in addition), and Verizon's CDN appears to have been renamed edgecast. If your company firewall, or proxy, is trying to create an allow-list of hostnames, it will be difficult, indeed. On my (WSL2) machine: $ host api.nuget.org
api.nuget.org is an alias for nugetapiprod.trafficmanager.net.
nugetapiprod.trafficmanager.net is an alias for apiprod-mscdn.azureedge.net.
apiprod-mscdn.azureedge.net is an alias for apiprod-mscdn.afd.azureedge.net.
apiprod-mscdn.afd.azureedge.net is an alias for star-azureedge-prod.trafficmanager.net.
star-azureedge-prod.trafficmanager.net is an alias for shed.dual-low.part-0014.t-0009.t-msedge.net.
shed.dual-low.part-0014.t-0009.t-msedge.net is an alias for part-0014.t-0009.t-msedge.net.
part-0014.t-0009.t-msedge.net has address 13.107.246.42
part-0014.t-0009.t-msedge.net has address 13.107.213.42
part-0014.t-0009.t-msedge.net has IPv6 address 2620:1ec:bdf::42
part-0014.t-0009.t-msedge.net has IPv6 address 2620:1ec:46::42 So, my physical location is using Azure CDN for nuget.org, rather than Edgecast/Verizon, but you can see that NuGet is ultimately going to make HTTP requests to part-0014.t-0009.t-msedge.net for nuget.org. |
Thank you! I believe this was it. Setting Hopefully this also will solve @mcanzerini issue
Yup, I understand. Unfortunately, in our official build environment it doesn't not have external internet access, so we will need to accept this additional risk.
I'm not exactly sure why its done this way, it just seems like the firewall is a black hole. Thanks for the information, appreciate the very quick response! |
I can confirm setting |
For anyone that came later, add this line in dockerfile: |
<!-- musl detection notes --> <!-- https://github.com/dotnet/runtime/blob/a50ba0669353893ca8ade8568b0a7d210b5a425f/src/mono/llvm/llvm-init.proj\#L7 --> <!-- https://github.com/dotnet/runtime/blob/a50ba0669353893ca8ade8568b0a7d210b5a425f/src/libraries/Common/tests/TestUtilities/System/PlatformDetection.Unix.cs\#L78 --> <!-- https://learn.microsoft.com/en-us/visualstudio/msbuild/exec-task\?view\=vs-2022 --> <!-- dotnet restore slow in container --> <!-- https://github.com/NuGet/Home/issues/13062\#issuecomment-1845202196 --> <!-- Check musl shared lib exists --> <!-- if musl libsexist check if musl is the loaded libc --> <!-- glibc hosts could have musl cross libs installed, in the standard musl location --> <!-- use ldd on a well known binary such as /bin/sh and grep for musl --> <!-- note ldd may not be available on all musl targets --> <!-- Main fallback behaviour is to default to glibc flavour, ensuring miminal impact on existing supported targets -->
## Rationale pact-reference has introduced musl and arm64 based ffi libraries for linux - pact-foundation/pact-reference#416 Tracking Issue - pact-foundation/roadmap#30 ## Issues Resolved fixes pact-foundation#498 fixes pact-foundation#496 fixes pact-foundation#500 fixes pact-foundation#374 fixes pact-foundation#387 ## Backwards Compatibility Linux glibc based hosts take precedence, so if any error occurs during musl detection. I do not anticipate breaking changes for users ## Implementation notes ### .NET notes - Docs - [Uses MSBuild Exec task](https://learn.microsoft.com/en-us/visualstudio/msbuild/exec-task?view=vs-2022) - MSBuild Blog Posts - [Cross-Platform Build Events in .NET Core using MSBuild](https://jeremybytes.blogspot.com/2020/05/cross-platform-build-events-in-net-core.html) - [MSBuild 101: Using the exit code from a command](https://www.creepingcoder.com/2020/06/01/msbuild-101-using-the-exit-code-from-a-command/) - Stack OverFlow - [Set PropertyGroup property to Exec output](https://stackoverflow.com/questions/76583824/set-propertygroup-property-to-exec-output) - .NET runtime musl detection code - https://github.com/dotnet/runtime/blob/a50ba0669353893ca8ade8568b0a7d210b5a425f/src/mono/llvm/llvm-init.proj\#L7 - https://github.com/dotnet/runtime/blob/a50ba0669353893ca8ade8568b0a7d210b5a425f/src/libraries/Common/tests/TestUtilities/System/PlatformDetection.Unix.cs\#L78t ### Conditions for execution musl detection will run if - if linux - if /lib/ld-musl-(x86_64|aarch64).so.1 exists - if ldd bin/sh | grep musl is true (musl lib is loaded, rather than glibc) will continue on error, reverting back to glibc based libaries. ### Supported musl targets should work for multiple musl based distroes if - /lib/ld-musl-(x86_64|aarch64).so.1 exists - ldd is available (available by default in alpine images) Tested on Alpine ARM64 / AMD64. ## Caveats - [.NET does not run under QEMU](https://github.com/dotnet/core/blob/main/release-notes/8.0/supported-os.md#qemu) affecting the ability to test multi-arch from a single system - .NET restore can take a long time when running under containers. - [Workaround](NuGet/Home#13062 (comment)): Set `DOTNET_NUGET_SIGNATURE_VERIFICATION` to `false` ## Compatibility ### Operating System Due to using a shared native library instead of C# for the main Pact logic only certain OSs are supported: | OS | Arch | Support | | ------------ | ----------- | -------------------------------------------------------------------| | Windows | x86 | ❌ No | | Windows | x64 | ✔️ Yes | | Linux (libc) | x86 | ❌ No | | Linux (libc) | x64 | ✔️ Yes | | Linux (musl) | x64 | ✔️ Yes (Tier 2)* | | Linux (libc) | ARM | ✔️ Yes (Tier 3)* | | Linux (musl) | ARM | ✔️ Yes (Tier 3)* | | OSX | x64 | ✔️ Yes | | OSX | ARM (M1/M2) | ✔️ Yes | #### Support - Tier 1 - Established - Full CI/CD support. - Users should not encounter issues - Full reproducible examples running in CI, should be provided by users raising issues - If using musl targets, users should attempt the same test on a libc target (such as debian) - Tier 2 - Recently introduced - Full CI/CD support. - Users may encounter issues - Full reproducible examples running in CI, should be provided by users raising issues - If using musl targets, users should attempt the same test on a libc target (such as debian) - Tier 3 - Recently introduced, No/limited CI/CD support. - Users may encounter issues - Full reproducible examples which can be run by maintainers locally, should be provided by users raising issues
Hi , when i am trying to deploy msbuild artifacts into nexus 3 , i am getting some ssl certificate issues for nuget commmands . Any leads ? |
An error was encountered when fetching 'PUT https://nexusrepo-tools.apps.bld.cammis.medi-cal.ca.gov/repository/nuget-hosted/'. The request will now be retried. |
I had the same problem this solved my problem
To get a build you can just use the --platform argument in docker build. To build an image for linux/amd64 just use the following build command:
|
In my case it is not a big win - from 3,5 minutes to 2,5 - I also wonder if it's really a good idea to turn off such a security function 🤔 |
It does not help, it still takes over 100 seconds to do it. Every optimization we could think of we have already applied. How can the difference be so catastrophic, it takes 3-5 seconds outside the container. |
Speeds up "dotnet restore". Especially can be noticed when running without network sandbox. Suggested by vimproved on IRC - thanks! See also: NuGet/Home#13062 Signed-off-by: Maciej Barć <[email protected]>
Still happening for me - super slow |
Hello, when I get a docker build via Azure Pipeline, it takes 30-40 minutes, while it takes 90 seconds locally. RUN dotnet restore "src/PayLane.CardSuite.BackEnd.HttpApi.Host/PayLane.CardSuite.BackEnd.HttpApi.Host.csproj" --source "http://nexusnuget/repository/nuget.org-proxy/index.json" Log : Dockerfile : ENV DOTNET_NUGET_SIGNATURE_VERIFICATION=false USER app FROM mcr.microsoft.com/dotnet/sdk:8.0 AS build WORKDIR /src RUN dotnet nuget locals all --clear RUN dotnet restore "src/PayLane.CardSuite.BackEnd.HttpApi.Host/PayLane.CardSuite.BackEnd.HttpApi.Host.csproj" --source "http://nexusAdress/repository/nuget.org-proxy/index.json" COPY . . RUN dotnet publish "./PayLane.CardSuite.BackEnd.HttpApi.Host.csproj" -c Release -o /app/publish /p:UseAppHost=false FROM base AS final |
Bump. This just started happening for me as well. Only dotnet restore in gitlab CI is slow, takes 20+ Minutes to restore just 3 packages. Locally runs in <1s |
Setting ENV DOTNET_NUGET_SIGNATURE_VERIFICATION=false reduces the build time from 4 hours to some minutes. I'm still not happy with this, with .net 8.0 this problem does not exists, it just started with 9.0. |
This started since yesterday for me on .net 8.0, currently sitting and looking at a +10 minute build, I have --verbosity detailed enabled and it is downloading packages super slow. Sometimes it runs in a few seconds inside Docker. |
NuGet Product Used
dotnet.exe
Product Version
docker dotnet/sdk 8.0
Worked before?
docker dotnet/sdk 7.0
Impact
I'm unable to use this version
Repro Steps & Context
The Bug
With dotnet/sdk:8.0,
dotnet restore
in a container takes way too much time (45 minutes to restore a project).This only happens in my gitlab ci, not in local. It is probably due to limited resources available for the gitlab runner.
The ops in charge of the runner tells my that the CPU usage was above 100% during the long restore.
With dotnet/sdk:7.0, the project is restored in 45 seconds.
See below the logs if I add the option
-v diagnostic
. The process is extermely long during this kind of operations.Verbose Logs
The text was updated successfully, but these errors were encountered: