Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The process cannot access the file 'NuGet.Config' because it is being used by another process. #11607

Closed
mthalman opened this issue Feb 18, 2022 · 16 comments
Assignees
Labels
Functionality:Restore Priority:1 High priority issues that must be resolved in the current sprint. Product:dotnet.exe RegressionFromPreviousRTM A regression from the last RTM. Example: worked in 6.2, doesn't work in 6.3 Type:Bug

Comments

@mthalman
Copy link

mthalman commented Feb 18, 2022

NuGet Product Used

dotnet.exe

Product Version

6.0.200

Worked before?

6.0.102

Impact

I'm unable to use this version

Repro Steps & Context

When attempting to run dotnet restore on a multi-project solution, it results in the following error:

C:\Program Files\dotnet\sdk\6.0.200\NuGet.targets(564,5): error : The process cannot access the file 'C:\Users\ContainerUser\AppData\Roaming\NuGet\NuGet.Config' because it is being used by another process.

This was encountered in the sample image tests for the dotnet/dotnet-docker repo after targeting the 6.0.200 SDK. This is a regression from 6.0.102.

The repro is consistent but requires a highly specific environment.

I've only been able to repro this in a Docker container, specifically a Windows Server version 20H2 container running on a Windows Server version 20H2 host machine. I cannot repro it with a different host version. For example, my Win 10 21H2 machine cannot repro it even though I'm targeting a 20H2 container. I've been able to repro this on a Windows Server version 20H2 and Windows Server 2022 host machine. Using 20H2 has produced the most consistent repro in our .NET Docker builds. It requires container process isolation, the default for Windows Server, which explains why it can't repro on my Win 10 21H2 machine because that can only use Hyper-V isolation (see Windows container version compatibility). This is very specific to 20H2 for some reason. The dotnet/dotnet-docker repo has the same tests that are run for other supported Windows versions and they all work.

Also, cleaning of the Docker images is necessary between repro attempts. It doesn't make sense to me but, for whatever reason, not cleaning images after a previous attempt will cause the next attempt to succeed.

As for the repro solution, it's a multi-project solution consisting of 3 projects: a root project that references 2 child projects. If I remove one of the child projects, the issue doesn't repro.

Repro Steps

  1. Create a new Azure VM for Windows Server 20H2 that has container support enabled. I used [smalldisk] Windows Server, version 20H2 with Containers from the VM gallery which is Windows Server Core. I'm unsure if this would repro in a full Windows VM.
  2. Download and extract the repro solution:
    complexapp.zip. This project originally comes from https://github.com/dotnet/dotnet-docker/tree/main/samples/complexapp.
  3. cd complexapp
  4. Ensure that container images are cleaned up: docker system prune -af
  5. Build the Dockerfile: docker build .. It should fail on the dotnet restore step.

Verbose Logs

The issue would not repro when using diagnostic verbosity (possibly timing related). But I was able to collect logs with detailed verbosity:
dotnet-restore.txt

@mthalman mthalman changed the title [Bug]: Regression in dotnet restore for [Bug]: Regression in dotnet restore for Windows Server 20H2 container Feb 18, 2022
@mthalman mthalman changed the title [Bug]: Regression in dotnet restore for Windows Server 20H2 container [Bug]: Regression in dotnet restore for Windows Server container Feb 18, 2022
@kartheekp-ms kartheekp-ms added Functionality:Restore RegressionDuringThisVersion A regression which broke since last RTM, and was fixed before the next RTM. Product:dotnet.exe and removed Triage:Untriaged labels Feb 23, 2022
@kartheekp-ms
Copy link
Contributor

kartheekp-ms commented Feb 23, 2022

@mthalman - Were you able to reproduce this problem on the host machine? I am asking because of #9020 (comment).

I see that the repository mentioned in the issue description doesn't have any NuGet.Config file. NuGet doc suggests that,

Add a nuget.config file in the root of your project repository. This is considered the best practice as it promotes repeatability and ensures that different users have the same NuGet configuration. You may need to configure clear elements to ensure no user or machine specific configuration is applied

Can you please try to reproduce this issue after adding a NuGet.Config to the repository?

I assume this issue may be somehow related to creation of NuGet.Config file on a clean machine. We tweaked this behaviour in NuGet/NuGet.Client#4338 PR released in dotnet SDK 6.0.200. This assumption is because of the following two reasons.

  • Repro steps suggest that cleaning of the Docker images is necessary between repro attempts. My understanding of this statement is that all the user profile data will be removed when the docker image is pruned. May be an anti-virus program is locking the newly created file for a few milliseconds causing this failure.
  • The error message from the log is The process cannot access the file 'C:\Users\ContainerUser\AppData\Roaming\NuGet\NuGet.Config' because it is being used by another process. The NuGet.Config file path referred is a user level configuration file which is created only when running NuGet commands for the first time on a clean machine.

@kartheekp-ms kartheekp-ms added the WaitingForCustomer Applied when a NuGet triage person needs more info from the OP label Feb 23, 2022
@mthalman
Copy link
Author

Were you able to reproduce this problem on the host machine?

No

Can you please try to reproduce this issue after adding a NuGet.Config to the repository?

Yes, adding a NuGet.config file does make it work. I created the following NuGet.config file:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
</configuration>

And I modified the Dockerfile to copy that NuGet.config file to the root of the container before running restore:

COPY NuGet.config /
RUN dotnet restore complexapp/complexapp.csproj

This allowed the restore to succeed.

Repro steps suggest that cleaning of the Docker images is necessary between repro attempts. My understanding of this statement is that all the user profile data will be removed when the docker image is pruned. May be an anti-virus program is locking the newly created file for a few milliseconds causing this failure.

Pruning images is being done on the host machine. The container doesn't even exist at that point. Once the container starts, that's when the user profile would exist. Again, it doesn't make sense to me why pruning images has an effect here other than causing more load to exist on the host machine when running docker build causing a possible timing issue in the container to be exposed with NuGet's functionality. As for anti-virus, that doesn't exist in these containers.

NuGet/NuGet.Client#4338 certainly seems like a potential suspect considering it was introduced in 6.0.200. This issue shows a clear regression in this behavior between 6.0.102 and 6.0.200.

@ghost ghost added WaitingForClientTeam Customer replied, needs attention from client team. Do not apply this label manually. and removed WaitingForCustomer Applied when a NuGet triage person needs more info from the OP labels Feb 23, 2022
@nacitar
Copy link

nacitar commented Feb 23, 2022

I too have ran into this bug, and it's precisely as described here. During dotnet restore it tries to initialize the per-user config if it doesn't exist, and when running in parallel this results in multiple processes trying to WRITE to the file at once. Multiple readers is perfectly fine, and I'm guessing the internal code wasn't expecting writers as there's no synchronization around the creation of this resource. Generating the config yourself in advance (the one in %APPDATA%) avoids the problem because then it's all readers and no writers.

This needs addressed, for sure. The only reason this has anything to do with containers is because developer machines are unlikely to have not ran nuget/dotnet restore at least once, so other than in containers you're not very likely to run into this situation.

@aortiz-msft aortiz-msft added this to the Sprint 2022-02 milestone Feb 24, 2022
@aortiz-msft aortiz-msft added Priority:1 High priority issues that must be resolved in the current sprint. Category:Around The World labels Feb 24, 2022
@donnie-msft donnie-msft removed the WaitingForClientTeam Customer replied, needs attention from client team. Do not apply this label manually. label Feb 25, 2022
@zivkan zivkan changed the title [Bug]: Regression in dotnet restore for Windows Server container [Bug]: The process cannot access the file 'NuGet.Config' because it is being used by another process. Feb 25, 2022
@zivkan
Copy link
Member

zivkan commented Feb 25, 2022

This issue was/will be resolved by: NuGet/NuGet.Client#4473

It will be available in the .NET SDK 6.0.300, VS/MSBuild 17.2, nuget.exe 6.2.0 when they're eventually released.

In order to backport the fix .NET SDK 6.0.2xx, VS 17.2, nuget.exe 6.1.x, we need enough 👍 upvotes on the first comment to justify the management.

@johncrim
Copy link

johncrim commented Feb 26, 2022

It's a little depressing that upvotes are the criteria for making a change, vs assessing the impact. Eg this issue is affecting anyone using dotnet on github actions or Azure DevOps with default agents calling dotnet tool restore or dotnet restore.

Also, I suspect that if you're just reverting the recent change, NuGet is still susceptible to this race condition (just less susceptible). This is proven both by the fact that #7503 was opened in 2018, and the source of the race condition is clearly described in #7503 by @bergbria.

@zivkan
Copy link
Member

zivkan commented Feb 28, 2022

It's a little depressing that upvotes are the criteria for making a change, vs assessing the impact.

A number of factors go into the consideration, including impact. But my attempt last week to get early pre-approval got pushback 😞

Also, I suspect that if you're just reverting the recent change, NuGet is still susceptible to this race condition (just less susceptible).

Kind of. The other issue was specifically for nugetorgadd.trk, which was removed a year ago. Additionally, it had 4 upvotes, at least one of which I'm very confident was added last week (although github lacks the tools to help us understand, hence why we need this new issue to start counting recent upvotes). Therefore probably 3 upvotes in the 3 years between when the issue was created and when that specific file was removed from our codebase.

Ignoring that, assuming that the creation of the nuget.config file has the same theoretical issue, we haven't been getting customer reports. Also, it's entirely possible that other codepath is idempotent, so maybe it isn't possible.🤷‍♂️

Given all the other work we have, I think we have better things to work on than investigating theoretical bugs.

@TheUnstoppable
Copy link

TheUnstoppable commented Mar 1, 2022

I am also having this issue while building our multi-project solution in GitLab CI Shared Windows Runners. Building the solution in my personal machine is fine, but it fails at dotnet restore command in CI environment.

C:\Users\gitlab_runner\AppData\Local\Microsoft\dotnet\sdk\6.0.200\NuGet.targets(564,5): error : The process cannot access the file 'C:\Users\gitlab_runner\AppData\Roaming\NuGet\NuGet.Config' because it is being used by another process.

Note that we are not using Docker to build our projects, and our projects are not designed to run in Docker.
We're using dotnet-install script to install .NET 6.0 into the environment, and this error started happening with .NET 6.0.200 (previous command line: ./dotnet-install.ps1 -Channel 6.0). We've currently switched our versions to 6.0.102 which builds our projects fine. (new command line: ./dotnet_install.ps1 -Channel 6.0 -Version 6.0.102)

@dasMulli
Copy link

dasMulli commented Mar 3, 2022

Could you loop in https://developercommunity.visualstudio.com/t/ubuntu-latest-fails-dotnet-tool-restore-with-proce/1673019 ?(still in "needs more info" even though steps were provided)

@zivkan zivkan added this to the Sprint 2022-03 milestone Mar 7, 2022
@zivkan zivkan added RegressionFromPreviousRTM A regression from the last RTM. Example: worked in 6.2, doesn't work in 6.3 and removed RegressionDuringThisVersion A regression which broke since last RTM, and was fixed before the next RTM. labels Mar 7, 2022
WonyoungChoi added a commit to Samsung/TizenFX that referenced this issue Mar 10, 2022
WonyoungChoi added a commit to Samsung/TizenFX that referenced this issue Mar 10, 2022
dongsug-song pushed a commit to dongsug-song/TizenFX that referenced this issue Mar 10, 2022
everLEEst pushed a commit to Samsung/TizenFX that referenced this issue Mar 10, 2022
@isaacabraham
Copy link

I'm currently seeing this on an Azure Devops build agent on Linux when doing a dotnet restore, too.

@yatima1460
Copy link

Can confirm it happens on ubuntu runners on GitHub Actions when dotnet restoring dotnet6 stuff

@youn123
Copy link

youn123 commented Apr 26, 2022

Does anyone know if the fix for this is available in VS/MSBuild 17.2 Preview 3? Thanks!

@zivkan
Copy link
Member

zivkan commented May 12, 2022

.NET SDK 6.0.300 has shipped now, so upgrading will resolve this issue.

@zivkan zivkan closed this as completed May 12, 2022
@specialforest
Copy link

I still may have the issue with SDK v6.0.300:

C:\Program Files\dotnet\sdk\6.0.300\NuGet.targets(564,5): error : Unexpected failure reading NuGet.Config. Path: 'C:\Users\ContainerUser\AppData\Roaming\NuGet\NuGet.Config'. [C:\build\***.csproj]
C:\Program Files\dotnet\sdk\6.0.300\NuGet.targets(564,5): error :   The process cannot access the file 'C:\Users\ContainerUser\AppData\Roaming\NuGet\NuGet.Config' because it is being used by another process. [C:\build\***.csproj]
C:\Program Files\dotnet\sdk\6.0.300\NuGet.targets(564,5): error : Unexpected failure reading NuGet.Config. Path: 'C:\Users\ContainerUser\AppData\Roaming\NuGet\NuGet.Config'. [C:\build\***.csproj]
C:\Program Files\dotnet\sdk\6.0.300\NuGet.targets(564,5): error :   The process cannot access the file 'C:\Users\ContainerUser\AppData\Roaming\NuGet\NuGet.Config' because it is being used by another process. [C:\build\***.csproj]

Booksbaum added a commit to Booksbaum/FsAutoComplete that referenced this issue Jun 27, 2022
Should solve occasional test failures (`dotnet tool restore` fails)

See NuGet/Home#11607 (comment)
(via NuGet/Home#7503 (comment))
@64J0
Copy link

64J0 commented Sep 23, 2022

Facing this problem yet.

@jeffkl
Copy link
Contributor

jeffkl commented Jan 23, 2024

To anyone experiencing this issue, please be sure that parallel build invocations are using the same TEMP directory and APPDATA directories for each process. NuGet uses a cross-platform, cross-process syncronization system that assumes that if it can write to a particular file in TEMP, then it has exclusive access to a different file. If a build system changes TEMP to a different directory than other processes but does not change the APPDATA, then NuGet will be able to write to a file in TEMP and assume that it can write to the NuGet.config in the APPDATA location.

@cataggar
Copy link

cataggar commented Nov 18, 2024

I ran into this after upgrading to the 9.0.100 SDK, but after running dotnet restore --disable-parallel once, it is better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Functionality:Restore Priority:1 High priority issues that must be resolved in the current sprint. Product:dotnet.exe RegressionFromPreviousRTM A regression from the last RTM. Example: worked in 6.2, doesn't work in 6.3 Type:Bug
Projects
None yet
Development

No branches or pull requests