Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extremely slow Network/Disk IO on Windows agent compared to Ubuntu/Mac #260

Open
2 of 5 tasks
jetersen opened this issue Jan 17, 2022 · 24 comments · May be fixed by #480
Open
2 of 5 tasks

extremely slow Network/Disk IO on Windows agent compared to Ubuntu/Mac #260

jetersen opened this issue Jan 17, 2022 · 24 comments · May be fixed by #480
Labels
feature request New feature or request to improve the current logic

Comments

@jetersen
Copy link

jetersen commented Jan 17, 2022

Description:

actions/runner-images#3577

Ubuntu agents have slightly higher IOPS disk performance configuration. We use install-dotnet.ps1 script for installation provided by DotNet team . The DownloadFile and Extract-Dotnet-Package functions are being slow. We will investigate how to improve performance those functions if we replace DownloadFile -> WebClient and Extract-Dotnet-Package -> 7zip

DownloadFile and Extract-Dotnet-Package are awfully slow. Like 3x slower!

SLOW

Task version:
v1.9.0

Platform:

  • Ubuntu
  • macOS
  • Windows

Runner type:

  • Hosted
  • Self-hosted

Repro steps:
https://github.com/jetersen/dotnet.restore.slow.github.action

Expected behavior:
Faster downloads

Actual behavior:
SLOW downloads

@jetersen
Copy link
Author

It can be fast:

FAST

@jetersen
Copy link
Author

Perhaps consider not using dotnet install script or is the option to contribute a fix to dotnet install script?

@vsafonkin
Copy link

Hi @jetersen, we will try to resolve this problem.

@PureKrome
Copy link
Contributor

Hi Team - any news on this?

@e-korolevskii
Copy link
Contributor

Hello @PureKrome,

So far no updates

@dsame
Copy link

dsame commented Nov 2, 2023

The problem is not reproduced anymore.

Based on multiple runs, the action does not take more than 15 seconds.
https://github.com/akv-demo/dotnet.restore.slow.github.action/actions/runs/6729109167

Most probably, the root cause of the problem was an infrastructure issue that has currently been resolved.

In case the problem reoccurs, the solution is to avoid bulk copying to the OS drive, similar to the workaround applied for the same problem in the actions/setup-go: actions/setup-go#393

@jetersen did it help?

@jetersen
Copy link
Author

jetersen commented Nov 2, 2023

@dsame I do not agree with the assessment that it is not reproducible 😓
Even with cache available Windows Server 2022 is still 20 seconds slower.
Creating the cache is still 1 minute longer than time of Ubuntu when running Windows Server 2022.

So definitely an improvement but I feel like windows can perform better.

image
image

https://github.com/jetersen/dotnet.restore.slow.github.action/actions/runs/6736238225
https://github.com/jetersen/dotnet.restore.slow.github.action/actions/runs/6736262624

@jetersen
Copy link
Author

jetersen commented Nov 2, 2023

I don't think it is fair to say look it is fixed for the actions/setup-dotnet when we are talking about a simple if check to see if .NET 6 is already available on the actions runner image 😓

Testing with .NET 8 preview shows ubuntu with 7 seconds vs +30 seconds sometime a little less. For actions/setup-dotnet.

https://github.com/jetersen/dotnet.restore.slow.github.action/actions/runs/6736383956/job/18311632176

While this issue remains open: #141 this will definitely not improve 😢

@dsame
Copy link

dsame commented Nov 3, 2023

dalyIsaac added a commit to dalyIsaac/Whim that referenced this issue Nov 3, 2023
Improved `commit` workflow job times from an average of 8m to:

- 6m 30s uncached
- 5m 30s cached

Times were improved by:

- Adding caching
- Installing packages to the `D:\` drive, as described in <actions/setup-dotnet#260 (comment)>
@jetersen
Copy link
Author

jetersen commented Nov 3, 2023

@dalyIsaac interesting approach, does that really save that much 🤔

@dalyIsaac
Copy link

I'm fairly happy with the gains I've seen, but admittedly I didn't conduct a very rigorous study.

sample # jobs mean median sample std dev
Installing on C:\ 16 02:16 02:27 00:30
Caching1 on C:\ 4 01:52 01:42 00:35
Installing on D:\ 12 01:37 01:34 00:24
Caching on D:\ 12 01:07 01:07 00:15

Footnotes

  1. Caching includes the actual caching and running dotnet restore. Cache sizes were about 700MB.

@dsame
Copy link

dsame commented Nov 6, 2023

Hello @jetersen

The quick fix is to set environment variable DOTNET_INSTALL_DIR to the value pointing to some path on the "D:" drive.

akv-demo/dotnet.restore.slow.github.action@45e801a#diff-b803fcb7f17ed9235f1e5cb1fcd2f5d3b2838429d4368ae4c57ce4436577f03fR15

This workaround is proven to solve the problem https://github.com/akv-demo/dotnet.restore.slow.github.action/actions/runs/6768243557/job/18392290993 and can be used until the action fix is available

@jetersen
Copy link
Author

jetersen commented Nov 6, 2023

@dsame perhaps some of these fixes should be raised with @actions/runner-images? I assume we are hitting similar IO restrictions on the Windows images as this affects all windows based hosted runners 🫠

@dsame
Copy link

dsame commented Nov 7, 2023

Hello @jetersen, generally it is a good idea but i doubt any of actions team can solve the problem with the infrastructure and most probably the infrastructure problem is not expected to be solved in the acceptable timeframe.

@PureKrome
Copy link
Contributor

but i doubt any of actions team can solve the problem with the infrastructure

why is this? because these are 2 independent teams within GH and even though the actions team could make some changes based on this thread (which would benefit all users, by default) ... the intra-team still need to also do some changes but you're suggesting that this is a low priority so .. they go 'meh' ?

@jetersen
Copy link
Author

jetersen commented Nov 7, 2023

Created actions/runner-images#8755 in hoping that we can find a generic solution. I was hoping they could simply change the disk setup on the windows packer scripts 🤔

@dsame dsame linked a pull request Nov 9, 2023 that will close this issue
2 tasks
@dsame dsame linked a pull request Nov 9, 2023 that will close this issue
2 tasks
@blackstars701
Copy link

hi

@priyagupta108 priyagupta108 assigned priyagupta108 and unassigned dsame Nov 27, 2024
@priyagupta108
Copy link
Contributor

Hi @jetersen 👋,
As mentioned by the runner-images team in this issue comment, unfortunately, there is no simple fix to align the C: drive's performance with the D: drive due to inherent limitations, and these are not expected to be resolved in the near future.

However, a feasible workaround is to set the DOTNET_INSTALL_DIR environment variable to a path on the D: drive to enhance build performance. You can configure it like this:

env:
  DOTNET_INSTALL_DIR: D:\dotnet

Please feel free to reach out if you have any concerns or need additional assistance.
Thanks!

@Piedone
Copy link

Piedone commented Dec 4, 2024

Thank you! I tested this. Note that this is only applicable to standard (4-core for public, 2-core for private repos) runners, not to larger ones, as those don't have a D drive.

Results are here: It seems to make things slower. What do you think?

@priyagupta108
Copy link
Contributor

Hi @Piedone,
Thank you for sharing your observations. Could you share the reproduction link or the specific runs used for the performance comparison of the NuGetTest and root workflows? This information would help us better understand the performance results.

@Piedone
Copy link

Piedone commented Dec 9, 2024

Sure, thank you! In my test, we're building the solution file in the root of our OSOCE project, as well as the solution in the NuGetTest folder. Runs:

You can disregard the failing builds; some parts of the workflow are flaky.

The following steps of the executed jobs are directly using the .NET SDK:

  • Set up .NET: runs setup-dotnet plus some configuration.
  • Build and Static Code Analysis: runs dotnet build and related things.
  • Tests: runs dotnet test for unit and UI tests.

@jetersen
Copy link
Author

jetersen commented Dec 9, 2024

@priyagupta108 I tested with my repro:
image

There is a definite improvement. But still setup/dotnet install is still incredibly slow when comparing it to Linux or Mac.
So I still think there are improvements to be had in the usage of dotnet-install script.
The difference is Ubuntu 6 seconds vs Windows 16 seconds for the actions/setup-dotnet@v4 run step.
Which shows that install script is doing something it shouldn't.

The .sh relies on curl or wget. Why couldn't Windows rely on curl? GitHub actions runner have curl installed and in my experience avoids any overheads that Powershell or Invoke-WebRequest has when it comes to downloading files.

@priyagupta108
Copy link
Contributor

Hi @Piedone,
Thank you for sharing the information. Changing the DOTNET_INSTALL_DIR to the D drive has indeed improved the performance for the setup-dotnet step. However, the performance of dotnet build and dotnet test commands may not be significantly impacted by the installation directory of the .NET SDK alone.
To potentially enhance performance further, consider setting the NUGET_PACKAGES environment variable to a directory on the D drive, which may allow faster access to cached packages.

Hi @jetersen,
Thank you for your insights! We will investigate this further and see if we can implement these changes to enhance performance. We will consider this as a potential feature request for future improvements. In the meantime, please feel free to provide any additional details or suggestions you may have.

@priyagupta108 priyagupta108 added feature request New feature or request to improve the current logic and removed bug Something isn't working labels Dec 11, 2024
@priyagupta108 priyagupta108 removed their assignment Dec 11, 2024
@Piedone
Copy link

Piedone commented Dec 11, 2024

Thanks for the tip! This was very useful, since setting NUGET_PACKAGES reduced the workflow runtimes by 14-25%, see Lombiq/GitHub-Actions#402 (comment). I also did a test in a private repo with a .NET Framework build, and got a ~6% reduction there.

0xced added a commit to serilog-contrib/serilog-formatting-log4net that referenced this issue Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request to improve the current logic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants