Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action has become very slow #543

Closed
2 tasks done
ledermann opened this issue Nov 1, 2023 · 16 comments
Closed
2 tasks done

Action has become very slow #543

ledermann opened this issue Nov 1, 2023 · 16 comments

Comments

@ledermann
Copy link

Ensure the following before filing this issue

  • I verified it reproduces with the latest version with - uses: ruby/setup-ruby@v1 (see Versioning policy)

  • I tried to reproduce the issue locally by following the workflow steps (including all commands done by ruby/setup-ruby, except for Downloading Ruby & Extracting Ruby),
    and it did not reproduce locally (if it does reproduce locally, it's not a ruby/setup-ruby issue)

Are you running on a GitHub-hosted runner or a self-hosted runner?

GitHub-hosted runner

Link to the failed workflow job (must be a public workflow job, so the necessary information is available)

https://github.com/templatus/templatus-hotwire/actions/runs/6713970783/job/18246455669

Any other notes?

The action has become slow lately, the setting up Ruby went from around 10 seconds to over 2 minutes.

Furthermore, there is a strange difference between the duration measured by GitHub and the time indicated by the action itself. See this example: The job itself says "9.52 seconds", but GitHub says "2m 21s"

Screenshot

The code I use is very simple:

- name: Set up Ruby
  uses: ruby/setup-ruby@v1
  with:
    bundler-cache: true
@MSP-Greg
Copy link
Collaborator

MSP-Greg commented Nov 1, 2023

I've noticed this also. If you turn on the timestamps or download the log, the following is shown:

2023-11-01T00:23:21.0213624Z Cache key: setup-ruby-bundler-cache-v6-ubuntu-22.04-x64-ruby-3.2.2-wd-/home/runner/work/templatus-hotwire/templatus-hotwire-with--without--only--Gemfile.lock-cf68edb4543c7c3c8470ee8a3b747071dac9a7f804eae170528713c041520cc8
2023-11-01T00:23:21.6767350Z Cache Size: ~54 MB (56493164 B)
2023-11-01T00:23:21.6876926Z [command]/usr/bin/tar -xf /home/runner/work/_temp/439c9c90-f5cc-42d3-9750-233b8e5abcfc/cache.tzst -P -C /home/runner/work/templatus-hotwire/templatus-hotwire --use-compress-program unzstd
2023-11-01T00:23:22.1771109Z Received 56493164 of 56493164 (100.0%), 53.8 MBs/sec
2023-11-01T00:23:22.2658245Z Cache restored successfully
2023-11-01T00:23:22.2744930Z Found cache for key: setup-ruby-bundler-cache-v6-ubuntu-22.04-x64-ruby-3.2.2-wd-/home/runner/work/templatus-hotwire/templatus-hotwire-with--without--only--Gemfile.lock-21ff1081c32103b8892cccf02164ddbb1185541cdcb08c073975b58f69c420dc
2023-11-01T00:23:22.2754773Z [command]/opt/hostedtoolcache/Ruby/3.2.2/x64/bin/bundle install --jobs 4
2023-11-01T00:23:25.6540433Z Fetching gem metadata from https://rubygems.org/.........
2023-11-01T00:23:25.9126946Z Fetching sidekiq 7.2.0
2023-11-01T00:23:26.0051303Z Installing sidekiq 7.2.0
2023-11-01T00:23:26.0760837Z Bundle complete! 35 Gemfile dependencies, 155 gems now installed.
2023-11-01T00:23:26.0763313Z Bundled gems are installed into `./vendor/bundle`
2023-11-01T00:23:26.1237137Z [command]/opt/hostedtoolcache/Ruby/3.2.2/x64/bin/bundle clean
2023-11-01T00:23:26.5386807Z Removing sidekiq (7.1.6)
2023-11-01T00:23:26.5480805Z Saving cache
2023-11-01T00:23:26.5691124Z [command]/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/templatus-hotwire/templatus-hotwire --files-from manifest.txt --use-compress-program zstdmt
2023-11-01T00:23:29.9885417Z Cache Size: ~54 MB (56581855 B)
2023-11-01T00:23:30.0962980Z Cache saved successfully
2023-11-01T00:23:30.1085596Z Took   9.52 seconds
2023-11-01T00:25:40.4096614Z ##[group]Run actions/[email protected]
2023-11-01T00:23:21.0213624Z Cache key: setup-ruby-bundler-cache-v6-ubuntu-22.04-x64-ruby-3.2.2-wd-/home/runner/work/templatus-hotwire/templatus-hotwire-with--without--only--Gemfile.lock-cf68edb4543c7c3c8470ee8a3b747071dac9a7f804eae170528713c041520cc8
2023-11-01T00:23:21.6767350Z Cache Size: ~54 MB (56493164 B)
2023-11-01T00:23:21.6876926Z [command]/usr/bin/tar -xf /home/runner/work/_temp/439c9c90-f5cc-42d3-9750-233b8e5abcfc/cache.tzst -P -C /home/runner/work/templatus-hotwire/templatus-hotwire --use-compress-program unzstd
2023-11-01T00:23:22.1771109Z Received 56493164 of 56493164 (100.0%), 53.8 MBs/sec
2023-11-01T00:23:22.2658245Z Cache restored successfully
2023-11-01T00:23:22.2744930Z Found cache for key: setup-ruby-bundler-cache-v6-ubuntu-22.04-x64-ruby-3.2.2-wd-/home/runner/work/templatus-hotwire/templatus-hotwire-with--without--only--Gemfile.lock-21ff1081c32103b8892cccf02164ddbb1185541cdcb08c073975b58f69c420dc
2023-11-01T00:23:22.2754773Z [command]/opt/hostedtoolcache/Ruby/3.2.2/x64/bin/bundle install --jobs 4
2023-11-01T00:23:25.6540433Z Fetching gem metadata from https://rubygems.org/.........
2023-11-01T00:23:25.9126946Z Fetching sidekiq 7.2.0
2023-11-01T00:23:26.0051303Z Installing sidekiq 7.2.0
2023-11-01T00:23:26.0760837Z Bundle complete! 35 Gemfile dependencies, 155 gems now installed.
2023-11-01T00:23:26.0763313Z Bundled gems are installed into `./vendor/bundle`
2023-11-01T00:23:26.1237137Z [command]/opt/hostedtoolcache/Ruby/3.2.2/x64/bin/bundle clean
2023-11-01T00:23:26.5386807Z Removing sidekiq (7.1.6)
2023-11-01T00:23:26.5480805Z Saving cache
2023-11-01T00:23:26.5691124Z [command]/usr/bin/tar --posix -cf cache.tzst --exclude cache.tzst -P -C /home/runner/work/templatus-hotwire/templatus-hotwire --files-from manifest.txt --use-compress-program zstdmt
2023-11-01T00:23:29.9885417Z Cache Size: ~54 MB (56581855 B)
2023-11-01T00:23:30.0962980Z Cache saved successfully
2023-11-01T00:23:30.1085596Z Took   9.52 seconds
2023-11-01T00:25:40.4096614Z ##[group]Run actions/[email protected]

So, the slowdown (2:10) is between the last log for setup-ruby and the first log of the next step in your Actions job, as the last two lines of the above log show. What's taking up the 2:10 is something I haven't yet looked at.

But, some of the logs here and elsewhere do not consistently show the delays...

@dentarg
Copy link

dentarg commented Nov 1, 2023

The job itself says "9.52 seconds", but GitHub says "2m 21s"

the logs here and elsewhere do not consistently show the delays

Sounds to me that the slowness/delays is outside of the control of this action?

I can see it myself too, 5 days ago it went from 30s to 2min+ https://github.com/Starkast/wikimum/actions/workflows/ci.yml, but it is not consistent, there was one quick job after a slow job, both jobs used 54a18e2 – this too indicates it is not something in this action that causes the slowness.

Let's ping the GitHub people that helped in #494, @Steve-Glass @chkimes, maybe they can look into this.

@chkimes
Copy link

chkimes commented Nov 1, 2023

For those who are experiencing this, does it always appear to happen after restoring from the cache? I reviewed the timestamps, and it's interesting that there is a significant gap between the last output of one step and the first output of the next.

@chkimes
Copy link

chkimes commented Nov 1, 2023

Through sheer luck, I think you have pinged one of the few people who might understand what is going on. I was poking at the cache action a few months ago to fix the cache behavior when the connection to Azure Blob gets interrupted. One of the things I discovered while testing that is that the client code was leaking promises (I was not able to track this down).

I ended up working around that in the cache action itself here: https://github.com/actions/cache/blob/704facf57e6136b1bc63b828d79edcd491f0ee84/src/restoreImpl.ts#L102

setup-ruby however, is referencing the @actions/cache package directly so it's not using the early exit code: https://github.com/ruby/setup-ruby/blob/master/bundler.js#L5C7-L5C12

The root fix is probably to address this as a bugfix in the cache library. A quick mitigation could be to add a process.exit call here: https://github.com/ruby/setup-ruby/blob/master/index.js#L32

@MSP-Greg
Copy link
Collaborator

MSP-Greg commented Nov 1, 2023

@chkimes

does it always appear to happen after restoring from the cache?

From a quick look at several jobs, it appears that the issue is saving the cache, as all the jobs I looked at took the normal amount of time, but also had exact cache key hits on the restore, so they weren't saving the cache. HTH...

@chkimes
Copy link

chkimes commented Nov 1, 2023

Ah interesting. I don't know of anything specific that would cause a difference in behavior for save vs. restore - the issue could also exist in either code path since a leaked promise will block the runtime from executing regardless of where it's leaked from. But if all instances are showing a cache save, then that's a strong sign.

Are there any examples of a save step that don't experience the delay? That is, do we have a 100% repro during cache saves?

@dentarg
Copy link

dentarg commented Nov 1, 2023

Are there any examples of a save step that don't experience the delay?

I think this is one https://github.com/Starkast/wikimum/actions/runs/6638880163/job/18036138791#step:4:124

@ledermann
Copy link
Author

I think this is one https://github.com/Starkast/wikimum/actions/runs/6638880163/job/18036138791#step:4:124

This example is from Oct 25, so it was run before setup-ruby was updated to Node 20 on Oct 26 in v1.159.0

I see the delay only if the following two conditions are met:

  • v1.159.0 is used (Node 20)
  • Cache is saved

I made this test:

  1. Clear cache (using gh cache delete --all)
  2. Run last workflow again → Cache is saved → 2-min delay
  3. Run last workflow again → Cache is not saved, no delay
  4. Revert to v1.158.0, new workflow → Cache is not saved, no delay
  5. Clear cache again
  6. Run last workflow again → Cache is saved, no delay

By the way, there is a similar issue in setup-node after upgrading to Node 20, which also occurs only after saving the cache:
actions/setup-node#878

@ledermann
Copy link
Author

Based on my observations above, I created a repo with a minimal reproducible example:
https://github.com/ledermann/setup-ruby-delay-demo/actions

There you can see that saving the cache is slow with v1.159.0, but fast with v1.158.0.

@chkimes
Copy link

chkimes commented Nov 3, 2023

Thanks! This is great investigation - let me get the toolkit team involved to look at the cache package and how its behavior might have changed in Node 20.

@Bo98
Copy link

Bo98 commented Nov 5, 2023

This is likely due to nodejs/node#47228. The reason it regressed in Node 20 compared to Node 16 is because Node 19 changed the default of keepAlive from false to true.

The HTTP client code is wrapped in the @actions/http-client package, which controls its own default of keepAlive. However it doesn't override Node's default correctly, and there's already an open PR to fix this: actions/toolkit#1572. This will restore the Node 16 behaviour for anything using @actions/http-client.

I've tested that PR (see Bo98/toolkit-test@2fc3770) and can confirm it fixes the slowness: https://github.com/Bo98/toolkit-test/actions/runs/6758550452/job/18370288942 (compared to without that PR: https://github.com/Bo98/toolkit-test/actions/runs/6758456188/job/18370108410).

One of the things I discovered while testing that is that the client code was leaking promises (I was not able to track this down).

I ended up working around that in the cache action itself here: https://github.com/actions/cache/blob/704facf57e6136b1bc63b828d79edcd491f0ee84/src/restoreImpl.ts#L102

The reason you likely saw this is because the restore side of @actions/cache does force keepAlive on itself: https://github.com/actions/toolkit/blob/fe3e7ce9a7f995d29d1fcfd226a32bca407f9dc8/packages/cache/src/internal/downloadUtils.ts#L220, so that explains seeing the issue under Node 16 in that particular scenario.

julik added a commit to cheddar-me/pecorino that referenced this issue Nov 7, 2023
@eregon
Copy link
Member

eregon commented Nov 7, 2023

The root fix is probably to address this as a bugfix in the cache library. A quick mitigation could be to add a process.exit call here: https://github.com/ruby/setup-ruby/blob/master/index.js#L32

Would this solve this keepAlive connections hanging? If so that sounds an easy/safe enough fix to merge.

@eregon
Copy link
Member

eregon commented Nov 7, 2023

Yes, that helps, thank you all for the investigation!
Unfortunate that the Node 20 update brought this bug in.

PR in #546

Slow build in https://github.com/ruby/setup-ruby/actions/runs/6788405695/job/18453344720 (2min 17s)

2023-11-07T17:54:28.4897445Z Took   3.80 seconds
2023-11-07T17:56:38.5288805Z ##[group]Run ruby -v

Fast build in https://github.com/ruby/setup-ruby/actions/runs/6788443622/job/18453477353?pr=546 (5s, as advertised :) )

2023-11-07T17:58:41.1757838Z Took   2.20 seconds
2023-11-07T17:58:41.1894986Z ##[group]Run ruby -v

@eregon
Copy link
Member

eregon commented Nov 7, 2023

@MSP-Greg You probably want to explicitly process.exit() too in https://github.com/ruby/setup-ruby-pkgs so this performance bug is fixed too for setup-ruby-pkgs.
(and same for potentially other actions reusing this action's source code)

@eregon
Copy link
Member

eregon commented Nov 7, 2023

One left unknown is builds on #540 seem fast. Anyway, since we worked around the issue it's all fine now.

@dentarg
Copy link

dentarg commented Nov 7, 2023

One left unknown is builds on #540 seem fast

No Saving cache happening?

kamatsuoka added a commit to kamatsuoka/setup-pdm that referenced this issue Mar 20, 2024
As with other actions like setup-node, I'm seeing 2-4 minute delays in post cache actions lately. Apparently this is because of a change in node behavior: ruby/setup-ruby#543 (comment)

The fix, as with other actions, is to explicitly exit so as not to wait for hanging promises.
frostming pushed a commit to pdm-project/setup-pdm that referenced this issue Mar 22, 2024
* Explicitly exit the process to not wait for hanging promises

As with other actions like setup-node, I'm seeing 2-4 minute delays in post cache actions lately. Apparently this is because of a change in node behavior: ruby/setup-ruby#543 (comment)

The fix, as with other actions, is to explicitly exit so as not to wait for hanging promises.

* transpiled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants