How to handle upgrade with ephemeral #1396

MichaelJJ · 2021-10-01T21:03:32Z

To support ephemeral runners as docker containers, we created an init script which runs the following:

/config.sh <arguments>
/run.sh

We've noticed that if the actions runner version is old, the runner will self update then exit without actually running a job or de-registering. When there is no upgrade the runner works correctly.

We are using the latest ubuntu docker image as a base.

Here is the log from the container:

# Authentication
√ Connected to GitHub
# Runner Registration
√ Runner successfully added
√ Runner connection is good
# Runner settings
√ Settings Saved.
√ Connected to GitHub
2021-10-01 20:50:54Z: Listening for Jobs
Runner update in progress, do not shutdown runner.
Downloading 2.283.2 runner
Waiting for current job finish running.
Generate and execute update script.
Runner will exit shortly for update, should be back online within 10 seconds.
√ Removed .credentials
√ Removed .runner

Is there a way to either skip the upgrade or have the runner process a job?

The text was updated successfully, but these errors were encountered:

TingluoHuang · 2021-10-01T21:06:21Z

😢 @thboop I guess we have another condition about deleting the config file. 😢

TingluoHuang · 2021-10-01T21:06:43Z

@pje FYI...

cgeers · 2021-10-01T22:52:25Z

I'm affected by this issue as well.

TingluoHuang · 2021-10-02T03:33:04Z

I think this is actually fixed by #1384

Before the fix, we blindly deleted the runner settings file when an ephemeral runner exit.
During auto-update, the old version ephemeral runner supports to exit, then start back with a newer version then pick up the job.

Since the setting files get deleted accidentally, the newer version runner can't start back again... 😢

cgeers · 2021-10-02T04:32:52Z

@TingluoHuang #1384 seems relevant, but I think the crux of this problem is that the auto update procedure creates a new process before the old one waits around for a few seconds and exits. The old process does not appear to check for success on upgrade, it just waits a fixed time and exits. In a containerized runner, when this happens all processes in the container are killed, whether or not the upgrade actually had time to complete. I see this routinely.

This changing pid throughout the upgrade procedure doesn’t play well with containerized runners that don’t also embed their own service manager (systemd). This is why I question whether ephemeral runners should be subject to auto upgrade at all.

People are expecting the —ephemeral option to finally bring with it basic support for orchestrating containerized runners and I’m not sure it does just yet. The design of the upgrade process seems to be a blocker.

personally, I’d be fine with a —autoupdate=false option being available either with or without —ephemeral.

MichaelJJ · 2021-10-02T15:11:18Z

The auto-upgrade impacts the time for a runner to be online and ready to run the job depending on how long it takes for the download and install of the new runner.

ViacheslavKudinov · 2021-10-04T18:44:21Z

I’m also interested in the functionality to prevent auto-upgrade.
We like some other people were affected by this release, and we believe that more control under upgrade process is beneficial for enterprise organizations where it is pretty critical to not be able to start new self hosted runners in ephemeral mode.

giorgiocerruti · 2021-10-05T09:04:19Z

We are affected too. We are running runners in an ECS cluster and disable the auto-update would be very handy.

When using infrastructure as code, containers and recipes, pinning versions for reproducibility is a good practice. Usually docker container images contain static tools and binaries and the docker image is tagged with a specific version. Constructing a docker image of a runner with a specific runner version and then self-updating itself doesn't seem that natural, instead the docker image should use whatever that binary was built/tagged to. Additionally to this - this concept doesn't play well when using ephemeral runners and kuberentes. First of all, we need to pay the price of downloading/self-updating every single ephemeral pod for every single job which causes delays in execution. Secondly this doesn't work well and containers may get stuck Related issues that will be solved with this: - actions#1396 - actions#246 - actions#485 - actions#422 - actions#442

jimrazmus · 2021-11-03T21:44:37Z

This also affects my team. We deploy the runners using idempotent Docker containers running on Nomad. We utilize a system job to ensure we have 1 runner executing per node along with the ephemeral/run-once runner option. When a job completes, the Nomad orchestrator handles the clean up and starts another fresh runner. Automatic upgrades conflict with our orchestration strategy. We end up in a loop where runners terminate after upgrading, which terminates the job, which leads to a new container launch, which starts the whole loop again.

Disabling automatic upgrades would be a welcome improvement.

ethomson · 2021-12-01T15:59:13Z

We're adding an option to allow self-hosted ephemeral runners to opt-out of automatic updates so that you can manage updates yourself.

Some background: we consider the runner software and the hosted Actions software as a cohesive whole. Many times when we add a new feature to GitHub Actions, these changes need to be made both on the hosted service and in the runner - for example, when we added conditional steps to composite actions. This is why we've always required runner updates, so that we can be sure that the runner is compatible with the service version.

Obviously this is a painful requirement for many ephemeral users. So we'll add an opt-out mechanism for ephemeral, where the runner will not try to do a self-update. This flag will allow you to control when you update your runners.

Because the runner versions are so tightly coupled to the overall service, you'll be required to update within a month of a new runner version being released. After a month, your runners will no longer be able to connect to GitHub, so you will need to perform updates regularly. Immediately upon a new release, the runner will begin notifying you when an update is available on stdout and stderr. We'll also start adding annotations to workflow runs on outdated runners.

This is in development now and we plan to have it generally available in the new year.

tyrken · 2021-12-01T18:45:08Z

Thanks @ethomson - how will this "update within a month" limit work with Github Enterprise Server installations, where the server (where I presume the Actions server-side code resides) may not be on the bleeding edge upgrading to your latest releases all the time? Will you document a minimum runner version in each GHE release notes?

ethomson · 2021-12-01T22:09:58Z

Thanks @ethomson - how will this "update within a month" limit work with Github Enterprise Server installations, where the server (where I presume the Actions server-side code resides) may not be on the bleeding edge upgrading to your latest releases all the time? Will you document a minimum runner version in each GHE release notes?

@tyrken We will, yes. We're still working on the details here but we'll have guidance - and I'm paraphrasing - "update your runner fleet first to version ". In a sense this is much easier on GHES since you control the upgrade of both pieces.

tonywildey-valstro · 2021-12-03T14:41:02Z

I think that the grace period should work if we can then point our build scripts to the latest version (rather than specific named) and then automate a rebuild when the version changes. Would be nice if we could expose the check the runner does as an action so we can easily do the check

e.g. create this release structure as other apps do :

curl -O -L https://github.com/actions/runner/releases/download/latest/actions-runner-linux-x64.tar.gz
vs
curl -O -L https://github.com/actions/runner/releases/download/v2.285.0/actions-runner-linux-x64-2.285.0.tar.gz

This would allow us to automate builds without having to override build-args and without having to interrogate the release tag

bryanmacfarlane · 2021-12-03T15:22:35Z

@tonywildey-valstro agreed. In some discussions we asserted the runner build process should not only build a tar gz but also publish a container. That would allow you to conume latest or maybe we could even move a stable label that's guarenteed to be < 30 but not quit on the bleeding edge (a publish from an hour ago).

one caveat to something like that. you would not only want to use the ephemeral runner concept but would also likely want to use the yaml job containers feature. That means the runner container is orthogonal to your build / tools / app container with your stuff in it and they can move independently (one reason why we created that yaml feature). If you couple the two then you have to build every time we build (within 30)

tonywildey-valstro · 2021-12-03T16:29:04Z

@bryanmacfarlane thanks for the response makes perfect sense
Will take a look at the yaml job containers as this is exactly the model we want.

GregoireW · 2021-12-09T11:28:52Z

@ethomson

If you work on an option to disable the auto upgrade, please add an option to do an upgrade if needed ! ( --upgrade-only for instance )

It will be much simpler for me to have a script like:

./run.sh --upgrade-only
./config.sh .....
./run.sh

edit: or a name like --check-upgrade but the idea is to get the recommended version ( may be different than the latest version)

bsc-dev-ops · 2022-01-18T16:23:15Z

Can we add a feature that we can specify auto update time? Example - We can add a corn expression to run the auto update every Sunday Night PST or some Global Time. This way we can still make our hosted runner up to date and also do not disturb runs during business hour.

thboop · 2022-02-01T18:50:52Z

We've shipped the ability to disable auto upgrade, please see the changelog for more information

MichaelJJ added the bug Something isn't working label Oct 1, 2021

actions deleted a comment Oct 4, 2021

fgalind1 mentioned this issue Oct 28, 2021

Allow to optionally skip self-update #1444

Closed

toast-gear mentioned this issue Dec 2, 2021

Enable Fetching and Installing Latest Runner Software On Container Start actions/actions-runner-controller#888

Closed

jeremyd2019 mentioned this issue Dec 2, 2021

Add initial support for Windows arm64 #785

Closed

ethomson mentioned this issue Dec 7, 2021

Disable runner auto update #485

Closed

ethomson mentioned this issue Dec 15, 2021

Runner config option to disable auto-update. #1558

Merged

toast-gear mentioned this issue Dec 27, 2021

Ephemeral runner actions/actions-runner-controller#831

Closed

toast-gear mentioned this issue Jan 6, 2022

Including maven in Builder Image actions/actions-runner-controller#1036

Closed

ravwojdyla mentioned this issue Jan 31, 2022

Decide what to do about auto-updates related-sciences/gce-github-runner#17

Closed

thboop closed this as completed Feb 1, 2022

actions deleted a comment Feb 2, 2022

dvviktordelev mentioned this issue May 8, 2024

How to handle runner upgrade with ephemeral datavisyn/github-workflows#65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle upgrade with ephemeral #1396

How to handle upgrade with ephemeral #1396

MichaelJJ commented Oct 1, 2021

TingluoHuang commented Oct 1, 2021

TingluoHuang commented Oct 1, 2021

cgeers commented Oct 1, 2021

TingluoHuang commented Oct 2, 2021

cgeers commented Oct 2, 2021 •

edited

Loading

MichaelJJ commented Oct 2, 2021 •

edited

Loading

ViacheslavKudinov commented Oct 4, 2021

giorgiocerruti commented Oct 5, 2021

jimrazmus commented Nov 3, 2021

ethomson commented Dec 1, 2021

tyrken commented Dec 1, 2021 •

edited

Loading

ethomson commented Dec 1, 2021

tonywildey-valstro commented Dec 3, 2021 •

edited

Loading

bryanmacfarlane commented Dec 3, 2021

tonywildey-valstro commented Dec 3, 2021

GregoireW commented Dec 9, 2021 •

edited

Loading

bsc-dev-ops commented Jan 18, 2022

thboop commented Feb 1, 2022

How to handle upgrade with ephemeral #1396

How to handle upgrade with ephemeral #1396

Comments

MichaelJJ commented Oct 1, 2021

TingluoHuang commented Oct 1, 2021

TingluoHuang commented Oct 1, 2021

cgeers commented Oct 1, 2021

TingluoHuang commented Oct 2, 2021

cgeers commented Oct 2, 2021 • edited Loading

MichaelJJ commented Oct 2, 2021 • edited Loading

ViacheslavKudinov commented Oct 4, 2021

giorgiocerruti commented Oct 5, 2021

jimrazmus commented Nov 3, 2021

ethomson commented Dec 1, 2021

tyrken commented Dec 1, 2021 • edited Loading

ethomson commented Dec 1, 2021

tonywildey-valstro commented Dec 3, 2021 • edited Loading

bryanmacfarlane commented Dec 3, 2021

tonywildey-valstro commented Dec 3, 2021

GregoireW commented Dec 9, 2021 • edited Loading

bsc-dev-ops commented Jan 18, 2022

thboop commented Feb 1, 2022

cgeers commented Oct 2, 2021 •

edited

Loading

MichaelJJ commented Oct 2, 2021 •

edited

Loading

tyrken commented Dec 1, 2021 •

edited

Loading

tonywildey-valstro commented Dec 3, 2021 •

edited

Loading

GregoireW commented Dec 9, 2021 •

edited

Loading