-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle upgrade with ephemeral #1396
Comments
😢 @thboop I guess we have another condition about deleting the config file. 😢 |
@pje FYI... |
I'm affected by this issue as well. |
I think this is actually fixed by #1384 Before the fix, we blindly deleted the runner settings file when an ephemeral runner exit. Since the setting files get deleted accidentally, the newer version runner can't start back again... 😢 |
@TingluoHuang #1384 seems relevant, but I think the crux of this problem is that the auto update procedure creates a new process before the old one waits around for a few seconds and exits. The old process does not appear to check for success on upgrade, it just waits a fixed time and exits. In a containerized runner, when this happens all processes in the container are killed, whether or not the upgrade actually had time to complete. I see this routinely. This changing pid throughout the upgrade procedure doesn’t play well with containerized runners that don’t also embed their own service manager (systemd). This is why I question whether ephemeral runners should be subject to auto upgrade at all. People are expecting the —ephemeral option to finally bring with it basic support for orchestrating containerized runners and I’m not sure it does just yet. The design of the upgrade process seems to be a blocker. personally, I’d be fine with a —autoupdate=false option being available either with or without —ephemeral. |
The auto-upgrade impacts the time for a runner to be online and ready to run the job depending on how long it takes for the download and install of the new runner. |
I’m also interested in the functionality to prevent auto-upgrade. |
We are affected too. We are running runners in an ECS cluster and disable the auto-update would be very handy. |
When using infrastructure as code, containers and recipes, pinning versions for reproducibility is a good practice. Usually docker container images contain static tools and binaries and the docker image is tagged with a specific version. Constructing a docker image of a runner with a specific runner version and then self-updating itself doesn't seem that natural, instead the docker image should use whatever that binary was built/tagged to. Additionally to this - this concept doesn't play well when using ephemeral runners and kuberentes. First of all, we need to pay the price of downloading/self-updating every single ephemeral pod for every single job which causes delays in execution. Secondly this doesn't work well and containers may get stuck Related issues that will be solved with this: - actions#1396 - actions#246 - actions#485 - actions#422 - actions#442
When using infrastructure as code, containers and recipes, pinning versions for reproducibility is a good practice. Usually docker container images contain static tools and binaries and the docker image is tagged with a specific version. Constructing a docker image of a runner with a specific runner version and then self-updating itself doesn't seem that natural, instead the docker image should use whatever that binary was built/tagged to. Additionally to this - this concept doesn't play well when using ephemeral runners and kuberentes. First of all, we need to pay the price of downloading/self-updating every single ephemeral pod for every single job which causes delays in execution. Secondly this doesn't work well and containers may get stuck Related issues that will be solved with this: - actions#1396 - actions#246 - actions#485 - actions#422 - actions#442
This also affects my team. We deploy the runners using idempotent Docker containers running on Nomad. We utilize a system job to ensure we have 1 runner executing per node along with the ephemeral/run-once runner option. When a job completes, the Nomad orchestrator handles the clean up and starts another fresh runner. Automatic upgrades conflict with our orchestration strategy. We end up in a loop where runners terminate after upgrading, which terminates the job, which leads to a new container launch, which starts the whole loop again. Disabling automatic upgrades would be a welcome improvement. |
We're adding an option to allow self-hosted ephemeral runners to opt-out of automatic updates so that you can manage updates yourself. Some background: we consider the runner software and the hosted Actions software as a cohesive whole. Many times when we add a new feature to GitHub Actions, these changes need to be made both on the hosted service and in the runner - for example, when we added conditional steps to composite actions. This is why we've always required runner updates, so that we can be sure that the runner is compatible with the service version. Obviously this is a painful requirement for many ephemeral users. So we'll add an opt-out mechanism for ephemeral, where the runner will not try to do a self-update. This flag will allow you to control when you update your runners. Because the runner versions are so tightly coupled to the overall service, you'll be required to update within a month of a new runner version being released. After a month, your runners will no longer be able to connect to GitHub, so you will need to perform updates regularly. Immediately upon a new release, the runner will begin notifying you when an update is available on stdout and stderr. We'll also start adding annotations to workflow runs on outdated runners. This is in development now and we plan to have it generally available in the new year. |
Thanks @ethomson - how will this "update within a month" limit work with Github Enterprise Server installations, where the server (where I presume the Actions server-side code resides) may not be on the bleeding edge upgrading to your latest releases all the time? Will you document a minimum runner version in each GHE release notes? |
@tyrken We will, yes. We're still working on the details here but we'll have guidance - and I'm paraphrasing - "update your runner fleet first to version ". In a sense this is much easier on GHES since you control the upgrade of both pieces. |
I think that the grace period should work if we can then point our build scripts to the latest version (rather than specific named) and then automate a rebuild when the version changes. Would be nice if we could expose the check the runner does as an action so we can easily do the check e.g. create this release structure as other apps do : curl -O -L https://github.com/actions/runner/releases/download/latest/actions-runner-linux-x64.tar.gz This would allow us to automate builds without having to override build-args and without having to interrogate the release tag |
@tonywildey-valstro agreed. In some discussions we asserted the runner build process should not only build a tar gz but also publish a container. That would allow you to conume one caveat to something like that. you would not only want to use the ephemeral runner concept but would also likely want to use the yaml job containers feature. That means the runner container is orthogonal to your build / tools / app container with your stuff in it and they can move independently (one reason why we created that yaml feature). If you couple the two then you have to build every time we build (within 30) |
@bryanmacfarlane thanks for the response makes perfect sense |
If you work on an option to disable the auto upgrade, please add an option to do an upgrade if needed ! ( It will be much simpler for me to have a script like: ./run.sh --upgrade-only
./config.sh .....
./run.sh edit: or a name like |
Can we add a feature that we can specify auto update time? Example - We can add a corn expression to run the auto update every Sunday Night PST or some Global Time. This way we can still make our hosted runner up to date and also do not disturb runs during business hour. |
We've shipped the ability to disable auto upgrade, please see the changelog for more information |
To support ephemeral runners as docker containers, we created an init script which runs the following:
We've noticed that if the actions runner version is old, the runner will self update then exit without actually running a job or de-registering. When there is no upgrade the runner works correctly.
We are using the latest ubuntu docker image as a base.
Here is the log from the container:
Is there a way to either skip the upgrade or have the runner process a job?
The text was updated successfully, but these errors were encountered: