Skip to content

Protect CI cache save against cancellation#168310

Merged
edenhaus merged 2 commits into
home-assistant:devfrom
cdce8p:ci-cache-prevent-cancellation
Apr 29, 2026
Merged

Protect CI cache save against cancellation#168310
edenhaus merged 2 commits into
home-assistant:devfrom
cdce8p:ci-cache-prevent-cancellation

Conversation

@cdce8p
Copy link
Copy Markdown
Member

@cdce8p cdce8p commented Apr 15, 2026

Proposed change

If a workflow run is cancelled while a cache is being saved, it can happen that the cache key is marked as reserved while the actual cache upload fails. When the next run again tries to upload the cache, it silently fails with "Unable to reserve cache ... another job may be creating this cache.". Any subsequent jobs will then fail to restore the cache and the workflow fails.

E.g. this happened yesterday:

This PR separates the cache restore and save steps and adds always() conditionals to the cache save steps. This will cause them to run even if the workflow itself is being cancelled. In the case of real errors, the run would force exit 5min after that, though this is unlikely to be necessary.

This could increase the time until the last job is cancelled by <1min, though the conditions are designed to only trigger if the venv was fully created in the first place. The delay is recouped on the subsequent workflow run as it doesn't need to regenerate the venv.

https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-cancellation
https://docs.github.com/en/actions/reference/workflows-and-actions/contexts#steps-context

Type of change

  • Dependency upgrade
  • Bugfix (non-breaking change which fixes an issue)
  • New integration (thank you!)
  • New feature (which adds functionality to an existing integration)
  • Deprecation (breaking change to happen in the future)
  • Breaking change (fix/feature causing existing functionality to break)
  • Code quality improvements to existing code or addition of tests

Additional information

  • This PR fixes or closes issue: fixes #
  • This PR is related to issue:
  • Link to documentation pull request:
  • Link to developer documentation pull request:
  • Link to frontend pull request:

Checklist

  • I understand the code I am submitting and can explain how it works.
  • The code change is tested and works locally.
  • Local tests pass. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist
  • I have followed the perfect PR recommendations
  • The code has been formatted using Ruff (ruff format homeassistant tests)
  • Tests have been added to verify that the new code works.
  • Any generated code has been carefully reviewed for correctness and compliance with project standards.

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly.
    Updated and included derived files by running: python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt.
    Updated by running python3 -m script.gen_requirements_all.
  • For the updated dependencies a diff between library versions and ideally a link to the changelog/release notes is added to the PR description.

To help with the load of incoming pull requests:

@cdce8p cdce8p requested a review from a team as a code owner April 15, 2026 19:08
Copilot AI review requested due to automatic review settings April 15, 2026 19:08
@home-assistant home-assistant Bot added cla-signed code-quality small-pr PRs with less than 30 lines. labels Apr 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the CI workflow caching logic to ensure cache save operations still complete when a workflow run is cancelled, preventing “reserved but not uploaded” caches from breaking subsequent runs.

Changes:

  • Split combined actions/cache usage into explicit actions/cache/restore and actions/cache/save steps for the base venv and uv wheel cache.
  • Add always()-guarded cache save steps (with success gating where appropriate) so cache uploads can complete even during cancellation.
  • Add step ids needed for outcome-based conditions (e.g., install-os-deps, create-venv, cache-uv).

Comment thread .github/workflows/ci.yaml
Comment on lines +480 to +491
- name: Save uv wheel cache
if: |
(success() && steps.cache-venv.outputs.cache-hit != 'true')
|| (always()
&& steps.create-venv.outcome == 'success'
&& steps.cache-uv.outputs.cache-matched-key == '')
uses: actions/cache/save@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5.0.4
with:
path: ${{ env.UV_CACHE_DIR }}
key: >-
${{ runner.os }}-${{ runner.arch }}-${{ steps.python.outputs.python-version }}-${{
steps.generate-uv-key.outputs.key }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm questioning if the described issue will be existing on the uv cache as this one include the current datetime including seconds... I expect that is not possible because is first canceling the current workflow before starting a new one

Copy link
Copy Markdown
Member Author

@cdce8p cdce8p Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly, as you've guessed already.

What I've tried to achieve here is a minor optimization after a HA version bump where no caches will exist. If there are multiple PRs merged in short succession each and every one will again try to rebuild the venv without a cache. That's just unnecessary work. The change here resolves this as only the first "uv cache save" after a HA version bump will always succeed. For all other cases steps.cache-uv.outputs.cache-matched-key will be the restored cache key and thus only the default condition success() && steps.cache-venv.outputs.cache-hit != 'true' will be used, just like it is now.

Comment thread .github/workflows/ci.yaml
${{ runner.os }}-${{ runner.arch }}-${{ steps.python.outputs.python-version }}-${{
steps.generate-uv-key.outputs.key }}
- name: Save base Python virtual environment
if: always() && steps.create-venv.outcome == 'success'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we somehow detect if the workflow gets cancelled during the execution of the step?
Because in my opinion we should not upload the cache if the workflow was cancelled before the upload step was started.
Example:
Current workflow is executing "Dump pip freeze" and another PR with a dependency is merged. So uploading the cache is useless as it is already invalid and the next workflow would create a new one anyways

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I fully understand what you actually want to achieve here. Sure we can detect if an earlier step was cancelled. I'm doing that here with steps.create-venv.outcome == 'success' to prevent cache saves if the job was cancelled before the venv was fully created. It is not possible to know if the next workflow run was triggered because of another dependency bump. It might as well be just another PR. So I'd suggest to still save the venv. At worst we loose 20s for the cache upload but if it's indeed needed for the next run anyway, it doesn't need to be recreated for that one.

@cdce8p
Copy link
Copy Markdown
Member Author

cdce8p commented Apr 21, 2026

@edenhaus Any update here?

Copy link
Copy Markdown
Member

@edenhaus edenhaus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cdce8p 👍

@edenhaus edenhaus merged commit acd9dd2 into home-assistant:dev Apr 29, 2026
85 of 86 checks passed
@cdce8p cdce8p deleted the ci-cache-prevent-cancellation branch April 29, 2026 08:41
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed code-quality small-pr PRs with less than 30 lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants