Skip to content

Enable fail-fast for new JVM version matrix actions#24690

Closed
ZacBlanco wants to merge 1 commit intoprestodb:masterfrom
ZacBlanco:upstream-actions-fail-fast
Closed

Enable fail-fast for new JVM version matrix actions#24690
ZacBlanco wants to merge 1 commit intoprestodb:masterfrom
ZacBlanco:upstream-actions-fail-fast

Conversation

@ZacBlanco
Copy link
Copy Markdown
Contributor

Description

Enables fail-fast on matrix builds

Motivation and Context

Should save some CI resources on builds where items fail

Impact

N/A

Test Plan

N/A

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Mar 7, 2025
@ZacBlanco
Copy link
Copy Markdown
Contributor Author

@czentgr One concern I am noticing here regarding the fail-fast is that with flaky-tests we might be more susceptible to test flakiness causing build failures

@czentgr
Copy link
Copy Markdown
Contributor

czentgr commented Mar 11, 2025

@ZacBlanco You mentioned you monitored the pipeline for a bit now. Do we have that many flaky tests to cause a problem in the fail-fast mode vs actual issues?

@ZacBlanco
Copy link
Copy Markdown
Contributor Author

ZacBlanco commented Mar 12, 2025

So from my anec-data in this PR, is that I had to re-run the "failed" action group 4 times before the checks all passed. I suspect without fail-fast it would have been only 1 or 2.

The issue lies in that if you have both 8 & 17 tests running, and one fails, the other is instantly failed. So if either 8 or 17 is flaky, you have to re-run both, and hope that neither have flaky tests. When fail-fast is disabled, then both actions can run to completion. Any action without a failure doesn't get re-run when trying to "Re run failed jobs" from the github UI.

I understand that fail-fast can potentially save resources on deterministic failures, but given the frequency of flaky tests, I am of the opinion it is actually better to actually leave fail-fast disabled to decrease the number of re-runs required on average

@czentgr
Copy link
Copy Markdown
Contributor

czentgr commented Mar 13, 2025

@ZacBlanco I understand the reasoning. Obviously, my assumption was that we don't have that many flaky tests but do recall there was an effort to clean these up. I also investigated one and could not understand how it was failing occasionally. On the other hand we can't take them out of testing because the flaky error is known (issue created) but also if passing shows no regression.

I don't really have a problem with using the resources. I suppose we can make this change dependent on having stable tests. By that time we might have reduced the number jobs per java version where it becomes moot.

We are running on the "free" github runners so the other concern was queuing if we keep too many resources for longer than necessary.

@ZacBlanco ZacBlanco closed this Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants