workflows/tests: enable use of a `CI-force-dependents-tests` label by carlocab · Pull Request #82220 · Homebrew/homebrew-core

carlocab · 2021-07-30T16:14:02Z

Have you followed the guidelines for contributing?
Have you ensured that your commits follow the commit style guide?
Have you checked that there aren't other open pull requests for the same formula update/change?
Have you built your formula locally with brew install --build-from-source <formula>, where <formula> is the name of the formula you're submitting?
Is your test running fine brew test <formula>, where <formula> is the name of the formula you're submitting?
Does your build pass brew audit --strict <formula> (after doing brew install --build-from-source <formula>)? If this is a new formula, does it pass brew audit --new <formula>?

Currently, CI skips dependents if the formula tests fail for any reason.
This can be undesirable in PRs which require a large number of revision
bumps (e.g. icu4c, boost, or libffi), or when failures are due to
transient networks issues or other issues that can be more easily
addressed in separate PRs (e.g. some livecheck failures).

This should allow us to make use of a label that will force CI to
continue attempting to test dependents even when the tests of the
modified formula fails.

Not sure if this does what I intend this to yet, but if this change makes sense I can test it out to check.

Currently, CI skips dependents if the formula tests fail for any reason. This can be undesirable in PRs which require a large number of revision bumps (e.g. `icu4c`, `boost`, or `libffi`), or when failures are due to transient networks issues or other issues that can be more easily addressed in separate PRs (e.g. some `livecheck` failures). This should allow us to make use of a label that will force CI to continue attempting to test dependents even when the tests of the modified formula fails.

MikeMcQuaid · 2021-08-03T07:18:52Z

This can be undesirable in PRs which require a large number of revision
bumps (e.g. icu4c, boost, or libffi), or when failures are due to
transient networks issues or other issues that can be more easily
addressed in separate PRs (e.g. some livecheck failures).

I think we should be fixing these in the relevant PRs. I agree with not fast failing on dependents being flaky but if you're revision bumping in a PR anyway: that seems like a very good time to fix flaky tests/livechecks. Alternatively, these should just get removed if they are flaky.

The only exception I can see is flaky e.g. homepage/github audits in which case we should either 1) handle these a bit differently in brew test-bot or 2) fix the underlying issues in Homebrew/brew.

As a result, I'm afraid, I think I'm 👎🏻 on this change. It encourages us to keep ignoring flakes rather than fixing them when they are going to be tested by CI anyway.

SMillerDev · 2021-08-03T07:26:12Z

The only exception I can see is flaky e.g. homepage/github audits in which case we should either 1) handle these a bit differently in brew test-bot or 2) fix the underlying issues in Homebrew/brew.

I think 1 is the best option here. I know casks have some audits as warning an I think that could be the solution.
Especially with Cloudflare WAF screwing up more and more of the homepages for audits we'll have to accept that we can't always connect to upstream with cURL.

MikeMcQuaid · 2021-08-03T07:45:54Z

I think 1 is the best option here. I know casks have some audits as warning an I think that could be the solution.
Especially with Cloudflare WAF screwing up more and more of the homepages for audits we'll have to accept that we can't always connect to upstream with cURL.

If we can't connect to upstream in this case I think we'll just want to disable homepage audits for specific formulae/domains instead of better supporting audit failures.

SMillerDev · 2021-08-03T07:59:34Z

It was just an example really. My point was that all failures should be fixed, but maybe not all failures should immediately abort the build. You can bottle something just fine if their homepage is having an outage.

MikeMcQuaid

Comment so this doesn't show up as still needing my review 😅

MikeMcQuaid · 2021-08-03T08:14:58Z

You can bottle something just fine if their homepage is having an outage.

You technically can but: we should stop doing this. Temporary outages of this sort seem really rare (and easy enough to just add to the audit allowlist JSON in the same PR if need be).

MikeMcQuaid · 2021-08-03T08:15:17Z

Temporary outages of this sort seem really rare

(compared to stuff that fails consistently)

SMillerDev · 2021-08-03T08:45:50Z

I am not sure. The way I see that going is:

CI fails
People add everything to the allowlist because it's the easy option
Now we essentially ignore the homepage audit
People notice that and write an audit for the allowlist
Repeat from 1

carlocab · 2021-08-03T09:07:05Z

Note it's not just temporary outages that are a problem -- sometimes some tests in large PRs just fail because they get rate-limited. This makes these failures rather complicated to fix in large PRs.

It also complicates fixing issues, because skipping dependent tests because of a failure (whether spurious or real) in a modified formula means that you don't get to address these dependent failures on your next PR run. Instead, you have to wait for a second or a third run to even get around to fixing dependents -- assuming you're not hit by a spurious failure that tends to plague large PRs. This can be days or even weeks later for CI runs that take a long time as they do for libffi.

I'd really rather try fixing all the failures in one go, rather than have to do it in several stages, where I fix failures in modified formulae first, then run CI to see if there are dependent failures, and then run CI again to fix those.

Temporary outages of this sort seem really rare

(compared to stuff that fails consistently)

I disagree -- I think I see temporary stuff much more often than actual problems, particularly in PRs that modify several formulae.

MikeMcQuaid · 2021-08-03T10:08:34Z

People add everything to the allowlist because it's the easy option

Now we essentially ignore the homepage audit

This is fine with me. The current version is:

homepage audit fails
people go "oh that's a transient issue"
homepage audit never gets fixed
people get used to merging PRs with red CI
we can never enable the (security benefitting) "only merge on green CI" enforcement

Note it's not just temporary outages that are a problem -- sometimes some tests in large PRs just fail because they get rate-limited. This makes these failures rather complicated to fix in large PRs.

These sort of tests should just be removed IMO. Flaky tests are worse than no tests.

It also complicates fixing issues, because skipping dependent tests because of a failure (whether spurious or real) in a modified formula means that you don't get to address these dependent failures on your next PR run. Instead, you have to wait for a second or a third run to even get around to fixing dependents -- assuming you're not hit by a spurious failure that tends to plague large PRs. This can be days or even weeks later for CI runs that take a long time as they do for libffi.

I see this as the opposite. A PR doesn't get to progress to dependent testing (which takes a long, long time) until all the formulae modified in the PR pass their basic tests.

What we could consider is making these fail even faster by checking e.g brew audit --online in the tap_syntax job or similar so the turnaround on audit/livecheck issues at least are faster.

I'd really rather try fixing all the failures in one go, rather than have to do it in several stages, where I fix failures in modified formulae first, then run CI to see if there are dependent failures, and then run CI again to fix those.

If you're always fixing every failure in every PR and you literally never merge a PR that's red: I'd agree. Otherwise, I'm afraid I do not.

I disagree -- I think I see temporary stuff much more often than actual problems, particularly in PRs that modify several formulae.

Sorry, my "temporary" here I meant "the homepage is temporarily down for a few hours" rather than "this test is temporarily failing because it is a bit flaky".

SMillerDev · 2021-08-03T10:15:12Z

What we could consider is making these fail even faster by checking e.g brew audit --online in the tap_syntax job or similar so the turnaround on audit/livecheck issues at least are faster.

If we can split out the homepage/livecheck/meta audit I'd be fine with the current status. And then I think we could attempt requiring green CI on x86 macOS since the flaky tests would be in a separate metadata CI run

carlocab · 2021-08-03T10:27:04Z

What we could consider is making these fail even faster by checking e.g brew audit --online in the tap_syntax job or similar so the turnaround on audit/livecheck issues at least are faster.

If we can split out the homepage/livecheck/meta audit I'd be fine with the current status. And then I think we could attempt requiring green CI on x86 macOS since the flaky tests would be in a separate metadata CI run

This works for me too.

MikeMcQuaid · 2021-08-03T18:24:31Z

Thanks all ❤️

carlocab · 2021-09-01T04:52:14Z

One thing I realised after trying it -- our current CI setup (and the proposed separation of some network-dependent tests to the tap-syntax job) prevents the proper testing of pre-releases. See #84363. Is there anything else we can do to enable this, or do we just never want to test pre-releases that get caught by our audit? (Not all of them do.)

MikeMcQuaid · 2021-09-01T09:58:32Z

our current CI setup (and the proposed separation of some network-dependent tests to the tap-syntax job) prevents the proper testing of pre-releases

@carlocab I'm not sure I understand this. Can you elaborate a bit more here?

carlocab · 2021-09-01T10:12:07Z

Sure. If a tag on GitHub is labelled as a pre-release, a CI run for a PR that updates to that tag will fail at brew audit --online.

Currently, this failure means that the dependent tests will be skipped. If we move brew audit --online to the tap syntax job (which is, by my interpretation, one of the things we agreed to do from the conversation above), then both the formula and dependent tests would be skipped.

In both of these cases, trying to test a pre-release would not work. In the status quo, dependents are not tested, and this is often the hard part of pre-release testing. Under the proposed changes to our CI setup, no testing of the formula is done at all. Testing pre-releases is something we do on occasion -- typically to prepare for the actual release of a new version, hence the prerelease-testing label.

I'm hoping there would still be some way to test pre-releases in CI because it is occasionally useful. It saved a little time in the previous Go version bump PR, for example. Or, if we just never want to be able to do this, it would be helpful to clarify this too.

MikeMcQuaid · 2021-09-01T10:40:10Z

In both of these cases, trying to test a pre-release would not work. In the status quo, dependents are not tested, and this is often the hard part of pre-release testing. Under the proposed changes to our CI setup, no testing of the formula is done at all. Testing pre-releases is something we do on occasion -- typically to prepare for the actual release of a new version, hence the prerelease-testing label.

Gotcha. I think having something like brew audit --online --prerelease-testing or something which is conditionally set by the CI label would be useful. Thoughts?

carlocab · 2021-09-01T15:09:01Z

Yea, seems reasonable. It also corresponds nicely to a label we already have. I'll put it on my to-do list...

carlocab requested review from MikeMcQuaid and dawidd6 as code owners July 30, 2021 16:14

BrewTestBot added automerge-skip `brew pr-automerge` will skip this pull request workflows PR modifies GitHub Actions workflow files labels Jul 30, 2021

carlocab added CI-syntax-only Change only affects brew syntax, not the install. Only run syntax CI. automerge-skip `brew pr-automerge` will skip this pull request and removed automerge-skip `brew pr-automerge` will skip this pull request labels Jul 30, 2021

MikeMcQuaid reviewed Aug 3, 2021

View reviewed changes

carlocab closed this Aug 3, 2021

carlocab deleted the dependents-label branch August 3, 2021 18:26

github-actions bot added the outdated PR was locked due to age label Oct 2, 2021

github-actions bot locked as resolved and limited conversation to collaborators Oct 2, 2021

Uh oh!

Conversation

carlocab commented Jul 30, 2021

Uh oh!

MikeMcQuaid commented Aug 3, 2021

Uh oh!

SMillerDev commented Aug 3, 2021

Uh oh!

MikeMcQuaid commented Aug 3, 2021

Uh oh!

SMillerDev commented Aug 3, 2021

Uh oh!

MikeMcQuaid left a comment

Choose a reason for hiding this comment

Uh oh!

MikeMcQuaid commented Aug 3, 2021

Uh oh!

MikeMcQuaid commented Aug 3, 2021

Uh oh!

SMillerDev commented Aug 3, 2021

Uh oh!

carlocab commented Aug 3, 2021

Uh oh!

MikeMcQuaid commented Aug 3, 2021

Uh oh!

SMillerDev commented Aug 3, 2021

Uh oh!

carlocab commented Aug 3, 2021

Uh oh!

MikeMcQuaid commented Aug 3, 2021

Uh oh!

carlocab commented Sep 1, 2021

Uh oh!

MikeMcQuaid commented Sep 1, 2021

Uh oh!

carlocab commented Sep 1, 2021

Uh oh!

MikeMcQuaid commented Sep 1, 2021

Uh oh!

carlocab commented Sep 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants