Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run CI every night #1334

Merged
merged 4 commits into from
Mar 14, 2022
Merged

Run CI every night #1334

merged 4 commits into from
Mar 14, 2022

Conversation

aminvakil
Copy link
Collaborator

I think this helps us running to catch any errors sooner than a PR comes up as this repository is not having so many changes and there may be many days without a CI running, and who doesn't like more testing? :)

We may want to change the time though.

@BYK
Copy link
Member

BYK commented Feb 20, 2022

This repo gets run many many times every day in Sentry, Snuba, and Relay repos so I don't think scheduled runs are necessary.

@aminvakil
Copy link
Collaborator Author

I haven't seen this error (timeout on 360m) until now for example. Maybe this can help us understanding the problems sooner?
And there aren't any issues (at least public ones) about this error, does this mean other repositories using self-hosted have not faced this?

@BYK
Copy link
Member

BYK commented Feb 21, 2022

I haven't seen this error (timeout on 360m) until now for example. Maybe this can help us understanding the problems sooner?

I tend to disagree but not a strong one. Also, I'm no longer the main maintainer of the repo so I'll leave the final decision to @chadwhitacre 🙂 It's just, I don't find this particularly useful for myself.

And there aren't any issues (at least public ones) about this error, does this mean other repositories using self-hosted have not faced this?

They run things in GCB (https://github.com/getsentry/sentry/blob/037ce428a4f60f39b891f410be0904187b05f2b8/docker/cloudbuild.yaml#L46-L68) so that might be the reason. I'd rather put more effort into porting those to GitHub actions rather than this kind of "bandaid" solutions. That's a much bigger and riskier ask tho.

@aminvakil
Copy link
Collaborator Author

And there aren't any issues (at least public ones) about this error, does this mean other repositories using self-hosted have not faced this?

They run things in GCB (https://github.com/getsentry/sentry/blob/037ce428a4f60f39b891f410be0904187b05f2b8/docker/cloudbuild.yaml#L46-L68) so that might be the reason. I'd rather put more effort into porting those to GitHub actions rather than this kind of "bandaid" solutions. That's a much bigger and riskier ask tho.

I can try opening a PR migrating main sentry repository CI to GHA later, that would really helps us understanding and fixing problems sooner, and I also understand right now we should put double effort realizing if an error has something to do with GCB/GHA differences or is this a really bug. But meanwhile from my POV which cannot involve much on other sentry repositories and see if running CI on self-hosted is resulting in an error, I would rather see new problems here rather than waiting for other sentry repositories to fail and wait for them to open an issue here about it.

I tend to disagree but not a strong one. Also, I'm no longer the main maintainer of the repo so I'll leave the final decision to @chadwhitacre slightly_smiling_face It's just, I don't find this particularly useful for myself.

@BYK I highly value your input and I'd appreciate if you read ^ and tell me about it.

@BYK
Copy link
Member

BYK commented Feb 22, 2022

But meanwhile from my POV which cannot involve much on other sentry repositories and see if running CI on self-hosted is resulting in an error, I would rather see new problems here rather than waiting for other sentry repositories to fail and wait for them to open an issue here about it.

Once we get rid of the GHA/GCB discrepancies (even now, with the added GCB run, it is mitigated quite a lot) "waiting for other repos for potential problems" stops being a problem (from my perspective) due to the following reasons:

  1. For anything, if CI is green here, it should not break any other repos
  2. If we do have time-related consistency issues (which unfortunately we do have due to less-than-strict version pinning), then other repos, especially getsentry/sentry gets so many PRs/commits that it would catch these way before any nightly run would

One thing that would add more value (even more than the dreaded GCB/GHA move) would be to simulate a release in a nightly. Now that would definitely add value and make us much more ready for releases. This can start with something very basic like running craft config (we should do this for PRs already anyway 😅) and then running craft release dryrun or triggering the release job with a dry-run parameter or something.

I can try opening a PR migrating main sentry repository CI to GHA later, that would really helps us understanding and fixing problems sooner, and I also understand right now we should put double effort realizing if an error has something to do with GCB/GHA differences or is this a really bug.

I would refrain from that as an external contributor as the images built in GCB can directly push to Sentry accounts which are then used from there. A move like this would involve some credentials. Either they'd be using GitHub's Docker Repo ghcr.io, or they'd setup OIDC with Google Cloud to be able to push images from GHA to Google Cloud. The value proposition is reducing duplication and costs regarding CI costs but I'd say the monetary amount would be peanuts compared to other expenses.

@chadwhitacre
Copy link
Member

#1341 for the timeout.

I'm okay bringing this in until CI settles down, I can see it being helpful to have a daily signal whether CI is healthy or not since we don't necessarily have PRs every day in self-hosted. It can be a good barometer for us "14 days without flakiness" or something to know when to declare victory, then we can turn this cron off again.

Now to get CI to green on this PR so we can merge ... :P

@aminvakil
Copy link
Collaborator Author

aminvakil commented Feb 23, 2022

Once we get rid of the GHA/GCB discrepancies (even now, with the added GCB run, it is mitigated quite a lot) "waiting for other repos for potential problems" stops being a problem (from my perspective) due to the following reasons:

1. For anything, if CI is green here, it should not break any other repos

2. If we do have time-related consistency issues (which unfortunately we do have due to less-than-strict version pinning), then other repos, especially getsentry/sentry gets so many PRs/commits that it would catch these way before any nightly run would

I guess we're good with this until we get rid of the GHA/GCB discrepancies then, as I'm not sure when it will be fixed, I'd say let's have a better CI in this repository until then.

One thing that would add more value (even more than the dreaded GCB/GHA move) would be to simulate a release in a nightly. Now that would definitely add value and make us much more ready for releases. This can start with something very basic like running craft config (we should do this for PRs already anyway sweat_smile) and then running craft release dryrun or triggering the release job with a dry-run parameter or something.

#1342

I would refrain from that as an external contributor as the images built in GCB can directly push to Sentry accounts which are then used from there. A move like this would involve some credentials. Either they'd be using GitHub's Docker Repo ghcr.io, or they'd setup OIDC with Google Cloud to be able to push images from GHA to Google Cloud. The value proposition is reducing duplication and costs regarding CI costs but I'd say the monetary amount would be peanuts compared to other expenses.

Fair enough.

@chadwhitacre
Copy link
Member

@aminvakil Can you rebase to pick up #1375? 🙏

@aminvakil
Copy link
Collaborator Author

All checks have passed 🎉

I'm sure it will bring us more headaches as it will fail randomly every night with current CI flakes, but I hope we can at least realize when something like #1341 happens and tear down our CI completely and debug it easier.

@chadwhitacre chadwhitacre merged commit 48dd5eb into getsentry:master Mar 14, 2022
@aminvakil aminvakil deleted the ci_nightly branch March 18, 2022 16:36
@github-actions github-actions bot locked and limited conversation to collaborators Apr 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants