-
Notifications
You must be signed in to change notification settings - Fork 462
MCO-1230: Retry build and push operations multiple times #4469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-1230: Retry build and push operations multiple times #4469
Conversation
|
Skipping CI for Draft Pull Request. |
|
@cheesesashimi: This pull request references MCO-1230 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-gcp-op-techpreview |
yuqi-zhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Code logically makes sense. Curious, would we ever want to retry at the pod level? i.e. if a builder pod fails, we retry with a new builder pod. Although I do think this way is a bit better since different operations can be retried separately (build and push)
I want to do that eventually. I opened MCO-1231 to consider using Kubernetes Jobs instead of bare pods like we're doing here. |
|
Also, I just wanted to point out that the I think I found it. It's another one of those weird Bash footguns. I think I eventually want to do away with Bash as the entrypoint for this and have a Golang binary that does all of the setup, retries, etc. It would be nice if I could do something like this instead: https://github.com/containers/buildah/blob/main/docs/tutorials/04-include-in-your-build-tool.md |
4aa53f8 to
a0c1d84
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cheesesashimi, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@cheesesashimi: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[ART PR BUILD NOTIFIER] Distgit: ose-machine-config-operator |
- What I did
Occasionally, an image build and / or push operation will fail due to a transient network condition. To make this process more robust, we should retry these operations multiple times. This PR implements a simplified approach where the build and push operations themselves are wrapped in a
retryfunction. It should be noted that a key limitation of this approach is that it does not account for situations where the build pod is evicted or rescheduled onto a different node. For that, we may want to investigate using a Kubernetes Job which provides additional resilience around evictions and rescheduling.- How to verify it
- Description for the changelog
Image builds and pushes should be retried