Skip to content

Conversation

@Prashanth684
Copy link
Contributor

This is a simple sanity check to see if release image architecture matches the bootstrap node's architecture. This is useful in cases of user error where an incorrect release image is specified and we error right away rather than downloading the associated images and failing in say crio when downloading the incorrect pause image.

@praveenkumar
Copy link
Contributor

/test e2e-crc

@Prashanth684
Copy link
Contributor Author

/retest

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are very close to getting the boot image included in the release image (see #4760). Will these changes still be helpful at that point?

we error right away rather than downloading the associated images and failing in
say crio when downloading the incorrect pause image.

The user would still have to wait the entire 20 minutes for the installer to time out waiting for the bootstrap control plane to be reachable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the ppc64le and s390x architectures not included here because they do not need any mangling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. x86_64 and aarch64 map to amd64 and arm64 resp. it is a cause of confusion in many places.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A failing release-image.service won't prevent other services from starting. So is the purpose of this just to have a failing unit in the bootstrap gather?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

essential services like crio and kubelet won't start if release-image.service throws an error. so in that sense, we have caught this error early and let the user know that there is an arch mismatch in the specified release image.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A release-image.service that fails will not prevent crio.service or kubelet.service from starting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah..right ..yeah. but it does prevent crio-configure.service from starting which in turn disables bootkube.service when there is an error, so in that sense , this error is visible right away after running journalctl -b -f -u release-image.service -u bootkube.service

Copy link
Contributor

@staebler staebler Mar 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my point is that it does not prevent any of those services from starting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my point is that it does not prevent any of those services from starting.

Ah, my bad. You are correct. Sorry for the noise. I looked at crio.service and kubelet.service but not bootkube.service.

@Prashanth684
Copy link
Contributor Author

Prashanth684 commented Mar 22, 2021

We are very close to getting the boot image included in the release image (see #4760). Will these changes still be helpful at that point?

we error right away rather than downloading the associated images and failing in
say crio when downloading the incorrect pause image.

The user would still have to wait the entire 20 minutes for the installer to time out waiting for the bootstrap control plane to be reachable.

that is true unfortunately. This cannot be done outside of the bootstrap process early on when the installer runs as we would need to download the release image to check the arch metadata. these changes will help a little in that the error will be obvious on looking at the bootstrap logs rather than seeing a crio error and trying to figure out what the issue could be.

@Prashanth684
Copy link
Contributor Author

We are very close to getting the boot image included in the release image (see #4760). Will these changes still be helpful at that point?

we error right away rather than downloading the associated images and failing in
say crio when downloading the incorrect pause image.

The user would still have to wait the entire 20 minutes for the installer to time out waiting for the bootstrap control plane to be reachable.

Answering the first part of the question which i missed - yes these changes will be useful irrespective of the boot image changes, because this sanity checks the Openshift release image (which can be overridden through the OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE and which is done in local testing and multi-arch CI as well) to see if it matches the target architecture.

…itecture

This is a simple sanity check to see if release image architecture matches the bootstrap node's architecture. This is useful in cases of
user error where an incorrect release image is specified and we error right away rather than downloading the associated images and failing in
say crio when downloading the incorrect pause image.
@Prashanth684
Copy link
Contributor Author

rebased to latest

@Prashanth684
Copy link
Contributor Author

/retest

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With #4751 as well, then the architecture mismatch will be displayed to the user in the terminal when the installation fails.

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 23, 2021
@staebler
Copy link
Contributor

These changes have no effect on upgrades.
/override ci/prow/e2e-aws-upgrade

@openshift-ci-robot
Copy link
Contributor

@staebler: Overrode contexts on behalf of staebler: ci/prow/e2e-aws-upgrade

Details

In response to this:

These changes have no effect on upgrades.
/override ci/prow/e2e-aws-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@staebler
Copy link
Contributor

/hold for libvirt and openstack tests to at least get a passing release image download
/test e2e-openstack
/test e2e-libvirt

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 23, 2021
@Prashanth684
Copy link
Contributor Author

/retest

@staebler
Copy link
Contributor

The libvirt job has a successful install.
The bootstrap machine is failing to fetch its ignition config in the openstack job. That is prevalent in other runs of the job and is not related to the changes in this PR.
/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 24, 2021
@staebler
Copy link
Contributor

/override ci/prow/e2e-aws-upgrade
/skip

@openshift-ci-robot
Copy link
Contributor

@staebler: Overrode contexts on behalf of staebler: ci/prow/e2e-aws-upgrade

Details

In response to this:

/override ci/prow/e2e-aws-upgrade
/skip

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@LorbusChris
Copy link
Contributor

@staebler could you override e2e-aws-upgrade again? It doesn't seem to have worked the first time

@staebler
Copy link
Contributor

/override ci/prow/e2e-aws-upgrade
/skip

@openshift-ci-robot
Copy link
Contributor

@staebler: Overrode contexts on behalf of staebler: ci/prow/e2e-aws-upgrade

Details

In response to this:

/override ci/prow/e2e-aws-upgrade
/skip

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 25, 2021

@Prashanth684: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-libvirt 175738e link /test e2e-libvirt
ci/prow/e2e-aws-workers-rhel7 175738e link /test e2e-aws-workers-rhel7

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Prashanth684
Copy link
Contributor Author

/retest

@staebler
Copy link
Contributor

/override ci/prow/e2e-aws-upgrade
/skip

@openshift-ci-robot
Copy link
Contributor

@staebler: Overrode contexts on behalf of staebler: ci/prow/e2e-aws-upgrade

Details

In response to this:

/override ci/prow/e2e-aws-upgrade
/skip

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 6363f3a into openshift:master Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants