Skip to content

Conversation

@cheesesashimi
Copy link
Member

@cheesesashimi cheesesashimi commented Jul 11, 2024

- What I did

This PR does the following:

  • Consolidates, repackages, and refactors how we parse the data from both the machine-config-osimageurl and the machine-config-operator-images ConfigMaps for easier code reuse.
  • Adds Buildah to the MCO image along with a build user.
  • Modifies BuildController so that instead of using a hardcoded image pullspec for the builder container, the image-build container will now use the same image pullspec as the rest of the MCO components instead. It's source of truth for this value will be the machine-config-operator-images ConfigMap.
  • Cleaned up some of the constants that were used in BuildController and pkg/controller/common/constants.go that are no longer needed, as well as renamed a few of the OCL-specific constants for the annotations and labels used by OCL objects.
  • Adds functions to generate label selectors for the aforementioned labels to make querying easier.
  • Removes the TestMachineOSBuilder e2e test for the following reason: Although the test has been passing for quite some time, ever since the OCL API landed, it has been passing for the incorrect reason. Before the OCL API landed, it was passing because we were looking for a specific ConfigMap to be present and asserting that something would either happen or not happen as a result. In its current form, we assert that nothing should happen. At this point however, we've removed the code that expects a certain ConfigMap to be present. However, this test was never updated to reflect that. With the other changes in this PR, we broke this test by preventing it from compiling. Regardless of that, the test itself does not add much value, so it should be removed.
  • Refactors the OCL E2E tests to allow the ImageStream to be created in a different namespace. This enables us to ensure that the correct image pull secret lands on the node as well as ensures that the image pull secret that is bound to the service account is refreshed before each test since we tear down the namespace and its contents upon successful completion of the test.
  • Adds labels and annotations to objects created by the OCL E2E test suite to facilitate easier cleanup.

- How to verify it

There are a couple of ways to verify this:

  1. The MCO image should now contain the buildah binary.
  2. The e2e-gcp-op-techpreview test suite should continue to pass as-is, though I've added some additional assertions.
  3. When running the e2e-gcp-op-techpreview test suite, the image pullspec for the image-build container in the machine-os-builder pod should be the same as the rest of the MCO components.
  4. All of the objects created by the modified e2e test will have an annotation and label whose key is machineconfiguration.openshift.io/used-by-e2e-test. The annotation will have the name of the test currently running.

- Description for the changelog
Lifecycle Buildah with MCO

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 11, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 11, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 11, 2024
@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch from b5601bd to bf946d4 Compare July 18, 2024 19:43
@cheesesashimi cheesesashimi changed the title lifecycle buildah with MCO MCO-703: lifecycle buildah with MCO Jul 18, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 18, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 18, 2024

@cheesesashimi: This pull request references MCO-703 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

Details

In response to this:

- What I did

This attempts to lifecycle Buildah with the MCO. Right now, this probably will not work as intended since we still need to set the /etc/subuid and /etc/subgid files.

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi
Copy link
Member Author

/test e2e-gcp-op-techpreview

@cheesesashimi cheesesashimi changed the title MCO-703: lifecycle buildah with MCO MCO-703: Lifecycle Buildah with MCO Jul 19, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 19, 2024

@cheesesashimi: This pull request references MCO-703 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

Details

In response to this:

- What I did

I added Buildah to the MCO image and made a few modifications so that instead of using a hardcoded image pullspec for Buildah which periodically changes at random, the image-build container in the machine-os-builder pod will use the MCO image instead. It's source of truth for this value will be the machine-config-operator-images ConfigMap. Because of that, I consolidated, repackaged, and refactored how we parse the data within both the machine-config-osimageurl and the machine-config-operator-images ConfigMaps for easier code reuse. I also opportunistically cleaned up some of the constants that were used in BuildController and pkg/controller/common/constants.go that are no longer needed.

- How to verify it

There are a couple of things to verify:

  1. The MCO image should now contain the buildah binary.
  2. When running the e2e-gcp-op-techpreview test suite, the image pullspec for the image-build container in the machine-os-builder pod should be the same as the rest of the MCO components.

- Description for the changelog
Lifecycle Buildah with MCO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 19, 2024

@cheesesashimi: This pull request references MCO-703 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

Details

In response to this:

- What I did

I added Buildah to the MCO image and made a few modifications so that instead of using a hardcoded image pullspec for Buildah which periodically changes at random, the image-build container in the machine-os-builder pod will use the MCO image instead. It's source of truth for this value will be the machine-config-operator-images ConfigMap. Because of that, I consolidated, repackaged, and refactored how we parse the data within both the machine-config-osimageurl and the machine-config-operator-images ConfigMaps for easier code reuse. I also opportunistically cleaned up some of the constants that were used in BuildController and pkg/controller/common/constants.go that are no longer needed.

- How to verify it

There are a couple of ways to verify this:

  1. The MCO image should now contain the buildah binary.
  2. The e2e-gcp-op-techpreview test suite should continue to pass as-is.
  3. When running the e2e-gcp-op-techpreview test suite, the image pullspec for the image-build container in the machine-os-builder pod should be the same as the rest of the MCO components.

- Description for the changelog
Lifecycle Buildah with MCO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch 2 times, most recently from f51bc91 to 2923dd3 Compare July 19, 2024 22:01
@cheesesashimi cheesesashimi marked this pull request as ready for review July 19, 2024 22:05
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 19, 2024
@openshift-ci openshift-ci bot requested review from djoshy and yuqi-zhang July 19, 2024 22:05
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 19, 2024

@cheesesashimi: This pull request references MCO-703 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

Details

In response to this:

- What I did

I added Buildah to the MCO image and made a few modifications so that instead of using a hardcoded image pullspec for Buildah which periodically changes at random, the image-build container in the machine-os-builder pod will use the MCO image instead. It's source of truth for this value will be the machine-config-operator-images ConfigMap. Because of that, I consolidated, repackaged, and refactored how we parse the data within both the machine-config-osimageurl and the machine-config-operator-images ConfigMaps for easier code reuse. I also opportunistically cleaned up some of the constants that were used in BuildController and pkg/controller/common/constants.go that are no longer needed, as well as renamed a few of the OCL-specific constants for the annotations and labels used by OCL objects. The end goal is to make it much more maintainable.

- How to verify it

There are a couple of ways to verify this:

  1. The MCO image should now contain the buildah binary.
  2. The e2e-gcp-op-techpreview test suite should continue to pass as-is.
  3. When running the e2e-gcp-op-techpreview test suite, the image pullspec for the image-build container in the machine-os-builder pod should be the same as the rest of the MCO components.

- Description for the changelog
Lifecycle Buildah with MCO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch from 9eb3226 to f359fa7 Compare July 22, 2024 15:22
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 22, 2024

@cheesesashimi: This pull request references MCO-703 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

Details

In response to this:

- What I did

I added Buildah to the MCO image and made a few modifications so that instead of using a hardcoded image pullspec for Buildah which periodically changes at random, the image-build container in the machine-os-builder pod will use the MCO image instead. It's source of truth for this value will be the machine-config-operator-images ConfigMap. Because of that, I consolidated, repackaged, and refactored how we parse the data within both the machine-config-osimageurl and the machine-config-operator-images ConfigMaps for easier code reuse. I also opportunistically cleaned up some of the constants that were used in BuildController and pkg/controller/common/constants.go that are no longer needed, as well as renamed a few of the OCL-specific constants for the annotations and labels used by OCL objects. The end goal is to make it much more maintainable.

This PR also removes the TestMachineOSBuilder e2e test. The rationale its removal is:

The TestMachineOSBuilder test is no longer necessary in my opinion. Although the test has been passing for quite some time, ever since the OCL API landed, it has been passing for the incorrect reason. Before the OCL API landed, it was passing because we were looking for a specific ConfigMap to be present and asserting that something would either happen or not happen as a result. In its current form, we assert that nothing should happen.

At this point however, we've removed the code that expects a certain ConfigMap to be present. However, this test was never updated to reflect that. With the other changes in this PR, we broke this test by preventing it from compiling. Regardless of that, the test itself does not add much value, so it should be removed.

- How to verify it

There are a couple of ways to verify this:

  1. The MCO image should now contain the buildah binary.
  2. The e2e-gcp-op-techpreview test suite should continue to pass as-is.
  3. When running the e2e-gcp-op-techpreview test suite, the image pullspec for the image-build container in the machine-os-builder pod should be the same as the rest of the MCO components.

- Description for the changelog
Lifecycle Buildah with MCO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch 2 times, most recently from 7bf4fa3 to de6d80e Compare July 22, 2024 15:31
@cheesesashimi
Copy link
Member Author

/retest-required

@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch 2 times, most recently from be43d9b to cf50fbe Compare July 25, 2024 15:54
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 25, 2024

@cheesesashimi: This pull request references MCO-703 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

Details

In response to this:

- What I did

I added Buildah to the MCO image and made a few modifications so that instead of using a hardcoded image pullspec for Buildah which periodically changes at random, the image-build container in the machine-os-builder pod will use the MCO image instead. It's source of truth for this value will be the machine-config-operator-images ConfigMap. Because of that, I consolidated, repackaged, and refactored how we parse the data within both the machine-config-osimageurl and the machine-config-operator-images ConfigMaps for easier code reuse. I also opportunistically cleaned up some of the constants that were used in BuildController and pkg/controller/common/constants.go that are no longer needed, as well as renamed a few of the OCL-specific constants for the annotations and labels used by OCL objects. The end goal is to make it much more maintainable.

This PR also removes the TestMachineOSBuilder e2e test. The rationale for its removal is:

The TestMachineOSBuilder test is no longer necessary in my opinion. Although the test has been passing for quite some time, ever since the OCL API landed, it has been passing for the incorrect reason. Before the OCL API landed, it was passing because we were looking for a specific ConfigMap to be present and asserting that something would either happen or not happen as a result. In its current form, we assert that nothing should happen.

At this point however, we've removed the code that expects a certain ConfigMap to be present. However, this test was never updated to reflect that. With the other changes in this PR, we broke this test by preventing it from compiling. Regardless of that, the test itself does not add much value, so it should be removed.

- How to verify it

There are a couple of ways to verify this:

  1. The MCO image should now contain the buildah binary.
  2. The e2e-gcp-op-techpreview test suite should continue to pass as-is.
  3. When running the e2e-gcp-op-techpreview test suite, the image pullspec for the image-build container in the machine-os-builder pod should be the same as the rest of the MCO components.

- Description for the changelog
Lifecycle Buildah with MCO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch from cf50fbe to bfd62b7 Compare July 25, 2024 16:18
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jul 25, 2024

@cheesesashimi: This pull request references MCO-703 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

Details

In response to this:

- What I did

This PR does the following:

  • Consolidates, repackages, and refactors how we parse the data from both the machine-config-osimageurl and the machine-config-operator-images ConfigMaps for easier code reuse.
  • Adds Buildah to the MCO image along with a build user.
  • Modifies BuildController so that instead of using a hardcoded image pullspec for the builder container, the image-build container will now use the same image pullspec as the rest of the MCO components instead. It's source of truth for this value will be the machine-config-operator-images ConfigMap.
  • Cleaned up some of the constants that were used in BuildController and pkg/controller/common/constants.go that are no longer needed, as well as renamed a few of the OCL-specific constants for the annotations and labels used by OCL objects.
  • Adds functions to generate label selectors for the aforementioned labels to make querying easier.
  • Removes the TestMachineOSBuilder e2e test for the following reason: Although the test has been passing for quite some time, ever since the OCL API landed, it has been passing for the incorrect reason. Before the OCL API landed, it was passing because we were looking for a specific ConfigMap to be present and asserting that something would either happen or not happen as a result. In its current form, we assert that nothing should happen. At this point however, we've removed the code that expects a certain ConfigMap to be present. However, this test was never updated to reflect that. With the other changes in this PR, we broke this test by preventing it from compiling. Regardless of that, the test itself does not add much value, so it should be removed.
  • Refactors the OCL E2E tests to allow the ImageStream to be created in a different namespace. This enables us to ensure that the correct image pull secret lands on the node as well as ensures that the image pull secret that is bound to the service account is refreshed before each test since we tear down the namespace and its contents upon successful completion of the test.
  • Adds labels and annotations to objects created by the OCL E2E test suite to facilitate easier cleanup.

- How to verify it

There are a couple of ways to verify this:

  1. The MCO image should now contain the buildah binary.
  2. The e2e-gcp-op-techpreview test suite should continue to pass as-is, though I've added some additional assertions.
  3. When running the e2e-gcp-op-techpreview test suite, the image pullspec for the image-build container in the machine-os-builder pod should be the same as the rest of the MCO components.
  4. All of the objects created by the modified e2e test will have an annotation and label whose key is machineconfiguration.openshift.io/used-by-e2e-test. The annotation will have the name of the test currently running.

- Description for the changelog
Lifecycle Buildah with MCO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch 2 times, most recently from 7b5a49c to e381bfb Compare July 25, 2024 17:55
@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch from edf1ef4 to 3001fe4 Compare July 29, 2024 15:41
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 29, 2024
@yuqi-zhang
Copy link
Contributor

/test e2e-gcp-op-techpreview

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2024
This moves parsing for the machine-config-operator-images and
machine-config-osimageurl into a centralized location to make working
with these ConfigMaps easier across all components of the MCO that need
to use them. This also introduces a new struct, OSImageURLConfig, which
has the contents of the machine-config-osimageurl ConfigMap in it.
This adds Buildah into the MCO container image and wires up
BuildController to use the MCO container for performing builds instead
of the hard-coded Buildah image. Doing this allows OCP to lifecycle
Buildah with the current OCP version instead of randomly getting a new
version at an inopportune time.
The TestMachineOSBuilder test is no longer necessary in my opinion.
Although the test has been passing for quite some time, ever since the OCL
API landed, it has been passing for the incorrect reason. Before the OCL
API landed, it was passing because we were looking for a specific
ConfigMap to be present and asserting that something would either happen
or not happen as a result. In its current form, we assert that nothing
should happen.

At this point however, we've removed the code that expects a certain
ConfigMap to be present. However, this test was never updated to reflect
that. With the other changes in this PR, we broke this test by
preventing it from compiling. Regardless of that, the test itself does
not add much value, so it should be removed.
This cleans up some of the label constants that are used by OCL build
objects and refactors how the e2e tests use those labels both for
post-test cleanup as well as post-test data collection in the event of
failure.

This way, we have consistent labels for all of the objects created by
the OCL build process as well as a way to easily query for them.
The digestfile ConfigMap that is created as part of the build process
should also make use of the new labels for all ephemeral build objects
so that it can be more readily identified.
@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch from 3001fe4 to 8c81f41 Compare July 30, 2024 15:10
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2024
For OCL, it is useful to create the ImageStream in a separate
namespace in order to simulate the situation where an image registry
absolutely requires private keys for pushing and pulling.

For now, it uses the lower-case test name as the namespace name to
ensure that there are no collisions as namespaces and their objects are
created and deleted.

This has the beneficial side-effect of ensuring that the image pull
secret that is used is always refreshed to avoid test failures by having
the image pull secret rotated before the image push and pull operations
complete.

This adds the current test name as an annotation as well
as a label indicating that it is used by an e2e test to all objects
created by the test to allow for easier debugging via must-gathers, etc.
This also allows label selector queries for cleanup purposes.

This also renames all of the On Cluster Build references to On Cluster
Layering, to correspond with the finalized name of this initiative.
@cheesesashimi cheesesashimi force-pushed the zzlotnik/buildah-lifecycle branch from 8c81f41 to 9b501d9 Compare July 30, 2024 18:33
@cheesesashimi
Copy link
Member Author

/retest-required

@cheesesashimi
Copy link
Member Author

/test e2e-gcp-op-techpreview

@yuqi-zhang
Copy link
Contributor

/test e2e-gcp-op-techpreview

Took a look at the latest failure and can confirm it was due to openshift/cluster-update-keys#58 . I see passing runs now, so hopefully the fix has been propagated

@yuqi-zhang
Copy link
Contributor

We got past the install failure but

TestOnClusterBuildsCustomPodBuilder

does fail. Is that flaky?

@cheesesashimi
Copy link
Member Author

That seems like a strange failure since I've seen that one (and other similar tests) pass before. I'll re-run it to determine if there are any flakes.

@cheesesashimi
Copy link
Member Author

/test e2e-gcp-op-techpreview

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Given that the original card did not have QE label attached (and this mostly is covered by CI), I think we don't need to hold for pre-merge QE. Feel free to hold if that's not the case!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 7, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [cheesesashimi,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 217f607 and 2 for PR HEAD 9b501d9 in total

@cheesesashimi
Copy link
Member Author

The most recent failures here look like a transient infrastructure issue.

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 8, 2024

@cheesesashimi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-zones 9b501d9 link false /test e2e-vsphere-ovn-zones

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 9c7a000 and 1 for PR HEAD 9b501d9 in total

@openshift-merge-bot openshift-merge-bot bot merged commit 7b2a9ae into openshift:master Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants