Skip to content

Conversation

@Prashanth684
Copy link
Contributor

@Prashanth684 Prashanth684 commented Sep 6, 2022

In order to support provisioning of arm64 bootimages, these need to be
created using the image gallery resource as they are not supported
through managed images. For this purpose, it would be good to unify the
implementation of x86 and arm64 to use image galleries as Microsoft have
said that it will be the supported way of creating bootimages going forward
for new instance types of new architectures.

This implementation would create one image gallery with two images: a
gen1 and a gen2 image which we would then pick appropriately based on
the VM generation.

xref: https://issues.redhat.com/browse/ARMOCP-16

@openshift-ci openshift-ci bot requested review from jhixson74 and m1kola September 6, 2022 16:24
@Prashanth684
Copy link
Contributor Author

/cc @patrickdillon @jstuever

@Prashanth684
Copy link
Contributor Author

/test e2e-azure

@patrickdillon
Copy link
Contributor

/cc @r4f4

@openshift-ci openshift-ci bot requested a review from r4f4 September 6, 2022 22:19
Copy link
Contributor

@patrickdillon patrickdillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks good. I can see that you're working through some errors in CI. The issue is not obvious, so feel free to provide an update or reach out if you want to discuss what's going on there.

I left a comment about setting the version. I think this is preferable, but I would not say i is a hard requirement. We can discuss this idea further. The first suggestion, in terms of piping the release through the codebase may be preferable to parsing the url, but may also be a bit more hairy to implement.

@Prashanth684
Copy link
Contributor Author

In general this looks good. I can see that you're working through some errors in CI. The issue is not obvious, so feel free to provide an update or reach out if you want to discuss what's going on there.

I left a comment about setting the version. I think this is preferable, but I would not say i is a hard requirement. We can discuss this idea further. The first suggestion, in terms of piping the release through the codebase may be preferable to parsing the url, but may also be a bit more hairy to implement.

yes the CI results concern me a bit..the current run seems to be progressing fine but the failure was essentially:

time="2022-09-06T18:12:04Z" level=error msg="Error: creating/updating Shared Image: (Image Name \"ci-op-2h3vjhpr-2dc90-7tk4z\" / Gallery Name \"gallery_ci_op_2h3vjhpr_2dc90_7tk4z\" / Resource Group \"ci-op-2h3vjhpr-2dc90-7tk4z-rg\"): compute.GalleryImagesClient#CreateOrUpdate: Failure sending request: StatusCode=404 -- Original Error: Code=\"ParentResourceNotFound\" Message=\"Can not perform requested operation on nested resource. Parent resource 'gallery_ci_op_2h3vjhpr_2dc90_7tk4z' not found.\""

so the flow here is image gallery -> image definition(inside the gallery) -> image version (inside the image definition).

the error above indicates that the image gallery could not be found, but in the previous few lines i see:

time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image_gallery.sig: Creation complete after 34s [id=/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-2h3vjhpr-2dc90-7tk4z-rg/providers/Microsoft.Compute/galleries/gallery_ci_op_2h3vjhpr_2dc90_7tk4z]"
time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image.clustergen2: Creating..."
time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image.cluster: Creating..."

so from a timeline perspective it seems ok. I'm still wondering if we need these resources to each have a depends_on argument just to eliminate the possibility of a race?

@patrickdillon
Copy link
Contributor

so the flow here is image gallery -> image definition(inside the gallery) -> image version (inside the image definition).

the error above indicates that the image gallery could not be found, but in the previous few lines i see:

time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image_gallery.sig: Creation complete after 34s [id=/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-2h3vjhpr-2dc90-7tk4z-rg/providers/Microsoft.Compute/galleries/gallery_ci_op_2h3vjhpr_2dc90_7tk4z]"
time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image.clustergen2: Creating..."
time="2022-09-06T18:08:55Z" level=debug msg="azurerm_shared_image.cluster: Creating..."

so from a timeline perspective it seems ok. I'm still wondering if we need these resources to each have a depends_on argument just to eliminate the possibility of a race?

This error is an example of a common class of errors we see in the upstream terraform providers. The terraform providers should handle these dependencies, but frequently there are bugs. We've had two similar bugs reported in the past week.

Adding a depends_on may be a sufficient work around for this. It would also be good to open an issue in the upstream provider hashicorp/terraform-provider-azurerm. If the depends_on workaround relieves the symptoms, you could also open an installer bug for us to track the upstream work.

Digging into the upstream code: the creation of the image gallery is here:
https://github.com/hashicorp/terraform-provider-azurerm/blob/main/internal/services/compute/shared_image_gallery_resource.go#L66

and the image itself is here:
https://github.com/hashicorp/terraform-provider-azurerm/blob/main/internal/services/compute/shared_image_resource.go#L245

@Prashanth684
Copy link
Contributor Author

/retest-required

@Prashanth684
Copy link
Contributor Author

/test e2e-azure-shared-vpc

@Prashanth684
Copy link
Contributor Author

/test e2e-azure-shared-vpc
/test e2e-azurestack

@Prashanth684
Copy link
Contributor Author

/test e2e-azure-shared-vpc

@Prashanth684 Prashanth684 force-pushed the azure-imagegallery branch 2 times, most recently from f72486c to a333b21 Compare September 12, 2022 23:09
@Prashanth684
Copy link
Contributor Author

/test e2e-azure-ovn-resourcegroup

@Prashanth684 Prashanth684 force-pushed the azure-imagegallery branch 3 times, most recently from 0a8560a to 553d13e Compare September 14, 2022 17:28
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 3379601 and 2 for PR HEAD 7994d1c in total

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 53727e6 and 1 for PR HEAD 7994d1c in total

@patrickdillon
Copy link
Contributor

/override ci/prow/e2e-aws-ovn
ci/prow/e2e-gcp-ovn
ci/prow/e2e-vsphere-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 6, 2022

@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-aws-ovn

Details

In response to this:

/override ci/prow/e2e-aws-ovn
ci/prow/e2e-gcp-ovn
ci/prow/e2e-vsphere-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD c100594 and 0 for PR HEAD 7994d1c in total

@r4f4
Copy link
Contributor

r4f4 commented Oct 7, 2022

/override ci/prow/e2e-vsphere-ovn

@r4f4
Copy link
Contributor

r4f4 commented Oct 7, 2022

/override ci/prow/e2e-gcp-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 7, 2022

@r4f4: Overrode contexts on behalf of r4f4: ci/prow/e2e-vsphere-ovn

Details

In response to this:

/override ci/prow/e2e-vsphere-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 7, 2022

@r4f4: Overrode contexts on behalf of r4f4: ci/prow/e2e-gcp-ovn

Details

In response to this:

/override ci/prow/e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@r4f4
Copy link
Contributor

r4f4 commented Oct 7, 2022

/override ci/prow/e2e-azure-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 7, 2022

@r4f4: Overrode contexts on behalf of r4f4: ci/prow/e2e-azure-ovn

Details

In response to this:

/override ci/prow/e2e-azure-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@r4f4
Copy link
Contributor

r4f4 commented Oct 7, 2022

/override ci/prow/e2e-gcp-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 7, 2022

@r4f4: Overrode contexts on behalf of r4f4: ci/prow/e2e-gcp-ovn

Details

In response to this:

/override ci/prow/e2e-gcp-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

/hold

Revision 7994d1c was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2022
@r4f4
Copy link
Contributor

r4f4 commented Oct 7, 2022

/skip

@r4f4
Copy link
Contributor

r4f4 commented Oct 7, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 7, 2022
@r4f4
Copy link
Contributor

r4f4 commented Oct 7, 2022

/skip

@r4f4
Copy link
Contributor

r4f4 commented Oct 7, 2022

/override ci/prow/e2e-aws-ovn ci/prow/okd-scos-images ci/prow/okd-scos-unit ci/prow/okd-scos-verify-codegen ci/prow/e2e-agent-compact

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 7, 2022

@r4f4: Overrode contexts on behalf of r4f4: ci/prow/e2e-agent-compact, ci/prow/e2e-aws-ovn, ci/prow/okd-scos-images, ci/prow/okd-scos-unit, ci/prow/okd-scos-verify-codegen

Details

In response to this:

/override ci/prow/e2e-aws-ovn ci/prow/okd-scos-images ci/prow/okd-scos-unit ci/prow/okd-scos-verify-codegen ci/prow/e2e-agent-compact

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 7b739cd into openshift:master Oct 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 7, 2022

@Prashanth684: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-ibmcloud 0b3e91e88a5b8b34c399f20dd21b51a9f776b06f link false /test e2e-ibmcloud
ci/prow/e2e-vsphere 0b3e91e88a5b8b34c399f20dd21b51a9f776b06f link true /test e2e-vsphere
ci/prow/e2e-azure-ovn-shared-vpc 7994d1c link false /test e2e-azure-ovn-shared-vpc
ci/prow/e2e-openstack 7994d1c link false /test e2e-openstack
ci/prow/e2e-azurestack 7994d1c link false /test e2e-azurestack
ci/prow/okd-e2e-aws-upgrade 7994d1c link false /test okd-e2e-aws-upgrade
ci/prow/e2e-libvirt 7994d1c link false /test e2e-libvirt
ci/prow/okd-e2e-aws-ovn 7994d1c link false /test okd-e2e-aws-ovn
ci/prow/e2e-agent-mce 7994d1c link false /test e2e-agent-mce
ci/prow/e2e-metal-ipi 7994d1c link false /test e2e-metal-ipi
ci/prow/e2e-ibmcloud-ovn 7994d1c link false /test e2e-ibmcloud-ovn
ci/prow/e2e-azure-ovn-resourcegroup 7994d1c link false /test e2e-azure-ovn-resourcegroup

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Prashanth684 added a commit to Prashanth684/release that referenced this pull request Oct 11, 2022
…resscontroller

With the bootimage creation for Azure changing from managed images to
image gallery with this installer PR: openshift/installer#6304,
it is better to infer the image id path from existing machines as it has a version and
gallery name which are not easy to derive.
openshift-merge-robot pushed a commit to openshift/release that referenced this pull request Oct 14, 2022
…resscontroller (#33067)

With the bootimage creation for Azure changing from managed images to
image gallery with this installer PR: openshift/installer#6304,
it is better to infer the image id path from existing machines as it has a version and
gallery name which are not easy to derive.
dlom referenced this pull request in openshift/hive Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants