-
Notifications
You must be signed in to change notification settings - Fork 462
Bug 2042655: Alibaba hairpin #2919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 2042655: Alibaba hairpin #2919
Conversation
Copy the files used to workaround load balancer hairpin limitations in Azure over to be used in Alibaba cloud.
|
@staebler: This pull request references Bugzilla bug 2042655, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla ([email protected]), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
not sure why alibaba approvers didnt get auto assigned here, so manually doing so: |
…o wait kube-apiserver rolls out on AlibabaCloud (openshift#5535)" This reverts commit 6e2d76b. With openshift/machine-config-operator#2919, it is no longer necessary to delay the teardown of the bootstrap control plane. The cluster will no longer get into an unusable state when there is only a single kube-apiserver pod running.
d66cea9 to
c4ab8ba
Compare
|
I have done two installs where the master nodes are all about to reach the ready state. This is with the reversion of the teardown delay added to the installer. The cluster install succeeds. I see the routes service running on the masters. |
templates/master/00-master/alibabacloud/files/opt-libexec-openshift-alibabacloud-routes-sh.yaml
Outdated
Show resolved
Hide resolved
|
@staebler This strategy seems like a solid approach. My only concern is maintaining this service file once Alibaba's LB feature fixes this problem. Is there a path to migration or removal of these rules? What is the downside for future maintenance? |
The service can continue to run even when the load balancer does support hairpinning. The service will detect that communication to the API is working and not add the iptables routes. However, we will need to maintain it indefinitely to continue to support 4.10 users that cannot do hairpinning. This is true even after Alibaba cloud makes the load balancer changes, as the configuration of the load balancers created in 4.10 will still not allow hairpinning. |
… of azure Update the hairpin files copied from Azure so that they reference Alibaba cloud instead. Add to the README for apiserver-watcher to describe the behavior on Alibaba Cloud.
c4ab8ba to
da4c446
Compare
|
FYI I tried building with the 2 PRs, and got 4 successful installations. LGTM, thanks! The build is from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-gcp-modern/1484374603642966016. ./openshift-install 4.10.0-0.ci.test-2022-01-21-042054-ci-ln-2sq07qb-latest The QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/69306/ (SUCCESS, region: us-east-1) The QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/69309/ (SUCCESS, region: cn-beijing) The QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/69310/ (SUCCESS, region: cn-hangzhou) The QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/69321/ (SUCCESS, region: ap-southeast-1) The QE flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/69330/ (SUCCESS, region: cn-hongkong) |
|
/lgtm |
|
/assign @kikisdeliveryservice |
|
Successfully installation on us-east-1. |
|
/approve |
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the excellent and thorough run through @jianli-wei
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kikisdeliveryservice, kwoodson, mtulio, staebler The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| @@ -0,0 +1,11 @@ | |||
| name: openshift-alibabacloud-routes.service | |||
| enabled: false | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is copied over from the azure/gcp code but could you remind me why this is disabled by default? Or does it not matter since the path service triggers it?
|
@staebler: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
10 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@staebler: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with Bugzilla bug 2042655 has not been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* azure: Check HyperVGenerations for instance type If an instance type that does not support HyperVGeneration version 1 then terraform returns an error mentioning there's support only for V1. Adding a check during install config to check for the versions supported by the instance type provided. * Ensure removal of placement-groups during cluster destroy on AWS * Adjust the startup order of httpd container Run the httpd container after the coreos-downloader completes to ensure that the kernel parameters can be added correctly. Signed-off-by: Zhou Hao <[email protected]> * Add IP outputs for IBM terraform instances Add the IP addresses for IBM bootstrap and master nodes to allow collecting of logs from those nodes. * Revert "Bug 2035757: cluster-bootstrap/alibaba: set tear-down-delay to wait kube-apiserver rolls out on AlibabaCloud (openshift#5535)" This reverts commit 6e2d76b. With openshift/machine-config-operator#2919, it is no longer necessary to delay the teardown of the bootstrap control plane. The cluster will no longer get into an unusable state when there is only a single kube-apiserver pod running. * baremetal: networkConfig field now accepts yaml instead of string value The current patch allows the user to specify the content of the install-config networkConfig field directly as a yaml object. Content validation (for a generic yaml) is now carried on by the install config asset * remove unused kube terraform provider * vendor: update openshift/api to include some alibaba infra changes * Update openshift/api to 6e0b1eb97188. * Update kube modules to v0.23.0. * Update controller-runtime to v0.11.0. * Remove unused terraform-provider-kubernetes. * hack: use go 1.17 for verifying codegen The hack/verify-codegen.sh script was using an image that included go 1.16. However, the updated k8s.io/json module calls the `(reflect.StructField) IsExported` function, which is new in go 1.17. Consequently, the script needs to be updated to use an image that include go 1.17 rather than 1.16. * Bump Fedora CoreOS to 35.20220116.2.0 * Alibaba: fix system disk category of bootstrap Remove hard coding, support users can specify cloud_efficiency in regions that do not support cloud_essd disk category Signed-off-by: sunhui <[email protected]> * Alibaba: fix creating public record being skipped If the user chooses a base domain for which there is no zone, creating the A record in the zone is simply skipped rather than raising an error. Signed-off-by: sunhui <[email protected]> * Alibaba: fix VSwitch subnets overlap Fix the overlapping problem of the VSwitch subnet of the Nat gateway with the master node VSwitch subnets Signed-off-by: sunhui <[email protected]> * remove unsupported options * Add proxy for ironic-agent.service Avoid the issue that ironic agent image cannot be downloaded due to network proxy. Signed-off-by: Zhou Hao <[email protected]> * Revert "remove unsupported options" This reverts commit 2684f8d. * remove unsupported options for existing resources * Alibaba: fix resource creation for existing network When users use an existing network, no longer create Nat gateways and EIPs Signed-off-by: sunhui <[email protected]> * gen'd install configs yaml * update alibaba for provider spec api changes This change updates the alibaba provider spec usage related to the vswitch, security groups, and resource group. The API for the provider spec is changing to use a discriminated union to capture the various methods for finding resources (by id, name, or tags). It also updates several machine api references to note the bifurcated nature of the api version between v1beta1 and v1. * update vendor for latest Aliababa API changes This change is to update the vendor references to support the Alibaba resrouce reference updates to the API. * remove validation related to unsupported options * update validation for unsupported options * openstack: Fix invalid-https-certificate detection Fix the reference to an unbound variable; avoid incrementing the invalid certificate counter in a subshell. * Alibaba: fix support region list Remove unsupport region Nanjing and Dubai. Signed-off-by: sunhui <[email protected]> * Bug 2043297: bump RHCOS 4.10 bootimage metadata These changes will update the RHCOS 4.10 bootimage metadata in the installer. This change includes fixes for the following BZs: Bug 2008521 - gcp-hostname service should correct invalid search entries in resolv.conf Bug 2043296 - Ignition fails when reusing existing statically-keyed LUKS volume Bug 2043721 - Installer bootstrap hosts using outdated kubelet containing bugs This change will also introduce artifacts for for Aliyun, AWS GovCloud regions, and Nutanix. Changes generated with: $ cosa shell [coreos-assembler]$ plume cosa2stream --target data/data/coreos/rhcos.json --distro rhcos --no-signatures \ --url https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases aarch64=410.84.202201251203-0 \ ppc64le=410.84.202201251004-0 s390x=410.84.202201251002-0 x86_64=410.84.202201251210-0 Verification Steps: Install a new 4.10 cluster oc debug node/<node name> -- chroot /host rpm-ostree status Verify that the deployment version matches the version from this PR that matches the architecture you are testing on. (i.e. x86_64 should have version 410.84.202201251210-0) * Bug 2045916: IBMCloud: Stop defaulting to dedicated storage profile Move off the dedicated storage machine profile, as it has shown to be less reliable for provisioning on IBM Cloud. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2045916 * Alibaba: fix destroy not exist security group The destroyer should not error when it attempts to delete a security group that does not exist. Signed-off-by: sunhui <[email protected]> * Alibaba: fix endpoint error in some regions Update sdk and terraform provier version, and add some endpoints of ECS service to fix endpoint error. Signed-off-by: sunhui <[email protected]> * Alibaba: update vendor * Revert "update validation for unsupported options" This reverts commit e5d628d. * Revert "remove validation related to unsupported options" This reverts commit 20f8626. * Alibaba: support internal publish strategy Support internal publish strategy for platform Alibaba Cloud Signed-off-by: sunhui <[email protected]> å * Alibaba: fix installer index panic Add NAT gateway validation to check the region whether support NAT gateway Signed-off-by: sunhui <[email protected]> * remove validation for unsupported options * Alibaba: fix destory exist private zone Should not destroy pre-configured alicloud DNS private zone Signed-off-by: sunhui <[email protected]> * Alibaba: fix validation of resource group ID Fix resource group ID validation errors caused by pagination issues Signed-off-by: sunhui <[email protected]> * update custom image ostype * Bug 2047258: Read GovCloud from RHCOS stream AMIs for GovCloud regions have been added to the RHCOS stream. Remove validation requiring users to provide an AMI. * Remove Caleb Boylan from core installer reviewers Co-authored-by: rna-afk <[email protected]> Co-authored-by: Joel Speed <[email protected]> Co-authored-by: Zhou Hao <[email protected]> Co-authored-by: Christopher J Schaefer <[email protected]> Co-authored-by: staebler <[email protected]> Co-authored-by: Andrea Fasano <[email protected]> Co-authored-by: OpenShift Merge Robot <[email protected]> Co-authored-by: Vadim Rutkovsky <[email protected]> Co-authored-by: sunhui <[email protected]> Co-authored-by: Jeff Nowicki <[email protected]> Co-authored-by: Michael McCune <[email protected]> Co-authored-by: Pierre Prinetti <[email protected]> Co-authored-by: Huijing Hei <[email protected]> Co-authored-by: patrickdillon <[email protected]> Co-authored-by: Kiran Thyagaraja <[email protected]>
* azure: Check HyperVGenerations for instance type If an instance type that does not support HyperVGeneration version 1 then terraform returns an error mentioning there's support only for V1. Adding a check during install config to check for the versions supported by the instance type provided. * Ensure removal of placement-groups during cluster destroy on AWS * Adjust the startup order of httpd container Run the httpd container after the coreos-downloader completes to ensure that the kernel parameters can be added correctly. Signed-off-by: Zhou Hao <[email protected]> * Add IP outputs for IBM terraform instances Add the IP addresses for IBM bootstrap and master nodes to allow collecting of logs from those nodes. * Revert "Bug 2035757: cluster-bootstrap/alibaba: set tear-down-delay to wait kube-apiserver rolls out on AlibabaCloud (openshift#5535)" This reverts commit 6e2d76b. With openshift/machine-config-operator#2919, it is no longer necessary to delay the teardown of the bootstrap control plane. The cluster will no longer get into an unusable state when there is only a single kube-apiserver pod running. * baremetal: networkConfig field now accepts yaml instead of string value The current patch allows the user to specify the content of the install-config networkConfig field directly as a yaml object. Content validation (for a generic yaml) is now carried on by the install config asset * remove unused kube terraform provider * vendor: update openshift/api to include some alibaba infra changes * Update openshift/api to 6e0b1eb97188. * Update kube modules to v0.23.0. * Update controller-runtime to v0.11.0. * Remove unused terraform-provider-kubernetes. * hack: use go 1.17 for verifying codegen The hack/verify-codegen.sh script was using an image that included go 1.16. However, the updated k8s.io/json module calls the `(reflect.StructField) IsExported` function, which is new in go 1.17. Consequently, the script needs to be updated to use an image that include go 1.17 rather than 1.16. * Bump Fedora CoreOS to 35.20220116.2.0 * Alibaba: fix system disk category of bootstrap Remove hard coding, support users can specify cloud_efficiency in regions that do not support cloud_essd disk category Signed-off-by: sunhui <[email protected]> * Alibaba: fix creating public record being skipped If the user chooses a base domain for which there is no zone, creating the A record in the zone is simply skipped rather than raising an error. Signed-off-by: sunhui <[email protected]> * Alibaba: fix VSwitch subnets overlap Fix the overlapping problem of the VSwitch subnet of the Nat gateway with the master node VSwitch subnets Signed-off-by: sunhui <[email protected]> * remove unsupported options * Add proxy for ironic-agent.service Avoid the issue that ironic agent image cannot be downloaded due to network proxy. Signed-off-by: Zhou Hao <[email protected]> * Revert "remove unsupported options" This reverts commit 2684f8d. * Azure Stack: Add UPI Instructions for internal CA Many Azure Stack environments use internal CAs. In these cases special steps are needed for a UPI install. * remove unsupported options for existing resources * Alibaba: fix resource creation for existing network When users use an existing network, no longer create Nat gateways and EIPs Signed-off-by: sunhui <[email protected]> * gen'd install configs yaml * update alibaba for provider spec api changes This change updates the alibaba provider spec usage related to the vswitch, security groups, and resource group. The API for the provider spec is changing to use a discriminated union to capture the various methods for finding resources (by id, name, or tags). It also updates several machine api references to note the bifurcated nature of the api version between v1beta1 and v1. * update vendor for latest Aliababa API changes This change is to update the vendor references to support the Alibaba resrouce reference updates to the API. * remove validation related to unsupported options * update validation for unsupported options * openstack: Fix invalid-https-certificate detection Fix the reference to an unbound variable; avoid incrementing the invalid certificate counter in a subshell. * Alibaba: fix support region list Remove unsupport region Nanjing and Dubai. Signed-off-by: sunhui <[email protected]> * Bug 2043297: bump RHCOS 4.10 bootimage metadata These changes will update the RHCOS 4.10 bootimage metadata in the installer. This change includes fixes for the following BZs: Bug 2008521 - gcp-hostname service should correct invalid search entries in resolv.conf Bug 2043296 - Ignition fails when reusing existing statically-keyed LUKS volume Bug 2043721 - Installer bootstrap hosts using outdated kubelet containing bugs This change will also introduce artifacts for for Aliyun, AWS GovCloud regions, and Nutanix. Changes generated with: $ cosa shell [coreos-assembler]$ plume cosa2stream --target data/data/coreos/rhcos.json --distro rhcos --no-signatures \ --url https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases aarch64=410.84.202201251203-0 \ ppc64le=410.84.202201251004-0 s390x=410.84.202201251002-0 x86_64=410.84.202201251210-0 Verification Steps: Install a new 4.10 cluster oc debug node/<node name> -- chroot /host rpm-ostree status Verify that the deployment version matches the version from this PR that matches the architecture you are testing on. (i.e. x86_64 should have version 410.84.202201251210-0) * Bug 2045916: IBMCloud: Stop defaulting to dedicated storage profile Move off the dedicated storage machine profile, as it has shown to be less reliable for provisioning on IBM Cloud. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2045916 * Alibaba: fix destroy not exist security group The destroyer should not error when it attempts to delete a security group that does not exist. Signed-off-by: sunhui <[email protected]> * Alibaba: fix endpoint error in some regions Update sdk and terraform provier version, and add some endpoints of ECS service to fix endpoint error. Signed-off-by: sunhui <[email protected]> * Alibaba: update vendor * Revert "update validation for unsupported options" This reverts commit e5d628d. * Revert "remove validation related to unsupported options" This reverts commit 20f8626. * Alibaba: support internal publish strategy Support internal publish strategy for platform Alibaba Cloud Signed-off-by: sunhui <[email protected]> å * Alibaba: fix installer index panic Add NAT gateway validation to check the region whether support NAT gateway Signed-off-by: sunhui <[email protected]> * remove validation for unsupported options * Alibaba: fix destory exist private zone Should not destroy pre-configured alicloud DNS private zone Signed-off-by: sunhui <[email protected]> * Alibaba: fix validation of resource group ID Fix resource group ID validation errors caused by pagination issues Signed-off-by: sunhui <[email protected]> * update custom image ostype * Bug 2047258: Read GovCloud from RHCOS stream AMIs for GovCloud regions have been added to the RHCOS stream. Remove validation requiring users to provide an AMI. * Remove Caleb Boylan from core installer reviewers * aws: Remove non-public AWS regions from list of regions When creating the install-config, the installer displays regions of all partitions of AWS. Certain regions also need extra information for the validation to work and should not be taken as input since we only ask for the bare minimum amount of information to create the install config. The best approach here would be to only display all the public regions of AWS and allow for other regions after the install-config is created to allow for the user to add the extra information. * openstack: Don't shortcut cloud scraping if quota is unavailable This results in an incorrect failure to validate network capabilities because network extensions weren't loaded. Co-authored-by: rna-afk <[email protected]> Co-authored-by: Joel Speed <[email protected]> Co-authored-by: Zhou Hao <[email protected]> Co-authored-by: Christopher J Schaefer <[email protected]> Co-authored-by: staebler <[email protected]> Co-authored-by: Andrea Fasano <[email protected]> Co-authored-by: OpenShift Merge Robot <[email protected]> Co-authored-by: Vadim Rutkovsky <[email protected]> Co-authored-by: sunhui <[email protected]> Co-authored-by: Jeff Nowicki <[email protected]> Co-authored-by: patrickdillon <[email protected]> Co-authored-by: Michael McCune <[email protected]> Co-authored-by: Pierre Prinetti <[email protected]> Co-authored-by: Huijing Hei <[email protected]> Co-authored-by: Kiran Thyagaraja <[email protected]> Co-authored-by: Matthew Booth <[email protected]>
* azure: Check HyperVGenerations for instance type If an instance type that does not support HyperVGeneration version 1 then terraform returns an error mentioning there's support only for V1. Adding a check during install config to check for the versions supported by the instance type provided. * Ensure removal of placement-groups during cluster destroy on AWS * Adjust the startup order of httpd container Run the httpd container after the coreos-downloader completes to ensure that the kernel parameters can be added correctly. Signed-off-by: Zhou Hao <[email protected]> * Add IP outputs for IBM terraform instances Add the IP addresses for IBM bootstrap and master nodes to allow collecting of logs from those nodes. * Revert "Bug 2035757: cluster-bootstrap/alibaba: set tear-down-delay to wait kube-apiserver rolls out on AlibabaCloud (openshift#5535)" This reverts commit 6e2d76b. With openshift/machine-config-operator#2919, it is no longer necessary to delay the teardown of the bootstrap control plane. The cluster will no longer get into an unusable state when there is only a single kube-apiserver pod running. * baremetal: networkConfig field now accepts yaml instead of string value The current patch allows the user to specify the content of the install-config networkConfig field directly as a yaml object. Content validation (for a generic yaml) is now carried on by the install config asset * remove unused kube terraform provider * vendor: update openshift/api to include some alibaba infra changes * Update openshift/api to 6e0b1eb97188. * Update kube modules to v0.23.0. * Update controller-runtime to v0.11.0. * Remove unused terraform-provider-kubernetes. * hack: use go 1.17 for verifying codegen The hack/verify-codegen.sh script was using an image that included go 1.16. However, the updated k8s.io/json module calls the `(reflect.StructField) IsExported` function, which is new in go 1.17. Consequently, the script needs to be updated to use an image that include go 1.17 rather than 1.16. * Bump Fedora CoreOS to 35.20220116.2.0 * Alibaba: fix system disk category of bootstrap Remove hard coding, support users can specify cloud_efficiency in regions that do not support cloud_essd disk category Signed-off-by: sunhui <[email protected]> * Alibaba: fix creating public record being skipped If the user chooses a base domain for which there is no zone, creating the A record in the zone is simply skipped rather than raising an error. Signed-off-by: sunhui <[email protected]> * Alibaba: fix VSwitch subnets overlap Fix the overlapping problem of the VSwitch subnet of the Nat gateway with the master node VSwitch subnets Signed-off-by: sunhui <[email protected]> * remove unsupported options * Add proxy for ironic-agent.service Avoid the issue that ironic agent image cannot be downloaded due to network proxy. Signed-off-by: Zhou Hao <[email protected]> * Revert "remove unsupported options" This reverts commit 2684f8d. * remove unsupported options for existing resources * Alibaba: fix resource creation for existing network When users use an existing network, no longer create Nat gateways and EIPs Signed-off-by: sunhui <[email protected]> * gen'd install configs yaml * update alibaba for provider spec api changes This change updates the alibaba provider spec usage related to the vswitch, security groups, and resource group. The API for the provider spec is changing to use a discriminated union to capture the various methods for finding resources (by id, name, or tags). It also updates several machine api references to note the bifurcated nature of the api version between v1beta1 and v1. * update vendor for latest Aliababa API changes This change is to update the vendor references to support the Alibaba resrouce reference updates to the API. * remove validation related to unsupported options * update validation for unsupported options * openstack: Fix invalid-https-certificate detection Fix the reference to an unbound variable; avoid incrementing the invalid certificate counter in a subshell. * Alibaba: fix support region list Remove unsupport region Nanjing and Dubai. Signed-off-by: sunhui <[email protected]> * Bug 2043297: bump RHCOS 4.10 bootimage metadata These changes will update the RHCOS 4.10 bootimage metadata in the installer. This change includes fixes for the following BZs: Bug 2008521 - gcp-hostname service should correct invalid search entries in resolv.conf Bug 2043296 - Ignition fails when reusing existing statically-keyed LUKS volume Bug 2043721 - Installer bootstrap hosts using outdated kubelet containing bugs This change will also introduce artifacts for for Aliyun, AWS GovCloud regions, and Nutanix. Changes generated with: $ cosa shell [coreos-assembler]$ plume cosa2stream --target data/data/coreos/rhcos.json --distro rhcos --no-signatures \ --url https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases aarch64=410.84.202201251203-0 \ ppc64le=410.84.202201251004-0 s390x=410.84.202201251002-0 x86_64=410.84.202201251210-0 Verification Steps: Install a new 4.10 cluster oc debug node/<node name> -- chroot /host rpm-ostree status Verify that the deployment version matches the version from this PR that matches the architecture you are testing on. (i.e. x86_64 should have version 410.84.202201251210-0) * Bug 2045916: IBMCloud: Stop defaulting to dedicated storage profile Move off the dedicated storage machine profile, as it has shown to be less reliable for provisioning on IBM Cloud. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2045916 * Alibaba: fix destroy not exist security group The destroyer should not error when it attempts to delete a security group that does not exist. Signed-off-by: sunhui <[email protected]> * Alibaba: fix endpoint error in some regions Update sdk and terraform provier version, and add some endpoints of ECS service to fix endpoint error. Signed-off-by: sunhui <[email protected]> * Alibaba: update vendor * Revert "update validation for unsupported options" This reverts commit e5d628d. * Revert "remove validation related to unsupported options" This reverts commit 20f8626. * Alibaba: support internal publish strategy Support internal publish strategy for platform Alibaba Cloud Signed-off-by: sunhui <[email protected]> å * Alibaba: fix installer index panic Add NAT gateway validation to check the region whether support NAT gateway Signed-off-by: sunhui <[email protected]> * remove validation for unsupported options * Alibaba: fix destory exist private zone Should not destroy pre-configured alicloud DNS private zone Signed-off-by: sunhui <[email protected]> * Alibaba: fix validation of resource group ID Fix resource group ID validation errors caused by pagination issues Signed-off-by: sunhui <[email protected]> * update custom image ostype * Bug 2047258: Read GovCloud from RHCOS stream AMIs for GovCloud regions have been added to the RHCOS stream. Remove validation requiring users to provide an AMI. * Remove Caleb Boylan from core installer reviewers Co-authored-by: rna-afk <[email protected]> Co-authored-by: Joel Speed <[email protected]> Co-authored-by: Zhou Hao <[email protected]> Co-authored-by: Christopher J Schaefer <[email protected]> Co-authored-by: staebler <[email protected]> Co-authored-by: Andrea Fasano <[email protected]> Co-authored-by: OpenShift Merge Robot <[email protected]> Co-authored-by: Vadim Rutkovsky <[email protected]> Co-authored-by: sunhui <[email protected]> Co-authored-by: Jeff Nowicki <[email protected]> Co-authored-by: Michael McCune <[email protected]> Co-authored-by: Pierre Prinetti <[email protected]> Co-authored-by: Huijing Hei <[email protected]> Co-authored-by: patrickdillon <[email protected]> Co-authored-by: Kiran Thyagaraja <[email protected]>
In the vein of #2011, this PR add an alibabacloud routes script that fixes hairpin.
Background: Alibaba cloud hosts cannot hairpin back to themselves over a load balancer. Thus, we need to redirect traffic to the apiserver vip to ourselves via iptables. However, we should only do this when our local apiserver is running.
The apiserver-watcher drops a $VIP.up and $VIP.down file, accordingly, depending on the state of the apiserver. Then, we add or remove iptables rules that short-circuit the load balancer.
Like Azure, we don't need to do this for external traffic, only local clients.
How to verify it
Install on alibabacloud, ensure connections to the internal API load balancer are reliable - both when the local apiserver process is running and stopped.
Description for the changelog
Masters on alibaba can now reliably connect to the apiserver service, without encountering hairpin issues