-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Bug 2035757: cluster-bootstrap/alibaba: set tear-down-delay to wait kube-apiserver rolls out on AlibabaCloud #5535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
f386d1d to
495284c
Compare
|
@mtulio: This pull request references Bugzilla bug 2035757, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Here is my analysis of the steps that cause the bug.
The proposed solution presented in this PR is to delay the time between steps (2) and (3) so that the kube-apiservers on the other nodes have a chance to start while the temporary control plane on the bootstrap node is still running. This is not a long-term solution as this does not address any issues that may arise during upgrades as new apiserver revisions are rolled out. If there is a time when there is only one kube-apiserver pod running, then the cluster will get stuck with a non-responding apiserver. |
- One of the masters creates a kube-apiserver pod. - The cluster-bootstrap on the bootstrap node sees that all of the kube-apiserver pods are ready (even though there has only been 1 created so far). - The control plane on the bootstrap is torn down. - The node for the master with the kube-apiserver stops reporting heartbeat since it cannot access the api server via api-int. - The kube-apiserver pod stops behaving since its token is revoked. - Nobody can access the api server any more. BZ https://bugzilla.redhat.com/show_bug.cgi?id=2035757
|
@mtulio: This pull request references Bugzilla bug 2035757, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@staebler I just changed to use the bootstrap template. ptal? Tests: for others ( |
staebler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a nit about code structure.
Co-authored-by: Matthew Staebler <[email protected]>
|
As we have no CI for this component yet, sharing the result of the cluster running with this version: DEBUG Time elapsed per stage:
DEBUG cluster: 2m35s
DEBUG bootstrap: 1m17s
DEBUG Bootstrap Complete: 21m47s
DEBUG API: 2m31s
DEBUG Cluster Operators: 20m34s
INFO Time elapsed: 46m22s
NAME STATUS ROLES AGE VERSION
mrbkas-f7vb6-master-0 Ready master 56m v1.23.0+60f5a1c
mrbkas-f7vb6-master-1 Ready master 54m v1.23.0+60f5a1c
mrbkas-f7vb6-master-2 Ready master 55m v1.23.0+60f5a1c
kube-apiserver-mrbkas-f7vb6-master-0 5/5 Running 0 20m
kube-apiserver-mrbkas-f7vb6-master-1 5/5 Running 0 12m
kube-apiserver-mrbkas-f7vb6-master-2 5/5 Running 0 14mcc @kwoodson regarding CI PR: |
|
/uncc |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@staebler Not sure if the |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
8 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@mtulio: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/override ci/prow/e2e-aws-upgrade
|
|
@mtulio: mtulio unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@staebler @patrickdillon please skip it again. |
|
/retest-required |
|
@mtulio: All pull requests linked via external trackers have merged: Bugzilla bug 2035757 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…o wait kube-apiserver rolls out on AlibabaCloud (openshift#5535)" This reverts commit 6e2d76b. With openshift/machine-config-operator#2919, it is no longer necessary to delay the teardown of the bootstrap control plane. The cluster will no longer get into an unusable state when there is only a single kube-apiserver pod running.
* azure: Check HyperVGenerations for instance type If an instance type that does not support HyperVGeneration version 1 then terraform returns an error mentioning there's support only for V1. Adding a check during install config to check for the versions supported by the instance type provided. * Ensure removal of placement-groups during cluster destroy on AWS * Adjust the startup order of httpd container Run the httpd container after the coreos-downloader completes to ensure that the kernel parameters can be added correctly. Signed-off-by: Zhou Hao <[email protected]> * Add IP outputs for IBM terraform instances Add the IP addresses for IBM bootstrap and master nodes to allow collecting of logs from those nodes. * Revert "Bug 2035757: cluster-bootstrap/alibaba: set tear-down-delay to wait kube-apiserver rolls out on AlibabaCloud (openshift#5535)" This reverts commit 6e2d76b. With openshift/machine-config-operator#2919, it is no longer necessary to delay the teardown of the bootstrap control plane. The cluster will no longer get into an unusable state when there is only a single kube-apiserver pod running. * baremetal: networkConfig field now accepts yaml instead of string value The current patch allows the user to specify the content of the install-config networkConfig field directly as a yaml object. Content validation (for a generic yaml) is now carried on by the install config asset * remove unused kube terraform provider * vendor: update openshift/api to include some alibaba infra changes * Update openshift/api to 6e0b1eb97188. * Update kube modules to v0.23.0. * Update controller-runtime to v0.11.0. * Remove unused terraform-provider-kubernetes. * hack: use go 1.17 for verifying codegen The hack/verify-codegen.sh script was using an image that included go 1.16. However, the updated k8s.io/json module calls the `(reflect.StructField) IsExported` function, which is new in go 1.17. Consequently, the script needs to be updated to use an image that include go 1.17 rather than 1.16. * Bump Fedora CoreOS to 35.20220116.2.0 * Alibaba: fix system disk category of bootstrap Remove hard coding, support users can specify cloud_efficiency in regions that do not support cloud_essd disk category Signed-off-by: sunhui <[email protected]> * Alibaba: fix creating public record being skipped If the user chooses a base domain for which there is no zone, creating the A record in the zone is simply skipped rather than raising an error. Signed-off-by: sunhui <[email protected]> * Alibaba: fix VSwitch subnets overlap Fix the overlapping problem of the VSwitch subnet of the Nat gateway with the master node VSwitch subnets Signed-off-by: sunhui <[email protected]> * remove unsupported options * Add proxy for ironic-agent.service Avoid the issue that ironic agent image cannot be downloaded due to network proxy. Signed-off-by: Zhou Hao <[email protected]> * Revert "remove unsupported options" This reverts commit 2684f8d. * remove unsupported options for existing resources * Alibaba: fix resource creation for existing network When users use an existing network, no longer create Nat gateways and EIPs Signed-off-by: sunhui <[email protected]> * gen'd install configs yaml * update alibaba for provider spec api changes This change updates the alibaba provider spec usage related to the vswitch, security groups, and resource group. The API for the provider spec is changing to use a discriminated union to capture the various methods for finding resources (by id, name, or tags). It also updates several machine api references to note the bifurcated nature of the api version between v1beta1 and v1. * update vendor for latest Aliababa API changes This change is to update the vendor references to support the Alibaba resrouce reference updates to the API. * remove validation related to unsupported options * update validation for unsupported options * openstack: Fix invalid-https-certificate detection Fix the reference to an unbound variable; avoid incrementing the invalid certificate counter in a subshell. * Alibaba: fix support region list Remove unsupport region Nanjing and Dubai. Signed-off-by: sunhui <[email protected]> * Bug 2043297: bump RHCOS 4.10 bootimage metadata These changes will update the RHCOS 4.10 bootimage metadata in the installer. This change includes fixes for the following BZs: Bug 2008521 - gcp-hostname service should correct invalid search entries in resolv.conf Bug 2043296 - Ignition fails when reusing existing statically-keyed LUKS volume Bug 2043721 - Installer bootstrap hosts using outdated kubelet containing bugs This change will also introduce artifacts for for Aliyun, AWS GovCloud regions, and Nutanix. Changes generated with: $ cosa shell [coreos-assembler]$ plume cosa2stream --target data/data/coreos/rhcos.json --distro rhcos --no-signatures \ --url https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases aarch64=410.84.202201251203-0 \ ppc64le=410.84.202201251004-0 s390x=410.84.202201251002-0 x86_64=410.84.202201251210-0 Verification Steps: Install a new 4.10 cluster oc debug node/<node name> -- chroot /host rpm-ostree status Verify that the deployment version matches the version from this PR that matches the architecture you are testing on. (i.e. x86_64 should have version 410.84.202201251210-0) * Bug 2045916: IBMCloud: Stop defaulting to dedicated storage profile Move off the dedicated storage machine profile, as it has shown to be less reliable for provisioning on IBM Cloud. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2045916 * Alibaba: fix destroy not exist security group The destroyer should not error when it attempts to delete a security group that does not exist. Signed-off-by: sunhui <[email protected]> * Alibaba: fix endpoint error in some regions Update sdk and terraform provier version, and add some endpoints of ECS service to fix endpoint error. Signed-off-by: sunhui <[email protected]> * Alibaba: update vendor * Revert "update validation for unsupported options" This reverts commit e5d628d. * Revert "remove validation related to unsupported options" This reverts commit 20f8626. * Alibaba: support internal publish strategy Support internal publish strategy for platform Alibaba Cloud Signed-off-by: sunhui <[email protected]> å * Alibaba: fix installer index panic Add NAT gateway validation to check the region whether support NAT gateway Signed-off-by: sunhui <[email protected]> * remove validation for unsupported options * Alibaba: fix destory exist private zone Should not destroy pre-configured alicloud DNS private zone Signed-off-by: sunhui <[email protected]> * Alibaba: fix validation of resource group ID Fix resource group ID validation errors caused by pagination issues Signed-off-by: sunhui <[email protected]> * update custom image ostype * Bug 2047258: Read GovCloud from RHCOS stream AMIs for GovCloud regions have been added to the RHCOS stream. Remove validation requiring users to provide an AMI. * Remove Caleb Boylan from core installer reviewers Co-authored-by: rna-afk <[email protected]> Co-authored-by: Joel Speed <[email protected]> Co-authored-by: Zhou Hao <[email protected]> Co-authored-by: Christopher J Schaefer <[email protected]> Co-authored-by: staebler <[email protected]> Co-authored-by: Andrea Fasano <[email protected]> Co-authored-by: OpenShift Merge Robot <[email protected]> Co-authored-by: Vadim Rutkovsky <[email protected]> Co-authored-by: sunhui <[email protected]> Co-authored-by: Jeff Nowicki <[email protected]> Co-authored-by: Michael McCune <[email protected]> Co-authored-by: Pierre Prinetti <[email protected]> Co-authored-by: Huijing Hei <[email protected]> Co-authored-by: patrickdillon <[email protected]> Co-authored-by: Kiran Thyagaraja <[email protected]>
* azure: Check HyperVGenerations for instance type If an instance type that does not support HyperVGeneration version 1 then terraform returns an error mentioning there's support only for V1. Adding a check during install config to check for the versions supported by the instance type provided. * Ensure removal of placement-groups during cluster destroy on AWS * Adjust the startup order of httpd container Run the httpd container after the coreos-downloader completes to ensure that the kernel parameters can be added correctly. Signed-off-by: Zhou Hao <[email protected]> * Add IP outputs for IBM terraform instances Add the IP addresses for IBM bootstrap and master nodes to allow collecting of logs from those nodes. * Revert "Bug 2035757: cluster-bootstrap/alibaba: set tear-down-delay to wait kube-apiserver rolls out on AlibabaCloud (openshift#5535)" This reverts commit 6e2d76b. With openshift/machine-config-operator#2919, it is no longer necessary to delay the teardown of the bootstrap control plane. The cluster will no longer get into an unusable state when there is only a single kube-apiserver pod running. * baremetal: networkConfig field now accepts yaml instead of string value The current patch allows the user to specify the content of the install-config networkConfig field directly as a yaml object. Content validation (for a generic yaml) is now carried on by the install config asset * remove unused kube terraform provider * vendor: update openshift/api to include some alibaba infra changes * Update openshift/api to 6e0b1eb97188. * Update kube modules to v0.23.0. * Update controller-runtime to v0.11.0. * Remove unused terraform-provider-kubernetes. * hack: use go 1.17 for verifying codegen The hack/verify-codegen.sh script was using an image that included go 1.16. However, the updated k8s.io/json module calls the `(reflect.StructField) IsExported` function, which is new in go 1.17. Consequently, the script needs to be updated to use an image that include go 1.17 rather than 1.16. * Bump Fedora CoreOS to 35.20220116.2.0 * Alibaba: fix system disk category of bootstrap Remove hard coding, support users can specify cloud_efficiency in regions that do not support cloud_essd disk category Signed-off-by: sunhui <[email protected]> * Alibaba: fix creating public record being skipped If the user chooses a base domain for which there is no zone, creating the A record in the zone is simply skipped rather than raising an error. Signed-off-by: sunhui <[email protected]> * Alibaba: fix VSwitch subnets overlap Fix the overlapping problem of the VSwitch subnet of the Nat gateway with the master node VSwitch subnets Signed-off-by: sunhui <[email protected]> * remove unsupported options * Add proxy for ironic-agent.service Avoid the issue that ironic agent image cannot be downloaded due to network proxy. Signed-off-by: Zhou Hao <[email protected]> * Revert "remove unsupported options" This reverts commit 2684f8d. * Azure Stack: Add UPI Instructions for internal CA Many Azure Stack environments use internal CAs. In these cases special steps are needed for a UPI install. * remove unsupported options for existing resources * Alibaba: fix resource creation for existing network When users use an existing network, no longer create Nat gateways and EIPs Signed-off-by: sunhui <[email protected]> * gen'd install configs yaml * update alibaba for provider spec api changes This change updates the alibaba provider spec usage related to the vswitch, security groups, and resource group. The API for the provider spec is changing to use a discriminated union to capture the various methods for finding resources (by id, name, or tags). It also updates several machine api references to note the bifurcated nature of the api version between v1beta1 and v1. * update vendor for latest Aliababa API changes This change is to update the vendor references to support the Alibaba resrouce reference updates to the API. * remove validation related to unsupported options * update validation for unsupported options * openstack: Fix invalid-https-certificate detection Fix the reference to an unbound variable; avoid incrementing the invalid certificate counter in a subshell. * Alibaba: fix support region list Remove unsupport region Nanjing and Dubai. Signed-off-by: sunhui <[email protected]> * Bug 2043297: bump RHCOS 4.10 bootimage metadata These changes will update the RHCOS 4.10 bootimage metadata in the installer. This change includes fixes for the following BZs: Bug 2008521 - gcp-hostname service should correct invalid search entries in resolv.conf Bug 2043296 - Ignition fails when reusing existing statically-keyed LUKS volume Bug 2043721 - Installer bootstrap hosts using outdated kubelet containing bugs This change will also introduce artifacts for for Aliyun, AWS GovCloud regions, and Nutanix. Changes generated with: $ cosa shell [coreos-assembler]$ plume cosa2stream --target data/data/coreos/rhcos.json --distro rhcos --no-signatures \ --url https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases aarch64=410.84.202201251203-0 \ ppc64le=410.84.202201251004-0 s390x=410.84.202201251002-0 x86_64=410.84.202201251210-0 Verification Steps: Install a new 4.10 cluster oc debug node/<node name> -- chroot /host rpm-ostree status Verify that the deployment version matches the version from this PR that matches the architecture you are testing on. (i.e. x86_64 should have version 410.84.202201251210-0) * Bug 2045916: IBMCloud: Stop defaulting to dedicated storage profile Move off the dedicated storage machine profile, as it has shown to be less reliable for provisioning on IBM Cloud. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2045916 * Alibaba: fix destroy not exist security group The destroyer should not error when it attempts to delete a security group that does not exist. Signed-off-by: sunhui <[email protected]> * Alibaba: fix endpoint error in some regions Update sdk and terraform provier version, and add some endpoints of ECS service to fix endpoint error. Signed-off-by: sunhui <[email protected]> * Alibaba: update vendor * Revert "update validation for unsupported options" This reverts commit e5d628d. * Revert "remove validation related to unsupported options" This reverts commit 20f8626. * Alibaba: support internal publish strategy Support internal publish strategy for platform Alibaba Cloud Signed-off-by: sunhui <[email protected]> å * Alibaba: fix installer index panic Add NAT gateway validation to check the region whether support NAT gateway Signed-off-by: sunhui <[email protected]> * remove validation for unsupported options * Alibaba: fix destory exist private zone Should not destroy pre-configured alicloud DNS private zone Signed-off-by: sunhui <[email protected]> * Alibaba: fix validation of resource group ID Fix resource group ID validation errors caused by pagination issues Signed-off-by: sunhui <[email protected]> * update custom image ostype * Bug 2047258: Read GovCloud from RHCOS stream AMIs for GovCloud regions have been added to the RHCOS stream. Remove validation requiring users to provide an AMI. * Remove Caleb Boylan from core installer reviewers * aws: Remove non-public AWS regions from list of regions When creating the install-config, the installer displays regions of all partitions of AWS. Certain regions also need extra information for the validation to work and should not be taken as input since we only ask for the bare minimum amount of information to create the install config. The best approach here would be to only display all the public regions of AWS and allow for other regions after the install-config is created to allow for the user to add the extra information. * openstack: Don't shortcut cloud scraping if quota is unavailable This results in an incorrect failure to validate network capabilities because network extensions weren't loaded. Co-authored-by: rna-afk <[email protected]> Co-authored-by: Joel Speed <[email protected]> Co-authored-by: Zhou Hao <[email protected]> Co-authored-by: Christopher J Schaefer <[email protected]> Co-authored-by: staebler <[email protected]> Co-authored-by: Andrea Fasano <[email protected]> Co-authored-by: OpenShift Merge Robot <[email protected]> Co-authored-by: Vadim Rutkovsky <[email protected]> Co-authored-by: sunhui <[email protected]> Co-authored-by: Jeff Nowicki <[email protected]> Co-authored-by: patrickdillon <[email protected]> Co-authored-by: Michael McCune <[email protected]> Co-authored-by: Pierre Prinetti <[email protected]> Co-authored-by: Huijing Hei <[email protected]> Co-authored-by: Kiran Thyagaraja <[email protected]> Co-authored-by: Matthew Booth <[email protected]>
* azure: Check HyperVGenerations for instance type If an instance type that does not support HyperVGeneration version 1 then terraform returns an error mentioning there's support only for V1. Adding a check during install config to check for the versions supported by the instance type provided. * Ensure removal of placement-groups during cluster destroy on AWS * Adjust the startup order of httpd container Run the httpd container after the coreos-downloader completes to ensure that the kernel parameters can be added correctly. Signed-off-by: Zhou Hao <[email protected]> * Add IP outputs for IBM terraform instances Add the IP addresses for IBM bootstrap and master nodes to allow collecting of logs from those nodes. * Revert "Bug 2035757: cluster-bootstrap/alibaba: set tear-down-delay to wait kube-apiserver rolls out on AlibabaCloud (openshift#5535)" This reverts commit 6e2d76b. With openshift/machine-config-operator#2919, it is no longer necessary to delay the teardown of the bootstrap control plane. The cluster will no longer get into an unusable state when there is only a single kube-apiserver pod running. * baremetal: networkConfig field now accepts yaml instead of string value The current patch allows the user to specify the content of the install-config networkConfig field directly as a yaml object. Content validation (for a generic yaml) is now carried on by the install config asset * remove unused kube terraform provider * vendor: update openshift/api to include some alibaba infra changes * Update openshift/api to 6e0b1eb97188. * Update kube modules to v0.23.0. * Update controller-runtime to v0.11.0. * Remove unused terraform-provider-kubernetes. * hack: use go 1.17 for verifying codegen The hack/verify-codegen.sh script was using an image that included go 1.16. However, the updated k8s.io/json module calls the `(reflect.StructField) IsExported` function, which is new in go 1.17. Consequently, the script needs to be updated to use an image that include go 1.17 rather than 1.16. * Bump Fedora CoreOS to 35.20220116.2.0 * Alibaba: fix system disk category of bootstrap Remove hard coding, support users can specify cloud_efficiency in regions that do not support cloud_essd disk category Signed-off-by: sunhui <[email protected]> * Alibaba: fix creating public record being skipped If the user chooses a base domain for which there is no zone, creating the A record in the zone is simply skipped rather than raising an error. Signed-off-by: sunhui <[email protected]> * Alibaba: fix VSwitch subnets overlap Fix the overlapping problem of the VSwitch subnet of the Nat gateway with the master node VSwitch subnets Signed-off-by: sunhui <[email protected]> * remove unsupported options * Add proxy for ironic-agent.service Avoid the issue that ironic agent image cannot be downloaded due to network proxy. Signed-off-by: Zhou Hao <[email protected]> * Revert "remove unsupported options" This reverts commit 2684f8d. * remove unsupported options for existing resources * Alibaba: fix resource creation for existing network When users use an existing network, no longer create Nat gateways and EIPs Signed-off-by: sunhui <[email protected]> * gen'd install configs yaml * update alibaba for provider spec api changes This change updates the alibaba provider spec usage related to the vswitch, security groups, and resource group. The API for the provider spec is changing to use a discriminated union to capture the various methods for finding resources (by id, name, or tags). It also updates several machine api references to note the bifurcated nature of the api version between v1beta1 and v1. * update vendor for latest Aliababa API changes This change is to update the vendor references to support the Alibaba resrouce reference updates to the API. * remove validation related to unsupported options * update validation for unsupported options * openstack: Fix invalid-https-certificate detection Fix the reference to an unbound variable; avoid incrementing the invalid certificate counter in a subshell. * Alibaba: fix support region list Remove unsupport region Nanjing and Dubai. Signed-off-by: sunhui <[email protected]> * Bug 2043297: bump RHCOS 4.10 bootimage metadata These changes will update the RHCOS 4.10 bootimage metadata in the installer. This change includes fixes for the following BZs: Bug 2008521 - gcp-hostname service should correct invalid search entries in resolv.conf Bug 2043296 - Ignition fails when reusing existing statically-keyed LUKS volume Bug 2043721 - Installer bootstrap hosts using outdated kubelet containing bugs This change will also introduce artifacts for for Aliyun, AWS GovCloud regions, and Nutanix. Changes generated with: $ cosa shell [coreos-assembler]$ plume cosa2stream --target data/data/coreos/rhcos.json --distro rhcos --no-signatures \ --url https://rhcos-redirector.apps.art.xq1c.p1.openshiftapps.com/art/storage/releases aarch64=410.84.202201251203-0 \ ppc64le=410.84.202201251004-0 s390x=410.84.202201251002-0 x86_64=410.84.202201251210-0 Verification Steps: Install a new 4.10 cluster oc debug node/<node name> -- chroot /host rpm-ostree status Verify that the deployment version matches the version from this PR that matches the architecture you are testing on. (i.e. x86_64 should have version 410.84.202201251210-0) * Bug 2045916: IBMCloud: Stop defaulting to dedicated storage profile Move off the dedicated storage machine profile, as it has shown to be less reliable for provisioning on IBM Cloud. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2045916 * Alibaba: fix destroy not exist security group The destroyer should not error when it attempts to delete a security group that does not exist. Signed-off-by: sunhui <[email protected]> * Alibaba: fix endpoint error in some regions Update sdk and terraform provier version, and add some endpoints of ECS service to fix endpoint error. Signed-off-by: sunhui <[email protected]> * Alibaba: update vendor * Revert "update validation for unsupported options" This reverts commit e5d628d. * Revert "remove validation related to unsupported options" This reverts commit 20f8626. * Alibaba: support internal publish strategy Support internal publish strategy for platform Alibaba Cloud Signed-off-by: sunhui <[email protected]> å * Alibaba: fix installer index panic Add NAT gateway validation to check the region whether support NAT gateway Signed-off-by: sunhui <[email protected]> * remove validation for unsupported options * Alibaba: fix destory exist private zone Should not destroy pre-configured alicloud DNS private zone Signed-off-by: sunhui <[email protected]> * Alibaba: fix validation of resource group ID Fix resource group ID validation errors caused by pagination issues Signed-off-by: sunhui <[email protected]> * update custom image ostype * Bug 2047258: Read GovCloud from RHCOS stream AMIs for GovCloud regions have been added to the RHCOS stream. Remove validation requiring users to provide an AMI. * Remove Caleb Boylan from core installer reviewers Co-authored-by: rna-afk <[email protected]> Co-authored-by: Joel Speed <[email protected]> Co-authored-by: Zhou Hao <[email protected]> Co-authored-by: Christopher J Schaefer <[email protected]> Co-authored-by: staebler <[email protected]> Co-authored-by: Andrea Fasano <[email protected]> Co-authored-by: OpenShift Merge Robot <[email protected]> Co-authored-by: Vadim Rutkovsky <[email protected]> Co-authored-by: sunhui <[email protected]> Co-authored-by: Jeff Nowicki <[email protected]> Co-authored-by: Michael McCune <[email protected]> Co-authored-by: Pierre Prinetti <[email protected]> Co-authored-by: Huijing Hei <[email protected]> Co-authored-by: patrickdillon <[email protected]> Co-authored-by: Kiran Thyagaraja <[email protected]>
Setting up
--tear-down-delayflag tocluster-bootstrapto wait10muntil it tear down, to give time to wait for two kube-apiserver pods is available to finish the bootstrap in AlibabaCloud (only). The default value will be set to0.This is caused due to a premature ending of bootkube of
cluster-bootstrap/kube-apiserverwhen scheduled the kube-apiserver to only one master, and the "SLB limitation where cannot be accessed by backend servers"[1].[1] https://www.alibabacloud.com/help/en/doc-detail/55206.htm
This is an investigation of the bug https://bugzilla.redhat.com/show_bug.cgi?id=2035757