Skip to content

Comments

AlibabaCloud: adding alibaba cloud platform#926

Merged
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
kwoodson:alibaba_test
Dec 15, 2021
Merged

AlibabaCloud: adding alibaba cloud platform#926
openshift-merge-robot merged 2 commits intoopenshift:masterfrom
kwoodson:alibaba_test

Conversation

@kwoodson
Copy link

@kwoodson kwoodson commented Oct 5, 2021

The initial attempt at creating the necessary bits for Alibaba.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2021
@kwoodson kwoodson marked this pull request as draft October 5, 2021 21:19
@kwoodson
Copy link
Author

kwoodson commented Oct 8, 2021

@elmiko @JoelSpeed
In order to make the machinesets properly create their instances I had to provide the following additional values:

          vpcId: vpc-0xie7clcw7syyi01yx07v
          securityGroupId: sg-0xi3y8ibemuq94p7s58f
          vSwitchId: vsw-0xidp4dro02ck884pfpbc
          vpcId: vpc-0xie7clcw7syyi01yx07v
          ioOptimized: true
          ramRoleName: test-fx2g5-role-worker
          keyPairName: kwoodson

Some of these values will be easy to populate based upon the or hard code for ioOptimized (might be that we do not need this one). The question I have is that for some of these values we need to query them or the terraform generation will need to pass them along to the manifests upon creation. How does this process work? I will attempt to see what other clouds have done to solve this same problem.

The other question I have is the credentialsSecret.

          credentialsSecret:
            name: alibabacloud-credentials

If the installer no longer populates the credentials, who/where does that secret come from? I am manually creating it today and the nodes successfully create but without them the API calls fail.

Machines are created successfully.

test-fx2g5-worker-us-east-1a-72q7n   Running   ecs.g6.large   us-east-1   us-east-1a   116m
test-fx2g5-worker-us-east-1b-hpggv   Running   ecs.g6.large   us-east-1   us-east-1b   114m
test-fx2g5-worker-us-east-1b-wt289   Running   ecs.g6.large   us-east-1   us-east-1b   114m

@JoelSpeed
Copy link
Contributor

How does this process work? I will attempt to see what other clouds have done to solve this same problem.

Normally, this kind of cluster specific information is injected into the machinesets by the installer. How this work differs slightly for each platform IIRC. On most platforms I think we don't rely on the explicit ID but instead the names and then lookup the entity based on the name, which does make things slightly easier. Probably best to talk to the installer team for this one as they'll have a better idea about what values can and can't be injected when the machineset manifests are generated

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 25, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 25, 2021
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2021
"fmt"
"testing"

alibaba "github.com/AliyunContainerService/cluster-api-provider-alibabacloud/pkg/apis/alibabacloudprovider/v1beta1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would have to be imported from openshift, ideally, openshift/api to avoid dependencies on individual provider repos

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other providers have a mix of configuration. Some (gcp, vsphere) use their openshift respositories and others use their upstream sigs.k8s.io/cluster-api-provider- (aws, azure). I followed the 2nd one and then replaced this repository in the go.mod as the others do.

I see that some have moved here https://github.com/openshift/api/tree/master/machine/v1beta1

What does the process look like moving them into openshift/api? What happens to the openshift/cluster-api-provider-alibaba? What would you like me to do for this one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rebased and see that the work has moved to openshift/api. I will open a PR there and fix this when that merges.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will wait until this PR merges:
openshift/api#1045

Comment on lines +120 to +151
case configv1.AlibabaCloudPlatformType:
return images.ClusterAPIControllerAlibaba, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Alibaba support spot instances? If not, this doesn't need to be here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spot instances are known as "pay-as-you-go" and are a supported type. I am not sure our intention on using them or if this is required. I'll defer to your opinion @JoelSpeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's already supported within the machine provider spec for alibaba, have they implemented a termination handler? Ref https://github.com/openshift/enhancements/blob/master/enhancements/machine-api/spot-instances.md#termination-handler-design

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented a termination hanlder. I will submit a PR later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwoodson @JoelSpeed Tomorrow I will submit a new PR to openshift/cluster-api-provider-alibaba

Comment on lines 90 to 89
// defaultAlibabaOSDiskOSType = "Linux"
// defaultAlibabaOSDiskStorageType = "Premium_LRS"
// AlibabaMaxDiskSizeGB = 32768
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Planning to also default these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will investigate these.

@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 28, 2021
"runtime"
"strings"

alibaba "github.com/AliyunContainerService/cluster-api-provider-alibabacloud/pkg/apis/alibabacloudprovider/v1beta1"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened up a PR here openshift/api#1045 and will remove once it merges.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 30, 2021
@kwoodson kwoodson changed the title [WIP] AlibabaCloud: adding alibaba cloud platform AlibabaCloud: adding alibaba cloud platform Nov 12, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 12, 2021
@kwoodson kwoodson marked this pull request as ready for review November 12, 2021 20:41
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 12, 2021
@kwoodson kwoodson force-pushed the alibaba_test branch 5 times, most recently from 084c167 to 9926df1 Compare November 12, 2021 21:20
@kwoodson
Copy link
Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 12, 2021
@kwoodson
Copy link
Author

@JoelSpeed I have removed the webhooks code as per our discussion. This is still awaiting github.com/openshift/api/pull/1045 in order to keep in line with the api refactor.

@kwoodson
Copy link
Author

/retest

2 similar comments
@kwoodson
Copy link
Author

/retest

@kwoodson
Copy link
Author

/retest

@kwoodson
Copy link
Author

Libvirt test is failing because of a missing secret.

/bin/bash: line 9: /tmp/secret/openshift_gcp_compute_zone: No such file or directory

@elmiko
Copy link
Contributor

elmiko commented Dec 13, 2021

should be fine @kwoodson , the only tests which haven't passed are non-required. if @JoelSpeed is ok to approve this we should be able to merge tomorrow.

@JoelSpeed
Copy link
Contributor

We still can't merge this, https://issues.redhat.com/browse/ART-3535 is still in progress, AFAIK, waiting on ART

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 14, 2021
@kwoodson
Copy link
Author

A rebase was required to fix go.mod dependencies. I have updated and redeployed a cluster with the latest changes:

NAME                                 STATUS   ROLES    AGE    VERSION
test-px2zh-master-0                  Ready    master   126m   v1.22.1+6859754
test-px2zh-master-1                  Ready    master   124m   v1.22.1+6859754
test-px2zh-master-2                  Ready    master   126m   v1.22.1+6859754
test-px2zh-worker-us-east-1a-555jl   Ready    worker   115m   v1.22.1+6859754
test-px2zh-worker-us-east-1b-2wph4   Ready    worker   114m   v1.22.1+6859754
test-px2zh-worker-us-east-1b-zlqvw   Ready    worker   111m   v1.22.1+6859754
NAME                                 PHASE     TYPE            REGION	ZONE         AGE
test-px2zh-master-0                  Running   ecs.g6.xlarge   us-east-1   us-east-1b   127m
test-px2zh-master-1                  Running   ecs.g6.xlarge   us-east-1   us-east-1a   127m
test-px2zh-master-2                  Running   ecs.g6.xlarge   us-east-1   us-east-1b   127m
test-px2zh-worker-us-east-1a-555jl   Running   ecs.g6.large    us-east-1   us-east-1a   121m
test-px2zh-worker-us-east-1b-2wph4   Running   ecs.g6.large    us-east-1   us-east-1b   121m
test-px2zh-worker-us-east-1b-zlqvw   Running   ecs.g6.large    us-east-1   us-east-1b   121m

@kwoodson
Copy link
Author

Looking into why the unit tests failed.

@kwoodson
Copy link
Author

@JoelSpeed @elmiko Please TAL. I believe the failures (e2e-aws) are failing all CI jobs in openshift. This PR is not the cause. I reviewed the unit test failures but was unsure as to why they are failing.

@JoelSpeed
Copy link
Contributor

/test unit

@JoelSpeed
Copy link
Contributor

/approve
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 15, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 15, 2021
Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for all the hard work @kwoodson !
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@JoelSpeed
Copy link
Contributor

/retest-required

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 15, 2021

@kwoodson: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere 4f8d902 link false /test e2e-vsphere
ci/prow/e2e-metal-ipi-ovn-dualstack 4f8d902 link false /test e2e-metal-ipi-ovn-dualstack
ci/prow/e2e-gcp-operator 4f8d902 link false /test e2e-gcp-operator
ci/prow/e2e-vsphere-upgrade 4f8d902 link false /test e2e-vsphere-upgrade
ci/prow/e2e-openstack 4f8d902 link false /test e2e-openstack
ci/prow/e2e-aws-disruptive 4f8d902 link false /test e2e-aws-disruptive

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit 045db50 into openshift:master Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants