Skip to content

Conversation

@kwoodson
Copy link
Contributor

@kwoodson kwoodson commented Nov 4, 2021

The work included in this pull request is the following:

  • Setting hostname to machine.GetName()
  • Setting instancename to machine.GetName()
  • Including the machine.Getname() to the list of network addresses

This is required in order for the CSR process to match the subject name to the machine name.

This also aligns us setting the Alibaba Private Zone DNS synchronization so that hostnames resolve and align with the CSR.

cc @menglingwei

@kwoodson kwoodson requested review from JoelSpeed and elmiko November 4, 2021 01:33
@kwoodson kwoodson marked this pull request as draft November 4, 2021 01:33
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2021
@kwoodson
Copy link
Contributor Author

kwoodson commented Nov 4, 2021

@menglingwei I based these commits on top of your work. Please review and feel free to comment.

https://github.com/openshift/cluster-api-provider-alibaba/pull/9/files#r740749797
@menglingwei Responding to your comment ^^, I had to remove the resourceGroupID from the security group query. When I include it the security groups come back empty. When I remove it, the correct security groups are found. We can discuss this on a call if you would like.

@menglingwei @bd233 @dongchen126 Is it possible to set the private zone to auto synchronize the DNS names during the provisioner?

@elmiko @JoelSpeed Is it possible to get the CSR process to work without the DNS synchronization? I do not see a method of turning this on during the terraform provisioning. We'll see what Alibaba provides.

@menglingwei
Copy link
Contributor

@menglingwei I based these commits on top of your work. Please review and feel free to comment.

https://github.com/openshift/cluster-api-provider-alibaba/pull/9/files#r740749797 @menglingwei Responding to your comment ^^, I had to remove the resourceGroupID from the security group query. When I include it the security groups come back empty. When I remove it, the correct security groups are found. We can discuss this on a call if you would like.

The parameter resourceGroupID is optional. So you can remove it.

@menglingwei @bd233 @dongchen126 Is it possible to set the private zone to auto synchronize the DNS names during the provisioner?

@kwoodson Which DNS names need to be resolved? Is there a list? Which component would be better than adding this feature? installer or cluster-api-provider?

@elmiko @JoelSpeed Is it possible to get the CSR process to work without the DNS synchronization? I do not see a method of turning this on during the terraform provisioning. We'll see what Alibaba provides.

@menglingwei
Copy link
Contributor

@menglingwei I based these commits on top of your work. Please review and feel free to comment.
https://github.com/openshift/cluster-api-provider-alibaba/pull/9/files#r740749797 @menglingwei Responding to your comment ^^, I had to remove the resourceGroupID from the security group query. When I include it the security groups come back empty. When I remove it, the correct security groups are found. We can discuss this on a call if you would like.

The parameter resourceGroupID is optional. So you can remove it.

@menglingwei @bd233 @dongchen126 Is it possible to set the private zone to auto synchronize the DNS names during the provisioner?

@kwoodson Which DNS names need to be resolved? Is there a list? Which component would be better than adding this feature? installer or cluster-api-provider?

@elmiko @JoelSpeed Is it possible to get the CSR process to work without the DNS synchronization? I do not see a method of turning this on during the terraform provisioning. We'll see what Alibaba provides.

@kwoodson The last time you tested querying a list of security groups,you said that if you set resourceGroupID, you cannot return the result,but if you removed resourceGroupID, the result is correct. So i have a question about your securityGroups. Whether your security group does not have a resourceGroup?

Here are our api documentions.

CreateResourceGroup : https://www.alibabacloud.com/help/doc-detail/158865.htm?spm=a2c63.p38356.879954.3.7c2e4b3ehm0arG#doc-api-ResourceManager-CreateResourceGroup

CreateSecurityGroup: https://www.alibabacloud.com/help/doc-detail/25553.htm

@JoelSpeed
Copy link
Contributor

@elmiko @JoelSpeed Is it possible to get the CSR process to work without the DNS synchronization? I do not see a method of turning this on during the terraform provisioning. We'll see what Alibaba provides.

DNS is a requirement for machines within OCP. We need to have this turned on. If this isn't supported by terraform, perhaps we need to extend the terraform to allow it to be turned on? DNS is a part of the security process that we use for the CSRs and is also used for Kube API to Kubelet communication, so it must be present for a working OCP cluster.

@kwoodson kwoodson force-pushed the alibaba_providerfix branch from 55eac5e to f495159 Compare November 4, 2021 16:37
@kwoodson kwoodson force-pushed the alibaba_providerfix branch from f495159 to 91d4951 Compare November 4, 2021 19:35
@kwoodson
Copy link
Contributor Author

kwoodson commented Nov 4, 2021

DNS is a requirement for machines within OCP. We need to have this turned on. If this isn't supported by terraform, perhaps we need to extend the terraform to allow it to be turned on? DNS is a part of the security process that we use for the CSRs and is also used for Kube API to Kubelet communication, so it must be present for a working OCP cluster.

I have asked Alibaba to look into the DNS requirement.

On a side note, @elmiko and I were able to get this working without using the automatic DNS synchronization. We were able to plumb this provider today by removing an unexpected tag from the DescribeInstances query. This enables the us to reliably search and match instances. This proved successful and we were able to install and provision a cluster.

Nodes:

NAME                                 STATUS   ROLES    AGE   VERSION
test-kqrvc-master-0                  Ready    master   90m   v1.22.1+1b2affc
test-kqrvc-master-1                  Ready    master   92m   v1.22.1+1b2affc
test-kqrvc-master-2                  Ready    master   90m   v1.22.1+1b2affc
test-kqrvc-worker-us-east-1a-w64zp   Ready    worker   31m   v1.22.1+1b2affc
test-kqrvc-worker-us-east-1b-4995f   Ready    worker   30m   v1.22.1+1b2affc
test-kqrvc-worker-us-east-1b-5rfpr   Ready    worker   26m   v1.22.1+1b2affc

Machines:

NAMESPACE               NAME                                 PHASE     TYPE            REGION      ZONE         AGE
openshift-machine-api   test-kqrvc-master-0                  Running   ecs.g6.xlarge   us-east-1   us-east-1b   93m
openshift-machine-api   test-kqrvc-master-1                  Running   ecs.g6.xlarge   us-east-1   us-east-1a   93m
openshift-machine-api   test-kqrvc-master-2                  Running   ecs.g6.xlarge   us-east-1   us-east-1b   93m
openshift-machine-api   test-kqrvc-worker-us-east-1a-w64zp   Running   ecs.g6.large    us-east-1   us-east-1a   87m
openshift-machine-api   test-kqrvc-worker-us-east-1b-4995f   Running   ecs.g6.large    us-east-1   us-east-1b   87m
openshift-machine-api   test-kqrvc-worker-us-east-1b-5rfpr   Running   ecs.g6.large    us-east-1   us-east-1b   87m

I am going to remove the draft status.

One note:
We are awaiting a fix in github.com/openshift/installer/pull/5348 which places the resourceGroupID on the security groups.

@kwoodson kwoodson marked this pull request as ready for review November 4, 2021 19:46
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2021
@kwoodson
Copy link
Contributor Author

kwoodson commented Nov 8, 2021

cc @JoelSpeed @elmiko

Comment on lines +380 to +382
if machineProviderConfig.SecurityGroupID != "" {
securityGroupIDs = append(securityGroupIDs, machineProviderConfig.SecurityGroupID)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird, I would expect the following in yaml to suffice, having a separate field for a dedicated SG seems counterintuative, also, it may be surprising to a user that using this field breaks using the list field

securityGroups:
- id: blah

Comment on lines +454 to +456
if machineProviderConfig.VSwitchID != "" {
vSwitchID = machineProviderConfig.VSwitchID
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, would expect a single way to specify this

})
klog.Errorf("error describing instances: %v", err)
return "", fmt.Errorf("error describing instances: %v", err)
if machineProviderConfig.VSwitch != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should clean this up a bit and unindent by reversing some of the logic, and maybe move some into a separate smaller function, eg.

if machineProviderConfig.VSwitch == nil {
  return "", errors.New("no vswitch configuration provided")
}

if machineProviderConfig.VSwitch.ID != "" {
  return machineProviderConfig.VSwitch.ID, nil
}

if machineProviderConfig.VSwitch.Tags != nil {
  vSwitchID, err := getVSwitchIDFromTags(machine, machineProviderConfig.VSwitch.Tags, client)
  if err != nil {
     return "", fmt.Errorf("could not get vSwitchID from tags: %v", err)
  }
  return vSwitchID, nil
}

return "", errors.New("no vSwitch found from configuration")

Could also apply the same sort of clean ups to the security group configuration checks

// SecurityGroups is an array of references to security groups which to assign the instance. The valid values of N vary based on the
// maximum number of security groups to which an instance can belong. For more information, see the "Security group limits" section in Limits.
// https://www.alibabacloud.com/help/doc-detail/101348.htm?spm=a2c63.p38356.879954.48.78f0199aX3dfIE
SecurityGroups []ResourceTagReference `json:"securityGroups,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove SecurityGroupID so there's only one way to configuration security groups

//VSwitch is a reference to the vswitch to use for this instance
//This parameter is required when you create an instance of the VPC type.
//You can call the DescribeVSwitches operation to query the created vSwitches.
VSwitch *ResourceTagReference `json:"vSwitch,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove VSwitchID so there's only one way to configuration the vSwitch

ID string `json:"id,omitempty"`

// Tags is a set of tags used to identify a resource
Tags []Tag `json:"tags,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does alibaba have a way they call this when finding instances by tags? Do they use a --tags or maybe --filter as AWS does? Just want to make sure we name this in a way that it's going to be obvious to users

Comment on lines +19 to +20
alibabacloudproviderv1 "github.com/AliyunContainerService/cluster-api-provider-alibabacloud/pkg/apis/alibabacloudprovider/v1beta1"
alibabacloudClient "github.com/AliyunContainerService/cluster-api-provider-alibabacloud/pkg/client"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be from the openshift fork now?

alibabacloudproviderv1 "github.com/AliyunContainerService/cluster-api-provider-alibabacloud/pkg/apis/alibabacloudprovider/v1beta1"
alibabacloudClient "github.com/AliyunContainerService/cluster-api-provider-alibabacloud/pkg/client"
"github.com/aliyun/alibaba-cloud-sdk-go/services/ecs"
machinev1 "github.com/openshift/machine-api-operator/pkg/apis/machine/v1beta1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use openshift/api/machine/v1beta1 instead

@kwoodson
Copy link
Contributor Author

kwoodson commented Nov 9, 2021

@JoelSpeed I have opened PR #12 to address your concerns. I believe that this PR is ready to merge based upon our earlier discussion.

WDYT?

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm adding both approval labels to this PR, we have agreed that @JoelSpeed 's comments will be addressed on #12, and i have talked with @kwoodson. getting this merged will unblock the installer work we are doing as well as the debugging of this controller.

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 10, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 10, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko, kwoodson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 10, 2021
@openshift-merge-robot openshift-merge-robot merged commit be5557a into openshift:main Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants