Skip to content

Conversation

@patrickdillon
Copy link
Contributor

Create an enhancement to outline work for installing and supporting OpenShift on Azure Stack Hub.

cc @openshift/openshift-team-installer
cc @staebler

/label priority/important-soon

@openshift-ci-robot
Copy link

@patrickdillon: The label(s) /label priority/important-soon cannot be applied. These labels are supported: platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, downstream-change-needed

Details

In response to this:

Create an enhancement to outline work for installing and supporting OpenShift on Azure Stack Hub.

cc @openshift/openshift-team-installer
cc @staebler

/label priority/important-soon

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.


The Installer will need to construct this json configuration file from user input and include the file as part of the cloud-provider-config configmap. The file will also need to be present on nodes for the kubelet. `AZURE_ENVIRONMENT_FILEPATH` will need to be set programmatically for ASH on the kubelet, presumably through the kubelet's systemd unit [(see open questions)](#open-questions).

Operators will set the `AZURE_ENVIRONMENT_FILEPATH` and mount the endpoints JSON file from the cloud-provider-config configmap. All operators using the Azure SDK will need to do this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for operators to configure the Azure client directly with the custom endpoints rather than relying on a json file? It would be cumbersome for an operator with a static manifest to do this with mounts, given that the mounted ConfigMap must be copied over to the namespace. The mounting option is feasible for controllers that the operator starts but not easily for operators directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow up on this. You can definitely set the endpoint programmatically when creating the client.
For example, see the following.
https://github.com/staebler/cluster-ingress-operator/blob/3bb8ed560892ae648806e6133d18803da9ec341b/pkg/dns/azure/client/client.go#L88

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@staebler I have updated this and I think these revisions make more sense.

ASH endpoints are user-provided and, therefore, the Azure SDK treats ASH endpoints differently than the already-known endpoints of other Azure environments. When the cloud environment is set to
`AZURESTACKCLOUD` the SDK expects the environment variable `AZURE_ENVIRONMENT_FILEPATH` to point to a [json configuration file](https://kubernetes-sigs.github.io/cloud-provider-azure/install/configs/#azure-stack-configuration), which is [typically located at `/etc/kuberentes/azurestackcloud.json`](https://github.com/kubernetes-sigs/cloud-provider-azure/issues/151).

The Installer will need to construct this json configuration file from user input and include the file as part of the cloud-provider-config configmap. The file will also need to be present on nodes for the kubelet. `AZURE_ENVIRONMENT_FILEPATH` will need to be set programmatically for ASH on the kubelet, presumably through the kubelet's systemd unit [(see open questions)](#open-questions).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will this user input come from? Is the expectation that they will have already set up an AZURE_ENVIRONMENT_FILEPATH on the install machine, given that the installer will presumably need to know the endpoints for its accesses to the Azure API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the update should clarify this.

### Open Questions

1. We need to explore how to pass the JSON endpoints file to the Kubelet:
1. Can the endpoints file be simply passed from the installer through ignition? Does this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be MachineConfigs that the installer adds for master and workers?

Does the bootstrap machine need this as well? I am assuming not since you have been able to run kubelet successfully on the bootstrap machine with the Azure platform.

unit's `EnvironmentFile`s](https://github.com/openshift/machine-config-operator/blob/master/templates/master/01-master-kubelet/_base/units/kubelet.service.yaml#L19). If so, which & how
(ignition or MCO)?
1. Should ASH produce a cluster DNS manifest or follow the baremetal approach and not create one?
1. Suggestions for determining all operators that need to adapt for the ASH config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickdillon patrickdillon force-pushed the azurestack branch 2 times, most recently from 2cea816 to 8f1b8b4 Compare March 31, 2021 20:22
@patrickdillon
Copy link
Contributor Author

/retest

@patrickdillon
Copy link
Contributor Author

@gnufied brought up CSI support for Azure Stack openshift/kubernetes#643 (comment)

We do not have a good background on storage, so I'm glad you reached out. Some quick googling led me to this: https://github.com/kubernetes-sigs/azuredisk-csi-driver

Can we discuss what more needs to be done to add storage support? We can move it over to JIRA too which might be more appropriate.

- requires different rhcos image
- limited instance metadata service (IMDS) for VMs
- does not support private DNS zones
- limited subset of Azure infrastructure available (ergo different Terraform provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other aspect that may fall under this category of limiited subset of infrastructure is limited API versions available. We need to be careful that in the future we are cognizant of API version selection and are specific about which API versions an Azure Stack deployment must support in order to run OpenShift.

[cloud provider configmap](https://github.com/openshift/installer/blob/master/pkg/asset/manifests/cloudproviderconfig.go#L126-L141):

```go
cloudProviderEndpointsKey = "endpoints"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cloudProviderEndpointsKey = "endpoints"
cloudProviderEndpointsKey = "endpoints.json"

Comment on lines 173 to 175
This provider lacks the ability to create a service principal for the resource group and assign a contributor role, which is required by the Ingress controller.

These actions are achievable through the CLI, if necessary these commands could be run as a Terraform post hook.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this conflating the service principal used by the ingress operator with the user-assigned identity attached to the VMs, which is used by the kube-controller-manager? For public Azure, the terraform creates a user-assigned identity but does not create any service principals. Azure Stack does not support user-assigned identities, even through the CLI.

Copy link
Contributor Author

@patrickdillon patrickdillon Apr 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My statement in the enhancement is confused. The issue is, as you say, the lack of user-assigned identities--not service principals, but it looks like the "main" identity is scoped to the resource group; so I don't think it's the VM identities you mention. Here is the code from IPI. The UPI doc states:

Grant the Contributor role to the Azure identity so that the Ingress Operator can create a public IP and its load balancer.

I'm not sure I have a firm grasp on the problem. From reading the docs it seems we need to create this identity, to give operators access to create resources in the resource group. The machine API operator would probably need this as well, but perhaps it is not mentioned as these are UPI docs.

Really what I am missing is why/how isn't this handled by the CCO? If we're using passthrough I assume this isn't a problem at all because those creds clearly have perms to create resources in the resource group.

Let me know what you think and I'll update the doc accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled by the CCO. All of the in-cluster operators use credentials that are granted by the CCO (assuming mint mode). The UPI docs are a bit misleading when they say that the Contributor role is needed by the Ingress operator. The Ingress operator is creating loadbalancer Services, but it is the kube-controller-manager that is creating the cloud resources for those Services.

In public Azure, the kube-controller-manager uses the user-assigned identity assigned to the VM. Azure Stack does not support user-assigned identities. Presumably, we could allow Azure Stack to create system-assigned identities for each VM. The installer would then need to assign the Contributor role to each system-assigned identity after each VM was created. The kubelet also uses the managed identity assigned to the VM, so the managed identity is need for worker machines too. However, the installer does not create the worker VMs so cannot assign the Contributor role to the system-assigned identities for the worker VMs. As far as I am aware, the machine-api does not have a feature for assigning roles to the system-assigned identities of the VMs that the machine-api creates.

A possible solution is to add that feature to the machine-api. Another solution is to supply a Service Principal in the cloud config that the kubelet and kube-controller-manager would use when accessing the Azure API. The latter solution is what I think you are using currently in your UPI work. There may be problems with utilizing that solution long-term due to how the user would replace that Service Principal. I don't think that simply changing the Service Principal in the cloud-provider-config ConfigMap would cause the replacement Service Principal to roll out to all of the kubelets and kube-controller-managers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get it now. The piece I was missing is that the main identity is assigned
to the VMs in TF and the machinesets. Then you explained the rest; thank
you for that!

If we want to manage the roles after creation, then VM extensions may
provide this:
https://registry.terraform.io/providers/hashicorp/azurestack/latest/docs/resources/virtual_machine_extension

But it seems the MAO may be more involved. Yes the UPI solution writes the
cloud provider config with the service principal ID & secret. So I guess
the question is: does updating the cloud provider config configmap in
cluster cause a new machine config to be written? Does a new machine config
cause a reboot, in which case the new creds would be picked up by the
kubelet.

Create an enhancement to outline work for installing and supporting
OpenShift on Azure Stack Hub.
@patrickdillon
Copy link
Contributor Author

@staebler This has been revised. PTAL

Copy link
Contributor

@staebler staebler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 6, 2021
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: staebler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 6, 2021
@openshift-merge-robot openshift-merge-robot merged commit 5da7a7d into openshift:master Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants