dns: refactor DNS handling#121
Conversation
pkg/dns/aws/dns.go
Outdated
| // | ||
| // 1. A public zone shared by all clusters with a domain name equal to the cluster base domain | ||
| // 2. A private zone for the cluster with a domain name equal to the cluster base domain | ||
| // Manager provides AWS DNS record management. In this implmentation, calling |
pkg/dns/dns.go
Outdated
| // Type is the DNS record type. | ||
| Type RecordType | ||
|
|
||
| // Alias are the options for an ALIAS record. |
pkg/manifests/manifests_test.go
Outdated
| "time" | ||
|
|
||
| ingressv1alpha1 "github.com/openshift/cluster-ingress-operator/pkg/apis/ingress/v1alpha1" | ||
| "github.com/openshift/cluster-ingress-operator/pkg/manifests" |
There was a problem hiding this comment.
This is an import cycle (self-import); can just delete it.
pkg/manifests/manifests_test.go
Outdated
| if _, err := f.RouterServiceCloud(ci); err != nil { | ||
| t.Errorf("invalid RouterServiceCloud: %v", err) | ||
| } | ||
| manifests.LoadBalancerService() |
There was a problem hiding this comment.
Delete the manifests. (this is in the manifests package).
| // TODO: This will need revisited when we stop defaulting .spec.ingressDomain | ||
| // and .spec.highAvailability as both can be nil but used with an effective | ||
| // system-provided default reported on status. | ||
| if ci.Spec.HighAvailability.Type != ingressv1alpha1.CloudClusterIngressHA || ci.Spec.IngressDomain == nil { |
There was a problem hiding this comment.
We do indeed want eventually to replace ci.Spec.HighAvailability.Type with an effective value, but as long as we are using ci.Spec.HighAvailability, we should still check whether it is nil.
| // desired if the high availability type is Cloud. An LB service will declare an | ||
| // owner reference to the given deployment. | ||
| func desiredLoadBalancerService(ci *ingressv1alpha1.ClusterIngress, infra *configv1.Infrastructure, deployment *appsv1.Deployment) (*corev1.Service, error) { | ||
| if ci.Spec.HighAvailability.Type != ingressv1alpha1.CloudClusterIngressHA { |
There was a problem hiding this comment.
Need to check whether ci.Spec.HighAvailability is nil.
|
|
||
| // ensureLoadBalancerService creates an LB service if one is desired but absent. | ||
| // Always returns the current LB service if one exists (whether it already | ||
| // existed or was created during the course of the function). |
There was a problem hiding this comment.
This means ensureRouterForIngress will ensureDNS an existing LB service even if none is desired, right? Do we want to do that?
There was a problem hiding this comment.
If one exists at all, then ensureDNS will try and create any DNS for it. If later on the clusteringress is deleted, we'll tear it all down.
If the clusteringress was deleted we won't even be in this code path in the first place.
Not sure there's an error here yet. However, I think this decomposition lets us at least start reasoning about the scenarios...
|
Should Edit: Never mind, we use |
I thought the intent of the DNS API was to allow us to specifically address a unique zone, regardless of its scope in the underlying platform (e.g. public/private, which is an implementation detail). Put another way, two zones can't have the same ID (or tag set as a proxy for ID) but different scopes. @abhinavdahiya, do you agree with that? |
|
|
||
| // Set up the DNS manager. | ||
| dnsManager, err := createDNSManager(kubeClient, operatorNamespace, infraConfig, dnsConfig, clusterVersionConfig) | ||
| dnsManager, err := createDNSManager(kubeClient, operatorNamespace, infraConfig, dnsConfig) |
There was a problem hiding this comment.
We can also delete the get at line 80, and the RBAC rule in 00-cluster-role.yaml.
The scenario I have in mind is that you have a public zone and a private zone and specify tags for each that select both. Presumably that is in invalid configuration, right? The API doesn't make that obvious. |
5bf06fd to
de31dcb
Compare
* clean up constraint/override stanzas * pin to kubernetes 1.12.5 (to match OKD) * pin to controller-runtime v0.1.9 (the latest compatible with kubernetes 1.12.5) * pin to aws v1.15.72 * upgrade openshift/api to pick up new config types
These changes are primarily in support of cluster DNS management API changes, but also include the beginnings of larger refactor of the controller into a more discrete desired/current/apply-changes model. The general pattern being promoted is: * _Desired_: compute the desired state of a resource. * _Current_: get the current state of a resource. * _Ensure_: given the desired and current state of a resource in our managed graph, compute and apply any necessary changes. * _Finalize_: clean up any external resources which depend on the current resource. This change is meant to be a minimal incremental refactor to help us start reigning in complexity with the goal of being able to support more sophisticated change management. Hopefully with enough such refactors new useful patterns will emerge, and in the meantime there is benefit in testability and legibility in this sort of decomposition. _Note about `manifests.go`_: another important goal is to eliminate the [ManifestFactory](pkg/manifests/manifests.go) type and replace it with stateless functions which simply load and return manifest content — any additional processing of a manifest to compute desired state should be moved into controller files. This is to contain sprawl. _Note about resource naming_: when trying to get the current state of a resource, instead of computing the desired state just to get the name of the resource to fetch, start extracting functions to compute names of resources which can take operator config into account (e.g. operator/operand namespaces) and which can be reused for both current and desired computations. Extract LB service management into a separate file, and add LB service finalization to the ingress deletion logic. Extract `ServiceMonitor` management into a separate file. Eliminate the dependency on the prometheus project and switch to using `Unstructured` for managing servicemonitor resources; this fixes a race condition where the `ServiceMonitor` may not exist yet when we construct our client, which uses a caching discovery client that won't ever refresh and pick up the type later. This file contains new and extracted pieces from the main controller file which support dealing with the LB service. This file contains new functions to handle applying DNS records associated with an LB service. Note that dealing with _current_ resources isn't yet supported; for now, always try to apply the desired state and rely on the DNS manager to manage no-ops efficiently. Remove the `RouterServiceCloud()` function and replace it with a stateless `LoadBalancerService()` function which only loads contents of the manifest. Move all the logic for computing desired state into the controller. Do the same for `MetricsServiceMonitor()`. Add new types to represent DNS records (taking us further down the slippery slope of reinventing [external-dns](https://github.com/kubernetes-incubator/external-dns)). Use the new types with expanded DNS interface methods for `Ensure()` and `Delete()` to allow more sophisticated DNS management. Refactor away all assumptions about public and private zones and installer topology. Intead, each call to `Ensure()` or `Delete()` works with a single `Record` as input and uses the zone information encoded in the `Record`. This eliminates the need for any zone discovery. When a record has a zone addressed by tags rather than ID, cache the ID we find for the zone. Unwire ingress and cluster version config from DNS manager, replacing their usage with DNS config. This allows for a much simpler and more reliable DNS management implementation. Remove only managed components by default rather than the entire operator infra; this is better when trying to run a local operator process against a remote cluster.
de31dcb to
f4d5db3
Compare
|
@pravisankar I threw in the ServiceMonitor change last minute to help with flakes since I was already fixing dependencies and doing refactoring around the area. |
| MetricsClusterRoleBinding = "assets/router/metrics/cluster-role-binding.yaml" | ||
| MetricsRole = "assets/router/metrics/role.yaml" | ||
| MetricsRoleBinding = "assets/router/metrics/role-binding.yaml" | ||
| MetricsServiceMonitorAsset = "assets/router/metrics/service-monitor.yaml" |
There was a problem hiding this comment.
Looks like MetricsServiceMonitorAsset can be deleted now.
|
Woe: |
|
/retest |
|
@Miciah was that before my service monitor patch? Such an error should now lead to retries |
|
/retest |
It was after. |
|
/lgtm |
|
/retest |
|
Router failure in the last failed run was due to the cluster ID mismatch bug (openshift/installer#762) |
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ironcladlou, Miciah, pravisankar The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Depends on openshift/installer#1233.
Blocks openshift/installer#1169.
Controller changes
These changes are primarily in support of cluster DNS management API changes, but also include the beginnings of larger refactor of the controller into a more discrete desired/current/apply-changes model. The general pattern being promoted is:
This change is meant to be a minimal incremental refactor to help us start reigning in complexity with the goal of being able to support more sophisticated change management. Hopefully with enough such refactors new useful patterns will emerge, and in the meantime there is benefit in testability and legibility in this sort of decomposition.
Note about
manifests.go: another important goal is to eliminate the ManifestFactory type and replace it with stateless functions which simply load and return manifest content — any additional processing of a manifest to compute desired state should be moved into controller files. This is to contain sprawl.Note about resource naming: when trying to get the current state of a resource, instead of computing the desired state just to get the name of the resource to fetch, start extracting functions to compute names of resources which can take operator config into account (e.g. operator/operand namespaces) and which can be reused for both current and desired computations.
pkg/operator/controller/controller.go
Extract LB service management into a separate file, and add LB service finalization to the ingress deletion logic.
Extract
ServiceMonitormanagement into a separate file. Eliminate the dependency on the prometheus project and switch to usingUnstructuredfor managing servicemonitor resources; this fixes a race condition where theServiceMonitormay not exist yet when we construct our client, which uses a caching discovery client that won't ever refresh and pick up the type later.pkg/operator/controller/controller_lb.go
This file contains new and extracted pieces from the main controller file which support dealing with the LB service.
pkg/operator/controller/controller_dns.go
This file contains new functions to handle applying DNS records associated with an LB service. Note that dealing with current resources isn't yet supported; for now, always try to apply the desired state and rely on the DNS manager to manage no-ops efficiently.
pkg/manifests/manifests.go
Remove the
RouterServiceCloud()function and replace it with a statelessLoadBalancerService()function which only loads contents of the manifest. Move all the logic for computing desired state into the controller.Do the same for
MetricsServiceMonitor().DNS manager changes
pkg/dns/dns.go
Add new types to represent DNS records (taking us further down the slippery slope of reinventing external-dns). Use the new types with expanded DNS interface methods for
Ensure()andDelete()to allow more sophisticated DNS management.pkg/dns/aws/dns.go
Refactor away all assumptions about public and private zones and installer topology. Intead, each call to
Ensure()orDelete()works with a singleRecordas input and uses the zone information encoded in theRecord. This eliminates the need for any zone discovery. When a record has a zone addressed by tags rather than ID, cache the ID we find for the zone.cmd/cluster-ingress-operator/main.go
Unwire ingress and cluster version config from DNS manager, replacing their usage with DNS config. This allows for a much simpler and more reliable DNS management implementation.
Misc
hack/uninstall.sh
Remove only managed components by default rather than the entire operator infra; this is better when trying to run a local operator process against a remote cluster.