Skip to content

Conversation

@serbrech
Copy link
Contributor

@serbrech serbrech commented May 7, 2019

The azure dns zones do not support ALIAS records.
We use A records instead.

@openshift-ci-robot openshift-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 7, 2019
@ironcladlou
Copy link
Contributor

/hold

Thanks for taking a stab at this! Before you invest much more time in the patch, you should know we're actively working to replace the internal DNS manager with an external-dns controller implementation.

Do you have any reason to suspect we'll encounter problems with the external-dns Azure provider?

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 7, 2019
@ironcladlou
Copy link
Contributor

I've been reviewing this PR, and it's looking really good. Thank you!

In the background I'm working on figuring out how we can validate this outside of CI. Right now it's looking like we'd want to try and accept this when master re-opens for the 4.2 release, even in advance of our external-dns plans.

@serbrech
Copy link
Contributor Author

serbrech commented May 7, 2019

Thanks for the quick feedback!
There are a couple things I need to validate before I'd remove [WIP], and I didn't run it on a cluster yet.

Q: Why is there a cache for AWS? is that something we'd need to do as well?

I think it makes sense to contribute to external-dns as it's doing pretty much the same thing.
We've used it successfully up to a very high scale, it's solid.

From browsing their code, it finds the Azure DNSZone by resource group.
In the openshift installer, we require the public dns zone to be existing in the subscription prior to deployment, so it is typically in a different RG.
That means it would require some changes, either in external-dns, or in openshift installer.

Meanwhile, this PR, once completed, should be enough to move forward until you make the switch.

@ironcladlou
Copy link
Contributor

@serbrech

Q: Why is there a cache for AWS? is that something we'd need to do as well?

It's a cheap/quick local optimization to drastically reduce chattiness with the Route53 API. I won't be surprised if something similar is needed for Azure in practice — will think on it more.

I think it makes sense to contribute to external-dns as it's doing pretty much the same thing.
We've used it successfully up to a very high scale, it's solid.

Thanks for the feedback. I'm encouraged to hear that!

From browsing their code, it finds the Azure DNSZone by resource group.
In the openshift installer, we require the public dns zone to be existing in the subscription prior to deployment, so it is typically in a different RG.
That means it would require some changes, either in external-dns, or in openshift installer.

Not entirely sure I understand. Are proposing some change to the public zone type in the OpenShift cluster config API? (That's currently the only DNS configuration recognized by the ingress operator.)

@serbrech
Copy link
Contributor Author

serbrech commented May 8, 2019

Not entirely sure I understand. Are proposing some change to the public zone type in the OpenShift cluster config API? (That's currently the only DNS configuration recognized by the ingress operator.)

There is 2 dns zones. private, and public.
the ingress operator is inserting records in both

That is not supported by external-dns.

@danehans
Copy link
Contributor

That is not supported by external-dns.

The ExternalDNS faq covers this limitation in more detail.

@serbrech serbrech force-pushed the azure-provider branch 3 times, most recently from a6f222b to 9020881 Compare May 15, 2019 14:40
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2019
@serbrech
Copy link
Contributor Author

serbrech commented May 19, 2019

logs after applying the azure credentialrequest, with the updated azure credential operator :

% KUBECONFIG=~/tmp/foo5/auth/kubeconfig RELEASE_VERSION=4.2.0-0.okd-2019-05-18-050412 IMAGE=registry.svc.ci.openshift.org/origin/4.2-2019-05-18-050412@sha256:d147b0db308cd33aae85c0147cbc7c0883148e8f197f24e579a6a78d16500e6b WATCH_NAMESPACE=openshift-ingress-operator ./ingress-operator
2019-05-19T10:39:26.364-0700    INFO    operator        log/log.go:26   started zapr logger
2019-05-19T10:39:32.239-0700    INFO    operator.entrypoint     ingress-operator/main.go:154    using azure creds from secret   {"namespace": "openshift-ingress-operator", "name": "cloud-credentials"}
2019-05-19T10:39:48.279-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:121    Starting EventSource    {"controller": "operator-controller", "source": "kind source: /, Kind="}
2019-05-19T10:39:52.971-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:121    Starting EventSource    {"controller": "operator-controller", "source": "informer source: 0xc0000eea20"}
2019-05-19T10:39:52.972-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:121    Starting EventSource    {"controller": "operator-controller", "source": "informer source: 0xc0000eeab0"}
2019-05-19T10:39:52.972-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:121    Starting EventSource    {"controller": "certificate-controller", "source": "kind source: /, Kind="}
2019-05-19T10:39:52.972-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:121    Starting EventSource    {"controller": "certificate-publisher-controller", "source": "informer source: 0xc0000eee10"}
2019-05-19T10:39:52.972-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:121    Starting EventSource    {"controller": "certificate-publisher-controller", "source": "kind source: /, Kind="}
2019-05-19T10:39:52.972-0700    INFO    operator.init   operator/operator.go:187        waiting for cache to sync
2019-05-19T10:39:53.138-0700    INFO    operator.init   operator/operator.go:126        queueing ingress        {"name": "default", "related": "/api/v1/namespaces/openshift-ingress/services/router-internal-default"}
2019-05-19T10:39:53.138-0700    INFO    operator.init   operator/operator.go:126        queueing ingress        {"name": "default", "related": "/api/v1/namespaces/openshift-ingress/services/router-default"}
2019-05-19T10:39:53.216-0700    INFO    operator.init   operator/operator.go:126        queueing ingress        {"name": "default", "related": "/apis/apps/v1/namespaces/openshift-ingress/deployments/router-default"}
2019-05-19T10:39:54.773-0700    INFO    operator.init   operator/operator.go:191        cache synced
2019-05-19T10:39:54.875-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:134    Starting Controller     {"controller": "certificate-publisher-controller"}
2019-05-19T10:39:54.875-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:134    Starting Controller     {"controller": "operator-controller"}
2019-05-19T10:39:54.875-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:134    Starting Controller     {"controller": "certificate-controller"}
2019-05-19T10:39:54.975-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:154    Starting workers        {"controller": "certificate-publisher-controller", "worker count": 1}
2019-05-19T10:39:54.976-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:154    Starting workers        {"controller": "operator-controller", "worker count": 1}
2019-05-19T10:39:54.976-0700    INFO    operator.init.controller-runtime.controller     controller/controller.go:154    Starting workers        {"controller": "certificate-controller", "worker count": 1}
2019-05-19T10:39:54.981-0700    INFO    operator.certificate-publisher-controller       certificate-publisher/controller.go:174 Reconciling     {"request": "openshift-ingress-operator/default"}
2019-05-19T10:39:54.981-0700    INFO    operator.controller     controller/controller.go:100    reconciling     {"request": "openshift-ingress-operator/default"}
2019-05-19T10:39:55.063-0700    DEBUG   operator.init.controller-runtime.controller     controller/controller.go:236    Successfully Reconciled {"controller": "certificate-publisher-controller", "request": "openshift-ingress-operator/default"}
2019-05-19T10:39:55.962-0700    DEBUG   operator.init.controller-runtime.controller     controller/controller.go:236    Successfully Reconciled {"controller": "certificate-controller", "request": "openshift-ingress-operator/default"}
2019-05-19T10:40:02.678-0700    ERROR   operator.init.controller-runtime.controller     controller/controller.go:217    Reconciler error        {"controller": "operator-controller", "request": "openshift-ingress-operator/default", "error": "failed to ensure ingresscontroller: [failed to ensure DNS for default: no load balancer is assigned to service openshift-ingress/router-default, failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"]", "errorCauses": [{"error": "failed to ensure ingresscontroller: [failed to ensure DNS for default: no load balancer is assigned to service openshift-ingress/router-default, failed to integrate metrics with openshift-monitoring for ingresscontroller default: failed to ensure servicemonitor for default: no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"]"}]}
2019-05-19T10:40:03.679-0700    INFO    operator.controller     controller/controller.go:100    reconciling     {"request": "openshift-ingress-operator/default"}

serbrech added 2 commits May 23, 2019 13:50
vendoring azure-sdk-for-go
the azure sdk requires fsnotify to be overriden
ref: golang/dep#1799
The azure dns zones do not support ALIAS records.
We use A record instead.
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 23, 2019
@serbrech
Copy link
Contributor Author

rebased on master

@serbrech serbrech changed the title [WIP] pkg/dns/azure: implement support for azure pkg/dns/azure: implement support for azure May 23, 2019
ingress := service.Status.LoadBalancer.Ingress
if len(ingress) > 0 && len(ingress[0].Hostname) > 0 {
records, err := desiredDNSRecords(ci, ingress[0].Hostname, dnsConfig)
records, err := desiredDNSAliasRecords(ci, ingress[0].Hostname, dnsConfig)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this needs a similar treatment for .IP handling to avoid leaking A records

@ironcladlou
Copy link
Contributor

I have some additional necessary changes in serbrech#1 which will need incorporated here.

@ironcladlou
Copy link
Contributor

/approve

Will lift the hold once serbrech#1 merges.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2019
* Refactor desired DNS functions

Refactor DNS handling to consolidate how the desired set of DNS records is
computed. This ensures that the same function can be used for both setting up
and tearing down DNS.
Add some unit testing around the desired DNS function.

* Add DNS unit tests
* dns/azure: fix delete handling
@ironcladlou
Copy link
Contributor

@serbrech many, many thanks for this one. I'm going to get it merged so we can iterate more quickly going forward.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label May 30, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ironcladlou, serbrech

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ironcladlou
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 30, 2019
@openshift-merge-robot openshift-merge-robot merged commit 71acd40 into openshift:master May 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants