-
Notifications
You must be signed in to change notification settings - Fork 1.5k
OCPBUGS-19552: Fixed DNS issues in OKD/FCOS due to split dns in systemd-resolved #7516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@JM1: This pull request references Jira Issue OCPBUGS-19552, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
great find and commit msg 👍 /lgtm |
|
@JM1: This pull request references Jira Issue OCPBUGS-19552, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
c4861fb to
b6cc933
Compare
|
Last force-push fixes the commit message only. |
|
/lgtm |
…md-resolved OKD/FCOS uses FCOS without OKD/OCP code as bootimages for IPI, SNO and Agent-based Installer. FCOS uses systemd-resolved for handling DNS queries. During installation, the bootstrap node or rendezvous host executes a bootkube.sh script which queries the cluster api.* and api-int.* endpoints to detect when the in-cluster control plane has come up. The DNS name resolution of these endpoints is supposed to be handled by CoreDNS which listens to 0.0.0.0:53 at the same node and is registered with 127.0.0.1 in /etc/resolv.conf. Some tools such as curl which are called via bootkube.sh [0] do not query /etc/resolv.conf but use Name Service Switch (NSS) [1] which delegates queries to systemd-resolved. systemd-resolved uses split DNS [2] to route DNS queries to specific nameservers depending on the dns domain settings of the network interfaces. When the bootstrap node or rendezvous host is booted with FCOS, it will retrieve its network configuration via DHCP. The domain record which is received via DHCP (and is part of the OKD cluster domain) is associated with the network interface which handled the DHCP exchange. This causes systemd-resolved to send all DNS queries for the OKD cluster domain through the network DNS server, but never to CoreDNS. OKD and OCP do not require the network DNS server to be able to resolve the api-int.* endpoint on bare-metal servers for HA deployments with IPI or Agent-based Installer (for SNO it is required). When api-int.* is not resolved by the network DNS servers, then services such as bootkube.sh at the rendezvous host (from Agent-based Installer) cannot query the in-cluster control plane and installation will stall (because systemd-resolved will not use CoreDNS due to Split DNS). With this change, the cluster domain will be associated with the DNS server at 127.0.0.1 which is CoreDNS. systemd-resolved will then properly query CoreDNS when resolving api-int.* and Agent-based Installer can finish successfully. [0] https://github.com/openshift/installer/blob/master/data/data/bootstrap/files/usr/local/bin/bootstrap-verify-api-server-urls.sh [1] https://www.mankier.com/5/nsswitch.conf [2] https://fedoramagazine.org/systemd-resolved-introduction-to-split-dns/
b6cc933 to
ed594dc
Compare
|
FYI Last patch is rebased on top of master and resolves merge conflicts. |
|
/lgtm |
|
/retest-required |
|
/approve |
|
Note: the CI has still some issues (not related to this patch), and they are affecting #7505 as well |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andfasano The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@JM1: This pull request references Jira Issue OCPBUGS-19552, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold cancel |
|
/retest-required |
|
/hold Revision ed594dc was retested 3 times: holding |
|
@JM1: This pull request references Jira Issue OCPBUGS-19552, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Note: will require #7484 for a proper agent testing |
|
/retest-required |
|
/lgtm Makes sense. This shouldn't affect the non-okd jobs since ocp doesn't use systemd-resolved for resolution. |
Thanks @cybertron for the feedback |
|
@JM1: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required |
|
/hold cancel |
|
@JM1: Jira Issue OCPBUGS-19552: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-19552 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
OCP requires DNS records api.<cluster_domain> and *.apps.\ <cluster_domain> to be externally resolvable (<cluster_domain> is <cluster_name>.<base_domain>). For SNO this list also includes DNS record api-int.<cluster_domain>. However, OCP does not enforce ownership of all subdomains of <cluster_domain>. For example, it is allowed to host a disconnected image registry at <registry_hostname>.<cluster_domain> and OCP shall be able to resolve it using the user-supplied external DNS resolver. PR openshift#7516 changed the systemd-resolved config of the bootstrap node / rendezvous host to associate the complete <cluster_domain> with the DNS server at 127.0.0.1 where CoreDNS is supposed to be listening. When a disconnected image registry is used for cluster installation, the registry is hosted at <registry_hostname>.<cluster_domain> and the bootstrap node / rendezvous host does not retrieve its domain from the DHCP server, then the registry's DNS name cannot be resolved. That is because in order to pull the CoreDNS image, the disconnected registry must be connected. The split dns mechanism of systemd-\ resolved would cause it to send DNS requests for <registry_hostname>.<cluster_domain> to 127.0.0.1 where CoreDNS is expected to be running which is not. When a bootstrap node / rendezvous host retrieves its domain <cluster_domain> from a DHCP server (e.g. dnsmasq's '--domain' option) then systemd-resolved would associate <cluster_domain> not only with 127.0.0.1 but also with the physical network interface, causing DNS requests for <registry_hostname>.<cluster_domain> to be send out to 127.0.0.1 as well as the external DNS resolver. This patch mitigates the DNS issue for other network setups. It changes the systemd-resolved config to forward DNS requests to CoreDNS only for domains which are resolvable by CoreDNS: * api.<cluster_domain> * api-int.<cluster_domain>. * apps.<cluster_domain> DNS requests for <registry_hostname>.<cluster_domain> and other subdomains of <cluster_domain> will be send out to the external DNS resolver. Fixes openshift#7516
OCP requires DNS records api.<cluster_domain> and *.apps.\ <cluster_domain> to be externally resolvable (<cluster_domain> is <cluster_name>.<base_domain>). For SNO this list also includes DNS record api-int.<cluster_domain>. However, OCP does not enforce ownership of all subdomains of <cluster_domain>. For example, it is allowed to host a disconnected image registry at <registry_hostname>.<cluster_domain> and OCP shall be able to resolve it using the user-supplied external DNS resolver. PR openshift#7516 changed the systemd-resolved config of the bootstrap node / rendezvous host to associate the complete <cluster_domain> with the DNS server at 127.0.0.1 where CoreDNS is supposed to be listening. When a disconnected image registry is used for cluster installation, the registry is hosted at <registry_hostname>.<cluster_domain> and the bootstrap node / rendezvous host does not retrieve its domain from the DHCP server, then the registry's DNS name cannot be resolved. That is because in order to pull the CoreDNS image, the disconnected registry must be connected. The split dns mechanism of systemd-\ resolved would cause it to send DNS requests for <registry_hostname>.<cluster_domain> to 127.0.0.1 where CoreDNS is expected to be running which is not. When a bootstrap node / rendezvous host retrieves its domain <cluster_domain> from a DHCP server (e.g. dnsmasq's '--domain' option) then systemd-resolved would associate <cluster_domain> not only with 127.0.0.1 but also with the physical network interface, causing DNS requests for <registry_hostname>.<cluster_domain> to be send out to 127.0.0.1 as well as the external DNS resolver. This patch mitigates the DNS issue for other network setups. It changes the systemd-resolved config to forward DNS requests to CoreDNS only for domains which are resolvable by CoreDNS: * api.<cluster_domain> * api-int.<cluster_domain>. * apps.<cluster_domain> DNS requests for <registry_hostname>.<cluster_domain> and other subdomains of <cluster_domain> will be send out to the external DNS resolver. Fixes openshift#7516
OCP requires DNS records api.<cluster_domain> and *.apps.\ <cluster_domain> to be externally resolvable (<cluster_domain> is <cluster_name>.<base_domain>). For SNO this list also includes DNS record api-int.<cluster_domain>. However, OCP does not enforce ownership of all subdomains of <cluster_domain>. For example, it is allowed to host a disconnected image registry at <registry_hostname>.<cluster_domain> and OCP shall be able to resolve it using the user-supplied external DNS resolver. PR openshift#7516 changed the systemd-resolved config of the bootstrap node / rendezvous host to associate the complete <cluster_domain> with the DNS server at 127.0.0.1 where CoreDNS is supposed to be listening. When a disconnected image registry is used for cluster installation, the registry is hosted at <registry_hostname>.<cluster_domain> and the bootstrap node / rendezvous host does not retrieve its domain from the DHCP server, then the registry's DNS name cannot be resolved. That is because in order to pull the CoreDNS image, the disconnected registry must be connected. The split dns mechanism of systemd-\ resolved would cause it to send DNS requests for <registry_hostname>.<cluster_domain> to 127.0.0.1 where CoreDNS is expected to be running which is not. When a bootstrap node / rendezvous host retrieves its domain <cluster_domain> from a DHCP server (e.g. dnsmasq's '--domain' option) then systemd-resolved would associate <cluster_domain> not only with 127.0.0.1 but also with the physical network interface, causing DNS requests for <registry_hostname>.<cluster_domain> to be send out to 127.0.0.1 as well as the external DNS resolver. This patch mitigates the DNS issue for other network setups. It changes the systemd-resolved config to forward DNS requests to CoreDNS only for domains which are resolvable by CoreDNS: * api.<cluster_domain> * api-int.<cluster_domain>. * apps.<cluster_domain> DNS requests for <registry_hostname>.<cluster_domain> and other subdomains of <cluster_domain> will be send out to the external DNS resolver. Fixes openshift#7516 (cherry picked from commit 5380ad9)
OKD/FCOS uses FCOS without OKD/OCP code as bootimages for IPI, SNO and Agent-based Installer. FCOS uses systemd-resolved for handling DNS queries.
During installation, the bootstrap node or rendezvous host executes a bootkube.sh script which queries the cluster api.* and api-int.* endpoints to detect when the in-cluster control plane has come up. The DNS name resolution of these endpoints is supposed to be handled by CoreDNS which listens to 0.0.0.0:53 at the same node and is registered with 127.0.0.1 in /etc/resolv.conf.
Some tools such as curl which are called via bootkube.sh do not query /etc/resolv.conf but use Name Service Switch (NSS) which delegates queries to systemd-resolved. systemd-resolved uses split DNS to route DNS queries to specific nameservers depending on the dns domain settings of the network interfaces.
When the bootstrap node or rendezvous host is booted with FCOS, it will retrieve its network configuration via DHCP. The domain record which is received via DHCP (and is part of the OKD cluster domain) is associated with the network interface which handled the DHCP exchange. This causes systemd-resolved to send all DNS queries for the OKD cluster domain through the network DNS server, but never to CoreDNS.
OKD and OCP do not require the network DNS server to be able to resolve the api-int.* endpoint on bare-metal servers for HA deployments with IPI or Agent-based Installer (for SNO it is required). When api-int.* is not resolved by the network DNS servers, then services such as bootkube.sh at the rendezvous host (from Agent-based Installer) cannot query the in-cluster control plane and installation will stall (because systemd-resolved will not use CoreDNS due to Split DNS).
With this change, the cluster domain will be associated with the DNS server at 127.0.0.1 which is CoreDNS. systemd-resolved will then properly query CoreDNS when resolving api-int.* and Agent-based Installer can finish successfully.
cc @vrutkovs @andfasano @LorbusChris