Move api-int record from coredns to /etc/hosts #2236

cybertron · 2020-11-17T20:04:20Z

This is to enable more flexibility about when the networking services
are deployed. With the api-int record in /etc/hosts we don't need
coredns to be running for a node to connect to the cluster.

This is to enable more flexibility about when the networking services are deployed. With the api-int record in /etc/hosts we don't need coredns to be running for a node to connect to the cluster.

openshift-ci-robot · 2020-11-17T20:04:40Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cybertron
To complete the pull request process, please assign runcom after the PR has been reviewed.
You can assign the PR to them by writing /assign @runcom in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cybertron · 2020-11-17T20:05:59Z

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere
/cc @mandre @rgolangh @jcpowermac

cgwalters · 2020-11-17T20:06:53Z

templates/common/on-prem/files/etc-hosts.yaml

@@ -0,0 +1,7 @@
+mode: 0644
+path: "/etc/hosts"


People may have legitimately added custom content in here before, I don't think we can unilaterally own this file - particularly on upgrades.

I approve the general idea (this also seems related to #2190 right?).

It's too bad there's not a simple /etc/hosts.d but I guess everyone who wants this ends up using a local resolver.
So that's one option...we could switch to on-host dnsmasq or resolved.

Or we could try to hack in some sort of Ansible-like "lineinfile" support to the MCO just for this.

This is primarily targeted at enabling openshift/enhancements#524

The problem with using dnsmasq for this is that we already run a local DNS server for the other records. It's possible we could make it work, but it gets tricky and makes the DNS flow even more complicated. Right now we configure a local server that forwards to the previously configured servers in resolv.conf. This would add another dnsmasq somewhere in that flow. I guess if there's no other way to do it then we'll have to eat the complexity, but it's not ideal either.

I also did look at using the append option in ignition and had that working. It's a weird fit for MCO though and I couldn't get it to work so I switched to just templating the whole file. It's another thing I could investigate further though.

OK I see. Hum...do we have any offhand guesses around whether my concern about customers injecting /etc/hosts via Ignition is a real concern? It may not happen. If it does though, what would we tell them for configuring static resolution?

Perhaps we can implement our own "merge /etc/hosts" logic by having code in the MCO that on startup takes the current content, appends this data to it into e.g. /run/hosts and then does mount --bind /run/hosts /etc/hosts or so.

That said it feels like what we really want to achieve here is switch over the host's resolver to coredns but only if coredns is active and working? Basically if coredns is up, update /etc/resolv.conf to use 127.0.0.1:53. It's a bit like where we ended up with #2011 - we have a systemd unit which monitors a pod, and if the pod is down takes action.

People may have legitimately added custom content in here before, I don't think we can unilaterally own this file - particularly on upgrades.

The DNS operator adds an entry for the cluster image registry to /etc/hosts so that the container runtime can pull from the registry: https://github.com/openshift/cluster-dns-operator/blob/0d46f0303675473ce5d022904ddaff23132bac97/assets/dns/daemonset.yaml#L90-L146

Would MCO overwrite /etc/hosts only on upgrades, every time the DNS operator modified it (the DNS operator checks minutely for the entry that it adds to /etc/hosts and adds it if it is missing or outdated), or what?

It's too bad there's not a simple /etc/hosts.d but I guess everyone who wants this ends up using a local resolver.

Yeah, something similar to /etc/NetworkManager/dnsmasq.d/. A /etc/hosts.d/ or similar would be useful to the DNS operator.

That said it feels like what we really want to achieve here is switch over the host's resolver to coredns but only if coredns is active and working? Basically if coredns is up, update /etc/resolv.conf to use 127.0.0.1:53. It's a bit like where we ended up with #2011 - we have a systemd unit which monitors a pod, and if the pod is down takes action.

That would be terrific— it would obviate the need for the DNS operator to update /etc/hosts and would solve several related longstanding problems.

Oops, I missed these comments. We did discuss the possibility of dynamically switching between dnsmasq and coredns for local resolution. It's still a bit complicated, but certainly an option.

I just pushed an alternate approach that uses a simple service to do the appending. It's fairly primitive, but it does what we need and shouldn't mess with any other customizations to /etc/hosts: #2258

jcpowermac · 2020-11-17T20:28:03Z

cc: @patrickdillon

dougsland · 2020-11-17T21:03:55Z

looks good to me (from ovirt) but need to check the @cgwalters comment.

openshift-merge-robot · 2020-11-18T00:38:51Z

@cybertron: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-ovirt	`e248aae`	link	`/test e2e-ovirt`
ci/prow/e2e-gcp-op	`e248aae`	link	`/test e2e-gcp-op`
ci/prow/e2e-vsphere	`e248aae`	link	`/test e2e-vsphere`
ci/prow/e2e-aws-workers-rhel7	`e248aae`	link	`/test e2e-aws-workers-rhel7`
ci/prow/e2e-ovn-step-registry	`e248aae`	link	`/test e2e-ovn-step-registry`
ci/prow/okd-e2e-aws	`e248aae`	link	`/test okd-e2e-aws`
ci/prow/e2e-openstack	`e248aae`	link	`/test e2e-openstack`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

yboaron · 2020-11-18T08:34:43Z

templates/common/on-prem/files/etc-hosts.yaml

+path: "/etc/hosts"
+contents:
+  inline: |
+      127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4


I guess we should use $cluster_name and $domain_name instead of 'ostest.test.metalkube.org' , right?

Whoops, yeah.

kikisdeliveryservice · 2020-12-02T19:06:53Z

this is superceded by: #2258 no?

celebdor · 2020-12-04T18:05:39Z

/close

openshift-ci-robot · 2020-12-04T18:05:55Z

@celebdor: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Move api-int record from coredns to /etc/hosts

e248aae

This is to enable more flexibility about when the networking services are deployed. With the api-int record in /etc/hosts we don't need coredns to be running for a node to connect to the cluster.

openshift-ci-robot requested review from kikisdeliveryservice and yuqi-zhang November 17, 2020 20:04

openshift-ci-robot requested review from jcpowermac, mandre and rgolangh November 17, 2020 20:06

cgwalters requested changes Nov 17, 2020

View reviewed changes

yboaron reviewed Nov 18, 2020

View reviewed changes

cgwalters mentioned this pull request Dec 3, 2020

4.5 -> 4.6, api-int resolution issues due to nsswitch change in Fedora 33 okd-project/okd#401

Closed

openshift-ci-robot closed this Dec 4, 2020

cybertron mentioned this pull request Jan 12, 2021

PoC of /etc/hosts.d functionality #2334

Closed

Move api-int record from coredns to /etc/hosts #2236

Move api-int record from coredns to /etc/hosts #2236

Uh oh!

Conversation

cybertron commented Nov 17, 2020

Uh oh!

openshift-ci-robot commented Nov 17, 2020

Uh oh!

cybertron commented Nov 17, 2020

Uh oh!

cgwalters Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

cybertron Nov 17, 2020

Choose a reason for hiding this comment

Uh oh!

cgwalters Nov 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Miciah Nov 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cybertron Nov 25, 2020

Choose a reason for hiding this comment

Uh oh!

jcpowermac commented Nov 17, 2020

Uh oh!

dougsland commented Nov 17, 2020

Uh oh!

openshift-merge-robot commented Nov 18, 2020

Uh oh!

yboaron Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

cybertron Nov 18, 2020

Choose a reason for hiding this comment

Uh oh!

kikisdeliveryservice commented Dec 2, 2020

Uh oh!

celebdor commented Dec 4, 2020

Uh oh!

openshift-ci-robot commented Dec 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

cgwalters Nov 19, 2020 •

edited

Loading

Miciah Nov 20, 2020 •

edited

Loading