Skip to content

Conversation

@cybertron
Copy link
Member

This is to enable more flexibility about when the networking services
are deployed. With the api-int record in /etc/hosts we don't need
coredns to be running for a node to connect to the cluster.

This is to enable more flexibility about when the networking services
are deployed. With the api-int record in /etc/hosts we don't need
coredns to be running for a node to connect to the cluster.
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cybertron
To complete the pull request process, please assign runcom after the PR has been reviewed.
You can assign the PR to them by writing /assign @runcom in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cybertron
Copy link
Member Author

/test e2e-openstack
/test e2e-ovirt
/test e2e-vsphere
/cc @mandre @rgolangh @jcpowermac

@@ -0,0 +1,7 @@
mode: 0644
path: "/etc/hosts"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People may have legitimately added custom content in here before, I don't think we can unilaterally own this file - particularly on upgrades.

I approve the general idea (this also seems related to #2190 right?).

It's too bad there's not a simple /etc/hosts.d but I guess everyone who wants this ends up using a local resolver.
So that's one option...we could switch to on-host dnsmasq or resolved.

Or we could try to hack in some sort of Ansible-like "lineinfile" support to the MCO just for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is primarily targeted at enabling openshift/enhancements#524

The problem with using dnsmasq for this is that we already run a local DNS server for the other records. It's possible we could make it work, but it gets tricky and makes the DNS flow even more complicated. Right now we configure a local server that forwards to the previously configured servers in resolv.conf. This would add another dnsmasq somewhere in that flow. I guess if there's no other way to do it then we'll have to eat the complexity, but it's not ideal either.

I also did look at using the append option in ignition and had that working. It's a weird fit for MCO though and I couldn't get it to work so I switched to just templating the whole file. It's another thing I could investigate further though.

Copy link
Member

@cgwalters cgwalters Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see. Hum...do we have any offhand guesses around whether my concern about customers injecting /etc/hosts via Ignition is a real concern? It may not happen. If it does though, what would we tell them for configuring static resolution?

Perhaps we can implement our own "merge /etc/hosts" logic by having code in the MCO that on startup takes the current content, appends this data to it into e.g. /run/hosts and then does mount --bind /run/hosts /etc/hosts or so.

That said it feels like what we really want to achieve here is switch over the host's resolver to coredns but only if coredns is active and working? Basically if coredns is up, update /etc/resolv.conf to use 127.0.0.1:53. It's a bit like where we ended up with #2011 - we have a systemd unit which monitors a pod, and if the pod is down takes action.

Copy link

@Miciah Miciah Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

People may have legitimately added custom content in here before, I don't think we can unilaterally own this file - particularly on upgrades.

The DNS operator adds an entry for the cluster image registry to /etc/hosts so that the container runtime can pull from the registry: https://github.com/openshift/cluster-dns-operator/blob/0d46f0303675473ce5d022904ddaff23132bac97/assets/dns/daemonset.yaml#L90-L146

Would MCO overwrite /etc/hosts only on upgrades, every time the DNS operator modified it (the DNS operator checks minutely for the entry that it adds to /etc/hosts and adds it if it is missing or outdated), or what?

It's too bad there's not a simple /etc/hosts.d but I guess everyone who wants this ends up using a local resolver.

Yeah, something similar to /etc/NetworkManager/dnsmasq.d/. A /etc/hosts.d/ or similar would be useful to the DNS operator.

That said it feels like what we really want to achieve here is switch over the host's resolver to coredns but only if coredns is active and working? Basically if coredns is up, update /etc/resolv.conf to use 127.0.0.1:53. It's a bit like where we ended up with #2011 - we have a systemd unit which monitors a pod, and if the pod is down takes action.

That would be terrific— it would obviate the need for the DNS operator to update /etc/hosts and would solve several related longstanding problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I missed these comments. We did discuss the possibility of dynamically switching between dnsmasq and coredns for local resolution. It's still a bit complicated, but certainly an option.

I just pushed an alternate approach that uses a simple service to do the appending. It's fairly primitive, but it does what we need and shouldn't mess with any other customizations to /etc/hosts: #2258

@jcpowermac
Copy link
Contributor

cc: @patrickdillon

@dougsland
Copy link
Contributor

looks good to me (from ovirt) but need to check the @cgwalters comment.

@openshift-merge-robot
Copy link
Contributor

@cybertron: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-ovirt e248aae link /test e2e-ovirt
ci/prow/e2e-gcp-op e248aae link /test e2e-gcp-op
ci/prow/e2e-vsphere e248aae link /test e2e-vsphere
ci/prow/e2e-aws-workers-rhel7 e248aae link /test e2e-aws-workers-rhel7
ci/prow/e2e-ovn-step-registry e248aae link /test e2e-ovn-step-registry
ci/prow/okd-e2e-aws e248aae link /test okd-e2e-aws
ci/prow/e2e-openstack e248aae link /test e2e-openstack

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

path: "/etc/hosts"
contents:
inline: |
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should use $cluster_name and $domain_name instead of 'ostest.test.metalkube.org' , right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, yeah.

@kikisdeliveryservice
Copy link
Contributor

this is superceded by: #2258 no?

@celebdor
Copy link
Contributor

celebdor commented Dec 4, 2020

/close

@openshift-ci-robot
Copy link
Contributor

@celebdor: Closed this PR.

Details

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants