Use hostNetwork: true for operator and controller #2190

cgwalters · 2020-10-30T12:55:18Z

We want to support switching cluster network types "day 2".
This may involve a MachineConfig rollout, so using the
host network will avoid potential deadlocks.

openshift-ci-robot · 2020-10-30T12:55:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cgwalters · 2020-10-30T13:24:20Z

It's not 100% clear to me that we don't also need to change the MCO with this. I think the only potential consequence if the MCO is temporarily unable to network is that the clusteroperator status won't update; otherwise...once a control plane rollout is done and the node hosting the MCO is drained, if the MCO lands on an updated node then status should start working again.

But maybe it would just overall be easier to understand if we made all containers in the MCO hostNetwork: true and hence fully severed any dependency on the SDN.

sinnykumari · 2020-10-30T13:39:23Z

But maybe it would just overall be easier to understand if we made all containers in the MCO hostNetwork: true and hence fully severed any dependency on the SDN.

+1

I am curious to know why MCO pods didn't opt for host networking in the beginning?

kikisdeliveryservice · 2020-10-30T17:24:42Z

But maybe it would just overall be easier to understand if we made all containers in the MCO hostNetwork: true and hence fully severed any dependency on the SDN.

+1 As someone who be confused later... I think this is the way to go unless there are good reasons not to?

pliurh · 2020-11-02T02:44:03Z

But maybe it would just overall be easier to understand if we made all containers in the MCO hostNetwork: true and hence fully severed any dependency on the SDN.

+1

We want to support switching cluster network types "day 2". This may involve a MachineConfig rollout, so using the host network will avoid potential deadlocks. The only thing that the MCO should need is access to the apiserver; we don't talk to any in-cluster services.

cgwalters · 2020-11-02T15:54:21Z

Updated to make the operator use hostNetwork: true too.

openshift-merge-robot · 2020-11-05T23:34:13Z

@cgwalters: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws-serial	`1d7be29`	link	`/test e2e-aws-serial`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

pliurh · 2020-11-17T02:48:00Z

@cgwalters If we want to decouple the MCO with the clusterNetwork, using hostNetwork may not be sufficient. Currently, MCO uses the in-cluster kube client configuration which talks to the apiserver VIP of 172.30.0.1. The route to this VIP is also handled by the clusterNetwork providers on hostnetwork. So if we want MCO has no dependency on the clusterNetwork, we shall also use the URI of api-int.<cluster-domain> to communicate with the apiserver.

crawford · 2020-11-24T20:44:09Z

Is this part of a larger epic or enhancement?

darkmuggle · 2021-01-21T02:30:30Z

/retest

The SDN Team is asking that we include this in 4.8. AFAIK, this looks fine.
@skinny or @kikisdeliveryservice give the +1, would one of you feel comfortable with an LGTM?

sinnykumari · 2021-01-21T10:03:07Z

With my limited cluster networking knowledge can't think of any issue, this PR looks good to me as it is.
Do we have CI coverage somewhere to look at where we perform Day2 networking switch?

kikisdeliveryservice · 2021-01-21T18:38:42Z

Is this part of a larger epic or enhancement?

@pliurh could you answer the above?

pliurh · 2021-01-22T03:00:10Z

Here is the story of 4.8 https://issues.redhat.com/browse/SDN-1546

Currently, we don't have CI for the SDN migration. Because during the migration, a manual reboot step is required, which I haven't figure out how it can be done in openshift-ci. We used to have a baremetal CI environment for that, but it was put to other uses. :(

My two cents: Put the SDN migration requirement aside, considering MCO's position, I think it still makes sense to decouple the MCO from the cluster network.

cgwalters · 2021-04-09T18:26:16Z

Because during the migration, a manual reboot step is required, which I haven't figure out how it can be done in openshift-ci.

Just cordon/drain the node and then use e.g. oc debug node/$node chroot /host reboot.

danwinship · 2021-04-27T15:22:16Z

I am curious to know why MCO pods didn't opt for host networking in the beginning?

+1 As someone who be confused later... I think this is the way to go unless there are good reasons not to?

For pods that don't expose any services, the argument against being hostNetwork is mostly just that it's more privilege (eg, now you can't restrict the pods with NetworkPolicy, etc). Especially, pods that are both privileged and hostNetwork can do a lot of damage if they are compromised. (But neither MCO nor MCC is privileged.)

(If either pod exposes any ports [for metrics?] you will need to update https://github.com/openshift/enhancements/blob/master/enhancements/network/host-port-registry.md)

So if we want MCO has no dependency on the clusterNetwork, we shall also use the URI of api-int.<cluster-domain> to communicate with the apiserver.

cf #2232

openshift-ci · 2021-05-21T20:20:44Z

@cgwalters: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-agnostic-upgrade	`1d7be29`	link	`/test e2e-agnostic-upgrade`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2021-08-20T02:53:20Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2021-09-19T08:43:25Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2021-10-19T09:10:47Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2021-10-19T09:10:54Z

@cgwalters: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2021-10-19T09:13:18Z

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from kikisdeliveryservice and runcom October 30, 2020 12:55

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2020

cgwalters force-pushed the mcc-hostnetwork branch from 4a3ce50 to a3c06db Compare October 30, 2020 13:05

cgwalters force-pushed the mcc-hostnetwork branch from a3c06db to 1d7be29 Compare November 2, 2020 15:51

cgwalters changed the title ~~mcc: Use hostNetwork: true~~ Use hostNetwork: true for operator and controller Nov 2, 2020

cgwalters mentioned this pull request Nov 17, 2020

Move api-int record from coredns to /etc/hosts #2236

Closed

kikisdeliveryservice added the team-mco label Nov 18, 2020

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2021

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 19, 2021

openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 19, 2021

openshift-ci bot closed this Oct 19, 2021

cgwalters mentioned this pull request Mar 2, 2022

daemon: get the apiserver url from the kubelet's kubeconfig #2978

Merged

Use hostNetwork: true for operator and controller #2190

Use hostNetwork: true for operator and controller #2190

Uh oh!

Conversation

cgwalters commented Oct 30, 2020

Uh oh!

openshift-ci-robot commented Oct 30, 2020

Uh oh!

cgwalters commented Oct 30, 2020

Uh oh!

sinnykumari commented Oct 30, 2020

Uh oh!

kikisdeliveryservice commented Oct 30, 2020

Uh oh!

pliurh commented Nov 2, 2020

Uh oh!

cgwalters commented Nov 2, 2020

Uh oh!

openshift-merge-robot commented Nov 5, 2020

Uh oh!

pliurh commented Nov 17, 2020

Uh oh!

crawford commented Nov 24, 2020

Uh oh!

darkmuggle commented Jan 21, 2021

Uh oh!

sinnykumari commented Jan 21, 2021

Uh oh!

kikisdeliveryservice commented Jan 21, 2021

Uh oh!

pliurh commented Jan 22, 2021

Uh oh!

cgwalters commented Apr 9, 2021

Uh oh!

danwinship commented Apr 27, 2021

Uh oh!

openshift-ci bot commented May 21, 2021

Uh oh!

openshift-bot commented Aug 20, 2021

Uh oh!

openshift-bot commented Sep 19, 2021

Uh oh!

openshift-bot commented Oct 19, 2021

Uh oh!

openshift-ci bot commented Oct 19, 2021

Uh oh!

openshift-ci bot commented Oct 19, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants