Skip to content

Conversation

@cgwalters
Copy link
Member

We want to support switching cluster network types "day 2".
This may involve a MachineConfig rollout, so using the
host network will avoid potential deadlocks.

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2020
@cgwalters
Copy link
Member Author

It's not 100% clear to me that we don't also need to change the MCO with this. I think the only potential consequence if the MCO is temporarily unable to network is that the clusteroperator status won't update; otherwise...once a control plane rollout is done and the node hosting the MCO is drained, if the MCO lands on an updated node then status should start working again.

But maybe it would just overall be easier to understand if we made all containers in the MCO hostNetwork: true and hence fully severed any dependency on the SDN.

@sinnykumari
Copy link
Contributor

But maybe it would just overall be easier to understand if we made all containers in the MCO hostNetwork: true and hence fully severed any dependency on the SDN.

+1

I am curious to know why MCO pods didn't opt for host networking in the beginning?

@kikisdeliveryservice
Copy link
Contributor

But maybe it would just overall be easier to understand if we made all containers in the MCO hostNetwork: true and hence fully severed any dependency on the SDN.

+1 As someone who be confused later... I think this is the way to go unless there are good reasons not to?

@pliurh
Copy link
Contributor

pliurh commented Nov 2, 2020

But maybe it would just overall be easier to understand if we made all containers in the MCO hostNetwork: true and hence fully severed any dependency on the SDN.

+1

We want to support switching cluster network types "day 2".
This may involve a MachineConfig rollout, so using the
host network will avoid potential deadlocks.

The only thing that the MCO should need is access to the apiserver;
we don't talk to any in-cluster services.
@cgwalters cgwalters changed the title mcc: Use hostNetwork: true Use hostNetwork: true for operator and controller Nov 2, 2020
@cgwalters
Copy link
Member Author

Updated to make the operator use hostNetwork: true too.

@openshift-merge-robot
Copy link
Contributor

@cgwalters: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-serial 1d7be29 link /test e2e-aws-serial

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pliurh
Copy link
Contributor

pliurh commented Nov 17, 2020

@cgwalters If we want to decouple the MCO with the clusterNetwork, using hostNetwork may not be sufficient. Currently, MCO uses the in-cluster kube client configuration which talks to the apiserver VIP of 172.30.0.1. The route to this VIP is also handled by the clusterNetwork providers on hostnetwork. So if we want MCO has no dependency on the clusterNetwork, we shall also use the URI of api-int.<cluster-domain> to communicate with the apiserver.

@crawford
Copy link
Contributor

Is this part of a larger epic or enhancement?

@darkmuggle
Copy link

/retest

The SDN Team is asking that we include this in 4.8. AFAIK, this looks fine.
@skinny or @kikisdeliveryservice give the +1, would one of you feel comfortable with an LGTM?

@sinnykumari
Copy link
Contributor

With my limited cluster networking knowledge can't think of any issue, this PR looks good to me as it is.
Do we have CI coverage somewhere to look at where we perform Day2 networking switch?

@kikisdeliveryservice
Copy link
Contributor

Is this part of a larger epic or enhancement?

@pliurh could you answer the above?

@pliurh
Copy link
Contributor

pliurh commented Jan 22, 2021

Here is the story of 4.8 https://issues.redhat.com/browse/SDN-1546

Currently, we don't have CI for the SDN migration. Because during the migration, a manual reboot step is required, which I haven't figure out how it can be done in openshift-ci. We used to have a baremetal CI environment for that, but it was put to other uses. :(

My two cents: Put the SDN migration requirement aside, considering MCO's position, I think it still makes sense to decouple the MCO from the cluster network.

@cgwalters
Copy link
Member Author

Because during the migration, a manual reboot step is required, which I haven't figure out how it can be done in openshift-ci.

Just cordon/drain the node and then use e.g. oc debug node/$node chroot /host reboot.

@danwinship
Copy link
Contributor

I am curious to know why MCO pods didn't opt for host networking in the beginning?

+1 As someone who be confused later... I think this is the way to go unless there are good reasons not to?

For pods that don't expose any services, the argument against being hostNetwork is mostly just that it's more privilege (eg, now you can't restrict the pods with NetworkPolicy, etc). Especially, pods that are both privileged and hostNetwork can do a lot of damage if they are compromised. (But neither MCO nor MCC is privileged.)

(If either pod exposes any ports [for metrics?] you will need to update https://github.com/openshift/enhancements/blob/master/enhancements/network/host-port-registry.md)

So if we want MCO has no dependency on the clusterNetwork, we shall also use the URI of api-int.<cluster-domain> to communicate with the apiserver.

cf #2232

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 21, 2021

@cgwalters: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-agnostic-upgrade 1d7be29 link /test e2e-agnostic-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 19, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 19, 2021

@cgwalters: PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 19, 2021
@openshift-ci openshift-ci bot closed this Oct 19, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 19, 2021

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. team-mco

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants