daemon: get the apiserver url from the kubelet's kubeconfig #2978

pliurh · 2022-03-02T08:00:20Z

In ovnkube based cluster, the connectivity between MCD pods and
kube-api-server relies on the openflow rules injected by ovnkube.
If due to some reason, the ovnkube-node pod cannot start after
the reboot of applying new MC. The MCD will not be able to reach
the api-server.

This PR lets the kubeclient in MCD use the kube-api-server url of
the node kubeconfig file instead. It eliminates the dependency
on ovnkube-node pod from MCD.

pliurh · 2022-03-02T13:03:20Z

/retest

cmd/machine-config-daemon/start.go

cgwalters · 2022-03-02T13:35:06Z

cmd/machine-config-daemon/start.go

One thing to be aware of is that setenv() is unsafe in the presence of threads that may be executing C code. https://internals.rust-lang.org/t/synchronized-ffi-access-to-posix-environment-variable-functions/15475 is a Rust thread on this; which links to e.g. https://sourceware.org/bugzilla/show_bug.cgi?id=15607

The Go runtime uses a middle ground trick of only calling the C setenv if cgo is in use which...I think will happen with us when linking to openssl at least.

So we should (at least eventually) fix the client API to allow overriding these things without setenv().

But...for now it's probably OK, I would think (hope) that we're not running any other active goroutines at this point.

cgwalters · 2022-03-02T13:35:58Z

This is a bit related to #2190 too right?

/approve
of the general idea.

pliurh · 2022-03-02T13:46:44Z

This is a bit related to #2190 too right?

Yes. For the same purpose, decouple the MCO from the network provider.

kikisdeliveryservice

one comment

cmd/machine-config-daemon/start.go

kikisdeliveryservice · 2022-03-02T17:55:56Z

As this affects OVN, @trozet @jcaamano PTAL

kikisdeliveryservice · 2022-03-02T17:56:56Z

Also this sounds like a bug? Is there an existing bz?

In ovnkube based cluster, the connectivity between MCD pods and kube-api-server relies on the openflow rules injected by ovnkube. If due to some reason, the ovnkube-node pod cannot start after the reboot of applying new MC. The MCD will not be able to reach the api-server. This PR let the kubeclient in MCD use the kube-api-server url of the node kubeconfig file instead. It eliminates the dependency on ovnkube-node pod from MCD.

pliurh · 2022-03-03T07:38:36Z

Also this sounds like a bug? Is there an existing bz?

No. I found this issue in a cluster with DPU (a smart NIC), which we plan to support in a future release. I suppose it might also happen on a regular cluster.

pliurh · 2022-03-04T06:45:59Z

/retest

jcaamano · 2022-03-07T12:06:29Z

Looks good to me

pliurh · 2022-03-09T07:06:25Z

/retest

cgwalters · 2022-03-10T16:47:36Z

I was looking at logs from an unrelated PR and saw e.g.:

W0308 19:26:11.024920    1727 reflector.go:324] k8s.io/client-go/informers/factory.go:134: failed to list *v1.Node: Get "https://172.30.0.1:443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout
W0308 19:26:11.024930    1727 reflector.go:324] github.com/openshift/machine-config-operator/pkg/generated/informers/externalversions/factory.go:101: failed to list *v1.MachineConfig: Get "https://172.30.0.1:443/apis/machineconfiguration.openshift.io/v1/machineconfigs?limit=500&resourceVersion=0": dial tcp 172.30.0.1:443: i/o timeout

over there, and then realized that this PR will probably fix it, and indeed it seems to. We're no longer depending on the SDN pod for the MCD, which indeed seems like a huge reliability improvement.

/lgtm

openshift-ci · 2022-03-10T16:48:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, pliurh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cgwalters · 2022-03-10T16:48:55Z

/skip

cgwalters · 2022-03-10T17:18:52Z

OK let's give this one more spin of the wheel of fortune, but if that fails I think we should override.
/test e2e-agnostic-upgrade

openshift-bot · 2022-03-10T17:32:30Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-10T20:32:29Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-10T22:32:34Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-10T23:08:30Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-11T00:57:22Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-ci · 2022-03-11T03:06:50Z

@pliurh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-upgrade-single-node	`2b51f26`	link	false	`/test e2e-aws-upgrade-single-node`
ci/prow/e2e-aws-disruptive	`2b51f26`	link	false	`/test e2e-aws-disruptive`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2022-03-11T03:21:23Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2022-03-11T04:57:22Z

/retest-required

Please review the full test history for this PR and help us cut down flakes.

openshift-ci bot requested review from cgwalters and yuqi-zhang March 2, 2022 08:01

cgwalters reviewed Mar 2, 2022

View reviewed changes

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2022

cgwalters mentioned this pull request Mar 2, 2022

Non-disruptive upgrades in DPU clusters openshift/enhancements#1005

Closed

kikisdeliveryservice reviewed Mar 2, 2022

View reviewed changes

cmd/machine-config-daemon/start.go Outdated Show resolved Hide resolved

pliurh force-pushed the kubeconfig branch from 270cd17 to a75ccc6 Compare March 3, 2022 07:25

pliurh force-pushed the kubeconfig branch from a75ccc6 to 2b51f26 Compare March 3, 2022 07:31

openshift-ci bot assigned cgwalters Mar 10, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 10, 2022

openshift-merge-robot merged commit bbe78cf into openshift:master Mar 11, 2022

daemon: get the apiserver url from the kubelet's kubeconfig #2978

daemon: get the apiserver url from the kubelet's kubeconfig #2978

Uh oh!

Conversation

pliurh commented Mar 2, 2022

Uh oh!

pliurh commented Mar 2, 2022

Uh oh!

Uh oh!

cgwalters Mar 2, 2022

Choose a reason for hiding this comment

Uh oh!

cgwalters commented Mar 2, 2022

Uh oh!

pliurh commented Mar 2, 2022

Uh oh!

kikisdeliveryservice left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kikisdeliveryservice commented Mar 2, 2022

Uh oh!

kikisdeliveryservice commented Mar 2, 2022

Uh oh!

pliurh commented Mar 3, 2022

Uh oh!

pliurh commented Mar 4, 2022

Uh oh!

jcaamano commented Mar 7, 2022

Uh oh!

pliurh commented Mar 9, 2022

Uh oh!

cgwalters commented Mar 10, 2022

Uh oh!

openshift-ci bot commented Mar 10, 2022

Uh oh!

cgwalters commented Mar 10, 2022

Uh oh!

cgwalters commented Mar 10, 2022

Uh oh!

openshift-bot commented Mar 10, 2022

Uh oh!

openshift-bot commented Mar 10, 2022

Uh oh!

openshift-bot commented Mar 10, 2022

Uh oh!

openshift-bot commented Mar 10, 2022

Uh oh!

openshift-bot commented Mar 11, 2022

Uh oh!

openshift-ci bot commented Mar 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-bot commented Mar 11, 2022

Uh oh!

openshift-bot commented Mar 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

openshift-ci bot commented Mar 11, 2022 •

edited

Loading