Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: init kep for no cri socket annotation #3930

Closed
wants to merge 5 commits into from

Conversation

pacoxu
Copy link
Member

@pacoxu pacoxu commented Mar 30, 2023

  • One-line PR description: in theory we don't need the CRI socket annotation on the Node object that kubeadm does any more.
  • Other comments: an initial proposal

/cc @neolit123 @SataQiu @chendave @RA489

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 30, 2023
@pacoxu pacoxu marked this pull request as draft March 30, 2023 08:11
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Mar 30, 2023
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can omit the keps/prod-readiness changes IMO

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'a tricky change that we must make either way. the alpha annotation was a workaround for the lack of cri fields in the kubelet config, but now they exist.

CRI or CNI may require updating that component before the kubelet.
-->

## Production Readiness Review Questionnaire
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can omit this


- Update the CRI socket annotation on Node object to be the latest

## Proposal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use a feature gate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add two proposals here.

  1. respect a list of configuration in local kubelet configuration, and in v1.27, CRI socket is the only one
  2. introduce a /var/lib/kubelet/kubeadm-config.yaml to maintain node specific configuration

I add a feature gate for the proposal 2.


- Remove the use of CRI socket annotation on Node object
- For node customized kubelet configuration, it can be saved locally on disk with file path
`/var/lib/kubelet/kubeadm-config.yaml` and we will prioritize respecting the local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's one of the ways of doing it. we need to be careful and make sure this host / instance config does not cause issues for us in the future.

perhaps we can avoid it, and just parse the cri value from the existing config.yaml on upgrade. like i proposed on the linked issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this may be safer for cri-socket only. Probably, I should use your proposal here as this KEP only covers the cri-socket.

I wondered if we need to add such a file for other node-customized kubelet configurations.

@pacoxu pacoxu marked this pull request as ready for review March 30, 2023 09:53
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2023
@pacoxu pacoxu changed the title kubeadm: init kep for no cri socket annotation [wip]kubeadm: init kep for no cri socket annotation Mar 30, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 30, 2023
Copy link
Member

@chendave chendave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly some questions.

"init upload-config" phase in Kubeadm. This annotation is used to specify the CRI
socket endpoint used by the kubelet on each node for communication with the container
runtime. Instead of relying on this annotation, the proposal suggests using a global
kubelet configuration with a CRI socket specified, as well as providing the ability to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the "global" kubelet config goes to a sub-struct of ClusterConfiguration? there is no such configmap when you bootstrap a cluster with kubeadm init...

Copy link
Member Author

@pacoxu pacoxu Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the global kubelet config here, it means kubectl get cm -n kube-system kubelet-config. There is no global setting in ClusterConfiguration.

For init and join process

  • The cri-socket can be set in NodeRegistrationOptions.containerRuntimeEndpoint in InitConfiguration or JoinConfiguration.
  • Or you can override it with --cri-socket.

socket endpoint used by the kubelet on each node for communication with the container
runtime. Instead of relying on this annotation, the proposal suggests using a global
kubelet configuration with a CRI socket specified, as well as providing the ability to
override this configuration during kubeadm join using the --config flag. This would
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this global config is something embedded in clusterCfg instead of configmap, how do you override this config? each node maintains one for itself? or you want to maintain a map in such global config to track different config for each node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this global config is something embedded in clusterCfg instead of configmap, how do you override this config?

For upgradation, the global config is in kubelet-config configmap and will not be override by clusterCfg.

For init/join, it will respect (1) the cri-socket flag, (2) the NodeRegistrationOptions.containerRuntimeEndpoint in InitConfiguration or JoinConfiguration.

In kubelet-config configmap, there can be a global one. There are two proposals here.

  • Proposal 1: If containerRuntimeEndpoint is in the node specific list, we will take it from node first and skip the global one. For this proposal, containerRuntimeEndpoint in global will not be used as the containerRuntimeEndpoint has default value and is always set during init or join.
  • Proposal 2: We will take the one from /var/lib/kubelet/kubeadm-config.yaml at first, if there is no containerRuntimeEndpoint in /var/lib/kubelet/kubeadm-config.yaml, the global one will take effect.

each node maintains one for itself?

containerRuntimeEndpoint in kubelet configuration after v1.27 has a default value and is always set during init or join.

--container-runtime-endpoint string The endpoint of container runtime service. Unix Domain Sockets are supported on Linux, while npipe and tcp endpoints are supported on Windows. Examples:'unix:///path/to/runtime.sock', 'npipe:////./pipe/runtime' (default "unix:///run/containerd/containerd.sock") (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

The flag is like above.

or you want to maintain a map in such global config to track different config for each node?

No.

kubeadm also download the kubelet configuration from configmap and replace the `containerRuntimeEndpoint` and
`imageServiceEndpoint`(This maybe empty and I prefer to respect it as well) with the local configuration.

A node-specific kubelet configuration list should be maintained in kubeadm code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls elaborate a little bit? How the node-specific config maintained in the kubeadm code?

Copy link
Member Author

@pacoxu pacoxu Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that we can maintain it in cmd/kubeadm/app/constants/constants.go or somewhere.

nodeSpecificKubeletConfigurations =  map[string]string{
	"containerRuntimeEndpoint": "the endpoint of container runtime service",
	"imageServiceEndpoint": "the endpoint of container image service",
	"providerID": "sets the unique ID of the instance that an external provider",
}

The list can be extended and I am not sure if the list should be modified by a user in a global config nodeSpecificKubeletConfiugrations, which means a new array in ClusterConfiguration. The default values can be containerRuntimeEndpoint and imageServiceEndpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can take the patches approach - any locally stored config file can act as a patch over the config from kubelet-config cm. but we may only consider some fields from the kubeletconfiguration v1beta1. this becomes tricky when v1beta1 is replaced by e.g. v1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any locally stored config file can act as a patch over the config from kubelet-config cm

this can also be tricky in some circumstances where we don't want to patch a field e.g. during upgrade, but hopefully it's not an issue

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the --patches feature here to add a default patch dir on node, this can be done in this way.

However, if users want to specify a new patch dir, he has to merge the default patch files or content to the new one. That would be a little weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant a generic patches approach where a local config file overrides the global config downloaded from the cm. not the --patches feature.

this local file must be stored somewhere. perhaps in the same dir as config.yaml, but called config-instance.yaml. IMO it has a number of tricky aspects that need to be covered in the design doc for init, join, upgrade.


### Proposal 1: respect a list of configuration in local kubelet configuration, and in v1.27, CRI socket is the only one

During `kubeadm ugprade`, kubeadm will read the local kubelet configuration in `/var/lib/kubelet/config.yaml`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the config file here support to config the containerRuntimeEndpoint? I tried to bootstrap a cluster with --cri-socket option, but this config is not persistented in the /var/lib/kubelet/config.yaml. If the endpoint is already supported to config in kubelet config file, then this is the logic we should update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the config file here support to config the containerRuntimeEndpoint? I tried to bootstrap a cluster with --cri-socket option, but this config is not persistented in the /var/lib/kubelet/config.yaml. If the endpoint is already supported to config in kubelet config file, then this is the logic we should update.

containerRuntimeEndpoint is added in v1.27. In the past and also v1.27, we are using the --container-runtime-endpoint in /var/lib/kubelet/kubeadm-flags.env.

  • Before v1.27, it cannot be added in the kubeletConfiguration.
  • And also, we saved the cri-socket in the annotation, and that is why we don't have to read and parse /var/lib/kubelet/kubeadm-flags.env here.

If we start the process that not using cri-socket annotation in v1.28, we set it in the /var/lib/kubelet/config.yaml.
Then the flag in /var/lib/kubelet/kubeadm-flags.env can be cleaned. (This should be mentioned in this KEP. I will update it.)

@pacoxu pacoxu changed the title [wip]kubeadm: init kep for no cri socket annotation kubeadm: init kep for no cri socket annotation Apr 3, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 3, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pacoxu
Once this PR has been reviewed and has the lgtm label, please assign vincepri for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 3, 2023
@pacoxu pacoxu force-pushed the no-cri-socket-annotation branch 2 times, most recently from 25d2632 to 65f4668 Compare April 3, 2023 09:15
Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pacoxu are there any blockers / concerns or can this KEP work proceed with an implementation POC PR?

if we have an alpha FG + tests we can establish if the change will work as expected. this feels like the safest route.

also, can we target .28?

i need to re-read the whole KEP again.

@pacoxu
Copy link
Member Author

pacoxu commented May 29, 2023

Of course, I will start to working on a demo PR this week or next. I think there is no blocker.
We could get this done in v1.28 release cycle.

@neolit123
Copy link
Member

Of course, I will start to working on a demo PR this week or next. I think there is no blocker.
We could get this done in v1.28 release cycle.

ok, i didn't see a feature gate in the kep. imo, we should use it as the user controlled switch. e.g. NodeLocalCRISocket
#3930 (comment)

@pacoxu
Copy link
Member Author

pacoxu commented Nov 9, 2023

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3983-drop-in-configuration
kubelet introduces a --config-dir for drop-in configuration.

We may alter to use it then. #3983 is beta in v1.29.

@neolit123
Copy link
Member

neolit123 commented Nov 9, 2023

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3983-drop-in-configuration kubelet introduces a --config-dir for drop-in configuration.

We may alter to use it then. #3983 is beta in v1.29.

i am generally -1 (but we can discuss it), since it implies a mechanism for systemd like drop-ins.
kubeadm on the other hand already uses patches for this node level config.

i believe i commented on that KEP, already.

@pacoxu
Copy link
Member Author

pacoxu commented Nov 9, 2023

i believe i commented on that KEP, already.

Yes. I must miss that 🤣

i am generally -1 (but we can discuss it), since it implies a mechanism for systemd like drop-ins. kubeadm on the other hand already uses patches for this node-level config.

I still cannot decide on it. Is the new drop-in a better solution than the current kubeadm way?

let's keep it as is until the direction is clear and beyond doubt.

@neolit123
Copy link
Member

i believe i commented on that KEP, already.

Yes. I must miss that 🤣

i am generally -1 (but we can discuss it), since it implies a mechanism for systemd like drop-ins. kubeadm on the other hand already uses patches for this node-level config.

I still cannot decide on it. Is the new drop-in a better solution than the current kubeadm way?

let's keep it as is until the direction is clear and beyond doubt.

here are my comments on the KEP PR
#4031 (review)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 7, 2024
@neolit123
Copy link
Member

@pacoxu you closed
#3929

what should we do with this PR?

@pacoxu
Copy link
Member Author

pacoxu commented Feb 7, 2024

i meant a generic patches approach where a local config file overrides the global config downloaded from the cm. not the --patches feature.

this local file must be stored somewhere. perhaps in the same dir as config.yaml, but called config-instance.yaml. IMO it has a number of tricky aspects that need to be covered in the design doc for init, join, upgrade.

Above is @neolit123 your proposal in comment here https://github.com/kubernetes/enhancements/pull/3930/files#r1177682829.

I prefer this solution.

BTW, /var/lib/kubelet/kubeadm-flags.env will only have --container-runtime-endpoint later. I would like to make this cri-socket-annotation and runtime endpoint configuration to config-instance.yaml under /var/lib/kubelet.

@neolit123
Copy link
Member

/close
#3929 was closed
let's revisit in the future.

kubernetes/kubernetes#122734 (comment)
cc @HirazawaUi

@k8s-ci-robot
Copy link
Contributor

@neolit123: Closed this PR.

In response to this:

/close
#3929 was closed
let's revisit in the future.

kubernetes/kubernetes#122734 (comment)
cc @HirazawaUi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants