Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3983: Add support for a drop-in kubelet configuration directory #4031

Merged
merged 1 commit into from
Jun 15, 2023

Conversation

yuqi-zhang
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 24, 2023
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 24, 2023
@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 24, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @yuqi-zhang!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 24, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @yuqi-zhang. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 24, 2023
@yuqi-zhang
Copy link
Contributor Author

cc @haircommander

@bart0sh
Copy link
Contributor

bart0sh commented May 25, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 25, 2023
@bart0sh
Copy link
Contributor

bart0sh commented May 25, 2023

/cc @SergeyKanzhelev @mrunalp

@ffromani
Copy link
Contributor

/cc



1. If no other configuration for the subfield(s) exist, append to the base configuration
2. If the subfield(s) exists in the base configuration at `/etc/kubernetes/kubelet.conf` file or another file in `/etc/kubernetes/kubelet.conf.d` with lesser alphanumeric ordering, overwrite it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to add a --config-dir or similar flag to specify the directory? Also, AFAIU there currently is no default kubelet config file path like /etc/kubernetes/kubelet.conf, are you planning to specify a default for that (changing the current default behavior)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the behavior should mirror that of the configuration file, which looks like there's no flag for it, but there is a default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oop, I seem to have missed the glaringly obvious --config option, so yeah there should be a --config-dir option I think

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is --config for specifying the kubelet config file but that's empty by default IIRC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion! So as far as I understand, there's:

  1. the kubelet.conf file generated with default configuration: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/kubelet-integration/
  2. the additional customizations you can make with a kubeletconfig file, passed via the flag: https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/

So in the base case with no additional specifications (which this enhancement wouldn't change) you would end up with the configurations in /etc/kubernetes/kubelet.conf

I can update the enhancement to allow specification of a configuration directory. One question I would have is, if you were to specify both config-dir and config flags, should those have a priority order?

Let me know if my assumptions are wrong anywhere, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the --config and --config-dir options would be a drop-in replacement for the priority of the defaults. for instance, if --config-dir is specified, the order goes:
default config -> --config-dir files
if --config is specified, then
--config -> default config dir files

1. If no other configuration for the subfield(s) exist, append to the base configuration
2. If the subfield(s) exists in the base configuration at `/etc/kubernetes/kubelet.conf` file or another file in `/etc/kubernetes/kubelet.conf.d` with lesser alphanumeric ordering, overwrite it

* If the subfield(s) exist as a list, overwrite instead of attempting to merge. This makes it easier to delete items from lists defined in the base kubelet.conf or other drop-ins without having to modify other files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are considered as subfields?

E.g. consider the following config snippet, what and how would the drop-in config able to modify?

authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say a drop in should be able to modify any individual member:

authentication:
  x509
    clientCAFile: file

and not specify the rest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e: any leaf of the configuration struct and the parent structures above should be modifiable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to talk about merging the configs instead of overwriting then?

And in the case above all other possible sub-fields under x509 would be unmodified(?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so, though merging implies we concatonate slices which is what this point is about. I am not so sure how to describe succinctly: merging structures but overwriting leaf fields?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so, I think the proposed outcome would be, if you already had:

authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt

and then a dropin to specify

authentication:
  x509
    clientCAFile: /some/new/location

We would end up with

authentication:
  anonymous:
    enabled: false
  webhook:
    enabled: true
  x509:
    clientCAFile: some/new/location

and keep the other bits unchanged

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the record, kubeadm implements a high level node specific kubelet configuration via patches:
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/#kubelet

it uses the approach of a global kubeletconfiguration that is patched incrementally with a directory of alpha numeric patches per node.

if the feature of this kep lands and if kubeadm users configure --config-dir there will a precedence of:
global --config file, --patches files, --config-dir files

effectively, kubeadm users will not need this additional layer, but they can opt in, in case they prefer the inbound merging methods.

another major difference would be that the kubelet --config-dir can contain files managed by different processes like the kep mentions and potentially reconfiguring the kubelet more dynamically, while kubeadm does a patching mutation over a single --config file per kubeadm invocation.

@haircommander
Copy link
Contributor

for the record, kubeadm implements a high level node specific kubelet configuration via patches: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/control-plane-flags/#kubelet

it uses the approach of a global kubeletconfiguration that is patched incrementally with a directory of alpha numeric patches per node.

if the feature of this kep lands and if kubeadm users configure --config-dir there will a precedence of: global --config file, --patches files, --config-dir files

effectively, kubeadm users will not need this additional layer, but they can opt in, in case they prefer the inbound merging methods.

another major difference would be that the kubelet --config-dir can contain files managed by diffrent professes like the kep mentions and potentially reconfiguring the kubelet more dynamically, while kubeadm does a patching mutation over a single --config file per kubeadm invocation.

long term, what would you think about kubeadm moving from patching the one file to adding drop-in files? I could see it as a way to reduce the code kubeadm maintains and simplify the flow of configuration management

* Extend API machinery code to reconcile multiple configuration files.
* Define Kubernetes best-practices for configuration definitions, similarly to [API conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md).

### Non-Goals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a goal or non-goal to dynamically reconfig a running kubelet if the contents of --config-dir change? that comes with some interesting complications.
the drop in pattern normally requires a service restart, afaik. maybe this should be stated in the kep (i may have missed it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-goal I'd say, I think we're generally moving away from dynamic kubelet config

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did consider some cases in sig-node for dynamically supporting changes but it can be done in a follow on KEP for explicit specific knobs separate from this.

@neolit123
Copy link
Member

long term, what would you think about kubeadm moving from patching the one file to adding drop-in files? I could see it as a way to reduce the code kubeadm maintains and simplify the flow of configuration management

the patching boiler plate in kubeadm is already in place for allowing node instance specific control plane component options. this is stated as a non goal for this kep.

effectively the cost of the kubeadm kubeletconfiguration patch target is fairly minimal, yet it's already widely used. i don't see any harm allowing this next to --config-dir, but users must know how to track where option foo is coming from.

i see --config-dir as the more flexible alternative for non kubeadm clusters.

@yuqi-zhang
Copy link
Contributor Author

Based on the discussion, made some changes to:

  1. explicitly set dynamic kubelet configuration as a non-goal
  2. change from a default /etc/kubernetes/kubelet.conf.d to a "--config-dir" flag

The changes were made as a separate commit for reading convenience, will squash if the changes are ok cc @haircommander

@@ -63,7 +63,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*

## Summary

Add support for a drop-in configuration directory for the Kubelet. This directory will be in `/etc/kubernetes/kubelet.conf.d` and configuration files will be processed in alphanumeric order. Establishment of conventions for configuration processing will be done, and further work can be done to add this support for other components.
Add support for a drop-in configuration directory for the Kubelet. This directory can be specified via a "--config-dir" flag, and configuration files will be processed in alphanumeric order. Establishment of conventions for configuration processing will be done, and further work can be done to add this support for other components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: somewhere we should establish the default for the flag will be /etc/kubernetes/kubelet.conf.d

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we want to follow the same convention as for --config (empty by default) so that if --config-dir is not specified drop-in support is disabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good point, then we should mention that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do we want to follow the same convention as for --config (empty by default) so that if --config-dir is not specified drop-in support is disabled?

+1 to align with --config defaults

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks!

@yuqi-zhang yuqi-zhang force-pushed the 3893-kubelet-dropin branch 2 times, most recently from dd75e42 to 14a2686 Compare June 2, 2023 19:09
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2023
@SergeyKanzhelev
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2023
@mrunalp
Copy link
Contributor

mrunalp commented Jun 15, 2023

/approve

# of http://git.k8s.io/enhancements/OWNERS_ALIASES
kep-number: 3983
alpha:
approver: "@deads2k"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnbelamaric if you have the time.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2023
@yuqi-zhang
Copy link
Contributor Author

Added @johnbelamaric as PRR approver, and snuck in the goals fix alongside


###### How can this feature be enabled / disabled in a live cluster?

There will be no feature gate. On a live cluster, admins can disable by removing the flag.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't really sufficient. We should have a feature gate. I know that sounds crazy. But requiring the user to enable a feature gate in addition to the flag forces them to acknowledge that they know they are using an alpha feature. In other words, it prevents accidental use of an alpha feature.

# of http://git.k8s.io/enhancements/OWNERS_ALIASES
kep-number: 3983
alpha:
approver: "@deads2k"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assign to me


###### How can an operator determine if the feature is in use by workloads?

Kubelet version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better answer would be:

Workloads do not directly consume this feature, it is for cluster admins during kubelet configuration.

On the other hand, is there a metric or flag that would tell a cluster operator whether kubelet has this flag enabled anywhere? I believe @logicalhan added an automatic metric for feature gates, not sure if it works on each component but I think so. That would be sufficient (once you implement the feature gate).


###### How can someone using this feature know that it is working for their instance?

Kubelet version gathered from API server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't tell anyone anything other than that the kubelet installed has the feature. It doesn't say if it is in use, or working. Is there a log entry or something that says where the config was read from?

@yuqi-zhang
Copy link
Contributor Author

yuqi-zhang commented Jun 15, 2023

Thanks for the feedback! Added a featuregate to the KEP on top of the CLI flag. Also assigned to you @johnbelamaric

Edit: oh whoops wrong file, sorry, updated

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits on PRR, I won't hold it back for these but would like to see them addressed.


###### How can this feature be enabled / disabled in a live cluster?

- [ ] Feature gate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/[ ]/[x]/


###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes. Roll back and remove the `--config-dir` flag from the kubelet's CLI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the feature gate


###### What happens if we reenable the feature if it was previously rolled back?

This feature will be re-enabled via adding back the `--config-dir` flag to the CLI, as mentioned above.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the feature gate


###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

Not specifically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say something like: "The feature does not persist any data and so the upgrade->downgrade->upgrade path is not special."


###### How can an operator determine if the feature is in use by workloads?

Workloads do not directly consume this feature, it is for cluster admins during kubelet configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, administrators can check the kubelet feature flag metric xxxx to see if it is enabled on a node.

You'll have to look up what that is called...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're referring to https://kubernetes.io/docs/reference/instrumentation/metrics/#list-of-alpha-kubernetes-metrics ->


kubernetes_feature_enabled | ALPHA | Gauge | This metric records the data about the stage and enablement of a k8s feature.

The flag itself is in alpha, does that affect anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I think that is more for users to know the deprecation lifecycle of the metric. @logicalhan ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to promote this soon.

@yuqi-zhang
Copy link
Contributor Author

Updated all the points. Added a question above regarding the metric

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, mrunalp, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2023
@harche
Copy link
Contributor

harche commented Jun 15, 2023

/lgtm

1 similar comment
@haircommander
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2023
@k8s-ci-robot k8s-ci-robot merged commit d5e8550 into kubernetes:master Jun 15, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.