diff --git a/keps/prod-readiness/sig-node/3893.yaml b/keps/prod-readiness/sig-node/3893.yaml new file mode 100644 index 000000000000..2b159eb36bbd --- /dev/null +++ b/keps/prod-readiness/sig-node/3893.yaml @@ -0,0 +1,6 @@ +# The KEP must have an approver from the +# "prod-readiness-approvers" group +# of http://git.k8s.io/enhancements/OWNERS_ALIASES +kep-number: 3893 +alpha: + approver: "@deads2k" diff --git a/keps/sig-node/3983-drop-in-configuration/README.md b/keps/sig-node/3983-drop-in-configuration/README.md new file mode 100644 index 000000000000..accc64fb65e7 --- /dev/null +++ b/keps/sig-node/3983-drop-in-configuration/README.md @@ -0,0 +1,315 @@ +# KEP-3983: Add support for a drop-in kubelet configuration directory + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Story 3](#story-3) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + + +## Release Signoff Checklist + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +Add support for a drop-in configuration directory for the Kubelet. This directory will be in `/etc/kubernetes/kubelet.conf.d` and configuration files will be processed in alphanumeric order. Establishment of conventions for configuration processing will be done, and further work can be done to add this support for other components. + + +## Motivation + +A common pattern for software configuration in linux is support for a drop-in configuration directory. The location of this directory is often based on a corresponding configuration file. For instance, `/etc/security/limits` can be overridden by files in `/etc/security/limits.d`. This pattern is useful for a number of reasons, though a large motivation here is to allow files to be owned by a single owner. If multiple processes are vying for changing the same file, then they could stamp over each other's changes and possibly race against each other, creating TOCTOU problems. + +Components in Kubernetes can similarly be configured by multiple entities and preventing races between them is cumbersome. There has been past work in the Kubelet to have a Dynamic Configuration, but resolving between multiple entities and a last known good state was also complicated. Since the Kubelet is the node agent, and is often distributed as a package on the host operating system along with the container runtime, configuring it similarly to other host processes seems clear. This paves the path for continuing the pattern of drop-in configuration for the Kubelet. + + +### Goals + +* Add support for a `/etc/kubernetes/kubelet.conf.d` directory, which will override the configuration for the Kubelet located at `/etc/kubernetes/kubelet.conf` +* Extend API machinery code to reconcile multiple configuration files. +* Define Kubernetes best-practices for configuration definitions, similarly to [API conventions](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md). + +### Non-Goals + +* Add support for drop-in configuration for Kubernetes components other than the Kubelet. + +## Proposal + +This proposal aims to add support for a drop-in configuration directory for the kubelet in `/etc/kubernetes/kubelet.conf.d`. Users are able to specify individually configurable kubelet config snippets in files, formatted in the same way as the existing kubelet.conf file. The kubelet will process the configuration provided in `/etc/kubernetes/kubelet.conf.d` in alphanumeric order: + + +1. If no other configuration for the subfield(s) exist, append to the base configuration +2. If the subfield(s) exists in the base configuration at `/etc/kubernetes/kubelet.conf` file or another file in `/etc/kubernetes/kubelet.conf.d` with lesser alphanumeric ordering, overwrite it + + * If the subfield(s) exist as a list, overwrite instead of attempting to merge. This makes it easier to delete items from lists defined in the base kubelet.conf or other drop-ins without having to modify other files. + + + If there are any issues with the drop-ins (e.g. formatting errors), the error will be reported in the same way as a misconfigured kubelet.conf file + +This drop-in directory is purely optional and if empty/does not exist, the base configuration is used and no behavior changes will be introduced. + + +### User Stories + + +#### Story 1 + +As a cluster admin, I would like to be able to easily customize the Kubelet configuration for different node types, while still sharing a base configuration. For instance, I would like to have customized system reserved allocations for the control plane and workers. + + +#### Story 2 + +As a Kubernetes distribution author, I would like to enable users to customize fields on the Kubelet while leaving a sensible and secure default in an easy way. + + +#### Story 3 + +As a cluster admin, I would like to have cgroup management and log size management in different files, so I can automate per-node management of those configurations performed via different components without cross-interference. + + +### Risks and Mitigations + +* Handling of zeroed fields + * It’s possible the configuration of the Kubelet does not handle not specified fields well. Special testing will need to be done for different types to define and ensure conformance of that behavior. +* Handling of lists + * Contention could be found with how lists should be handled (append or overwrite, also see proposal). Consensus should be found and testing performed. + +## Design Details + + +### Test Plan + +[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement. + + +##### Prerequisite testing updates + + +##### Unit tests + +* Unit tests will be added, details to be added here + +##### Integration tests + +* : + +##### e2e tests + +* : + +### Graduation Criteria + + +### Upgrade / Downgrade Strategy + +Upgrades and downgrades are safe as far as Kubelet stability is concerned. It’s possible a vendor may ship vital pieces of configuration within a drop-in directory. If the Kubelet downgrades to a version that doesn’t support reading the drop-in directory, it will only be configured with the original `/etc/kubelet/kubelet.conf`. However, assuming that vendor left that file in a valid state, the Kubelet should not be at risk of crashing. Other than vital configuration being missed, there’s no risk to the stability of the Kubelet on upgrades and downgrades. + + +### Version Skew Strategy + +All behavior change is encapsulated within the Kubelet, so there is no version skew possible within core Kubernetes. It is possible third party tools may attempt to utilize the Kubelet’s drop-in directory before the Kubelet is upgraded to support it, which would cause silent failures. It is the responsibility of these third party tools to ensure the Kubelet is new enough to support this. + + +## Production Readiness Review Questionnaire + + +### Feature Enablement and Rollback + + +###### How can this feature be enabled / disabled in a live cluster? + +There will be no feature gate. On a live cluster, admins can disable by not using the drop-in configuration directory. + + +###### Does enabling the feature change any default behavior? + +Yes, upgrading to a version of the Kubelet with this feature will enable the Kubelet to be configured with the drop-in directory + + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + +No, only a rollback of the Kubelet can roll back the enablement. + + +###### What happens if we reenable the feature if it was previously rolled back? + +This will be a Kubelet upgrade, which will re-enable the feature as expected. + + +###### Are there any tests for feature enablement/disablement? + +None planned + + +### Rollout, Upgrade and Rollback Planning + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + +The feature can cause the Kubelet to fail if the configuration in the drop-in directory is invalid. A rollback could fail if the original configuration also has an invalid configuration. This situation would cause workloads to not appear on that node. Neither of these cases are expected. + + +###### What specific metrics should inform a rollback? + +The Kubelet not starting, which will cause nodes to be NotReady. + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + +Not specifically + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + +No + + +### Monitoring Requirements + + +###### How can an operator determine if the feature is in use by workloads? + +Kubelet version + + +###### How can someone using this feature know that it is working for their instance? + +Kubelet version gathered from API server + + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + +### Dependencies + + +###### Does this feature depend on any specific services running in the cluster? + +No + + +### Scalability + + +###### Will enabling / using this feature result in any new API calls? + +No + + +###### Will enabling / using this feature result in introducing new API types? + +No, though there may be changes to the Kubelet configuration required. + + +###### Will enabling / using this feature result in any new calls to the cloud provider? + +No + + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + +No, though metadata on the fields may need to be changed. + + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + +It will take slightly longer for the Kubelet to start, but it should not be noticeable unless there are very many (hundreds?) of configurations. + + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + +Likely negligible amounts of CPU. + + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + +Not likely + + +### Troubleshooting + + +###### How does this feature react if the API server and/or etcd is unavailable? + +This feature is enabled in Kubelet alone. + + +###### What are other known failure modes? + +Invalid configuration, same as exists today with `/etc/kubernetes/kubelet.conf` + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +Fix the invalid configuration, or remove configurations. + + +## Implementation History + +- 2023-05-04: KEP initialized. + + +## Drawbacks + + +## Alternatives + +Reinstate the now deprecated Dynamic Kubelet Configuration + +Continue to rely on CLI flags or systemd drop-in files. diff --git a/keps/sig-node/3983-drop-in-configuration/kep.yaml b/keps/sig-node/3983-drop-in-configuration/kep.yaml new file mode 100644 index 000000000000..ab9015fb8220 --- /dev/null +++ b/keps/sig-node/3983-drop-in-configuration/kep.yaml @@ -0,0 +1,35 @@ +title: Add support for a kubelet drop-in configuration directory +kep-number: 3983 +authors: + - "@haircommander" + - "@yuqi-zhang" +owning-sig: sig-node +status: provisional +creation-date: 2023-05-04 +reviewers: + - TBD +approvers: + - TBD + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.28" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.28" + beta: "v1.29" + stable: "v1.31" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +# Feature gate is not required for this KEP +disable-supported: true + +# The following PRR answers are required at beta release +metrics: + - "N/A"