diff --git a/keps/sig-node/1977-container-notifier/README.md b/keps/sig-node/1977-container-notifier/README.md new file mode 100644 index 00000000000..15ba0b7ccf4 --- /dev/null +++ b/keps/sig-node/1977-container-notifier/README.md @@ -0,0 +1,770 @@ +# KEP-1977: Container notifier + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Phase 1](#phase-1) + - [Phase 2](#phase-2) + - [Phase 3](#phase-3) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [API Changes](#api-changes) + - [Phase 1](#phase-1-1) + - [Phase 2](#phase-2-1) + - [Phase 3](#phase-3-1) + - [Phase 1 API Changes](#phase-1-api-changes) + - [Inline Pod Definition for ContainerNotifier](#inline-pod-definition-for-containernotifier) + - [Notification API Object](#notification-api-object) + - [Notification Status](#notification-status) + - [Phase 2 API Additions](#phase-2-api-additions) + - [Phase 3 API Additions](#phase-3-api-additions) + - [Kubelet Impact in Phase 2 and Beyond](#kubelet-impact-in-phase-2-and-beyond) + - [CRI Changes](#cri-changes) +- [Implementation Plan](#implementation-plan) + - [Phase 1](#phase-1-2) + - [Phase 2](#phase-2-2) + - [Phase 3](#phase-3-2) +- [Example Workflows](#example-workflows) + - [Example Workflow with Quiesce Hooks](#example-workflow-with-quiesce-hooks) + - [Example Workflow with Sighup (Phase 2)](#example-workflow-with-sighup-phase-2) + - [With Probe (Phase 3)](#with-probe-phase-3) + - [Example Workflow to Change Log Verbosity (Phase 2)](#example-workflow-to-change-log-verbosity-phase-2) + - [With Probe (Phase 3)](#with-probe-phase-3-1) + - [Risks and Mitigations](#risks-and-mitigations) +- [Test Plan](#test-plan) + - [Unit tests](#unit-tests) + - [E2E tests](#e2e-tests) +- [Graduation Criteria](#graduation-criteria) + - [Alpha Graduation](#alpha-graduation) + - [Alpha -> Beta Graduation](#alpha---beta-graduation) + - [Beta -> GA Graduation](#beta---ga-graduation) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input +- [ ] (R) Graduation criteria is in place +- [ ] (R) Production readiness review completed +- [ ] Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + +## Summary + +This KEP proposes to add an inline pod definition for issuing commands or sending signals to containers and an API object to send a request to trigger those commands/signals. + +## Motivation + +In order to protect Kubernetes stateful workloads, we want to take application consistent snapshots and backups. To ensure application consistency, we need a mechanism to send a command to the pod to quiesce an application before taking a snapshot of its persistent volume(s) and un-quiesce it afterwards. There are also other use cases that require the user to send a signal to the Pod to trigger certain commands. See this [issue](https://github.com/kubernetes/kubernetes/issues/24957) for more information. Following are some of those use cases: + +* Execute a command in a pod to quiesce an application before taking a snapshot and un-quiesce it after the snapshot is taken to ensure application consistency. (quiesce/freeze, unquiesce/unfreeze/thaw) +* Send a signal to a pod for configuration reloading. +* Send a signal to flush logs, change log verbosity, etc. +* Build-in notifications, i.e., notify pods running on a to-be-evicted node. + +Since there are multiple use cases that require a way to trigger a command to run in a pod, we are thinking about a solution that is more general and secure than the typical CRD approach. Having Kubelet execute the hooks instead of an external controller would be more secure as an external controller is considerably easier to be compromised than kubelet and be able to arbitrarily exec on any pod. + +### Goals + +#### Phase 1 + +- Add an inline pod definition for issuing commands to containers. +- Add an API object to send a request to trigger those commands. +- Implement controller logic in a single *trusted* controller. + +#### Phase 2 + +- Add support to send signals to containers. +- Move controller logic into Kubelet. + +#### Phase 3 + +- Add a probe to verify the results from the commands/signals if needed. + +### Non-Goals + +- Writing scripts, i.e., quiesce scripts, that are triggered by the commands are the responsibilities of users who set up pod definitions. + +## Proposal + +Define a new feature gate: `ContainerNotifier`. + +In phase 1, this API proposal adds an inline pod definition for sending commands to containers and an API object to send a request to trigger those commands. + +An external controller such as the application snapshot controller would signal when a command needs to run by creating a request notification API object. A SINGLE *trusted* controller (container notifier controller) will be implemented to watch the objects and run the command when it is created and update its status accordingly. + +In phase 2, this API proposal adds an inline pod definition for container signals and allows the API object to send a request to trigger those signals. Logic in the container notifier controller will be moved to Kubelet. Kubelet watches the objects and runs the command when it is created and updates its status accordingly. + +In phase 3, a Probe may be added if needed as an inline pod definition to verify whether the signal is delivered or whether the command is run and results in the desired outcome. + +### API Changes + +#### Phase 1 + +There are two parts in the API changes: +* Add an inline pod definition for ContainerNotifierAction - define a command to run in the container. +* Add a Notification API object - this is the request to trigger a container notifier. + +#### Phase 2 + +* Add a signal to the inline pod definition: + * In the inline pod definition for ContainerNotifierAction, allow a signal to be defined and sent to the container. + +#### Phase 3 + +* If needed, add an inline pod definition for Probe - probes the container to verify that ContainerNotifierAction has brought the container to the desired state. + +#### Phase 1 API Changes + +##### Inline Pod Definition for ContainerNotifier + +Add []ContainerNotifier to the Container struct: + +``` +type Container struct { + ...... + // +optional + Lifecycle *Lifecycle `json:"lifecycle,omitempty" protobuf:"bytes,12,opt,name=lifecycle"` + + // Notifiers can be triggered by Notification resources. What they mean are + // up to the Pod owner to define, but we may choose to define some commonly + // used names. Each notifier must have a unique name within the container + // definition, but the same name may be used in different containers of the + // same pod. When a Notification of a given name as a container notifier is created + // (and the pod to which the container belongs to is selected by the Notification's + // spec.podSelector), the notifier will run in the container. + // +optional + Notifiers []ContainerNotifier + ...... +} + +type ContainerNotifier struct { + // Name must be unique within a container. + // Names are label-style, with an optional prefix. Names without the prefix + // such as `quiesce` are reserved for Kubernetes-defined "well-known" names. + // Names with a prefix such as `example.com/label-style` are custom names. + // This refers to a ContainerNotifierAction. + // Container notifier controller (or Kubelet in phase 2) executes the command + // defined in ContainerNotifierAction. + Name string + + // ContainerNotifier must support idempotency. If the same command is run in + // the same container multiple times, it should achieve the same results. + // For example, a quiesce command is run on a database and the database is + // quiesced. If the same quiesce command is run again, it should not have + // additional impact because the database is already quiesced. + Action *ContainerNotifierAction + } + +// ContainerNotifier describes a command to run in the container. +type ContainerNotifierAction struct { + // handler describes how this notifier is delivered to the container. + Handler ContainerNotifierHandler + + // Number of seconds after which the notifier times out. + // Defaults to 1 second. Minimum value is 1. + // +optional + TimeoutSeconds int32 +} + +// Handler defines a specific action that should be taken +type ContainerNotifierHandler struct { + // Exec specifies the action to take. + // +optional + Exec *ExecAction `json:"exec,omitempty" protobuf:"bytes,1,opt,name=exec"` +} +``` + +##### Notification API Object + +The Notification object is a request for a ContainerNotifier. It will be an in-tree API object in the "node.k8s.io" API group so that Kubelet can monitor and trigger the actions. The external controller or a user creates the Notification object to request a ContainerNotifier. + +All containers in a selected pod which have a "ContainerNotifier" defined in their spec with the "ContainerNotifierName" will have their corresponding "ContainerNotifierAction" executed. Moreover, there is no guarantee of ordering on execution. + +``` +type Notification struct { + metav1.TypeMeta + // +optional + metav1.ObjectMeta + + // Spec defines the behavior of a notification. + // +optional + Spec NotificationSpec + + // Status defines the current state of a notification. + // +optional + Status NotificationStatus +} + +type NotificationSpec struct { + // This contains the name of the ContainerNotifier to send to a container. + // Name must be unique within a container. + // Names are label-style, with an optional prefix. Names without the prefix + // such as `quiesce` are reserved for Kubernetes-defined "well-known" names. + // Names with a prefix such as `example.com/label-style` are custom names. + ContainerNotifierName string + + // PodSelector specifies a label query over a set of pods. + // +optional + PodSelector *metav1.LabelSelector + } +``` + +###### Notification Status + +NotificationStatus represents the current state of a Notification. It has a list of PodNotificationStatuses. + +A PodNotificationStatus represents the current state of a Notification for a specific pod. A PodNotificationStatus contains ContainerNotificationStatuses of all the containers which have the corresponding ContainerNotifier specified. PodNotificationStatus is created when a Notifier starts to run in a pod. + +A PodNotificationStatus is considered to be successful only if all of its ContainerNotificationStatuses are successful. A Notification is considered to be successful only if all PodNotificationStatuses are successful. + +The container notifier controller (or Kubelet in phase 2) will NOT retry if any Notifier didn't run successfully. The external controller can request Notification again to do retries. + +``` +// NotificationStatus represents the current state of a Notification +type NotificationStatus struct { + // This is a list of PodNotificationStatus, with each status representing information + // about how Notification is executed in containers inside a pod. + // +optional + PodNotificationStatuses []PodNotificationStatus +} + +// PodNotificationStatus represents the current state of a Notification for a specific pod +type PodNotificationStatus struct { + // This field is required + PodName string + + // This field is required + ContainerNotificationStatuses []ContainerNotificationStatus + } + +// ContainerNotificationStatus represents the current state of a Notification for a specific container +// The "Succeed" parameter is false until it completes successfully when it will be changed to true. +// If an error occurs, the "Error" parameter will be set. +type ContainerNotificationStatus struct { + // This field is required + ContainerName string + + // If not set, indicating Action has not started + // If set, it means Action has started at the specified time + // +optional + StartTimestamp *int64 + + // Succeed is set to true when the action is executed in the container successfully. + // It will be set to false if the action cannot be executed successfully after + // TimeoutSeconds in ContainerNotifierAction passes or there is any error. + // +optional + Succeed *bool + + // The last error encountered when executing the action. + // The container notifier controller (or Kubelet in phase 2) + // might update this field each time it retries the execution. + // +optional + Error *ActionError +} + +type ActionError struct { + // Type of the error + // This is required + ErrorType ErrorType + + // Error message + // +optional + Message *string + + // More detailed reason why error happens + // +optional + Reason *string + + // It indicates when the error occurred + // +optional + Timestamp *int64 +} + +type ErrorType string + +// More error types could be added, e.g., Forbidden, Unauthorized, etc. +const ( + // The Notification times out + Timeout ErrorType = "Timeout" + + // The Notification fails with an error + Error ErrorType = "Error" +) +``` + +External controller deletes the Notification API objects after it is done. + +### Phase 2 API Additions + +In Phase 2, add a "signal" field to ContainerNotifierHandler. + +``` +// Handler defines a specific action that should be taken or a signal that should be delivered +// Only one of Exec and Signal should be set and not both +type ContainerNotifierHandler struct { + // Exec specifies the action to take. + // +optional + Exec *ExecAction `json:"exec,omitempty" protobuf:"bytes,1,opt,name=exec"` + + // Signal specifies a signal to send to the container + // +optional + // define constants for signals? + // validate the signals are valid? windows? +++ Signal string +} +``` + +### Phase 3 API Additions + +If needed, we may add a Probe to ContainerNotifier in Phase 3, to verify that the ContainerNotifierAction has actually delivered the signal or has run the command. + +``` +type ContainerNotifier struct { + // Name must be unique within a container. + // Names are label-style, with an optional prefix. Names without the prefix + // such as `quiesce` are reserved for Kubernetes-defined “well-known” names. + // Names with a prefix such as `example.com/label-style` are custom names. +- // This refers to a ContainerNotifierAction and a Probe. ++ // This refers to a ContainerNotifierAction and a Probe. ++ // Kubelet executes the command defined in ContainerNotifierAction, and then ++ // calls the command defined in Probe to verify whether the action has ++ // resulted in a desired state. ++ // For example, if the ContainerNotifierAction has a command to quiesce the application, ++ // then the Probe has a command to verify that the application is indeed ++ // being quiesced. Kubelet will run the command to do a quiesce and wait for the quiescent ++ // probe to return success. + Name string + Action *ContainerNotifierAction ++ // Add a Probe in Phase 2. Probe defined in the core API here will be used. ++ Probe *Probe +} +``` + +### Kubelet Impact in Phase 2 and Beyond + +When moving the logic from the container notifier controller to Kubelet, there will be impact on Kubelet. + +Kubelet watches new Notification resources cluster-wide. + +Kubelet must execute Notification command/signal against containers. CRI changes are required to support signals. + +Kubelet must update Notification status (possible QPS issues trying to update for 100+ of pods). + +Kubelet must have some method of measuring the success/failure of the Notification. +* For exec, it depends on the return value of the ExecInContainer method. +* For signals, it depends on the new CRI changes. + +Kubelet will not retry if the ContainerNotifier call fails. It is up to the external controller who requested the ContainerNotifier to send another request to retry. + +#### CRI Changes + +To support signals directly, we need to make changes in CRI. We can add a new "Signal" Runtime interface with an input parameter "signal string" and handle a specified number of signals, starting with "SIGHUP". For example, if the input parameter "signal" is "SIGHUP", it is a signal to kill the container. + +``` +// Runtime is the interface to execute the commands and provide the streams. +type Runtime interface { + Exec(containerID string, cmd []string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error ++ Signal(containerID string, signal string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error + + Attach(containerID string, in io.Reader, out, err io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error + PortForward(podSandboxID string, port int32, stream io.ReadWriteCloser) error +} +``` + +To support "SIGHUP" directly, we can either convert it to the command "kill -SIGHUP" and send it the same way as other commands, or we can translate that into a call to the docker client method `ContainerKill`. + +``` +https://github.com/moby/moby/blob/master/client/container_kill.go#L9 +``` + +We can add a `SignalKill` method in `~/go/src/k8s.io/kubernetes/pkg/kubelet/dockershim/libdocker/kube_docker_client.go` and call docker client `ContainerKill`, similar to how `CreateExec` method calls docker client `ContainerExecCreate`. + +## Implementation Plan + +### Phase 1 + +Phase 1 API definition will be added to Kubernetes. Implementation in this phase will not happen in Kubelet. Instead, it will be implemented in a single *trusted* controller which acts on Notifications via exec (with the goal being to move that into Kubelet in the next steps). In this phase, only exec will be included and signal will not be included as that involves changes to CRI. + +The controller (container notifier controller) will be located in a separate repo under kubernetes. The repo should be sponsored by sig-node: `https://github.com/kubernetes/containernotifier`. + +### Phase 2 + +* Make CRI changes to support signals. +* Move container notifier controller logic to Kubelet. + +### Phase 3 + +* Add Probe if needed. + +## Example Workflows + +### Example Workflow with Quiesce Hooks + +For example, there are 3 commands that need to run sequentially to quiesce before taking a snapshot of a mysql database. They are lockTables, flushDisk, and fsfreeze. After taking the snapshot, there are 2 commands to run sequentially to unquiesce. They are fsUnfreeze, unlockTables. For simplicity, assume we only need to run fsfreeze for one volume and we only need to run each command in one container in one pod. These commands are defined in "Notifiers []ContainerNotifier" inside the Container. + +1. lockTables +External controller creates a Notification object to request the lockTables ContainerNotifier. +Container notifier controller (or Kubelet in phase 2) watches the Notification object and gets notified. It starts to run the lockTables command specified in the ContainerNotifier and updates the NotificationStatus with the StartTimestamp set for the container in the pod. + +When the command finishes successfully, container notifier controller (or Kubelet in phase 2) sets the Succeed field in the ContainerNotificationStatus to True. +If it fails or times out, container notifier controller (or Kubelet in phase 2) sets the Succeed field to False. + +2. flushDisk +If the lockTables command succeeds, the external controller will proceed to create a Notification object to request the flushDisk ContainerNotifier. + +Container notifier controller (or Kubelet in phase 2) starts to run the flushDisk command specified in the ContainerNotifier and updates the NotificationStatus with the StartTimestamp set for the container in the pod. + +When the flushDisk command finishes successfully, container notifier controller (or Kubelet in phase 2) sets the Succeed field in the ContainerNotificationStatus to True. +If it fails or times out, container notifier controller (or Kubelet in phase 2) sets the Succeed field to False. + +If the lockTables command fails, the external controller will create a Notification object to request unlockTables ContainerNotifier. It will not proceed to the next Notification and the snapshot creation will be marked as failure. + +3. fsfreeze +If the flushDisk command succeeds, the external controller will create a Notification object to request the fsfreeze ContainerNotifier. + +Container notifier controller (or Kubelet in phase 2) starts to run the fsfreeze command specified in the ContainerNotifier and updates the NotificationStatus with the StartTimestamp set for the container in the pod. + +When the fsfreeze command finishes successfully, container notifier controller (or Kubelet in phase 2) sets the Succeed field in the ContainerNotificationStatus to True. +If it fails or times out, container notifier controller (or Kubelet in phase 2) sets the Succeed field to False. + +4. Take snapshot +If the fsfreeze command succeeds, the external controller will proceed to take a snapshot. + +5. fsUnfreeze +After taking the snapshot, the external controller will create a Notification object to request the fsUnfreeze ContainerNotifier. +Even if the snapshot creation fails, the external controller will still create a Notification object to request fsUnfreeze. +If fsFreeze is called, fsUnfreeze should always be called. + +Container notifier controller (or Kubelet in phase 2) starts to run the fsUnfreeze command specified in the ContainerNotifier and updates the NotificationStatus with the StartTimestamp set for the container in the pod. + +When the fsUnfreeze command finishes successfully, container notifier controller (or Kubelet in phase 2) sets the Succeed field in the ContainerNotificationStatus to True. +If it fails or times out, container notifier controller (or Kubelet in phase 2) sets the Succeed field to False. + +6. unlockTables +If the fsUnfreeze command succeeds, the external controller will proceed to create a Notification object to request the unlockTables ContainerNotifier. +If lockTables is called, unlockTables should always be called. + +Container notifier controller (or Kubelet in phase 2) starts to run the unlockTables command specified in the ContainerNotifier and updates the NotificationStatus with the StartTimestamp set for the container in the pod. + +When the unlockTables command finishes successfully, container notifier controller (or Kubelet in phase 2) sets the Succeed field in the ContainerNotificationStatus to True. +If it fails or times out, container notifier controller (or Kubelet in phase 2) sets the Succeed field to False. + +7. It is the external controller's responsibility to make sure unquiesce is always called following a quiesce command for the snapshot use case. This means fsUnfreeze is always called after fsFreeze and unlockTables is always called after lockTables. + +8. The external controller will delete all Notification objects it has created after the commands have completed. + +9. The external controller is also responsible to handle retries if a command fails by creating another Notification object. + +### Example Workflow with Sighup (Phase 2) + +This example involves a signal so it will be in phase 2 and beyond. + +For example, the user wants to send sighup to a container in a Pod. This signal is defined in the ContainterNotifierAction inside the Container. + +External controller creates a Notification object to request the sighup ContainerNotifier. +Kubelet watches the Notification object and gets notified. It sends the sighup signal defined in the ContainerNotifierAction to the container. This is similar to "docker kill --signal=SIGHUP my_container". + +#### With Probe (Phase 3) + +If Probe is implement, Kubelet also sends a probe to check if the container is still running. If the probe detects that the container is stopped, Kubelet sets the Succeed field in ContainerNotificationStatus to True. +If the container is still running, Kubelet will retry probes periodically until it times out. When that happens, it stops retrying and sets the ContainerNotificationStatus Succeed field to False. + +### Example Workflow to Change Log Verbosity (Phase 2) + +This example involves a signal so it will be in phase 2 and beyond. + +For example, the user wants to change log level to verbose in a container in a Pod. This signal is defined in the ContainterNotifierAction inside the Container. + +External controller creates a Notification object to request the ChangeLogLevel ContainerNotifier to change log to verbose. +Kubelet watches the Notification object and gets notified. It sends the ChangeLogLevel to verbose signal defined in the ContainerNotifierAction to the container. + +#### With Probe (Phase 3) + +If Probe is implemented, Kubelet also sends a probe to check the log level in the container. + +If the probe detects that the log level inside the container is indeed verbose, Kubelet sets the Succeed field in ContainerNotificationStatus to True. +If the probe detects that the log level is not changed yet, Kubelet will retry probes periodically until it times out. When that happens, it stops retrying and sets the ContainerNotificationStatus Succeed field to False. + +### Risks and Mitigations + + + +## Test Plan + +### Unit tests + +* Unit tests for container notifier controller will be added in phase 1. + +### E2E tests + +* E2e tests for creating a Notification object to request a ContainerNotifier when the feature flag is enabled. + +## Graduation Criteria + +#### Alpha Graduation + +* Feature Flag is present. +* Container notifier controller is implemented in a sig-node sponsored repo. +* E2E tests are implemented. + +### Alpha -> Beta Graduation + +* One release has been waited to allow for feedback from users. +* A Blog post has been written and published on the Kubernetes blog. + +### Beta -> GA Graduation + +* Gather feedback from users and address feedback. + +## Production Readiness Review Questionnaire + + + + +### Feature Enablement and Rollback + +_This section must be completed when targeting alpha to a release._ + +* **How can this feature be enabled / disabled in a live cluster?** + - [ ] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: + - Components depending on the feature gate: + - [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). + +* **Does enabling the feature change any default behavior?** + Any change of default behavior may be surprising to users or break existing + automations, so be extremely careful here. + +* **Can the feature be disabled once it has been enabled (i.e. can we roll back + the enablement)?** + Also set `disable-supported` to `true` or `false` in `kep.yaml`. + Describe the consequences on existing workloads (e.g., if this is a runtime + feature, can it break the existing applications?). + +* **What happens if we reenable the feature if it was previously rolled back?** + +* **Are there any tests for feature enablement/disablement?** + The e2e framework does not currently support enabling or disabling feature + gates. However, unit tests in each component dealing with managing data, created + with and without the feature, are necessary. At the very least, think about + conversion tests if API types are being modified. + + +### Rollout, Upgrade and Rollback Planning + +_This section must be completed when targeting beta graduation to a release._ + +* **How can a rollout fail? Can it impact already running workloads?** + Try to be as paranoid as possible - e.g., what if some components will restart + mid-rollout? + +* **What specific metrics should inform a rollback?** + +* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?** + Describe manual testing that was done and the outcomes. + Longer term, we may want to require automated upgrade/rollback tests, but we + are missing a bunch of machinery and tooling and can't do that now. + +* **Is the rollout accompanied by any deprecations and/or removals of features, APIs, +fields of API types, flags, etc.?** + Even if applying deprecation policies, they may still surprise some users. + +### Monitoring Requirements + +_This section must be completed when targeting beta graduation to a release._ + +* **How can an operator determine if the feature is in use by workloads?** + Ideally, this should be a metric. Operations against the Kubernetes API (e.g., + checking if there are objects with field X set) may be a last resort. Avoid + logs or events for this purpose. + +* **What are the SLIs (Service Level Indicators) an operator can use to determine +the health of the service?** + - [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: + - [ ] Other (treat as last resort) + - Details: + +* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** + At a high level, this usually will be in the form of "high percentile of SLI + per day <= X". It's impossible to provide comprehensive guidance, but at the very + high level (needs more precise definitions) those may be things like: + - per-day percentage of API calls finishing with 5XX errors <= 1% + - 99% percentile over day of absolute value from (job creation time minus expected + job creation time) for cron job <= 10% + - 99,9% of /health requests per day finish with 200 code + +* **Are there any missing metrics that would be useful to have to improve observability +of this feature?** + Describe the metrics themselves and the reasons why they weren't added (e.g., cost, + implementation difficulties, etc.). + +### Dependencies + +_This section must be completed when targeting beta graduation to a release._ + +* **Does this feature depend on any specific services running in the cluster?** + Think about both cluster-level services (e.g. metrics-server) as well + as node-level agents (e.g. specific version of CRI). Focus on external or + optional services that are needed. For example, if this feature depends on + a cloud provider API, or upon an external software-defined storage or network + control plane. + + For each of these, fill in the following—thinking about running existing user workloads + and creating new ones, as well as about cluster-level services (e.g. DNS): + - [Dependency name] + - Usage description: + - Impact of its outage on the feature: + - Impact of its degraded performance or high-error rates on the feature: + + +### Scalability + +_For alpha, this section is encouraged: reviewers should consider these questions +and attempt to answer them._ + +_For beta, this section is required: reviewers must answer these questions._ + +_For GA, this section is required: approvers should be able to confirm the +previous answers based on experience in the field._ + +* **Will enabling / using this feature result in any new API calls?** + Describe them, providing: + - API call type (e.g. PATCH pods) + - estimated throughput + - originating component(s) (e.g. Kubelet, Feature-X-controller) + focusing mostly on: + - components listing and/or watching resources they didn't before + - API calls that may be triggered by changes of some Kubernetes resources + (e.g. update of object X triggers new updates of object Y) + - periodic API calls to reconcile state (e.g. periodic fetching state, + heartbeats, leader election, etc.) + +* **Will enabling / using this feature result in introducing new API types?** + Describe them, providing: + - API type + - Supported number of objects per cluster + - Supported number of objects per namespace (for namespace-scoped objects) + +* **Will enabling / using this feature result in any new calls to the cloud +provider?** + +* **Will enabling / using this feature result in increasing size or count of +the existing API objects?** + Describe them, providing: + - API type(s): + - Estimated increase in size: (e.g., new annotation of size 32B) + - Estimated amount of new objects: (e.g., new Object X for every existing Pod) + +* **Will enabling / using this feature result in increasing time taken by any +operations covered by [existing SLIs/SLOs]?** + Think about adding additional work or introducing new steps in between + (e.g. need to do X to start a container), etc. Please describe the details. + +* **Will enabling / using this feature result in non-negligible increase of +resource usage (CPU, RAM, disk, IO, ...) in any components?** + Things to keep in mind include: additional in-memory state, additional + non-trivial computations, excessive access to disks (including increased log + volume), significant amount of data sent and/or received over network, etc. + This through this both in small and large cases, again with respect to the + [supported limits]. + +### Troubleshooting + +The Troubleshooting section currently serves the `Playbook` role. We may consider +splitting it into a dedicated `Playbook` document (potentially with some monitoring +details). For now, we leave it here. + +_This section must be completed when targeting beta graduation to a release._ + +* **How does this feature react if the API server and/or etcd is unavailable?** + +* **What are other known failure modes?** + For each of them, fill in the following information by copying the below template: + - [Failure mode brief description] + - Detection: How can it be detected via metrics? Stated another way: + how can an operator troubleshoot without logging into a master or worker node? + - Mitigations: What can be done to stop the bleeding, especially for already + running user workloads? + - Diagnostics: What are the useful log messages and their required logging + levels that could help debug the issue? + Not required until feature graduated to beta. + - Testing: Are there any tests for failure mode? If not, describe why. + +* **What steps should be taken if SLOs are not being met to determine the problem?** + +[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md +[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos + +## Implementation History + +## Drawbacks + +None. + +## Alternatives + +This was initially proposed as a CRD approach in the [ExecutionHook KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20190120-execution-hook-design.md). However this approach means an external controller would be responsible for running exec on a pod. Having Kubelet execute the hooks instead of an external controller would be more secure as an external controller is considerably easier to be compromised than kubelet and be able to arbitrarily exec on any pod. + +## Infrastructure Needed (Optional) + +None. diff --git a/keps/sig-node/1977-container-notifier/kep.yaml b/keps/sig-node/1977-container-notifier/kep.yaml new file mode 100644 index 00000000000..a65cdf94fc2 --- /dev/null +++ b/keps/sig-node/1977-container-notifier/kep.yaml @@ -0,0 +1,50 @@ +title: Container notifier +kep-number: 1977 +authors: + - "@xing-yang" + - "@yuxiangqian" +owning-sig: sig-node +participating-sigs: + - sig-storage +status: implementable +creation-date: 2020-09-20 +reviewers: + - "@thockin" + - "@liggit" + - "@saad-ali" + - "@sjenning" +approvers: + - "@thockin" + - "@liggit" + - "@saad-ali" + - "@derekwaynecarr" + - "@dchen1107" +prr-approvers: + - johnbelamaric +see-also: +replaces: + +# The target maturity stage in the current dev cycle for this KEP. +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.20" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.20" + beta: "v1.21" + stable: "v1.22" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: ContainerNotifier + components: + - kube-apiserver +disable-supported: true + +# The following PRR answers are required at beta release +metrics: