From 9b811dc2c9ff56e74943421f6281d4f049436122 Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Thu, 23 Aug 2018 15:09:23 -0700 Subject: [PATCH 1/3] TTL controller for cleaning up finished resources --- .../controllers/jobs-run-to-completion.md | 45 +++++++++++ .../workloads/controllers/ttlafterfinished.md | 80 +++++++++++++++++++ .../feature-gates.md | 2 + 3 files changed, 127 insertions(+) create mode 100644 content/en/docs/concepts/workloads/controllers/ttlafterfinished.md diff --git a/content/en/docs/concepts/workloads/controllers/jobs-run-to-completion.md b/content/en/docs/concepts/workloads/controllers/jobs-run-to-completion.md index 1b0c5a388b5fb..c8e9364d0ae9d 100644 --- a/content/en/docs/concepts/workloads/controllers/jobs-run-to-completion.md +++ b/content/en/docs/concepts/workloads/controllers/jobs-run-to-completion.md @@ -247,6 +247,51 @@ spec: Note that both the Job Spec and the [Pod Template Spec](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior) within the Job have an `activeDeadlineSeconds` field. Ensure that you set this field at the proper level. +## Clean Up Finished Jobs Automatically + +Finished Jobs are usually no longer needed in the system. Keeping them around in +the system will put pressure on API server. If the Jobs are managed directly by +a higher level controller, such as +[CronJobs](/docs/concepts/workloads/controllers/cron-jobs/), the Jobs can be +cleaned up by CronJobs based on specified cleanup policy. + +Another way to clean up finished Jobs (either `Complete` or `Failed`) +automatically is to use a TTL mechanism provided by a +[TTL controller](/docs/concepts/workloads/controllers/ttlafterfinished/) for +finished resources, by specifying the `.spec.ttlSecondsAfterFinished` field of +the Job. + +For example: + +```yaml +apiVersion: batch/v1 +kind: Job +metadata: + name: pi-with-ttl +spec: +spec: + ttlSecondsAfterFinished: 100 + template: + spec: + containers: + - name: pi + image: perl + command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"] + restartPolicy: Never +``` + +The Job `pi-with-ttl` will be eligible to be automatically deleted, `100` +seconds after it finishes. Note that when the Job is deleted, its lifecycle +guarantees, such as finalizers, will be honored. + +If the field is set to `0`, the Job will be eligible to be automatically deleted +immediately after it finishes. If the field is unset, this Jobs won't be cleaned +up by the TTL controller after it finishes. + +Note that this TTL mechanism is alpha, with feature gate `TTLAfterFinished`. For +more information, see the documentation for +[TTL controller](/docs/concepts/workloads/controllers/ttlafterfinished/) for +finished resources. ## Job Patterns diff --git a/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md b/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md new file mode 100644 index 0000000000000..7a6fb767125eb --- /dev/null +++ b/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md @@ -0,0 +1,80 @@ +--- +reviewers: +- janetkuo +title: TTL Controller for Finished Resources +content_template: templates/concept +weight: ?? +--- + +{{% capture overview %}} + +The TTL controller provides a TTL mechanism to limit the lifetime of resource +objects that have finished execution. Currently, TTL controller only handles +[Jobs](/docs/concepts/workloads/controllers/jobs-run-to-completion/) for +now, and may be expanded to handle other resources that will finish execution, +such as Pods and custom resources. + +Alpha Disclaimer: this feature is currently alpha, and can be enabled with +[feature gate](/docs/reference/command-line-tools-reference/feature-gates/) +`TTLAfterFinished`. + + +{{% /capture %}} + + +{{< toc >}} + + +{{% capture body %}} + +## TTL Controller + +The TTL controller only supports Jobs for now. You can use this feature to clean +up finished Jobs (either `Complete` or `Failed`) automatically by specifying the +`.spec.ttlSecondsAfterFinished` field of a Job, +see [example](/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically). +The TTL controller will assume that a resource is eligible to be cleaned up +TTL seconds after the resource has finished, i.e. TTL has expired. When the +resource is deleted, its lifecycle guarantees, such as finalizers, will be +honored. + +The TTL seconds can be set at any time -- for example, you can specify it in the +resource manifest, set it at resource creation time, or set it after the +resource has finished. You can also use +[mutating admission webhooks](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) +to set this field dynamically. + +In the future, we plan to expand TTL controller to other resources that will +finish execution, such as Pods and custom resources. + +## Caveat + +### Updating TTL Seconds + +Note that the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs, +can be modified after the resource is created or has finished. However, once the +Job becomes eligible to be deleted (i.e. the TTL has expired), the system won't +guarantee that the Jobs will be kept, even if an update to extend the TTL +returns a successful API response. + +### Time Skew + +Because TTL controller uses timestamps stored in the Kubernetes resources to +determine whether the TTL has expired or not, this feature is sensitive to time +skew in the cluster, which may cause TTL controller to clean up resource objects +at the wrong time. + +In Kubernetes, it's required to run NTP on all nodes +(see [#6159](https://github.com/kubernetes/kubernetes/issues/6159#issuecomment-93844058)) +to avoid time skew. Clocks aren't always correct, but the difference should be +very small. Please be aware of this risk when setting a non-zero TTL. + +{{% /capture %}} + +{{% capture whatsnext %}} + +[Clean up Jobs automatically](/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically) + +[Design doc](https://github.com/kubernetes/community/blob/master/keps/sig-apps/0026-ttl-after-finish.md) + +{{% /capture %}} diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md index a0edc06bbe413..b4f0118901fe1 100644 --- a/content/en/docs/reference/command-line-tools-reference/feature-gates.md +++ b/content/en/docs/reference/command-line-tools-reference/feature-gates.md @@ -107,6 +107,7 @@ different Kubernetes components. | `TokenRequest` | `True` | Beta | 1.12 | | | `TokenRequestProjection` | `false` | Alpha | 1.11 | 1.11 | | `TokenRequestProjection` | `True` | Beta | 1.12 | | +| `TTLAfterFinished` | `false` | Alpha | 1.12 | | | `VolumeScheduling` | `false` | Alpha | 1.9 | 1.9 | | `VolumeScheduling` | `true` | Beta | 1.10 | | | `VolumeSubpathEnvExpansion` | `false` | Alpha | 1.11 | | @@ -252,6 +253,7 @@ Each feature gate is designed for enabling/disabling a specific feature: - `TokenRequest`: Enable the `TokenRequest` endpoint on service account resources. - `TokenRequestProjection`: Enable the injection of service account tokens into a Pod through the [`projected` volume](/docs/concepts/storage/volumes/#projected). +- `TTLAfterFinished`: Allow a [TTL controller](/docs/concepts/workloads/controllers/ttlafterfinished/) to clean up resources after they finish execution. - `VolumeScheduling`: Enable volume topology aware scheduling and make the PersistentVolumeClaim (PVC) binding aware of scheduling decisions. It also enables the usage of [`local`](/docs/concepts/storage/volumes/#local) volume From 65fadf3618deb1f13376e9d901420357b0d4a8eb Mon Sep 17 00:00:00 2001 From: Janet Kuo Date: Mon, 10 Sep 2018 17:19:45 -0700 Subject: [PATCH 2/3] Address comments --- .../controllers/jobs-run-to-completion.md | 20 +++++++--- .../workloads/controllers/ttlafterfinished.md | 40 ++++++++++++------- 2 files changed, 39 insertions(+), 21 deletions(-) diff --git a/content/en/docs/concepts/workloads/controllers/jobs-run-to-completion.md b/content/en/docs/concepts/workloads/controllers/jobs-run-to-completion.md index c8e9364d0ae9d..39324b6b2800f 100644 --- a/content/en/docs/concepts/workloads/controllers/jobs-run-to-completion.md +++ b/content/en/docs/concepts/workloads/controllers/jobs-run-to-completion.md @@ -250,10 +250,14 @@ Note that both the Job Spec and the [Pod Template Spec](https://kubernetes.io/do ## Clean Up Finished Jobs Automatically Finished Jobs are usually no longer needed in the system. Keeping them around in -the system will put pressure on API server. If the Jobs are managed directly by -a higher level controller, such as +the system will put pressure on the API server. If the Jobs are managed directly +by a higher level controller, such as [CronJobs](/docs/concepts/workloads/controllers/cron-jobs/), the Jobs can be -cleaned up by CronJobs based on specified cleanup policy. +cleaned up by CronJobs based on the specified capacity-based cleanup policy. + +### TTL Mechanism for Finished Jobs + +{{< feature-state for_k8s_version="v1.12" state="alpha" >}} Another way to clean up finished Jobs (either `Complete` or `Failed`) automatically is to use a TTL mechanism provided by a @@ -261,6 +265,11 @@ automatically is to use a TTL mechanism provided by a finished resources, by specifying the `.spec.ttlSecondsAfterFinished` field of the Job. +When the TTL controller cleans up the Job, it will delete the Job cascadingly, +i.e. delete its dependent objects, such as Pods, together with the Job. Note +that when the Job is deleted, its lifecycle guarantees, such as finalizers, will +be honored. + For example: ```yaml @@ -281,11 +290,10 @@ spec: ``` The Job `pi-with-ttl` will be eligible to be automatically deleted, `100` -seconds after it finishes. Note that when the Job is deleted, its lifecycle -guarantees, such as finalizers, will be honored. +seconds after it finishes. If the field is set to `0`, the Job will be eligible to be automatically deleted -immediately after it finishes. If the field is unset, this Jobs won't be cleaned +immediately after it finishes. If the field is unset, this Job won't be cleaned up by the TTL controller after it finishes. Note that this TTL mechanism is alpha, with feature gate `TTLAfterFinished`. For diff --git a/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md b/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md index 7a6fb767125eb..6a5b418a2c74e 100644 --- a/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md +++ b/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md @@ -3,13 +3,15 @@ reviewers: - janetkuo title: TTL Controller for Finished Resources content_template: templates/concept -weight: ?? +weight: 65 --- {{% capture overview %}} +{{< feature-state for_k8s_version="v1.12" state="alpha" >}} + The TTL controller provides a TTL mechanism to limit the lifetime of resource -objects that have finished execution. Currently, TTL controller only handles +objects that have finished execution. TTL controller only handles [Jobs](/docs/concepts/workloads/controllers/jobs-run-to-completion/) for now, and may be expanded to handle other resources that will finish execution, such as Pods and custom resources. @@ -31,21 +33,29 @@ Alpha Disclaimer: this feature is currently alpha, and can be enabled with The TTL controller only supports Jobs for now. You can use this feature to clean up finished Jobs (either `Complete` or `Failed`) automatically by specifying the -`.spec.ttlSecondsAfterFinished` field of a Job, -see [example](/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically). +`.spec.ttlSecondsAfterFinished` field of a Job, as in this +[example](/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically). The TTL controller will assume that a resource is eligible to be cleaned up TTL seconds after the resource has finished, i.e. TTL has expired. When the -resource is deleted, its lifecycle guarantees, such as finalizers, will be -honored. - -The TTL seconds can be set at any time -- for example, you can specify it in the -resource manifest, set it at resource creation time, or set it after the -resource has finished. You can also use -[mutating admission webhooks](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) -to set this field dynamically. - -In the future, we plan to expand TTL controller to other resources that will -finish execution, such as Pods and custom resources. +TTL controller cleans up a resource, it will delete it cascadingly, i.e. delete +its dependent objects together with it. Note that when the resource is deleted, +its lifecycle guarantees, such as finalizers, will be honored. + +The TTL seconds can be set at any time. Here are some examples for setting the +`.spec.ttlSecondsAfterFinished` field of a Job: + +* Specify this field in the resource manifest, so that a Job can be cleaned up + automatically some time after it finishes. +* Set this field of existing, already finished resources, to adopt this new + feature. +* Use a + [mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) + to set this field dynamically at resource creation time. Cluster admins can + use this to enforce a TTL policy for finished resources. +* Use a + [mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) + to set this field dynamically after the resource has finished, and choose + different TTL values based on resource status, labels, etc. ## Caveat From 76979c40f511c73edf3f8ac113f2d8eb1b3e2535 Mon Sep 17 00:00:00 2001 From: Zach Arnold Date: Wed, 12 Sep 2018 09:46:21 -0700 Subject: [PATCH 3/3] Update ttlafterfinished.md --- .../concepts/workloads/controllers/ttlafterfinished.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md b/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md index 6a5b418a2c74e..1f8e355ff8325 100644 --- a/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md +++ b/content/en/docs/concepts/workloads/controllers/ttlafterfinished.md @@ -31,12 +31,12 @@ Alpha Disclaimer: this feature is currently alpha, and can be enabled with ## TTL Controller -The TTL controller only supports Jobs for now. You can use this feature to clean +The TTL controller only supports Jobs for now. A cluster operator can use this feature to clean up finished Jobs (either `Complete` or `Failed`) automatically by specifying the `.spec.ttlSecondsAfterFinished` field of a Job, as in this [example](/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically). The TTL controller will assume that a resource is eligible to be cleaned up -TTL seconds after the resource has finished, i.e. TTL has expired. When the +TTL seconds after the resource has finished, in other words, when the TTL has expired. When the TTL controller cleans up a resource, it will delete it cascadingly, i.e. delete its dependent objects together with it. Note that when the resource is deleted, its lifecycle guarantees, such as finalizers, will be honored. @@ -50,7 +50,7 @@ The TTL seconds can be set at any time. Here are some examples for setting the feature. * Use a [mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) - to set this field dynamically at resource creation time. Cluster admins can + to set this field dynamically at resource creation time. Cluster administrators can use this to enforce a TTL policy for finished resources. * Use a [mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks) @@ -63,7 +63,7 @@ The TTL seconds can be set at any time. Here are some examples for setting the Note that the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs, can be modified after the resource is created or has finished. However, once the -Job becomes eligible to be deleted (i.e. the TTL has expired), the system won't +Job becomes eligible to be deleted (when the TTL has expired), the system won't guarantee that the Jobs will be kept, even if an update to extend the TTL returns a successful API response.