Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTL controller for cleaning up finished resources #10064

Merged
merged 3 commits into from
Sep 12, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,59 @@ spec:

Note that both the Job Spec and the [Pod Template Spec](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#detailed-behavior) within the Job have an `activeDeadlineSeconds` field. Ensure that you set this field at the proper level.

## Clean Up Finished Jobs Automatically

Finished Jobs are usually no longer needed in the system. Keeping them around in
the system will put pressure on the API server. If the Jobs are managed directly
by a higher level controller, such as
[CronJobs](/docs/concepts/workloads/controllers/cron-jobs/), the Jobs can be
cleaned up by CronJobs based on the specified capacity-based cleanup policy.

### TTL Mechanism for Finished Jobs

{{< feature-state for_k8s_version="v1.12" state="alpha" >}}

Another way to clean up finished Jobs (either `Complete` or `Failed`)
automatically is to use a TTL mechanism provided by a
[TTL controller](/docs/concepts/workloads/controllers/ttlafterfinished/) for
finished resources, by specifying the `.spec.ttlSecondsAfterFinished` field of
the Job.

When the TTL controller cleans up the Job, it will delete the Job cascadingly,
i.e. delete its dependent objects, such as Pods, together with the Job. Note
that when the Job is deleted, its lifecycle guarantees, such as finalizers, will
be honored.

For example:

```yaml
apiVersion: batch/v1
kind: Job
metadata:
name: pi-with-ttl
spec:
spec:
ttlSecondsAfterFinished: 100
template:
spec:
containers:
- name: pi
image: perl
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
restartPolicy: Never
```

The Job `pi-with-ttl` will be eligible to be automatically deleted, `100`
seconds after it finishes.

If the field is set to `0`, the Job will be eligible to be automatically deleted
immediately after it finishes. If the field is unset, this Job won't be cleaned
up by the TTL controller after it finishes.

Note that this TTL mechanism is alpha, with feature gate `TTLAfterFinished`. For
more information, see the documentation for
[TTL controller](/docs/concepts/workloads/controllers/ttlafterfinished/) for
finished resources.

## Job Patterns

Expand Down
90 changes: 90 additions & 0 deletions content/en/docs/concepts/workloads/controllers/ttlafterfinished.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
reviewers:
- janetkuo
title: TTL Controller for Finished Resources
content_template: templates/concept
weight: 65
---

{{% capture overview %}}

{{< feature-state for_k8s_version="v1.12" state="alpha" >}}

The TTL controller provides a TTL mechanism to limit the lifetime of resource
objects that have finished execution. TTL controller only handles
[Jobs](/docs/concepts/workloads/controllers/jobs-run-to-completion/) for
now, and may be expanded to handle other resources that will finish execution,
janetkuo marked this conversation as resolved.
Show resolved Hide resolved
such as Pods and custom resources.

Alpha Disclaimer: this feature is currently alpha, and can be enabled with
janetkuo marked this conversation as resolved.
Show resolved Hide resolved
[feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
`TTLAfterFinished`.


{{% /capture %}}


{{< toc >}}


{{% capture body %}}

## TTL Controller

The TTL controller only supports Jobs for now. You can use this feature to clean
up finished Jobs (either `Complete` or `Failed`) automatically by specifying the
`.spec.ttlSecondsAfterFinished` field of a Job, as in this
[example](/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically).
The TTL controller will assume that a resource is eligible to be cleaned up
TTL seconds after the resource has finished, i.e. TTL has expired. When the
TTL controller cleans up a resource, it will delete it cascadingly, i.e. delete
its dependent objects together with it. Note that when the resource is deleted,
its lifecycle guarantees, such as finalizers, will be honored.

The TTL seconds can be set at any time. Here are some examples for setting the
`.spec.ttlSecondsAfterFinished` field of a Job:

* Specify this field in the resource manifest, so that a Job can be cleaned up
automatically some time after it finishes.
* Set this field of existing, already finished resources, to adopt this new
feature.
* Use a
[mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
to set this field dynamically at resource creation time. Cluster admins can
use this to enforce a TTL policy for finished resources.
* Use a
[mutating admission webhook](/docs/reference/access-authn-authz/extensible-admission-controllers/#admission-webhooks)
to set this field dynamically after the resource has finished, and choose
different TTL values based on resource status, labels, etc.

## Caveat

### Updating TTL Seconds

Note that the TTL period, e.g. `.spec.ttlSecondsAfterFinished` field of Jobs,
can be modified after the resource is created or has finished. However, once the
Job becomes eligible to be deleted (i.e. the TTL has expired), the system won't
guarantee that the Jobs will be kept, even if an update to extend the TTL
returns a successful API response.

### Time Skew

Because TTL controller uses timestamps stored in the Kubernetes resources to
determine whether the TTL has expired or not, this feature is sensitive to time
skew in the cluster, which may cause TTL controller to clean up resource objects
at the wrong time.

In Kubernetes, it's required to run NTP on all nodes
(see [#6159](https://github.com/kubernetes/kubernetes/issues/6159#issuecomment-93844058))
to avoid time skew. Clocks aren't always correct, but the difference should be
very small. Please be aware of this risk when setting a non-zero TTL.

{{% /capture %}}

{{% capture whatsnext %}}

[Clean up Jobs automatically](/docs/concepts/workloads/controllers/jobs-run-to-completion/#clean-up-finished-jobs-automatically)

[Design doc](https://github.com/kubernetes/community/blob/master/keps/sig-apps/0026-ttl-after-finish.md)

{{% /capture %}}
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ different Kubernetes components.
| `TokenRequest` | `True` | Beta | 1.12 | |
| `TokenRequestProjection` | `false` | Alpha | 1.11 | 1.11 |
| `TokenRequestProjection` | `True` | Beta | 1.12 | |
| `TTLAfterFinished` | `false` | Alpha | 1.12 | |
| `VolumeScheduling` | `false` | Alpha | 1.9 | 1.9 |
| `VolumeScheduling` | `true` | Beta | 1.10 | |
| `VolumeSubpathEnvExpansion` | `false` | Alpha | 1.11 | |
Expand Down Expand Up @@ -252,6 +253,7 @@ Each feature gate is designed for enabling/disabling a specific feature:
- `TokenRequest`: Enable the `TokenRequest` endpoint on service account resources.
- `TokenRequestProjection`: Enable the injection of service account tokens into
a Pod through the [`projected` volume](/docs/concepts/storage/volumes/#projected).
- `TTLAfterFinished`: Allow a [TTL controller](/docs/concepts/workloads/controllers/ttlafterfinished/) to clean up resources after they finish execution.
- `VolumeScheduling`: Enable volume topology aware scheduling and make the
PersistentVolumeClaim (PVC) binding aware of scheduling decisions. It also
enables the usage of [`local`](/docs/concepts/storage/volumes/#local) volume
Expand Down