Already API-evicted Pods do not get evicted by the kubelet eviction manager (memory pressure, ephemeral storage pressure) #122297

FullyScaled · 2023-12-13T14:57:47Z

What happened?

When pods have a long terminationGracePeriod and get evicted by downscaling or some other reason (via API) the eviction is initiated which respects that long terminationGracePeriod. This works fine.

If the pods need to be evicted due to memory pressure on the node after the already initiated API eviction there is no more eviction done with the specified evictionMaxPodGracePeriod of the kubelet. This means that the second eviction due to e.g. memory pressure cannot be successful at all when you have large terminationGracePeriod on your pod and there was a prior unrelated API eviction.

What did you expect to happen?

The eviction manager triggered eviction due to memory pressure should still be issued with the specified evictionMaxPodGracePeriod even if there was a prior API based eviction with a large terminationGracePeriod.

How can we reproduce it (as minimally and precisely as possible)?

Create a workload pod that does ignore sig-terms
Specify a large terminationGracePeriod for that pod, do not set any memory limits for the pod
Specify a small evictionMaxPodGracePeriod for the kubelet on the node
API evict the Pod
Make sure the kubelet wants to evict the pod (e.g. by allocating memory inside pod until the node is under memory pressure)

Anything else we need to know?

The bug was not present in k8s 1.25.x

Kubernetes version

1.26.7

Cloud provider

Azure

OS version

GardenLinux 934.10.0

Install tools

Container runtime (CRI) and version (if applicable)

containerd/1.6.20

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot · 2023-12-13T14:57:57Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

neolit123 · 2023-12-13T15:26:35Z

/sig node
for triage

kannon92 · 2023-12-13T17:14:42Z

Do you have a pod.yaml that can reproduce this?

FullyScaled · 2023-12-14T17:11:08Z

Yes, I was able to reproduce it with this minimal example. I had 16GB Nodes (RAM) inside the cluster. If your nodes are smaller or larger you have to adjust the memory allocation in order to trigger the memory pressure eviction.

Kubelet settings:

kubelet:
            evictionMaxPodGracePeriod: 20
            evictionSoft:
              memoryAvailable: 25%
            evictionSoftGracePeriod:
              memoryAvailable: 1m0s

apiVersion: v1
kind: Pod
metadata:
  name: eviction-pod
spec:
  containers:
    - name: worker
      image: eu.gcr.io/gardener-project/gardener/ops-toolbelt:latest
      command:
        - bash 
        - -c
        - while true; do echo $(date); sleep 5d; done
      resources:
        requests:
          memory: "2Gi"
          cpu: "0.5"
  terminationGracePeriodSeconds: 3600

Steps to reproduce

Create pod
API-Evict pod (e.g. via kubectl delete eviction-pod -n default)
kubectl exec -it eviction-pod -n default -- bash
cat <(head -c 10000m /dev/zero) <(sleep 10000) | tail
Wait some time

Result: Pod will not get killed by the kubelet eviction manager. If you do the exact same without API evicting the pod in the second step, the kubelet eviction will work.

kannon92 · 2023-12-14T17:21:32Z

Potential duplicate of #118172.

FullyScaled · 2023-12-15T12:26:14Z

I am not sure if this is really a duplicate. I was not able to observe any additional logs as "Killing container with a grace period ..." after the prior API-eviction at all in the kubelet logs. So for me it looks like a different issue.

But I guess it would be easy for someone with deeper kubelet knowledge to reproduce this and see if there is really no second eviction attempt or just a second eviction event with the wrong grace period.

FullyScaled · 2023-12-18T12:49:37Z

I was able to reproduce this issue with k8s 1.27.x and 1.28.4.

SergeyKanzhelev · 2024-01-03T18:50:55Z

/assign

this is indeed looks like a duplicate of the mentioned issue as well as #122222 likely.

I will close it as a dup for now, please comment if you do not agree

/close

k8s-ci-robot · 2024-01-03T18:51:01Z

@SergeyKanzhelev: Closing this issue.

In response to this:

/assign

this is indeed looks like a duplicate of the mentioned issue as well as #122222 likely.

I will close it as a dup for now, please comment if you do not agree

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

FullyScaled added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2023

k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 13, 2023

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 13, 2023

k8s-ci-robot assigned SergeyKanzhelev Jan 3, 2024

k8s-ci-robot closed this as completed Jan 3, 2024

kannon92 added this to SIG Node Bugs Jul 22, 2024

kannon92 moved this to Done in SIG Node Bugs Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Already API-evicted Pods do not get evicted by the kubelet eviction manager (memory pressure, ephemeral storage pressure) #122297

Already API-evicted Pods do not get evicted by the kubelet eviction manager (memory pressure, ephemeral storage pressure) #122297

FullyScaled commented Dec 13, 2023 •

edited

Loading

k8s-ci-robot commented Dec 13, 2023

neolit123 commented Dec 13, 2023

kannon92 commented Dec 13, 2023

FullyScaled commented Dec 14, 2023 •

edited

Loading

kannon92 commented Dec 14, 2023

FullyScaled commented Dec 15, 2023

FullyScaled commented Dec 18, 2023

SergeyKanzhelev commented Jan 3, 2024

k8s-ci-robot commented Jan 3, 2024

Already API-evicted Pods do not get evicted by the kubelet eviction manager (memory pressure, ephemeral storage pressure) #122297

Already API-evicted Pods do not get evicted by the kubelet eviction manager (memory pressure, ephemeral storage pressure) #122297

Comments

FullyScaled commented Dec 13, 2023 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Dec 13, 2023

neolit123 commented Dec 13, 2023

kannon92 commented Dec 13, 2023

FullyScaled commented Dec 14, 2023 • edited Loading

Steps to reproduce

kannon92 commented Dec 14, 2023

FullyScaled commented Dec 15, 2023

FullyScaled commented Dec 18, 2023

SergeyKanzhelev commented Jan 3, 2024

k8s-ci-robot commented Jan 3, 2024

FullyScaled commented Dec 13, 2023 •

edited

Loading

FullyScaled commented Dec 14, 2023 •

edited

Loading