Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Already API-evicted Pods do not get evicted by the kubelet eviction manager (memory pressure, ephemeral storage pressure) #122297

Closed
FullyScaled opened this issue Dec 13, 2023 · 9 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@FullyScaled
Copy link

FullyScaled commented Dec 13, 2023

What happened?

When pods have a long terminationGracePeriod and get evicted by downscaling or some other reason (via API) the eviction is initiated which respects that long terminationGracePeriod. This works fine.

If the pods need to be evicted due to memory pressure on the node after the already initiated API eviction there is no more eviction done with the specified evictionMaxPodGracePeriod of the kubelet. This means that the second eviction due to e.g. memory pressure cannot be successful at all when you have large terminationGracePeriod on your pod and there was a prior unrelated API eviction.

image

What did you expect to happen?

The eviction manager triggered eviction due to memory pressure should still be issued with the specified evictionMaxPodGracePeriod even if there was a prior API based eviction with a large terminationGracePeriod.

How can we reproduce it (as minimally and precisely as possible)?

  • Create a workload pod that does ignore sig-terms
  • Specify a large terminationGracePeriod for that pod, do not set any memory limits for the pod
  • Specify a small evictionMaxPodGracePeriod for the kubelet on the node
  • API evict the Pod
  • Make sure the kubelet wants to evict the pod (e.g. by allocating memory inside pod until the node is under memory pressure)

Anything else we need to know?

The bug was not present in k8s 1.25.x

Kubernetes version

1.26.7

Cloud provider

Azure

OS version

GardenLinux 934.10.0

Install tools

Container runtime (CRI) and version (if applicable)

containerd/1.6.20

Related plugins (CNI, CSI, ...) and versions (if applicable)

@FullyScaled FullyScaled added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 13, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123
Copy link
Member

/sig node
for triage

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 13, 2023
@kannon92
Copy link
Contributor

Do you have a pod.yaml that can reproduce this?

@FullyScaled
Copy link
Author

FullyScaled commented Dec 14, 2023

Yes, I was able to reproduce it with this minimal example. I had 16GB Nodes (RAM) inside the cluster. If your nodes are smaller or larger you have to adjust the memory allocation in order to trigger the memory pressure eviction.

  • Kubelet settings:
kubelet:
            evictionMaxPodGracePeriod: 20
            evictionSoft:
              memoryAvailable: 25%
            evictionSoftGracePeriod:
              memoryAvailable: 1m0s
apiVersion: v1
kind: Pod
metadata:
  name: eviction-pod
spec:
  containers:
    - name: worker
      image: eu.gcr.io/gardener-project/gardener/ops-toolbelt:latest
      command:
        - bash 
        - -c
        - while true; do echo $(date); sleep 5d; done
      resources:
        requests:
          memory: "2Gi"
          cpu: "0.5"
  terminationGracePeriodSeconds: 3600

Steps to reproduce

  1. Create pod
  2. API-Evict pod (e.g. via kubectl delete eviction-pod -n default)
  3. kubectl exec -it eviction-pod -n default -- bash
  4. cat <(head -c 10000m /dev/zero) <(sleep 10000) | tail
  5. Wait some time

Result: Pod will not get killed by the kubelet eviction manager. If you do the exact same without API evicting the pod in the second step, the kubelet eviction will work.

@kannon92
Copy link
Contributor

Potential duplicate of #118172.

@FullyScaled
Copy link
Author

I am not sure if this is really a duplicate. I was not able to observe any additional logs as "Killing container with a grace period ..." after the prior API-eviction at all in the kubelet logs. So for me it looks like a different issue.

But I guess it would be easy for someone with deeper kubelet knowledge to reproduce this and see if there is really no second eviction attempt or just a second eviction event with the wrong grace period.

@FullyScaled
Copy link
Author

I was able to reproduce this issue with k8s 1.27.x and 1.28.4.

@SergeyKanzhelev
Copy link
Member

/assign

this is indeed looks like a duplicate of the mentioned issue as well as #122222 likely.

I will close it as a dup for now, please comment if you do not agree

/close

@k8s-ci-robot
Copy link
Contributor

@SergeyKanzhelev: Closing this issue.

In response to this:

/assign

this is indeed looks like a duplicate of the mentioned issue as well as #122222 likely.

I will close it as a dup for now, please comment if you do not agree

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Archived in project
Development

No branches or pull requests

5 participants