Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet parameter(eviction-max-pod-grace-period ), not work as expected like officical comment. #118172

Open
rainbowBPF2 opened this issue May 22, 2023 · 14 comments
Assignees
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation. priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@rainbowBPF2
Copy link

rainbowBPF2 commented May 22, 2023

What happened?

Hey brothers , feedback a confusing point about kubelet soft eviction:

Just like kubelet --help show:
--eviction-max-pod-grace-period int32 Maximum allowed grace period (in seconds) to use when terminating pods in response to a soft eviction threshold being met. If negative, defer to pod specified value.

I guess if this parameter is set to negative ,such as -1, soft eviction would use pod specified value(TerminationGracePeriodSeconds),

But When I try to evict pod by creating node pressure(such as memory pressure), I found it's always -1 send to CRI runtime.
Then pod container stopped immediately with sigkill, exit 137.

In short, this parameter:
(1)set as active number:work as expected;
(2)no set, kubelet read its default number:0, work as expected;
(3)set as negative number: not work as kubelet help show.

Thanks for your response. May it be a bug ?

What did you expect to happen?

--eviction-max-pod-grace-period set to negative, then kubelet syncPod logic send pod‘s TerminationGracePeriodSeconds as gracePeriod to CRI runtime to stop container.

How can we reproduce it (as minimally and precisely as possible)?

(1) set this parameter to a negative number, such as -1;
(2) config kubelet soft eviction ;
(3) running a pod asking a lot memory , to trigger node memory pressure;
(4) check CRI runtime log, such as containerd ;
(5) then you will find timeout parameter -1 sent to pod container , then container exit with code 137(sigkill);

Anything else we need to know?

I running these tests on Tencent Cloud TKE.

And I thought it seem not related to Cloud Service Provider.

Kubernetes version

Client Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.3-tke.34", GitCommit:"61cc96d2f7e9277e89e29b4b04d045f27d6e75df", GitTreeState:"clean", BuildDate:"2023-03-08T07:40:04Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.3-tke.32", GitCommit:"8ca5807ae095365d68754a2ba20b10bbe5a8998c", GitTreeState:"clean", BuildDate:"2022-10-08T04:00:33Z", GoVersion:"go1.12.14", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

Tencent Cloud TKE

OS version

AME="Ubuntu" VERSION="18.04.4 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.4 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

Install tools

Container runtime (CRI) and version (if applicable)

containerd --version
containerd github.com/containerd/containerd v1.4.3-tke.2 a11500cbfa0b4d2fc9b905e03c35f349ef5b1a9f

Related plugins (CNI, CSI, ...) and versions (if applicable)

@rainbowBPF2 rainbowBPF2 added the kind/bug Categorizes issue or PR as related to a bug. label May 22, 2023
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 22, 2023
@tzneal
Copy link
Contributor

tzneal commented May 22, 2023

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 22, 2023
@SergeyKanzhelev
Copy link
Member

/triage accepted

The question is whether we fix code and make the backward incompatible change or fix a doc. I'd vote for a doc change which is less work. But blocks an interesting scenarios.

@rainbowBPF2 were you looking for the described behavior? Does your scenario requires this?

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 24, 2023
@rainbowBPF2
Copy link
Author

rainbowBPF2 commented May 25, 2023

@SergeyKanzhelev

Thanks for brothers‘ attention on this point.

  1. For daily business running, temporarily no bad effects, and seems not required. Doc change is also a choice.
  2. But look at it in the long run, it will be an amazing/exciting feature.
    Because we have a more reasonable choice to terminate pods in case of soft-eviction, to realize Pod-Graceful-Shutdown

In short,no required but deep expected, 😄。

@fgksgf
Copy link

fgksgf commented May 25, 2023

/assign

@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels May 24, 2024
@kannon92 kannon92 moved this to Triaged in SIG Node Bugs Jul 22, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 22, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 21, 2024
@haircommander haircommander moved this from Triaged to Triage in SIG Node Bugs Oct 23, 2024
@haircommander
Copy link
Contributor

/triage accepted
/kind documentation
/priority backlog

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Oct 23, 2024
@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 23, 2024
@haircommander
Copy link
Contributor

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 23, 2024
@SergeyKanzhelev
Copy link
Member

/help
/good-first-issue

At this stage I doubt we will be fixing the inconsistency by changing code and need to update the field documentation instead.

If you want to take this issue:

  1. List set of steps you performed with all the commands output to reproduce this inconsistency
  2. Send a PR to update the description of this field.
  3. If it is defined in both - command line and kubelet config, make sure both are tested and updated

@k8s-ci-robot
Copy link
Contributor

@SergeyKanzhelev:
This request has been marked as suitable for new contributors.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/help
/good-first-issue

At this stage I doubt we will be fixing the inconsistency by changing code and need to update the field documentation instead.

If you want to take this issue:

  1. List set of steps you performed with all the commands output to reproduce this inconsistency
  2. Send a PR to update the description of this field.
  3. If it is defined in both - command line and kubelet config, make sure both are tested and updated

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Oct 23, 2024
@ArpanSolanki29
Copy link

/Assign

@anneheartrecord
Copy link

/assign

@potato-boii
Copy link

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. kind/documentation Categorizes issue or PR as related to documentation. priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Triaged
Development

Successfully merging a pull request may close this issue.

10 participants