Skip to content

Conversation

@rphillips
Copy link
Contributor

@rphillips rphillips commented Feb 11, 2020

Cherry pick of: #1450

Kubelet and Crio are running at around 250-500 MB each (on default installs). This PR bumps the limit to 1 GB to allow for a bit of headroom to preserve some of the Kernel cache as well. If we don't bump the limit then memory pressure on the node could flush some of the kernel cache resulting in the kernel trying to re-read the cache. This flood of IOPS can be throttled by the cloud providers which results in a kernel pause.

- What I did

- How to verify it

- Description for the changelog

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Feb 11, 2020
@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1801824, which is invalid:

  • expected dependent Bugzilla bug 1800319 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is POST instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

[release-4.3] Bug 1801824: kubelet: add more kube reservation to protect node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 11, 2020
@rphillips rphillips requested a review from sjenning February 11, 2020 17:07
@kikisdeliveryservice
Copy link
Contributor

/assign @sjenning @mrunalp

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 11, 2020
@rphillips
Copy link
Contributor Author

/lgtm

@openshift-ci-robot
Copy link
Contributor

@rphillips: you cannot LGTM your own PR.

Details

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 14, 2020
@rphillips rphillips closed this Feb 25, 2020
@rphillips rphillips reopened this Mar 6, 2020
@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1801824, which is invalid:

  • expected dependent Bugzilla bug 1800319 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead
  • expected dependent Bugzilla bug 1802687 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead
  • expected dependent Bugzilla bug 1806786 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead
  • expected dependent Bugzilla bug 1808429 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead
  • expected dependent Bugzilla bug 1810136 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is NEW instead
  • expected dependent Bugzilla bug 1808444 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is CLOSED (DUPLICATE) instead
  • expected dependent Bugzilla bug 1800319 to target the "4.4.0" release, but it targets "4.5.0" instead
  • expected dependent Bugzilla bug 1802687 to target the "4.4.0" release, but it targets "4.5.0" instead
  • expected dependent Bugzilla bug 1808429 to target the "4.4.0" release, but it targets "4.3.z" instead
  • expected dependent Bugzilla bug 1810136 to target the "4.4.0" release, but it targets "4.2.z" instead
  • expected dependent Bugzilla bug 1808444 to target the "4.4.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

[release-4.3] Bug 1801824: kubelet: add more kube reservation to protect node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips rphillips force-pushed the fixes/add_kube_reservation_4.3 branch from 1884020 to 23663a3 Compare March 6, 2020 18:12
@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Mar 6, 2020
@rphillips
Copy link
Contributor Author

/hold cancel
/bugzilla refresh

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 6, 2020
@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1801824, which is invalid:

  • expected dependent Bugzilla bug 1800319 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead
  • expected dependent Bugzilla bug 1802687 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead
  • expected dependent Bugzilla bug 1806786 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead
  • expected dependent Bugzilla bug 1808429 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead
  • expected dependent Bugzilla bug 1810136 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is NEW instead
  • expected dependent Bugzilla bug 1808444 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is CLOSED (DUPLICATE) instead
  • expected dependent Bugzilla bug 1800319 to target the "4.4.0" release, but it targets "4.5.0" instead
  • expected dependent Bugzilla bug 1802687 to target the "4.4.0" release, but it targets "4.5.0" instead
  • expected dependent Bugzilla bug 1808429 to target the "4.4.0" release, but it targets "4.3.z" instead
  • expected dependent Bugzilla bug 1810136 to target the "4.4.0" release, but it targets "4.2.z" instead
  • expected dependent Bugzilla bug 1808444 to target the "4.4.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/hold cancel
/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1801824, which is invalid:

  • expected dependent Bugzilla bug 1806786 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mrunalp
Copy link
Member

mrunalp commented Mar 6, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 6, 2020
@runcom
Copy link
Member

runcom commented Mar 6, 2020

/retest

1 similar comment
@rphillips
Copy link
Contributor Author

/retest

@kikisdeliveryservice
Copy link
Contributor

/skip

@eparis
Copy link
Member

eparis commented Mar 9, 2020

/hold
I do not yet understand how this fixes a problem correctly. Nor do I think stealing 500M x EveryServer of schedulable ram is a nice thing to do in a z-stream.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 9, 2020
@eparis
Copy link
Member

eparis commented Mar 9, 2020

The first comment needs to explain why we are doing this. How it fixes what it fixes etc. If I look in a git log I need to be convinced this is the right thing to have done.

@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1801824, which is invalid:

  • expected dependent Bugzilla bug 1806786 to be in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but it is ON_QA instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

[release-4.3] Bug 1801824: kubelet: add more kube reservation to protect node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rphillips
Copy link
Contributor Author

@eparis updated

@rphillips
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Apr 28, 2020
@openshift-ci-robot
Copy link
Contributor

@rphillips: This pull request references Bugzilla bug 1801824, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.3.z) matches configured target release for branch (4.3.z)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1806786 is in the state VERIFIED, which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA))
  • dependent Bugzilla bug 1806786 targets the "4.4.0" release, which is one of the valid target releases: 4.4.0, 4.4.z
  • bug has dependents
Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Apr 28, 2020
@rphillips
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 28, 2020
this reserves more system reserved headroom for kubelet, crio, and the
kernel to safely operate in.
@rphillips rphillips force-pushed the fixes/add_kube_reservation_4.3 branch from 23663a3 to ab2ee0c Compare April 29, 2020 14:15
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2020
@haircommander
Copy link
Member

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander, kikisdeliveryservice, mrunalp, rphillips

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@rphillips rphillips changed the title [release-4.3] Bug 1801824: kubelet: add more kube reservation to protect node [release-4.3] Bug 1801824: kubelet: add more system reservation to protect node Apr 30, 2020
@sdodson sdodson added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label May 1, 2020
@sdodson
Copy link
Member

sdodson commented May 1, 2020

After a lengthy Slack conversation @smarterclayton @jim-minter and @rphillips have made a strong case for this change even if the full details of the telemetry examination didn't make it into the commit message. In the future we should make sure that sort of analysis which justifies clawing back 500MiB of memory is safe to do makes it into git commit.

cherry-pick-approved

@kikisdeliveryservice
Copy link
Contributor

level=error msg="Error: Error applying IAM policy to project \"openshift-gce-devel-ci\": Too many conflicts.  Latest error: Error setting IAM policy for project \"openshift-gce-devel-ci\": googleapi: Error 409: There were concurrent policy changes. Please retry the whole read-modify-write with exponential backoff., aborted"
level=error
level=error msg="  on ../tmp/openshift-install-981833611/iam/main.tf line 6, in resource \"google_project_iam_member\" \"worker-compute-viewer\":"
level=error msg="   6: resource \"google_project_iam_member\" \"worker-compute-viewer\" {" 

/test e2e-gcp-upgrade

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 9ddb819 into openshift:release-4.3 May 1, 2020
@openshift-ci-robot
Copy link
Contributor

@rphillips: All pull requests linked via external trackers have merged: openshift/machine-config-operator#1458. Bugzilla bug 1801824 has been moved to the MODIFIED state.

Details

In response to this:

[release-4.3] Bug 1801824: kubelet: add more system reservation to protect node

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.