Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support addon Deployments, make heapster a deployment with a nanny. #22893

Merged
merged 4 commits into from
Mar 22, 2016

Conversation

Q-Lee
Copy link
Contributor

@Q-Lee Q-Lee commented Mar 12, 2016

  1. Heapster uses a deployment here.
  2. Changed kube-addon-update.sh to include deployments (this seems like a transitional change while we're moving all addons to deployments).
  3. Adding a nanny to resize heapster with the number of nodes.

@k8s-github-robot
Copy link

Labelling this PR as size/L

@k8s-github-robot k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 12, 2016
@k8s-bot
Copy link

k8s-bot commented Mar 12, 2016

GCE e2e build/test failed for commit 5a3c51892cf210760ba4cbbc40d821e5e7c97b82.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@davidopp
Copy link
Member

Is this for 1.2?

cc/ @bgrant0607 @mikedanese

@bgrant0607
Copy link
Member

No, this can't go into 1.2. That ship has sailed.

@bgrant0607
Copy link
Member

Ref #7459, #9467

@mikedanese mikedanese assigned mikedanese and unassigned davidopp Mar 12, 2016
@Q-Lee
Copy link
Contributor Author

Q-Lee commented Mar 14, 2016

We're aiming for 1.2.1. Adding zml@ for context on kube-addon-update.sh.
cc @zmerlynn

@@ -475,6 +475,7 @@ function update-addons() {
local -r addon_path=$1
# be careful, reconcile-objects uses global variables
reconcile-objects ${addon_path} ReplicationController "-" &
reconcile-objects ${addon_path} Deployment "-" &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment bellow this:

We don't expect names to be versioned for the following kinds

I would expect this to be true for deployments as well.

@Q-Lee
Copy link
Contributor Author

Q-Lee commented Mar 16, 2016

@bgrant0607 Is there anything in particular I need to consider for cluster upgrades if heapster is becoming a deployment?

@k8s-bot
Copy link

k8s-bot commented Mar 17, 2016

GCE e2e build/test failed for commit 82682f5d993c99f33e4d104675fbbb9cb864e1a2.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@bgrant0607
Copy link
Member

What's going to kill off the old RCs?

cc @fgrzadkowski @vishh

- image: gcr.io/google_containers/addon-resizer:1.0
name: heapster-nanny
resources:
limits:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just specify limits, request should be set from limits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#18216 added requests to all addons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the config best practices is:

Don’t specify default values unnecessarily, in order to simplify and minimize configs, and to reduce error.

I don't really understand the argument behind #18216 but it seems to violate that best practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why don't we do this for other singletons (e.g., heapster itself)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of #18216. Can you explain why this is important @gmarek? Here's the defaulting code https://github.com/kubernetes/kubernetes/blob/master/pkg/api/v1/defaults.go#L99-L116

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we were designing QoS we decided to have this defaulting as a backward compatibility feature. I didn't had impression that it's there to stay forever, so I think we should have requests explicit. Generally all those QoS stuff is really confusing as it is today. @piosz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting: I didn't know about that default behavior. It actually conflicts with the addon-resizer, which explicitly sets requests=limits to keep its dependents in the guaranteed class. The addon-resizer also performs an update on any qualitative difference (e.g., requests being expected but not found). This would manifest as a single deployment update at startup.

It feels weird to add a flag to disregard requests, but we'll need to if we want to utilize this default behavior. But since literally all of our addons specify both requests and limits, why don't I cut an issue for 1.3 to rectify both the pod_nanny and all of our addon yamls with best practices?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another proof that the current state is confusing. We should work on that at some point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #23229

@bgrant0607
Copy link
Member

cc @janetkuo @pwittrock

@@ -17,7 +17,8 @@ metadata:
spec:
replicas: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this only has 1 replica, you might consider the Recreate update policy. The default is RollingUpdate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. The code for the container in question lives in contrib/addon-resizer, where that would be an easy (1-line) change. I'll definitely consider it.

@fgrzadkowski
Copy link
Contributor

@Q-Lee Please cc people who owns a particular feature/component if you change its configuration etc. @mwielgus @piosz

fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mwielgus Please verify these values based on your perf tests. AFAIR we need more than 100m cpu in large clusters and 4MB of memory for each node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1.2 cut gives 4MB per node, but leaves CPU at 100m. I've changed this to use the same per-node value as static sizing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 1000 node cluster we are using 50 millis on average with occasional spikes to 500 (every minute for a couple seconds).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this define?

@Q-Lee
Copy link
Contributor Author

Q-Lee commented Mar 17, 2016

@fgrzadkowski how do I know which person/people own a particular configuration file?

@Q-Lee
Copy link
Contributor Author

Q-Lee commented Mar 17, 2016

"What's going to kill off the old RCs?"

Ideally I would think the addon-update system should delete the old replication controller when its configuration is no longer found. Thoughts?

@bgrant0607 @fgrzadkowski @vishh

@k8s-bot
Copy link

k8s-bot commented Mar 17, 2016

GCE e2e build/test failed for commit 1c794555072a5ddd5ef3a26fc5970e6f04ec1c26.

Please reference the list of currently known flakes when examining this failure. If you request a re-test, you must reference the issue describing the flake.

@vishh
Copy link
Contributor

vishh commented Mar 17, 2016

@Q-Lee:
+1 for #22893 (comment)

@k8s-bot
Copy link

k8s-bot commented Mar 22, 2016

GCE e2e build/test passed for commit 726bbc4bfce110601fc17bf568386e413b4bd3de.

@k8s-github-robot
Copy link

PR changed after LGTM, removing LGTM.

@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 22, 2016
@k8s-bot
Copy link

k8s-bot commented Mar 22, 2016

GCE e2e build/test passed for commit 81ba98a.

Q-Lee added a commit that referenced this pull request Mar 22, 2016
Support addon Deployments, make heapster a deployment with a nanny.
@Q-Lee Q-Lee merged commit 1f8773e into kubernetes:master Mar 22, 2016
@@ -481,6 +481,7 @@ function update-addons() {
# That's why we pass an empty string as the version separator.
# If the description differs on disk, the object should be recreated.
# This is not implemented in this version.
reconcile-objects ${addon_path} Deployment "" &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this going to update Deployments? AFAIU name for an object will not change, so how addon-updater will realise it has to do an update? Has this been tested?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @Q-Lee
Are you sure this won't broke addons update procedure?

@k8s-cherrypick-bot
Copy link

Removing label cherrypick-candidate because no release milestone was set. This is an invalid state and thus this PR is not being considered for cherry-pick to any release branch. Please add an appropriate release milestone and then re-add the label.

@zmerlynn zmerlynn added this to the v1.2 milestone Apr 6, 2016
@zmerlynn zmerlynn added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 6, 2016
zmerlynn added a commit that referenced this pull request Apr 6, 2016
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.2" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@andyzheng0831
Copy link

@Q-Lee as I said in #23434 (comment), "Please notify me when you decide to cherry pick." Your change broke us in master branch and I fixed that. Now it was cherry picked in release-1.2 without telling us, so it breaks us again. Thanks to our Jenkins for catching the breakage. Hopefully I will still have time to cherry pick my fix into release-1.2 before 1.2.2 is cut.

shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
shouhong pushed a commit to shouhong/kubernetes that referenced this pull request Feb 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.