Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't block scale down on pods owned by custom controllers #5387

Closed
x13n opened this issue Dec 27, 2022 · 5 comments · Fixed by #5507
Closed

Don't block scale down on pods owned by custom controllers #5387

x13n opened this issue Dec 27, 2022 · 5 comments · Fixed by #5507
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature.

Comments

@x13n
Copy link
Member

x13n commented Dec 27, 2022

Which component are you using?:

/label cluster-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

As the k8s ecosystem grows, a hardcoded list currently used by CA is not going to work with custom controllers. In

func GetPodsForDeletionOnNodeDrain(
, only a few controllers are handled. We could extend CA to treat all ownerReferences with controller: true as replicated. In case of controllers that do not recreate evicted pods, existing annotations could be used (cluster-autoscaler.kubernetes.io/safe-to-evict: false or cluster-autoscaler.kubernetes.io/daemonset-pod: true) to prevent CA from evicting such pods.

Describe the solution you'd like.:

Just look at controller field in owner references and treat all controller-owned pods as replicated by default.

Describe any alternative solutions you've considered.:

Add some extension mechanism (CRD?) allowing CA to know how to handle specific controllers. This might be useful in the long term, but will require much more work.

Additional context.:

@x13n x13n added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 27, 2022
@k8s-ci-robot
Copy link
Contributor

@x13n: The label(s) `/label cluster-autoscaler

cannot be applied. These labels are supported:api-review, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, team/katacoda, refactor. Is this label configured under labels -> additional_labelsorlabels -> restricted_labelsinplugin.yaml`?

In response to this:

Which component are you using?:

/label cluster-autoscaler

Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:

As the k8s ecosystem grows, a hardcoded list currently used by CA is not going to work with custom controllers. In

func GetPodsForDeletionOnNodeDrain(
, only a few controllers are handled. We could extend CA to treat all ownerReferences with controller: true as replicated. In case of controllers that do not recreate evicted pods, existing annotations could be used (cluster-autoscaler.kubernetes.io/safe-to-evict: false or cluster-autoscaler.kubernetes.io/daemonset-pod: true) to prevent CA from evicting such pods.

Describe the solution you'd like.:

Just look at controller field in owner references and treat all controller-owned pods as replicated by default.

Describe any alternative solutions you've considered.:

Add some extension mechanism (CRD?) allowing CA to know how to handle specific controllers. This might be useful in the long term, but will require much more work.

Additional context.:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Bryce-Soghigian
Copy link
Member

@vadasambar Are you still planning an approach for this or can I pick it up?

@vadasambar
Copy link
Member

vadasambar commented Jan 11, 2023

@vadasambar Are you still planning an approach for this or can I pick it up?

Thank you for noticing I was looking at this issue. I am new to cluster-autoscaler code. I have an understanding of the problem. I am trying to solidify my approach which goes something like this:

  1. Add a map of Kind to Lister in ListerRegistry called GenericListerMap (<- the name can change)
  2. Every time we encounter a pod owner reference that doesn't reference in-built types like Deployment. StatefulSet etc., (for which we already have Listers), we add a new lister entry to GenericListerMap. I want to leverage the dynamiclister library here

As of now, I am trying to figure out a way to do 2 in a nice way. If you have any feedback on the approach, I would love to know more (I am not sure if my approach is the best way to solve the problem).

As for the ticket, if this ticket needs to be worked on urgently (or you are stuck because of it), feel free to pick it up. I might continue looking at it in the background to see if my solution matches yours in the end (not necessarily for raising a PR but more for understanding).

P.S.: I don't think we will need listers at all. I might have misunderstood the problem. We just need to treat anything with owner reference as replicated.

@x13n
Copy link
Member Author

x13n commented Jan 13, 2023

Using a lister is an extra layer of safety - to ensure not just that the pod is owned, but also that the owner didn't disappear in the meantime.

@vadasambar
Copy link
Member

Using a lister is an extra layer of safety - to ensure not just that the pod is owned, but also that the owner didn't disappear in the meantime.

That makes sense. Thank you for the feedback!

P.S.: I don't think we will need listers at all. I might have misunderstood the problem. We just need to treat anything with owner reference as replicated.

This doesn't sound true anymore. Having a generic lister does have value. It makes sense to use the listers and query for owners like our existing code does but for a wider range of owner types (not limited to the built-in types like Jobs, Statefulsets, Deployments etc.,)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
5 participants