-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strimzi support for Horizontal Pod Autoscaling #3331
Comments
If you wanna autoscale the Kafka brokers - and I do not think it is necessarily a good idea - you would need to do it by changing the I added the scale subresource to some of our CRs to make it easier to autoscale it. But not to Kafka since we didn't really saw any use case there with @tombentley and @samuel-hawker who IIRC were involved int he discussion. But in theory we can reconsider it to make it easier if it helps. |
Yes, if I recall we decided that auto-scaling was difficult due to having multiple things to scale indenpendently (both Kafka and ZooKeeper), and how you would want to ratio that (if at all). As for use-cases, I suspect for a production cluster you wouldn't want to autoscale your brokers, more likely your topics, distributing them to more brokers etc. Like Jakub above said, I am also interested in your use case, could you provide some details why autoscaling would be preferred to already having a larger cluster and scaling by topic size increases, partition distribution? |
I do not think not scaling Zookeeper is an issue in most cases. But I think the rest is still an issue - although with cruise control support it is a bit easier, I would personally still not use it. But having the scale sub-resource would make it easier for people to play with it and maybe one day ... |
@scholzj @samuel-hawker Even though this was closed by the original poster, I came across this issue and wanted to provide my use case as this is exactly what I was trying to find out how to do with my Strimzi cluster. I work for a mobile advertising company and our traffic is very time-dependent; we can have 5x as much traffic at 10pm as we do at 10am, all of which gets recorded through Kafka. Generally, though, the peak traffic window lasts around 6-8 hours and we were looking for a way to automatically add and remove brokers based on traffic levels (or CPU usage) to save on costs. Currently we are looking at implementing this via traffic detection that's external to Strimzi but naturally a native implementation would be much preferable. Thanks for your time! |
Well, you need to consider the whole lifecycle ... adding or removing brokers is the easiest part of it:
I do not know how do you use Kafka. But do you really have enough time to do the rebalances while scaling down and up during your time window? The rebalance can take a long time depending on how much data you have in the cluster. It could be it works for you, but it does not sound like your window is exactly large to do this. If you think that you really have all what is needed and the only missing thing is the scale up/down, I think we can definitely look at adding it to Strimzi. |
Is your feature request related to a problem? Please describe.
I am in process of trying to integrate Strimzi Kafka with Horizontal Pod Autoscaling Feature Kubernetes offers. I faced some issues with this and was eventually successful. I saw that the Roadmap did not yet feature HPA support but I still went ahead and tried to implement it.
Describe the solution you'd like
A eventual solution would be to integrate HPA out of the box i.e have a HPA section in the Kafka kind yaml where its properties can be set.
Describe alternatives you've considered
I noticed that for Strimzi we need the resources section of both the containers populated i.e spec.kafka.resources as well as spec.kafka.tlssidecar.resources in the Kafka Kind resource. Once this was set, I then deployed the backing services i.e the Metric Server and the HPA resource itself.
A major pain point is that when the HPA backs the Statefulset for example, and the threshold is breached, the HPA operator starts up a new pod from the reference stateful set pod. But this causes an issue with tls sidecar of the new pod not getting certificates loaded into it. For example :
116/60 would mean a new POD will be spun up, which is kafka-cluster-kafka-3
kafka-cluster-kafka-3 0/2 CrashLoopBackOff 1 28s
Logs ->
I assume this file is created by the Strimzi - Kafka - Operator which still has the same previous value for replica count i.e
Notice how kafka-cluster is still 3. It needs to change as well.
For now I mitigated this by writing my own Operator which updates the Kafka Kind CRD with the new replica and then the operator takes care of the rest and everything is great. For now the HPA cannot support custom CRD's . Unless I start contributing to the HPA project, I do not think we might see any progress. So the only customization that can be done is for Strimzi to work by updating operator everytime a HPA change is seen with the new replicas.
Thanks
The text was updated successfully, but these errors were encountered: