Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An option for disabling Pod disruption budget #6996

Closed
m-yosefpor opened this issue Jun 28, 2022 · 14 comments
Closed

An option for disabling Pod disruption budget #6996

m-yosefpor opened this issue Jun 28, 2022 · 14 comments

Comments

@m-yosefpor
Copy link

Is your feature request related to a problem? Please describe.

Many infrustructures deny PDB creations for users, so operator will face issues. There should be an option to disable podDisruptionBudget creation in strimzi, to opt-out for creating PDB in such cases.

Describe the solution you'd like

There should be an option in CRD (e.g.e here: https://github.com/strimzi/strimzi-kafka-operator/blob/main/install/cluster-operator/040-Crd-kafka.yaml#L1450) to disable podDisruptionBudget creation (podDisruptionBudget.disabled: true/false) in strimzi, to opt-out for creating PDB in such cases. The default value should be false to conserve current behavior by default.

Describe alternatives you've considered

It can be disabled globally as a flag --disable-pdb-creation in strimzi (not a good solution).

Additional context

@scholzj
Copy link
Member

scholzj commented Jun 28, 2022

Why do they deny it? PDB is quite essential for running an application like Kafka. I don't think you can have any availability guarantees without it, might impact reliability as well.

In any case, if you would want to disable it, a global flag similar to the one for NEtwork Policies is probably better idea than something in the custom resources as you would need to add it to every single custom resource.

@scholzj
Copy link
Member

scholzj commented Aug 18, 2022

Triaged on 18.8.2022: Having an environment variable to disable the PDBs globally would make sense (I don't think the flag in CRD makes much sense here). But in anycase, this should have a proposal similar to the Network policies one.

@m-yosefpor
Copy link
Author

Triaged on 18.8.2022: Having an environment variable to disable the PDBs globally would make sense (I don't think the flag in CRD makes much sense here). But in anycase, this should have a proposal similar to the Network policies one.

@scholzj proposal added

@scholzj
Copy link
Member

scholzj commented Mar 23, 2024

@m-yosefpor Do you plan to implement the proposal you wrote? Or should we try to get someone else to implement it?

@m-yosefpor
Copy link
Author

@scholzj Yes I will submit a PR for the implementation.

@Alansyf
Copy link

Alansyf commented May 28, 2024

Hi,
I am having six pods, 3 are brokers and the other 3 are controller.

NAME                      DESIRED REPLICAS   ROLES
oneapi-kafka-broker       3                  ["broker"]
oneapi-kafka-controller   3                  ["controller"]

Operator create min available as five for me:

NAME                         MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
oneapi-kafka-cluster-kafka   5               N/A               1                     153m

This vale is too much for me, when we upgrading our kubernates nodes, this PDB blocked our nodes to restart.
Is this global env has be implemented?

@Alansyf
Copy link

Alansyf commented May 29, 2024

nvm, i found below piece of code, setup max can help.

return new PodDisruptionBudgetBuilder()
               .withNewMetadata()
                   .withName(name)
                   .withLabels(labels.withAdditionalLabels(TemplateUtils.labels(template)).toMap())
                   .withNamespace(namespace)
                   .withAnnotations(TemplateUtils.annotations(template))
                   .withOwnerReferences(ownerReference)
               .endMetadata()
               .withNewSpec()
                   .withMinAvailable(new IntOrString(Math.max(0, replicas - (template != null ? template.getMaxUnavailable() : DEFAULT_MAX_UNAVAILABLE))))
                   .withSelector(new LabelSelectorBuilder().withMatchLabels(labels.strimziSelectorLabels().toMap()).build())
               .endSpec()
               .build();

@scholzj
Copy link
Member

scholzj commented May 29, 2024

Keep in mind that increasing the number of nodes that can be down at the same time can cause your Kafka cluster to be unavailable.

@weisdd
Copy link

weisdd commented Jun 7, 2024

Disabling PDB would be helpful for local development environments, where we care more about staying within resource constrains (e.g. 16 Gb for everything from OS till dev environment and test workloads) than about availability.

@stklcode
Copy link

stklcode commented Aug 5, 2024

We recently had a similar scenario as given above:

Old cluster: 3x ZooKeeper + 3x Kafka (on a 3-node K8s cluster with pod affinity)
=> PDB with minAvailable: 2 for both resources
New cluster: KRaft with 3x Controller + 3x Broker
=> PDB with minAvailable: 5

Issues after migration:

  • default minAvailable: 5 is too restrictive, so we cannot drain one node for maintenance (while preserving the affinity)
  • simply setting maxUnavailable: 2 resolves that issue, but in this case 2 nodes of the same pool (e.g. 2 brokers) may go down which is not the desired result.

Our workaround:
Introduce custom PDB resources per NodePool with minAvailable: 2 with rules like

matchLabels:
  strimzi.io/cluster: <cluster-name>
  strimzi.io/kind: Kafka
  strimzi.io/pool-name: <nodepool-name>

So the default PDB becomes obsolete. Because it it's generation cannot be disabled (maxUnavailable: 99999 or similar pratically "disables" it though) our second workaround part is setting spec.kafka.podDisruptionBudget.maxUnavailable: 2 (1 per node pool) to make the default budget fit the bill with final result

NAME                 MIN AVAILABLE 
my-kafka-broker      2
my-kafka-controller  2
my-kafka             4              # this is the default generated

Internally this is actually calculated in some Helm templates, so one can simply resize the pools with convenience for the budgets. Not very elegant, but it seems to do the job for some test clusters.

Default PDB generation per pool might be a handy feature for such cases. Disabling it completely ... I agree, that's probably not the best idea in most cases.


My preferred solution would be fine granular control on Kafka and KafkaNodePool level, maybe with defaults like this:

---
kind: Kafka
spec:
  kafka:
    template:
      podDisruptionBudget:
        create: true
        maxUnavailable: 1
---
kind: KafkaNodePool
spec:
  podDisruptionBudget:
    create: false
    maxUnavailable: 1

@scholzj
Copy link
Member

scholzj commented Sep 16, 2024

@m-yosefpor Do you still plan to implement the proposal? Or should we try to get someone else to implement it?

@m-yosefpor
Copy link
Author

Do you still plan to implement the proposal? Or should we try to get someone else to implement it?

Due to my current workload, I don’t have the bandwidth to implement the proposal right now. It might be better to look for someone else who can take it on. Thanks.

@scholzj
Copy link
Member

scholzj commented Sep 18, 2024

@m-yosefpor No worries, I will try to find someone else to look into it.


The proposal is approved, this just needs the implementation: https://github.com/strimzi/proposals/blob/main/063-pdb-generation-environment-variable.md

The implementation should cover:

  • Adding the new environment variable in ClusterOperatorConfig
  • Adding the new environment variable to the documentation
  • Updating the Assembly Operators for Zoo, Kafka, Connect, MM1, MM2 and Bridge to skip the PDB management when configured accordingly
  • Unit tests

@katheris katheris self-assigned this Sep 18, 2024
katheris added a commit to katheris/strimzi-kafka-operator that referenced this issue Sep 19, 2024
katheris added a commit to katheris/strimzi-kafka-operator that referenced this issue Sep 23, 2024
katheris added a commit to katheris/strimzi-kafka-operator that referenced this issue Sep 25, 2024
@scholzj
Copy link
Member

scholzj commented Sep 25, 2024

Done in #10614

@scholzj scholzj closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants