-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Label to Disable Katib Webhooks #2069
Comments
@andreyvelich Does that mean if experiments with that label deployed, all webhooks (validator, defaultor, mutator) are ignored? |
Effectively, this will create problems with Katib. I would not suggest this unless a non Katib user complains about the performance? |
@tenzen-y Yes, we can give users an option to disable all 3 webhooks by setting label in Katib Experiment (disable validator and defaulter) and in user's pods (mutator).
Let's hold this issue for some time and ask users to provide feedback. |
We can close this issue this @tenzen-y integrated cert-generator to Katib Controller startup 🎉 |
/reopen I found this issue still remains, and the replicaset-controller occasionally faces this issue: Warning FailedCreate 39s (x15 over 2m2s) replicaset-controller Error creating: Internal error occurred: failed calling webhook "mutator.pod.katib.kubeflow.org": failed to call webhook: Post "[https://katib-controller.kubeflow.svc:443/mutate-pod?timeout=10s](https://katib-controller.kubeflow.svc/mutate-pod?timeout=10s)": dial tcp 10.100.157.27:443: connect: connection refused The above error will happen if the katib starts up in the following steps:
I think we can select one of the following option:
@andreyvelich @johnugeorge WDYT? |
@tenzen-y: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I think
|
@tenzen-y On which Katib Component did you see this Warning ? |
@andreyvelich I see this issue on all the katib components such as controller, db-manager...
I thought so too. However, the webhook is occasionally registered to the kube-apiserver before certs are ready. It's distributed system :( |
@tenzen-y Did you see the errors even without Leader Election mode enabled ? |
@andreyvelich Yes, I see it on the installation without leader election mode. You can check the error in the following: With |
In this PR: #2018 (comment), I proposed to introduce label for disabling Katib Webhooks (validator, defaulter, mutator). For example:
katib.kubeflow.org/webhooks: disabled
.Let's discuss if that would be useful for the users with large-scale environment.
Currently, if user's namespace has
katib.kubeflow.org/metrics-collector-injection: enabled
label, Katib Mutation Webhook runs for every Pod in that namespace. That might increase latency in the Kubernetes API server. Some users might want to use Katib Experiments and run other pods in their namespaces without Webhook execution.What do you think @gaocegege @johnugeorge @tenzen-y @anencore94 @terrytangyuan ?
/kind discussion
Love this feature? Give it a 👍 We prioritize the features with the most 👍
The text was updated successfully, but these errors were encountered: