You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that if GKE has network policies enforced, but no network policy allowing the API server to communicate with the webhook, that admission of new elastic resources will time out. The goal of setting the failure policy to Ignore (see #1386) was to allow creation to still progress even if the webhook was unavailable, but it appears we may be hitting a combination of timeouts that complicates that. We should investigate and ensure that our webhook does indeed fail open in this environment.
It may also be worth updating the docs with an example network policy to allow it to function in GKE with network policies enforced as well.
Simply enabling network policy enforcement in GKE has no effect on the webhook because all pods are non-isolated by default. In order to interfere with the correct operation of the webhook, users have to make a conscious decision to block all traffic to/from the operator pods. I managed to do so by creating the following network policy:
With the above policy in place, attempting to create an Elasticsearch resource times out with the following error message:
Error from server (Timeout): error when creating "es.yaml": Timeout: request did not complete within requested timeout 30s
As noted in the linked upstream issue, this is simply the case of the client timing out before the server. By setting the webhook failure policy to Fail and increasing the client timeout (kubectl --request-timeout=1m apply -f es.yaml), the following error can be observed:
Error from server (InternalError): error when creating "es.yaml": Internal error occurred: failed calling webhook "elastic-es-validation-v1.k8s.elastic.co": Post https://elastic-webhook-server.elastic-system.svc:443/validate-elasticsearch-k8s-elastic-co-v1-elasticsearch?timeout=30s: context deadline exceeded
When the webhook failure policy is set to Ignore and kubectl is invoked with the increased client timeout value, the resource gets created after waiting for about 30 seconds (server-side request timeout to the webhook). This is the intended behaviour and it seems our implementation is working as expected.
From a report on the discuss forums:
https://discuss.elastic.co/t/error-from-server-notfound-services-quickstart-es-http-not-found/193658/15
It appears that if GKE has network policies enforced, but no network policy allowing the API server to communicate with the webhook, that admission of new elastic resources will time out. The goal of setting the failure policy to
Ignore
(see #1386) was to allow creation to still progress even if the webhook was unavailable, but it appears we may be hitting a combination of timeouts that complicates that. We should investigate and ensure that our webhook does indeed fail open in this environment.It may also be worth updating the docs with an example network policy to allow it to function in GKE with network policies enforced as well.
Similar k/k issue:
kubernetes/kubernetes#71508
The text was updated successfully, but these errors were encountered: