Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add webhook network policy troubleshooting information #2524

Merged
merged 4 commits into from
Feb 6, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/troubleshooting.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,9 @@ This can also be done for Kibana and APM Server.

On startup, the operator deploys an https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/[admission webhook] that points to the operator's service. If this is inaccessible, you may see errors in your Kubernetes API server logs indicating that it cannot reach the service. A common cause may be that the operator pods are failing to start for some reason, or that the control plane is isolated from the operator pod by some mechanism (for instance via network policies or running the control plane externally as in https://github.com/elastic/cloud-on-k8s/issues/896#issuecomment-507224945[issue #869] and https://github.com/elastic/cloud-on-k8s/issues/1369[issue #1369]).

You can also change the https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy[`failurePolicy`] of the webhook configuration to `Fail`, which will cause creations and updates to error out if there is an error contacting the webhook.
For troubleshooting, you can change the https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy[`failurePolicy`] of the webhook configuration to `Fail`, which will cause creations and updates to error out if there is an error contacting the webhook.

Refer to <<{p}-webhook-network-policies>> for more information about network policies that might be preventing communication between the Kubernetes API server and the webhook server.

=== Validation failures
If the validation webhook is preventing you from making changes due to the unknown fields validation like below, you can force the webhook to ignore it by removing the`kubectl.kubernetes.io/last-applied-configuration` annotation from your resource.
Expand Down
69 changes: 67 additions & 2 deletions docs/webhook.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,73 @@ EOF

NOTE: This example assumes that you have installed the operator in the `elastic-system` namespace.

[float]
[id="{p}-webhook-network-policies"]
=== Network Policies

Webhooks require network connectivity between the Kubernetes API server and the operator. If the creation of an Elasticsearch resource times out with an error message similar to the following, then the Kubernetes API server might be unable to connect to the webhook to validate the manifest.

....
Error from server (Timeout): error when creating "elasticsearch.yaml": Timeout: request did not complete within requested timeout 30s
....

If you encounter the above error, try re-running the command with a higher request timeout as follows:

[source,sh,subs="attributes"]
----
kubectl --request-timeout=1m apply -f elasticsearch.yaml
----

As the default link:https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#failure-policy[`failurePolicy`] of the webhook is `Ignore`, the above command should succeed after about 30 seconds. This is an indication that the API server cannot contact the webhook server and has foregone validation when creating the resource. One possible reason for this is that a link:https://kubernetes.io/docs/concepts/services-networking/network-policies/[network policy] might be blocking any incoming requests to the webhook server. Consult your system administrator to determine whether that is the case and create an appropriate policy to allow communication between the Kubernetes API server and the webhook server. For example, the following network policy simply opens up the webhook port to the world:


[source,yaml,subs="attributes"]
----
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-webhook-access-from-any
namespace: elastic-system
spec:
podSelector:
matchLabels:
control-plane: elastic-operator
ingress:
- from: []
ports:
- port: 9443
----

You may want to restrict webhook access to just the Kubernetes API server. Currently this requires knowing the IP address of the API server -- which can be obtained through the command:

[source,sh,subs="attributes"]
----
kubectl cluster-info | grep master
----

Assuming that the API server IP address is `10.1.0.1`, the following policy restricts webhook access to just the API server.

[source,yaml,subs="attributes"]
----
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-webhook-access-from-apiserver
namespace: elastic-system
spec:
podSelector:
matchLabels:
control-plane: elastic-operator
ingress:
- from:
- ipBlock:
cidr: 10.1.0.1/32
ports:
- port: 9443
----


[float]
=== Troubleshooting

Webhooks require network connectivity between the Kubernetes API server and the operator.
See <<{p}-webhook-troubleshooting,Webhook troubleshooting>> for more information about some known problems with some Kubernetes providers.
See the <<{p}-webhook-troubleshooting,Webhook troubleshooting>> section of the <<{p}-troubleshooting,Troubleshooting guide>>.