-
Notifications
You must be signed in to change notification settings - Fork 168
Bug 1752725: Log into kibana console get 504 Gateway Time-out The server didn't respond in time. when http_proxy enabled
#255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/retest |
|
/test e2e-operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should that be e.Meta ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried e.Meta, then it failed as follows...
clusterlogging_controller.go:61:73: e.Meta undefined (type event.UpdateEvent has no field or method Meta)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no e.Meta for UpdateEvent only MetaOld and MetaNew - see https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/event/event.go#L34
|
@nhosoi have you tested what the operator's processing looks like when you stack a proxy config change and a clusterlogging change? |
Thanks for your reviews, @ewolinetz. Well, what I could test was adding noProxy and/or httpProxy to the cluster proxy and check the fluentd env vars if they are applied. And removed them and check them again. httpsProxy and trustedCA are not tested. (not sure how to do so...) To be honest, with/without this PR, there's no difference in my test results... |
|
This patch looks really breaking the e2e tests... :( But it's not clear to me how adding watch causes these failures... |
|
@nhosoi a watch means that the operator looks for that object to change, and when it does it sends that request through the reconcile loop. So what may end up happening is we have multiple events that are being re-reconciled (so that we can periodically update our status). We may need to investigate a better way to check update our status periodically without holding onto events ad infinitum |
|
/test e2e-operator |
I think it explains what I'm observing and wondering... Without the newly added watches, the cluster proxy config was consumed in the fluentd pod. Does that mean this cluster proxy event is reconciled by some other request and my addition is redundant??? I added 2 watches, one for the proxy configmap and the other is for the proxy object itself. Let me disable one by one and figure out what is causing this error... |
|
/test e2e-operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so one thing to note with doing this, i believe when we get to Reconcile any proxy config changes will push that event - so we may fail to get the clusterlogging instance.
I'm not sure if we can do an EnqueueRequestForOwner in this case to bypass that.
But in the case where we do get a proxy event change, we don't want to requeue it at the end of a successful run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - so how do other operators deal with this? It seems that other operators that need to respect proxy settings would have to deal with this (unless once again logging is the pioneer that gets the arrows . . .)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm learning from cluster-network-operator. The operator has separated reconciler for each watched target(?). In this PR, I piggybacked the proxyconfig watches in the clusterlogging reconciler... Do you think we have to have a separate reconciler as cluster-network-operator does???
https://github.com/openshift/cluster-network-operator/blob/master/pkg/controller/add_networkconfig.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And by commenting out the cluster configmap watch, the e2e test passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster configmap watch
just to clarify and prevent confusion, this isn't a configmap -- its a non-namespaced object of type config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we have to have a separate reconciler as cluster-network-operator does?
I think this is the path we want to take as well, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'm trying it and my first cut causes a strange problem. :) If I update cluster proxy spec/status, it restarts all the pods including elasticsearch and kibana... Obviously, there are lots more to learn... But I'm glad I got something to pursue.
0da021f to
b5bbf8c
Compare
|
Hi @ewolinetz, I'm stuck... Could you please help me? Regarding your advice [0], I tried what I could think of (some are left in the patch commented out...) but it looks all of my attempts were invalid and updating cluster proxy affects all pods managed by cluster logging operator. Could you please give me some hints for "updating the proxy reconciler to just update the collector work"? [0]
What I'm observing is (the following is a snippet of debug prints [1] I put into the patch [2]) Reconcile for 'instance' is called about every 30 sec. and updating the status of each pod. Of course, it does not restart the pods. When I update the cluster proxy's spec (in my testing noProxy value), then all the pods are restarted. It looks to me it was derived from reconciling 'instance' since there's no k8shandler.Reconcile call in Reconcile in proxyconfig_controller.go... And one more thing I'm confused is without this PR/attempt - adding Watch for cluster proxy, changes made in cluster proxy status is applied to fluentd EnvVar. That is, it looks to me we don't need the Watch for cluster proxy and this PR is introducing something redundant. But I should be wrong... [1] [2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should leave this function definition the same and instead have a separate call for the proxy config reconciler to just adjust the collector
pkg/k8shandler/reconciler.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels very hacky
850178c to
b39f488
Compare
|
/test e2e-operator |
ea0c6e2 to
b1b9d28
Compare
|
/retest |
74b4eeb to
0dbeaa3
Compare
|
@bparees were your requested changes addressed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like this is trying to determine if the reconcile request was triggered by a a change to a configmap that logging cares about, but i'm not sure how this works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above code is "Outdated".
Now we are checking the request.Name is in ReconcileForGlobalProxyList, which is {"fluentd-trusted-ca-bundle", "kibana-trusted-ca-bundle"}
} else if utils.ContainsString(constants.ReconcileForGlobalProxyList, request.Name) {
https://github.com/openshift/cluster-logging-operator/pull/255/files#diff-993a3a5d79e9b1a8103e2bde12087a84R108
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this watching all configmaps in the entire cluster? does operatorSDK not give us a way to scope the watch to the logging namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume your question is about ConfigMap at the line 66 (not Proxy). Now ConfigMap is watched if it is in the openshift-logging namespace and the name is fluentd-trusted-ca-bundle or kibana-trusted-ca-bundle.
364df02 to
b411583
Compare
|
@ewolinetz, @bparees, thank you very much for your reviews. |
|
/test e2e-operator |
|
Please review. We need to merge this by tomorrow, or it won't happen for a couple of weeks. Let's try to get this merged ASAP. |
|
@richm @nhosoi @ewolinetz my concerns around the event handling and deployment rollout triggering have been addressed.. lgtm. |
|
after https://github.com/openshift/cluster-logging-operator/pull/255/files#r348753401 i'll put a flag on this... @nhosoi |
Thanks, @ewolinetz, @bparees!! |
- Adding proxyconfig controller to watch cluster proxy and trusted CA
bundle configmaps in the openshift-logging namespace. These configmap
name is KibanaTrustedCAName and FluentdTrustedCAName.
- Adding pkg/constants/constants.go to share the constant strings.
- Simplifying settng proxy environment variables to EnvVar.
- Adding trusted CA bundle configmap support.
The configmap is being watched in the proxyconfig controller.
Fluentd daemonset and kibana deployment hold the hash value of ca
certs in their annotation. The value is updated if the ca certs
in the configmap are updated, which triggers the fluentd and kibana
pods restart and update the mounted tls-ca-bundle.pem file.
It overrides /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem with
the certs auto-filled in configmap by "VolumeMount'ing".
utils.EnvVarEqual:
- In EnvVarSourceEqual, replacing reflect.DeepEqual with customized
EnvVarResourceFieldSelectorEqual since Divisor (type resource.Quantity)
is not correctly compared by DeepEqual.
Others:
hack/common - Keeping debug_print for future debugging.
This PR fixes the following 3 bugs.
Bug 1752725 - Log into kibana console get `504 Gateway Time-out The
server didn't respond in time. ` when http_proxy enabled
Bug 1766187 - Authentication "500 Internal Error"'
Bug 1768762 - Fluentd: "Could not communicate to Elasticsearch" when
http proxy enabled in the cluster.
Fix: Setting the elasticsearch FQDN to logStoreService
and elasticsearchName. The FQDN belongs to the global
proxy noProxy list. By doing so, it skips the global
proxy to communicate with the internal elasticsearch.
Bug 1774837 - Too many `warning: The environment variable HTTP_PROXY is
discouraged. Use http_proxy.` in fluentd pod logs after
enable forwarding logs to user-managed ES as insecure
Fix: In addition to HTTP_PROXY, HTTPS_PROXY and NO_PROXY,
setting http_proxy, https_proxy and no_proxy, as well.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ewolinetz, nhosoi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@nhosoi: All pull requests linked via external trackers have merged. Bugzilla bug 1752725 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Bug 1752725: Log into kibana console get `504 Gateway Time-out The server didn't respond in time. ` when http_proxy enabled
clusterlogging_controller.go - Adding watch for cluster proxy
Borrowed the code from cluster-network-operator/pkg/controller/proxyconfig