Bug 1805168: inject global proxy envs,ca into OCM daemonset #145

gabemontero · 2020-03-11T15:33:10Z

per recent discussion between @dmage, @adambkaplan, and myself
the image signature controller in OCM is reaching out to the registries like registry.redhat.io and not using the global proxy if set

see https://github.com/openshift/cluster-version-operator/blob/611490ff448962c70ab29a90764a6c30d4ee87dc/lib/resourcebuilder/apps.go#L55

the bug originator manually set the proxy ENVs on the OCM and saw things works

@bparees @mfojtik @soltysh FYI

openshift-ci-robot · 2020-03-11T15:33:16Z

@gabemontero: This pull request references Bugzilla bug 1805168, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.5.0) matches configured target release for branch (4.5.0)
bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Details

In response to this:

Bug 1805168: add inject-proxy env annotation to daemonset so CVO injects proxy env…

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

gabemontero · 2020-03-12T13:51:40Z

existing e2e's are green at least

/assign @bparees
/assign @adambkaplan

bparees · 2020-03-12T14:07:07Z

I don't think this is the right solution.

the CVO injection is only for operators. As in "I need the CVO to inject these env vars into my operators' deployment because the CVO owns my deployment and only it can touch it".

You(the OCM operator) own the OCM's deployment, you should be setting the proxy envs on the OCM's deployment and reconciling them as they change (i.e. OCM operator needs to watch the proxy config and propagate that config to the OCM deployment)

bparees · 2020-03-12T14:07:59Z

/hold

gabemontero · 2020-03-12T14:14:22Z

I don't think this is the right solution.

the CVO injection is only for operators. As in "I need the CVO to inject these env vars into my operators' deployment because the CVO owns my deployment and only it can touch it".

You(the OCM operator) own the OCM's deployment, you should be setting the proxy envs on the OCM's deployment and reconciling them as they change (i.e. OCM operator needs to watch the proxy config and propagate that config to the OCM deployment)

OK a bigger change but certainly not onerours. @adambkaplan 's suggestion of the annotation was trivial enough that at least proposing it to a bigger audience for evaluation seemed fine.

There is also the possibility of the OCM, which is already watching proxy config, setting the envs on itself. While less expensive, that probably subverts our "what an operator does and what an operand does" conventions. But certainly voice any opinions there @bparees if you like.

bparees · 2020-03-12T14:20:18Z

There is also the possibility of the OCM, which is already watching proxy config, setting the envs on itself. While less expensive, that probably subverts our "what an operator does and what an operand does" conventions. But certainly voice any opinions there @bparees if you like.

the resource needs to be owned/written to by a single writer. for the OCM daemonset resource, that's the OCM operator. Otherwise you risk two things fighting w/ each other about what the correct state of the resource should be.

gabemontero · 2020-03-12T14:32:33Z

There is also the possibility of the OCM, which is already watching proxy config, setting the envs on itself. While less expensive, that probably subverts our "what an operator does and what an operand does" conventions. But certainly voice any opinions there @bparees if you like.

the resource needs to be owned/written to by a single writer. for the OCM daemonset resource, that's the OCM operator. Otherwise you risk two things fighting w/ each other about what the correct state of the resource should be.

Well I was actually thinking it would be golang os.Setenv() call in the OCM process actually nad would not be manipulating the DaemonSet at all. It would be dynamically updating the env's the DaemonSet added to the underlying Pod.

But again, probably too off the beaten path ... I just mentioned it cause it popped into my head

I'll start down the path in this repo

bparees · 2020-03-12T14:36:45Z

Well I was actually thinking it would be golang os.Setenv() call in the OCM process actually nad would not be manipulating the DaemonSet at all. It would be dynamically updating the env's the DaemonSet added to the underlying Pod.

the drawbacks w/ that approach are:

lack of debuggability (you can't just look at the resource to understand what configuration it is using)
I believe the transports+static libs get setup early and read this information just once, so changing the env var will not, in many cases, affect the configuration they are using, so you'd need to restart the process to pick up the change anyway, which means the OCM would have to nuke itself, and then ensure(on restart) the envs got set before any lib+transport initialization

(2) is why we had to make the CVO able to inject proxy config on operator resources, because operators could not solve the problem themselves by dynamically setting their own env vars. You're basically in the same boat, one level deeper.

bparees · 2020-03-12T14:38:40Z

Note: i think i may have seen another devex PR recently that was adding the CVO proxy injection annotation. I didn't look at it closely... but if it was doing the same thing as this one (adding the injection annotation to an operand resource, not an operator resource) it should be reverted and redone per the approach i've outlined in the discussion here.

gabemontero · 2020-03-12T15:04:13Z

Note: i think i may have seen another devex PR recently that was adding the CVO proxy injection annotation. I didn't look at it closely... but if it was doing the same thing as this one (adding the injection annotation to an operand resource, not an operator resource) it should be reverted and redone per the approach i've outlined in the discussion here.

since @adambkaplan suggested the annotation to me, maybe he know which PR you are referring to @bparees

otherwise, I'll refactor this PR to have OCM-O watch global proxy and update the OCM daemonset's env as you previously mentioned in the classic operator/operand mgmt pattern

bparees · 2020-03-12T15:06:25Z

btw if the operand has to consume the proxy it also needs to consume the proxy CAs, so there's more that needs to be done here.

the operand has to mount a configmap that's annotated for the proxy CA injection, and the operator has to watch that configmap and redeploy the operand if the configmap is updated w/ new CA content.

gabemontero · 2020-03-12T15:14:51Z

btw if the operand has to consume the proxy it also needs to consume the proxy CAs, so there's more that needs to be done here.

the operand has to mount a configmap that's annotated for the proxy CA injection, and the operator has to watch that configmap and redeploy the operand if the configmap is updated w/ new CA content.

yep true as well ... so yeah the OCM-O is already creating a proxy injected CM for the CAs and is watching it as part of the build support ... the daemonset update would have to include mounting that CA in addition to the envs

And I suppose the OCM would have to be updated to copy the cert from the k8s mount point to the well known location under /etc just like the openshift/builder image does today

bparees · 2020-03-12T15:19:04Z

And I suppose the OCM would have to be updated to copy the cert from the k8s mount point to the well known location under /etc just like the openshift/builder image does today

yes, or mount the CM there directly. (The CM should contain all the CAs you need, including system trusts, so it should be ok to just mount over top of the image's own CAs)

adambkaplan · 2020-03-12T15:30:54Z

yes, or mount the CM there directly. (The CM should contain all the CAs you need, including system trusts, so it should be ok to just mount over top of the image's own CAs)

I concur that mounting directly is the way to go for OCM. Builds are kind of a special case because we have additionalTrustedCAs that are used only for pulling/pushing images. Same deal with the image registry.

soltysh · 2020-03-16T16:24:29Z

I think you need something similar to openshift/cluster-kube-controller-manager-operator#285 and openshift/cluster-kube-controller-manager-operator#325 to be able to consume and use the proxy as far as I can tell.

gabemontero · 2020-03-16T17:48:46Z

I think you need something similar to openshift/cluster-kube-controller-manager-operator#285 and openshift/cluster-kube-controller-manager-operator#325 to be able to consume and use the proxy as far as I can tell.

thanks for the pointers @soltysh

I've got some changes on my laptop that share some commonality with those PRs

I'm wrapping up some unit tests and should have a PR up before my EOB close today.

openshift-ci-robot · 2020-03-16T19:36:19Z

@gabemontero: This pull request references Bugzilla bug 1805168, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.5.0) matches configured target release for branch (4.5.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Details

In response to this:

Bug 1805168: inject global proxy envs,ca into OCM daemonset

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

gabemontero · 2020-03-16T19:38:02Z

/hold cancel

PR switched over to a proxy watch, config observation, and injection of envs/ca into OCM daemonset via operator

ptal

soltysh

This lgtm

adambkaplan · 2020-03-24T17:54:47Z

@gabemontero looks good to me, squash commits and we can make this official.

gabemontero · 2020-03-24T21:08:47Z

10-4 @adambkaplan @soltysh thanks ..

gabemontero · 2020-03-24T21:10:21Z

... and commits squashed/pushed @adambkaplan @soltysh

and thanks @bparees for all the help

gabemontero · 2020-03-24T23:06:55Z

error: could not run steps: step e2e-aws failed: failed to acquire lease: status 503 Service Unavailable, status code 503

error: could not run steps: step e2e-aws-operator failed: failed to acquire lease: status 503 Service Unavailable, status code 503

error: could not run steps: step e2e-aws-upgrade failed: failed to acquire lease: status 503 Service Unavailable, status code 503

/retest

gabemontero · 2020-03-25T17:31:43Z

green tests @adambkaplan @soltysh @bparees

can somebody lgtm ?

sttts · 2020-03-26T10:15:08Z

pkg/operator/configobservation/configobservercontroller/observe_config_controller.go

 			configobservation.Listers{
 				ImageConfigLister: configInformers.Config().V1().Images().Lister(),
 				BuildConfigLister: configInformers.Config().V1().Builds().Lister(),
+				ProxyLister:       configInformers.Config().V1().Proxies().Lister(),


shouldn't we install event handlers to notice changes and update?

you are absolutely right @sttts

though this particular piece is a leftover from the confusion over when to use observe_config_controller.go vs. operator.go and its calls to the sync_*.go file

so I think we need
a) cleanup and full revert of the changes here in observe_config_controller.go
b) and install the event handlers as you say around here:

cluster-openshift-controller-manager-operator/pkg/operator/operator.go

Lines 52 to 72 in 9c985c9

proxyLister proxyvclient1.ProxyLister,

kubeInformersForOpenshiftControllerManager informers.SharedInformerFactory,

operatorConfigClient operatorclientv1.OperatorV1Interface,

kubeClient kubernetes.Interface,

recorder events.Recorder,

) *OpenShiftControllerManagerOperator {

c := &OpenShiftControllerManagerOperator{

targetImagePullSpec: targetImagePullSpec,

operatorConfigClient: operatorConfigClient,

proxyLister: proxyLister,

kubeClient: kubeClient,

queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "KubeApiserverOperator"),

rateLimiter: flowcontrol.NewTokenBucketRateLimiter(0.05 /*3 per minute*/, 4),

recorder: recorder,

}

operatorConfigInformer.Informer().AddEventHandler(c.eventHandler())

kubeInformersForOpenshiftControllerManager.Core().V1().ConfigMaps().Informer().AddEventHandler(c.eventHandler())

kubeInformersForOpenshiftControllerManager.Core().V1().ServiceAccounts().Informer().AddEventHandler(c.eventHandler())

kubeInformersForOpenshiftControllerManager.Core().V1().Services().Informer().AddEventHandler(c.eventHandler())

kubeInformersForOpenshiftControllerManager.Apps().V1().Deployments().Informer().AddEventHandler(c.eventHandler())

thanks ... getting started

sttts · 2020-03-26T10:20:01Z

pkg/operator/sync_openshiftcontrollermanager_v311_00.go

+			return required, false, err
+		}
+	}
+	if err == nil {


sttts · 2020-03-26T10:27:09Z

pkg/operator/sync_openshiftcontrollermanager_v311_00.go

+			}
+			if httpsProxySet && !hasHTTPSProxy {
+				newEnvs = append(newEnvs, corev1.EnvVar{Name: "HTTPS_PROXY", Value: proxyCfg.Status.HTTPSProxy})
+			}


How about this?

newEnvs := []corev1.EnvVar{} for _, env := range c.Env { name := strings.TrimSpace(env.Name) switch name { case "HTTPS_PROXY": if len(proxyCfg.Status.HTTPSProxy) == 0 { continue } env.Value = proxyCfg.Status.HTTPSProxy case ...: case ....: } newEnvs = append(newEnvs, env) } forceRollout = forceRollout || !reflect.DeepEqual(newEnvs, c.Env)

I had to tweak it slightly to deal with an empty c.Env but I agree your form is better ... will be pushing new commit soon

gabemontero · 2020-03-26T15:22:16Z

updates pushed @stts thanks again

gabemontero · 2020-03-26T17:31:27Z

broad failures across all suites waiting on expected SAs in test namespaces... as an example:

Mar 26 16:37:05.309: INFO: Creating project "e2e-test-s2i-build-quota-btfrh"
Mar 26 16:37:05.725: INFO: Waiting on permissions in project "e2e-test-s2i-build-quota-btfrh" ...
Mar 26 16:37:05.752: INFO: Waiting for ServiceAccount "default" to be provisioned...
Mar 26 16:39:03.083: INFO: Waiting for ServiceAccount "deployer" to be provisioned...
[AfterEach] [sig-builds][Feature:Builds] s2i build with a quota

The CVO has a nasty looking extra condition on it:

                   {
                        "lastTransitionTime": "2020-03-26T15:39:58Z",
                        "message": "Unable to retrieve available updates: currently installed version 0.0.1-2020-03-26-153053 not found in the \"stable-4.5\" channel",
                        "reason": "VersionNotFound",
                        "status": "False",
                        "type": "RetrievedUpdates"
                    }

punting

/test e2e-aws

gabemontero · 2020-03-26T17:40:35Z

I see a lot of throttling notifications trying to access the api server in the OCM-O pod logs around the times of the e2e-aws-operator failures ... my assumption is that messed with those e2e's but am getting up to speed on those tests, looking around the artifacts some more before retesting that one.

gabemontero · 2020-03-26T18:02:09Z

I see a lot of throttling notifications trying to access the api server in the OCM-O pod logs around the times of the e2e-aws-operator failures ... my assumption is that messed with those e2e's but am getting up to speed on those tests, looking around the artifacts some more before retesting that one.

Yeah even the basic, very first test that just reads the OCM clusteroperator suffered a TO, but if you look at the artifacts, that object showed everything was well, and was last updated before the test started.

/test e2e-aws-operator

gabemontero · 2020-03-28T14:21:37Z

/retest

gabemontero · 2020-03-29T22:24:40Z

/retest

gabemontero · 2020-03-30T11:59:38Z

/test e2e-aws

gabemontero · 2020-03-30T19:16:57Z

pushed fix for e2e-aws-operator that should better account for a false positive we were getting with the reflect.DeepEquals change from the last iteration

gabemontero · 2020-03-30T20:59:04Z

CI hiccup across most of the board

/retest

gabemontero · 2020-03-31T14:36:32Z

green tests

bump - any of the previous reviewers ready to LGTM ?

adambkaplan

/lgtm

openshift-ci-robot · 2020-03-31T17:31:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan, gabemontero, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [adambkaplan,soltysh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2020-03-31T17:34:31Z

@gabemontero: All pull requests linked via external trackers have merged: openshift/cluster-openshift-controller-manager-operator#145. Bugzilla bug 1805168 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1805168: inject global proxy envs,ca into OCM daemonset

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 11, 2020

openshift-ci-robot requested review from adambkaplan and soltysh March 11, 2020 15:33

gabemontero force-pushed the ocm-proxy-envs branch from d8d367e to 518421c Compare March 11, 2020 15:42

openshift-ci-robot assigned adambkaplan and bparees Mar 12, 2020

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 12, 2020

gabemontero force-pushed the ocm-proxy-envs branch from 518421c to 0319d9a Compare March 16, 2020 19:35

gabemontero changed the title ~~Bug 1805168: add inject-proxy env annotation to daemonset so CVO injects proxy env…~~ Bug 1805168: inject global proxy envs,ca into OCM daemonset Mar 16, 2020

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 16, 2020

gabemontero force-pushed the ocm-proxy-envs branch from 0319d9a to 4bce0da Compare March 16, 2020 20:12

soltysh approved these changes Mar 24, 2020

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 24, 2020

gabemontero force-pushed the ocm-proxy-envs branch from 7d2648d to 9c985c9 Compare March 24, 2020 21:09

sttts reviewed Mar 26, 2020

View reviewed changes

pkg/operator/sync_openshiftcontrollermanager_v311_00.go Outdated

return required, false, err

}

}

if err == nil {

Copy link

Contributor

sttts Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else {

sttts reviewed Mar 26, 2020

View reviewed changes

gabemontero force-pushed the ocm-proxy-envs branch from 9c985c9 to f0cdb02 Compare March 26, 2020 15:21

inject global proxy envs,ca into OCM daemonset

84afeea

gabemontero force-pushed the ocm-proxy-envs branch from f0cdb02 to 84afeea Compare March 30, 2020 19:15

adambkaplan approved these changes Mar 31, 2020

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 31, 2020

openshift-merge-robot merged commit 5f3cb00 into openshift:master Mar 31, 2020

gabemontero deleted the ocm-proxy-envs branch March 31, 2020 18:15

	proxyLister proxyvclient1.ProxyLister,
	kubeInformersForOpenshiftControllerManager informers.SharedInformerFactory,
	operatorConfigClient operatorclientv1.OperatorV1Interface,
	kubeClient kubernetes.Interface,
	recorder events.Recorder,
	) *OpenShiftControllerManagerOperator {
	c := &OpenShiftControllerManagerOperator{
	targetImagePullSpec: targetImagePullSpec,
	operatorConfigClient: operatorConfigClient,
	proxyLister: proxyLister,
	kubeClient: kubeClient,
	queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "KubeApiserverOperator"),
	rateLimiter: flowcontrol.NewTokenBucketRateLimiter(0.05 /3 per minute/, 4),
	recorder: recorder,
	}

	operatorConfigInformer.Informer().AddEventHandler(c.eventHandler())
	kubeInformersForOpenshiftControllerManager.Core().V1().ConfigMaps().Informer().AddEventHandler(c.eventHandler())
	kubeInformersForOpenshiftControllerManager.Core().V1().ServiceAccounts().Informer().AddEventHandler(c.eventHandler())
	kubeInformersForOpenshiftControllerManager.Core().V1().Services().Informer().AddEventHandler(c.eventHandler())
	kubeInformersForOpenshiftControllerManager.Apps().V1().Deployments().Informer().AddEventHandler(c.eventHandler())

Bug 1805168: inject global proxy envs,ca into OCM daemonset #145

Bug 1805168: inject global proxy envs,ca into OCM daemonset #145

Uh oh!

Conversation

gabemontero commented Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Mar 11, 2020

Uh oh!

gabemontero commented Mar 12, 2020

Uh oh!

bparees commented Mar 12, 2020

Uh oh!

bparees commented Mar 12, 2020

Uh oh!

gabemontero commented Mar 12, 2020

Uh oh!

bparees commented Mar 12, 2020

Uh oh!

gabemontero commented Mar 12, 2020

Uh oh!

bparees commented Mar 12, 2020

Uh oh!

bparees commented Mar 12, 2020

Uh oh!

gabemontero commented Mar 12, 2020

Uh oh!

bparees commented Mar 12, 2020

Uh oh!

gabemontero commented Mar 12, 2020

Uh oh!

bparees commented Mar 12, 2020

Uh oh!

adambkaplan commented Mar 12, 2020

Uh oh!

soltysh commented Mar 16, 2020

Uh oh!

gabemontero commented Mar 16, 2020

Uh oh!

openshift-ci-robot commented Mar 16, 2020

Uh oh!

gabemontero commented Mar 16, 2020

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

adambkaplan commented Mar 24, 2020

Uh oh!

gabemontero commented Mar 24, 2020

Uh oh!

gabemontero commented Mar 24, 2020

Uh oh!

gabemontero commented Mar 24, 2020

Uh oh!

gabemontero commented Mar 25, 2020

Uh oh!

sttts Mar 26, 2020

Choose a reason for hiding this comment

Uh oh!

gabemontero Mar 26, 2020

Choose a reason for hiding this comment

Uh oh!

sttts Mar 26, 2020

Choose a reason for hiding this comment

Uh oh!

sttts Mar 26, 2020

Choose a reason for hiding this comment

Uh oh!

gabemontero Mar 26, 2020

Choose a reason for hiding this comment

Uh oh!

gabemontero commented Mar 26, 2020

Uh oh!

gabemontero commented Mar 26, 2020

Uh oh!

gabemontero commented Mar 26, 2020

Uh oh!

gabemontero commented Mar 26, 2020

Uh oh!

gabemontero commented Mar 28, 2020

gabemontero commented Mar 11, 2020 •

edited

Loading