-
Notifications
You must be signed in to change notification settings - Fork 111
CONSOLE-3322: Update cluster-authentication-operator not to go degraded without console #583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| isConsoleCapabilityEnabled := false | ||
| for _, capability := range clusterVersionConfig.Status.Capabilities.EnabledCapabilities { | ||
| if capability == configv1.ClusterVersionCapabilityConsole { | ||
| isConsoleCapabilityEnabled = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could break here, but 🤷, iterating through the remainder of the list isn't expensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a break in the next line now, so this thread can be marked resolved.
|
|
||
| consoleConfig, err := listers.ConsoleLister.Get("cluster") | ||
| if err != nil { | ||
| if errors.IsNotFound(err) && isConsoleCapabilityEnabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want !isConsoleCapabilityEnabled instead of isConsoleCapabilityEnabled here, for "we can't find the cluster Console, but that's ok, because this is a console-less cluster".
I'm a bit unsure about errors.IsNotFound(err). Does that match the errors we get when requesting a resource whose backing CRD is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a fair point @wking ... Not entirely sure what should be the other then the IsNotFound error
| consoleConfig, err := listers.ConsoleLister.Get("cluster") | ||
| if err != nil { | ||
| if errors.IsNotFound(err) && isConsoleCapabilityEnabled { | ||
| return nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth return existingConfig, nil here? I'm not sure how much the caller assumes about existingConfig completion, but the actually-failing returns in surrounding lines include existingConfig, and there may be useful stuff in there, even in the "cluster is no-console" case.
|
@stlaz can we move this forward? it's critical we resolve the degraded condition before 4.12 ships so that disabling the Console doesn't result in a degraded auth operator. |
| if tt.clusterVersion != nil { | ||
| if err := clusterVersionIndexer.Add(&configv1.ClusterVersion{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "cluster", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs to move to version too.
wking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jhadvig, wking The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities |
|
@bparees: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d29f6000-50bb-11ed-9541-8b5ab96ea429-0 |
|
That^ run didn't have Ben's new code from openshift/origin#27481: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-cluster-authentication-operator-583-ci-4.12-e2e-aws-sdn-no-capabilities/1583204398492815360/artifacts/release/artifacts/release-payload-latest/image-references | jq -r '.spec.tags[] | select(.name == "tests").annotations["io.openshift.build.commit.id"]'
9da7e86531a7ed93591692be6acafe9ae3e452c5
$ git --no-pager log --oneline --first-parent -2 origin/master
77afd1ee7d (origin/release-4.13, origin/release-4.12, origin/master, origin/HEAD) Merge pull request #27481 from bparees/nocaps
9da7e86531 Merge pull request #27482 from ardaguclu/fix-flaky-cli-testsTrying again: /payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities |
|
@wking: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c4785560-50fa-11ed-9924-3fbb0aff9287-0 |
|
I don't know what to make of this error, or why it would be tied to any particular capability being turned off: so: /payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities But that'll have to be our debugging starting point if this fails again. |
|
@bparees: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9b56df70-5187-11ed-8a47-1066141d15d7-0 |
|
Yup, stdout for failed test-cases in the new run has lots of |
| isConsoleCapabilityEnabled := false | ||
| for _, capability := range clusterVersionConfig.Status.Capabilities.EnabledCapabilities { | ||
| if capability == configv1.ClusterVersionCapabilityConsole { | ||
| isConsoleCapabilityEnabled = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the execution be stopped with a hardcoded value once the capability is not allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should 👍
| IngressLister: configInformer.Config().V1().Ingresses().Lister(), | ||
|
|
||
| APIServerLister_: configInformer.Config().V1().APIServers().Lister(), | ||
| ConsoleLister: configInformer.Config().V1().Consoles().Lister(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the console type does not exist, the lister presence will fill our logs with error messages in a loop. We can't have that.
| }{ | ||
| { | ||
| name: "NoConsoleConfig", | ||
| name: "NoConsoleConfigConsoleCapabilityEnabled", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when the capability is not enabled but someone creates the CRD and the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think the cluster would be in an inconsistent state since the the CRD would be there but quite a lot of other resrouces would be missing.
Also is that even a valid scenario? meaning that only a cluster admin can create CRDs, not mentioning that he would/should be the one that decides which set of capabilities should be enabled when provisioning the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also is that even a valid scenario?
Unless you can prevent it, yes.
Add a unit test, please.
cluster admin can create CRDs
Cluster admin and likely anyone installing an operator.
|
edit: I see from the above comments that we can already test this, awesome 👍 |
|
/lgtm cancel |
|
@stlaz here's the must-gather from the most recent run: (doesn't include any changes since friday, not sure if you think that is likely to affect the failure) |
|
Comments addressed. PTAL |
pkg/controllers/configobservation/configobservercontroller/observe_config_controller.go
Outdated
Show resolved
Hide resolved
| }{ | ||
| { | ||
| name: "NoConsoleConfig", | ||
| name: "NoConsoleConfigConsoleCapabilityEnabled", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also is that even a valid scenario?
Unless you can prevent it, yes.
Add a unit test, please.
cluster admin can create CRDs
Cluster admin and likely anyone installing an operator.
| if !isConsoleCapabilityEnabled { | ||
| return existingConfig, errs | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrap this whole piece with the condition block and remove the capability special case from the error handling
| return existingConfig, errs | ||
| } | ||
|
|
||
| consoleConfig, err := listers.ConsoleLister.Get("cluster") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nil exception panic here in case the capability is not enabled
…ed without console
|
/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities |
|
@jhadvig: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c7b40b60-550b-11ed-9b2e-b9f3e74a45b6-0 |
from latest run. |
|
/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities let's see what happens with the latest changes |
|
@stlaz: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/60cd7300-5538-11ed-8f53-acf499b096a8-0 |
|
/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities |
|
@jhadvig: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/57259260-563c-11ed-9cce-e3b4f93a9592-0 |
pkg/operator/starter.go
Outdated
| oauthInformers := oauthinformers.NewSharedInformerFactory(oauthClient, resync) | ||
|
|
||
| clusterVersionLister := operatorCtx.operatorConfigInformer.Config().V1().ClusterVersions().Lister() | ||
| clusterVersionConfig, err := clusterVersionLister.Get("version") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly recommend you add some debugging that prints out these two values (the clusterversion object and the err object), until you sort out what is going on (and then you can remove it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bet my money that the issue is in the informer... 💸
|
/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities |
|
@jhadvig: trigger 1 job(s) for the /payload-(job|aggregate) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e07823b0-5647-11ed-8d79-857c7ed15d09-0 |
|
@jhadvig: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Closing in favour of #587 |
Adding check if the Console capability is disabled in the ClusterVersion in case the console config is not present on the cluster. In this case cluster-authentication-operator should not get degraded.
/assign @stlaz