Skip to content

Conversation

@jhadvig
Copy link
Member

@jhadvig jhadvig commented Sep 27, 2022

Adding check if the Console capability is disabled in the ClusterVersion in case the console config is not present on the cluster. In this case cluster-authentication-operator should not get degraded.

/assign @stlaz

isConsoleCapabilityEnabled := false
for _, capability := range clusterVersionConfig.Status.Capabilities.EnabledCapabilities {
if capability == configv1.ClusterVersionCapabilityConsole {
isConsoleCapabilityEnabled = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could break here, but 🤷, iterating through the remainder of the list isn't expensive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a break in the next line now, so this thread can be marked resolved.


consoleConfig, err := listers.ConsoleLister.Get("cluster")
if err != nil {
if errors.IsNotFound(err) && isConsoleCapabilityEnabled {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want !isConsoleCapabilityEnabled instead of isConsoleCapabilityEnabled here, for "we can't find the cluster Console, but that's ok, because this is a console-less cluster".

I'm a bit unsure about errors.IsNotFound(err). Does that match the errors we get when requesting a resource whose backing CRD is missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a fair point @wking ... Not entirely sure what should be the other then the IsNotFound error

consoleConfig, err := listers.ConsoleLister.Get("cluster")
if err != nil {
if errors.IsNotFound(err) && isConsoleCapabilityEnabled {
return nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth return existingConfig, nil here? I'm not sure how much the caller assumes about existingConfig completion, but the actually-failing returns in surrounding lines include existingConfig, and there may be useful stuff in there, even in the "cluster is no-console" case.

@bparees
Copy link
Contributor

bparees commented Oct 18, 2022

@stlaz can we move this forward? it's critical we resolve the degraded condition before 4.12 ships so that disabling the Console doesn't result in a degraded auth operator.

@jhadvig
Copy link
Member Author

jhadvig commented Oct 19, 2022

@wking comments addressed.

@stlaz PTAL :)

if tt.clusterVersion != nil {
if err := clusterVersionIndexer.Add(&configv1.ClusterVersion{
ObjectMeta: metav1.ObjectMeta{
Name: "cluster",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to move to version too.

@wking
Copy link
Member

wking commented Oct 20, 2022

Copy link
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 20, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jhadvig, wking
Once this PR has been reviewed and has the lgtm label, please ask for approval from stlaz by writing /assign @stlaz in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bparees
Copy link
Contributor

bparees commented Oct 20, 2022

/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 20, 2022

@bparees: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d29f6000-50bb-11ed-9541-8b5ab96ea429-0

@wking
Copy link
Member

wking commented Oct 21, 2022

That^ run didn't have Ben's new code from openshift/origin#27481:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-cluster-authentication-operator-583-ci-4.12-e2e-aws-sdn-no-capabilities/1583204398492815360/artifacts/release/artifacts/release-payload-latest/image-references | jq -r '.spec.tags[] | select(.name == "tests").annotations["io.openshift.build.commit.id"]'
9da7e86531a7ed93591692be6acafe9ae3e452c5
$ git --no-pager log --oneline --first-parent -2 origin/master
77afd1ee7d (origin/release-4.13, origin/release-4.12, origin/master, origin/HEAD) Merge pull request #27481 from bparees/nocaps
9da7e86531 Merge pull request #27482 from ardaguclu/fix-flaky-cli-tests

Trying again:

/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 21, 2022

@wking: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c4785560-50fa-11ed-9924-3fbb0aff9287-0

@bparees
Copy link
Contributor

bparees commented Oct 21, 2022

I don't know what to make of this error, or why it would be tied to any particular capability being turned off:

Oct 21 05:36:12.519: INFO: Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority

so:

/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

But that'll have to be our debugging starting point if this fails again.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 21, 2022

@bparees: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/9b56df70-5187-11ed-8a47-1066141d15d7-0

@wking
Copy link
Member

wking commented Oct 22, 2022

Yup, stdout for failed test-cases in the new run has lots of Waiting for the OAuth server route to be ready: x509: certificate signed by unknown authority too :/

isConsoleCapabilityEnabled := false
for _, capability := range clusterVersionConfig.Status.Capabilities.EnabledCapabilities {
if capability == configv1.ClusterVersionCapabilityConsole {
isConsoleCapabilityEnabled = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the execution be stopped with a hardcoded value once the capability is not allowed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should 👍

IngressLister: configInformer.Config().V1().Ingresses().Lister(),

APIServerLister_: configInformer.Config().V1().APIServers().Lister(),
ConsoleLister: configInformer.Config().V1().Consoles().Lister(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the console type does not exist, the lister presence will fill our logs with error messages in a loop. We can't have that.

}{
{
name: "NoConsoleConfig",
name: "NoConsoleConfigConsoleCapabilityEnabled",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when the capability is not enabled but someone creates the CRD and the object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think the cluster would be in an inconsistent state since the the CRD would be there but quite a lot of other resrouces would be missing.
Also is that even a valid scenario? meaning that only a cluster admin can create CRDs, not mentioning that he would/should be the one that decides which set of capabilities should be enabled when provisioning the cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is that even a valid scenario?

Unless you can prevent it, yes.

Add a unit test, please.

cluster admin can create CRDs

Cluster admin and likely anyone installing an operator.

@stlaz
Copy link
Contributor

stlaz commented Oct 24, 2022

Once all the comments are fixed, please provide a must-gather from a cluster that does not have the console capabiity enabled.

edit: I see from the above comments that we can already test this, awesome 👍

@stlaz
Copy link
Contributor

stlaz commented Oct 24, 2022

/lgtm cancel

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 24, 2022
@bparees
Copy link
Contributor

bparees commented Oct 24, 2022

@stlaz here's the must-gather from the most recent run:
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-cluster-authentication-operator-583-ci-4.12-e2e-aws-sdn-no-capabilities/1583571500747722752/artifacts/e2e-aws-sdn-no-capabilities/gather-must-gather/artifacts/

(doesn't include any changes since friday, not sure if you think that is likely to affect the failure)

@jhadvig
Copy link
Member Author

jhadvig commented Oct 24, 2022

Comments addressed. PTAL

}{
{
name: "NoConsoleConfig",
name: "NoConsoleConfigConsoleCapabilityEnabled",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is that even a valid scenario?

Unless you can prevent it, yes.

Add a unit test, please.

cluster admin can create CRDs

Cluster admin and likely anyone installing an operator.

if !isConsoleCapabilityEnabled {
return existingConfig, errs
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap this whole piece with the condition block and remove the capability special case from the error handling

return existingConfig, errs
}

consoleConfig, err := listers.ConsoleLister.Get("cluster")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nil exception panic here in case the capability is not enabled

@jhadvig
Copy link
Member Author

jhadvig commented Oct 26, 2022

/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 26, 2022

@jhadvig: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c7b40b60-550b-11ed-9b2e-b9f3e74a45b6-0

@bparees
Copy link
Contributor

bparees commented Oct 26, 2022

level=error msg=failed to initialize the cluster: Multiple errors are preventing progress:
level=error msg=* Cluster operator authentication is not available 

from latest run.

@stlaz
Copy link
Contributor

stlaz commented Oct 26, 2022

/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

let's see what happens with the latest changes

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 26, 2022

@stlaz: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/60cd7300-5538-11ed-8f53-acf499b096a8-0

@jhadvig
Copy link
Member Author

jhadvig commented Oct 27, 2022

/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2022

@jhadvig: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/57259260-563c-11ed-9cce-e3b4f93a9592-0

oauthInformers := oauthinformers.NewSharedInformerFactory(oauthClient, resync)

clusterVersionLister := operatorCtx.operatorConfigInformer.Config().V1().ClusterVersions().Lister()
clusterVersionConfig, err := clusterVersionLister.Get("version")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly recommend you add some debugging that prints out these two values (the clusterversion object and the err object), until you sort out what is going on (and then you can remove it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet my money that the issue is in the informer... 💸

@jhadvig
Copy link
Member Author

jhadvig commented Oct 27, 2022

/payload-job periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 27, 2022

@jhadvig: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-no-capabilities

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e07823b0-5647-11ed-8d79-857c7ed15d09-0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2022

@jhadvig: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-console-login c779d16 link true /test e2e-console-login
ci/prow/e2e-agnostic c779d16 link true /test e2e-agnostic
ci/prow/e2e-agnostic-upgrade c779d16 link true /test e2e-agnostic-upgrade
ci/prow/e2e-agnostic-ipv6 c779d16 link false /test e2e-agnostic-ipv6
ci/prow/e2e-aws-single-node c779d16 link false /test e2e-aws-single-node
ci/prow/e2e-operator c779d16 link true /test e2e-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jhadvig
Copy link
Member Author

jhadvig commented Nov 11, 2022

Closing in favour of #587

@jhadvig jhadvig closed this Nov 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants