Skip to content

Conversation

@alebedev87
Copy link
Contributor

@alebedev87 alebedev87 commented Apr 9, 2024

Part of the implementation of Make Ingress optional for HyperShift EP. Note that this change targets only HyperShift managed deployments and should not impact standalone OpenShift installations.

This PR:

  • Implements alternative ingress fields from the console operator config API
  • Skips component route customizations if ingress capability is disabled
  • Uses NodePort type for console and downloads services if ingress capability is disabled
  • Adds document to describe how to implement alternative ingress on ROSA

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 9, 2024

@alebedev87: This pull request references NE-1319 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 9, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 9, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 10, 2024
@alebedev87 alebedev87 force-pushed the tolerate-ingress-absence branch 2 times, most recently from cc41180 to 5e8103c Compare April 17, 2024 07:25
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 17, 2024
@alebedev87 alebedev87 changed the title NE-1319: disable health check if ingress capability is disabled NE-1319: disable route checks if ingress capability is disabled Apr 17, 2024
@alebedev87 alebedev87 force-pushed the tolerate-ingress-absence branch from 5e8103c to 8f3a66e Compare April 19, 2024 14:56
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 19, 2024

@alebedev87: This pull request references NE-1319 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

Part of the implementation of Make Ingress optional for HyperShift EP. This PR skips some checks which rely on admitted routes. Note that this change targets only HyperShift managed deployments and should not impact standalone OpenShift installations.

Manual test:

$ oc version
Client Version: 4.14.3
Kustomize Version: v5.0.1
Server Version: 4.16.0-0.nightly-2024-04-18-141003
Kubernetes Version: v1.29.3+5fa1806


# CVO scaled down to be able to disable Ingress and set console operator image
$ oc -n openshift-cluster-version get pods
No resources found in openshift-cluster-version namespace.

# Ingress operator and its operand are NOT running
$ oc -n openshift-ingress-operator get pods
No resources found in openshift-ingress-operator namespace.

$ oc -n openshift-ingress get pods
No resources found in openshift-ingress namespace.

# Console operator has a locally built image with the diff from the PR
$ oc -n openshift-console-operator get pods
NAME                                          READY   STATUS    RESTARTS   AGE
console-conversion-webhook-76598bc896-frhdh   1/1     Running   0          112m
console-operator-c467cd58f-kn69q              1/1     Running   0          4m29s

$ oc -n openshift-console-operator get pods console-operator-c467cd58f-kn69q -o yaml | yq .spec.containers[0].image
quay.io/alebedev/console-operator:4.19.162

# Note the console operator remained Available while the authentication operator went Degarded
$ oc get co console authentication
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console          4.16.0-0.nightly-2024-04-18-141003   True        False         False      9m37s   
authentication   4.16.0-0.nightly-2024-04-18-141003   False       False         False      54s     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.ci-ln-v8xxvwt-76ef8.origin-ci-int-aws.dev.rhcloud.com/healthz": EOF

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 19, 2024

@alebedev87: This pull request references NE-1319 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

Part of the implementation of Make Ingress optional for HyperShift EP. This PR skips some checks which rely on admitted routes. Note that this change targets only HyperShift managed deployments and should not impact standalone OpenShift installations.

Manual test:

$ oc version
Client Version: 4.14.3
Kustomize Version: v5.0.1
Server Version: 4.16.0-0.nightly-2024-04-18-141003
Kubernetes Version: v1.29.3+5fa1806

# CVO scaled down to be able to disable Ingress and set console operator image
$ oc -n openshift-cluster-version get pods
No resources found in openshift-cluster-version namespace.

# Ingress operator and its operand are NOT running
$ oc -n openshift-ingress-operator get pods
No resources found in openshift-ingress-operator namespace.

$ oc -n openshift-ingress get pods
No resources found in openshift-ingress namespace.

# Console operator has a locally built image with the diff from the PR
$ oc -n openshift-console-operator get pods
NAME                                          READY   STATUS    RESTARTS   AGE
console-conversion-webhook-76598bc896-frhdh   1/1     Running   0          112m
console-operator-c467cd58f-kn69q              1/1     Running   0          4m29s

$ oc -n openshift-console-operator get pods console-operator-c467cd58f-kn69q -o yaml | yq .spec.containers[0].image
quay.io/alebedev/console-operator:4.19.162

# Note the console operator remained Available while the authentication operator went Degarded
$ oc get co console authentication
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console          4.16.0-0.nightly-2024-04-18-141003   True        False         False      9m37s   
authentication   4.16.0-0.nightly-2024-04-18-141003   False       False         False      54s     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.ci-ln-v8xxvwt-76ef8.origin-ci-int-aws.dev.rhcloud.com/healthz": EOF

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@alebedev87 alebedev87 marked this pull request as ready for review April 19, 2024 15:09
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2024
@openshift-ci openshift-ci bot requested review from jhadvig and spadgett April 19, 2024 15:09
@alebedev87
Copy link
Contributor Author

/retest

Copy link
Member

@jhadvig jhadvig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alebedev87
This change does not make the operator to go downgraded when the ingress is disabled but whats basically because you are prematurely exiting all of the updated controllers. I doubt that the console will be working in this state.
Have you tested it ?


statusHandler := status.NewStatusHandler(c.operatorClient)

infrastructureConfig, err := c.infrastructureClient.Get(ctx, api.ConfigResourceName, metav1.GetOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the ingress can be enabled/disabled during the clusters lifecycle then we need to keep the fetching logic per controller, but we need to use listers rather then client.
If the ingress can be re-enabled then please move the fetching logic starter.go and pass the config to each of the controllers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the standalone OpenShift the Ingress capability will always be enabled. On HyperShift: once enabled it cannot be disabled.

}

// Disable the route check for external control plane topology (hypershift) if the ingress capability is disabled.
if util.IsExternalControlPlaneWithIngressDisabled(infrastructureConfig, clusterVersionConfig) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by doing this you are basically disabling the operator from doing its work by syncing the console routes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhadvig: From what I see both of the functions used by the route controller (SyncCustomRoute,SyncDefaultRoute) are related to the component route implementation (custom hostname or tls). The component routes are not supported on HyperShift, so I think we are safe to skip them.

@csrwng: can you please confirm that the component routes are not supported on HyperShift?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. If you've disabled ingress you are also indirectly disabling component routes, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I put my followup comment into a wrong conversation. Here it is:

Also, in the absence of the ingress operator the required RBAC won't be created for the components (including the console operator): Component routes chapter of the EP.


// Disable the oauth client update for external control plane topology (hypershift) if the ingress capability is disabled.
// HyperShift handles the updating of the oauth client for the console.
if util.IsExternalControlPlaneWithIngressDisabled(infrastructureConfig, clusterVersionConfig) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by doing this you are basically disabling the operator from doing its work by syncing the console oauthclients.

Copy link
Contributor Author

@alebedev87 alebedev87 Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "syncing the console oauthclients" you mean adding redirectURI? From what I understand this part will be handled by HyperShift as it takes care of some of the OAuthClients already. I suppose it's because the authentication server is not exposed via a route.

@csrwng: What do you think? If we don't disable the updates of OAuthClient the console operator will go degraded. However it's not yet clear what should be put as the redirectURI in case the ingress is disabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to disable updates of the OAuthClient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new API has been implemented to support an alternative ingress.

}

// Disable the client download check for external control plane topology (hypershift) if the ingress capability is disabled.
if controllersutil.IsExternalControlPlaneWithIngressDisabled(infrastructureConfig, clusterVersionConfig) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by doing this you are basically disabling the operator from doing its work by syncing the console CLI downloads CRDs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, oc-cli-downloads contains the links to download the oc client from the download route:

$ oc get consoleclidownloads oc-cli-downloads -o yaml | yq .spec.links[].href
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/amd64/linux/oc.tar
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/amd64/mac/oc.zip
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/amd64/windows/oc.zip
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/arm64/linux/oc.tar
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/arm64/mac/oc.zip
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/ppc64le/linux/oc.tar
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/s390x/linux/oc.tar
https://downloads-openshift-console.apps.alebedev-0424.devcluster.openshift.com/oc-license

So, the risk is to have the broken links in /command-line-tools path of the console UI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csrwng : can we accept the risk I described above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we should be also prematurely exiting the DownloadsDeployment controller and remove the deployment itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new API has been implemented to setup alternative ingress for the downloads.

@alebedev87
Copy link
Contributor Author

alebedev87 commented Apr 23, 2024

I doubt that the console will be working in this state. Have you tested it ?

Yes. You are right, the console is not not working out of the box because the routes are left unserved. I was playing with AWS Load Balancer Operator (ALBO) to find the right Ingress configuration to make the console work without routes. I managed to find it but it sill has some gaps:

  • the console service should be of type NodePort to work with the AWS ALB. This can be treated as a gap in the console operator which we can address later (on demand).
  • the redirect to the oauth server (at the login time) still uses the oauth route despite the fact that I changed all OAuth clients to use the AWS ALBs provisioned by the aws-lb-controller. This part may be configurable though, I may just miss some console knowledge.

@alebedev87
Copy link
Contributor Author

/retest

@alebedev87 alebedev87 force-pushed the tolerate-ingress-absence branch from 8f3a66e to c206e4e Compare April 24, 2024 17:23
@alebedev87
Copy link
Contributor Author

Rebased from master to fix the merge conflict.

@alebedev87
Copy link
Contributor Author

/assign @jhadvig

@alebedev87
Copy link
Contributor Author

/retest

@jhadvig
Copy link
Member

jhadvig commented Apr 25, 2024

Yes. You are right, the console is not not working out of the box because the routes are left unserved. I was playing with AWS Load Balancer Operator (ALBO) to find the right Ingress configuration to make the console work without routes. I managed to find it but it sill has some gaps:

  • the console service should be of type NodePort to work with the AWS ALB. This can be treated as a gap in the console operator which we can address later (on demand).

So this case will need to be handle in the service controller, which will then update the service spec accordingly. If I understand it, console itself wont be available though the service if the NodePort is not set? If thats the case I think it needs to be address in this change.

  • the redirect to the oauth server (at the login time) still uses the oauth route despite the fact that I changed all OAuth clients to use the AWS ALBs provisioned by the aws-lb-controller. This part may be configurable though, I may just miss some console knowledge.

Are these steps documented somewhere?

I feel like there is too many gaps - in order to merge this change. What would be the benefit for Hypershift user if he disables the Ingress capability, but the console wont be working? I feel that at minimum we should have a documented way how to make the console work.
Also this will need to be tested with QE before merging and they will need to know the steps how to:

  1. setup the env (hypershift with disabled ingress capability)
    after which they will test the non-functional console. In which case they will assume that its a bug. Therefor they will need a reproduction steps how to make console work.

@spadgett
Copy link
Member

I doubt that the console will be working in this state. Have you tested it ?

Yes. You are right, the console is not not working out of the box because the routes are left unserved. I was playing with AWS Load Balancer Operator (ALBO) to find the right Ingress configuration to make the console work without routes. I managed to find it but it sill has some gaps:

  • the console service should be of type NodePort to work with the AWS ALB. This can be treated as a gap in the console operator which we can address later (on demand).
  • the redirect to the oauth server (at the login time) still uses the oauth route despite the fact that I changed all OAuth clients to use the AWS ALBs provisioned by the aws-lb-controller. This part may be configurable though, I may just miss some console knowledge.

Should we simply disable console for now if we know it won't work, even with manual load balancer configuration?

The console reads the OAuth URLs from the well-known endpoint. If the OAuth metadata can be updated in that document, console should discover the right URL.

https://docs.openshift.com/container-platform/4.15/authentication/configuring-internal-oauth.html#oauth-server-metadata_configuring-internal-oauth

@alebedev87
Copy link
Contributor Author

Should we simply disable console for now if we know it won't work, even with manual load balancer configuration?

The console is still an important part of HyperShift based offerings like ROSA. The HyperShift engineers prefer the console to be running even without the default ingress (router). The way to expose it is a subject to a discussion, one of the ways to expose the console is described in the doc I added today.
However the console operator depends deeply on the router being present on the cluster. And we need to think pragmatically about what we can do in 4.16 release and what needs a follow up.

The console reads the OAuth URLs from the well-known endpoint. If the OAuth metadata can be updated in that document, console should discover the right URL.

ROSA HCP clusters already have the auth URL which is not an OpenShift route. This makes the authentication from console work smoothly even without the router.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2024
@alebedev87 alebedev87 changed the title NE-1319: disable route checks if ingress capability is disabled NE-1319: tolerate the absence of ingress capability on HyperShift clusters May 6, 2024
@alebedev87 alebedev87 force-pushed the tolerate-ingress-absence branch from beb12ae to 47090ae Compare June 5, 2024 21:01
@alebedev87
Copy link
Contributor Author

Always apply nodeport services, to avoid the chicken and egg problem with alternative ingress: Ingress needs a nodeport service before it can provision a LB whose host we need to configure console.operator.spec.ingress.

@alebedev87
Copy link
Contributor Author

/retest

@alebedev87 alebedev87 force-pushed the tolerate-ingress-absence branch from 47090ae to 35e503b Compare June 6, 2024 09:57
@alebedev87
Copy link
Contributor Author

ConsoleURL removed from configmap config as it's not a customization but API.

@alebedev87 alebedev87 force-pushed the tolerate-ingress-absence branch from 35e503b to 24f4886 Compare June 6, 2024 14:49
@openshift-ci-robot
Copy link
Contributor

@alebedev87: This pull request references Jira Issue OCPBUGS-33787, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.17.0) matches configured target version for branch (4.17.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @melvinjoseph86

Details

In response to this:

Part of the implementation of Make Ingress optional for HyperShift EP. Note that this change targets only HyperShift managed deployments and should not impact standalone OpenShift installations.

This PR:

  • Implements alternative ingress fields from the console operator config API
  • Skips component route customizations if ingress capability is disabled
  • Uses NodePort type for console and downloads services if ingress capability is disabled
  • Adds document to describe how to implement alternative ingress on ROSA

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from melvinjoseph86 June 6, 2024 14:51
@alebedev87
Copy link
Contributor Author

As discussed with @jhadvig: create NodePort service only for clusters with disabled ingress capability.

…ft clusters

- Implement alternative ingress fields from the console operator config API
- Skip component route customizations if ingress capability is disabled
- Use NodePort type for console and downloads services if ingress capability is disabled
- Add document to describe how to implement alternative ingress on ROSA
@alebedev87 alebedev87 force-pushed the tolerate-ingress-absence branch from 24f4886 to 7c4d777 Compare June 6, 2024 15:56
@alebedev87
Copy link
Contributor Author

docs/alb-ingress-rosa-hcp.md still had *-np services mentioned.

@alebedev87
Copy link
Contributor Author

/retest

@alebedev87
Copy link
Contributor Author

/retest

CI outage.

@alebedev87
Copy link
Contributor Author

e2e-aws-console test is unstable and needs to be overridden like we did for #907.

@jhadvig
Copy link
Member

jhadvig commented Jun 7, 2024

Thanks @alebedev87 for the PR. LGTM
/lgtm
/retest

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 7, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alebedev87, jhadvig

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 7, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD d4bac3a and 2 for PR HEAD 7c4d777 in total

@jhadvig
Copy link
Member

jhadvig commented Jun 10, 2024

Overriding the TC due to flaking e2e test
/override ci/prow/e2e-aws-console

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 10, 2024

@jhadvig: Overrode contexts on behalf of jhadvig: ci/prow/e2e-aws-console

Details

In response to this:

Overriding the TC due to flaking e2e test
/override ci/prow/e2e-aws-console

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 10, 2024

@alebedev87: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit c608dec into openshift:master Jun 10, 2024
@openshift-ci-robot
Copy link
Contributor

@alebedev87: Jira Issue OCPBUGS-33787: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-33787 has been moved to the MODIFIED state.

Details

In response to this:

Part of the implementation of Make Ingress optional for HyperShift EP. Note that this change targets only HyperShift managed deployments and should not impact standalone OpenShift installations.

This PR:

  • Implements alternative ingress fields from the console operator config API
  • Skips component route customizations if ingress capability is disabled
  • Uses NodePort type for console and downloads services if ingress capability is disabled
  • Adds document to describe how to implement alternative ingress on ROSA

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build openshift-enterprise-console-operator-container-v4.17.0-202406101611.p0.gc608dec.assembly.stream.el9 for distgit openshift-enterprise-console-operator.
All builds following this will include this PR.

@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.16.0-0.nightly-2024-07-04-015518

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants