Skip to content

Conversation

@DavidHurta
Copy link
Contributor

@DavidHurta DavidHurta commented Jul 12, 2023

What this PR does / why we need it:

This pull request will enable the hosted Cluster Version Operator (CVO) to evaluate conditional updates on self-managed HyperShift deployed on an OpenShift management cluster.

Which issue(s) this PR fixes:
Implements #OTA-855

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 12, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jul 12, 2023

@Davoska: This pull request references OTA-855 which is a valid jira issue.

Details

In response to this:

What this PR does / why we need it:

This pull request will enable the Cluster Version Operator (CVO) to evaluate conditional updates of its hosted cluster. We need this to fully utilize the conditional updates in the HyperShift.

This pull request heavily relies on new changes from the openshift/cluster-version-operator#926 and thus tests may be failing before the PR is merged. However, in the meanwhile, the code can be reviewed as the PR is functional.

Which issue(s) this PR fixes:
Implements #OTA-855

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 12, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 12, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added do-not-merge/needs-area area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Jul 12, 2023
@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 17, 2023
@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller July 17, 2023 17:27
@DavidHurta DavidHurta marked this pull request as ready for review July 18, 2023 14:59
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 18, 2023
@DavidHurta
Copy link
Contributor Author

Looking at the logs from the failing CI:

# github.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:328:10: undefined: deleteIfExists
control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:332:9: undefined: deleteIfExists
make: *** [Makefile:76: control-plane-operator] Error 2

#2778 modified the used code in this PR, I need to use the function that replaced the deleteIfExists function.

@DavidHurta DavidHurta force-pushed the ota-855-point-hosted-cvo-at-management-cluster branch from 031ab45 to b970908 Compare July 19, 2023 11:49
@DavidHurta
Copy link
Contributor Author

The currently failing CI is failing as expected because openshift/cluster-version-operator#926 needs to be merged before this pull request can be merged.

@DavidHurta
Copy link
Contributor Author

/hold
Modifying extensively.

@DavidHurta
Copy link
Contributor Author

I am converting the PR back to a draft for the moment so that we can discuss the design again.

@DavidHurta DavidHurta marked this pull request as draft August 14, 2023 12:55
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 14, 2023
@DavidHurta DavidHurta force-pushed the ota-855-point-hosted-cvo-at-management-cluster branch from b970908 to b517d50 Compare November 6, 2023 16:27
@openshift-ci openshift-ci bot added area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI labels Nov 6, 2023
@DavidHurta DavidHurta changed the title OTA-855: Add risk evaluation of conditional updates by the CVO OTA-855: Enable CVO to evaluate conditional updates on self-managed HyperShift deployed on OpenShift Nov 6, 2023
fmt.Sprintf("--listen=0.0.0.0:%d", port),
fmt.Sprintf("--serving-cert-file=%s", cpath(cvoVolumeServerCert().Name, corev1.TLSCertKey)),
fmt.Sprintf("--serving-key-file=%s", cpath(cvoVolumeServerCert().Name, corev1.TLSPrivateKeyKey)),
"--hypershift=true",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is fine, just thinking it through out loud in case other folks are thinking in the same direction. CVO support for this option is new in 4.15:

cluster-version-operator $ git --no-pager grep hypershift origin/release-4.15
origin/release-4.15:cmd/start.go:       cmd.PersistentFlags().BoolVar(&opts.InjectClusterIdIntoPromQL, "hypershift", opts.InjectClusterIdIntoPromQL, "This options indicates whether the CVO is running inside a hosted control plane.")
cluster-version-operator $ git --no-pager grep hypershift origin/release-4.14
...no hits...

But this HostedControlPlane operator is pinned to the release payload, so we should be able to set it like you have it without worrying about version skew vs. the CVO. Older payloads with CVOs that don't support this option will have older HostedControlPlane controllers that don't try to set it.

@DavidHurta DavidHurta force-pushed the ota-855-point-hosted-cvo-at-management-cluster branch from 6901f2b to 9fc1db1 Compare December 4, 2023 23:45
@DavidHurta
Copy link
Contributor Author

/hold Modifying per #2807 (comment).

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 5, 2023
@DavidHurta DavidHurta force-pushed the ota-855-point-hosted-cvo-at-management-cluster branch from 9fc1db1 to 0097f51 Compare December 5, 2023 16:24
@DavidHurta
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 5, 2023
@DavidHurta
Copy link
Contributor Author

/retest

Copy link
Contributor

@csrwng csrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments, otherwise lgtm

… deployed on OpenShift

This commit will enable the Cluster Version Operator (CVO) to evaluate
conditional updates on self-managed HyperShift deployed on an OpenShift
management cluster.

For the CVO to evaluate conditional updates, it accesses a PromQL query
service holding the appropriate information. In the standalone
OpenShift, it is the thanos-querier service in the openshift-monitoring
namespace.

This commit will point the hosted CVO at the mentioned service when
specified via a command line flag as well. To use the thanos-querier
service, commit utilizes its tenancy port 9092. This port allows the
users to query metrics from a specified namespace (in this case, the
hosted CVO can query metrics from its hosted control plane namespace).
For the authorization, the hosted CVO is given a new Role with the
minimal needed permissions to be able to query its HCP metrics.

For the Control Plane Operator to be able to give this permission to the
CVO, it needs this permission as well. The same case applies to the
HyperShift Operator (HSO).

When installing HyperShift, the
`--enable-cvo-management-cluster-metrics-access` flag can be now passed
to modify specific API objects accordingly to allow the hosted CVO to
access the HCP metrics. The flag is not supported when
`--rhobs-monitoring` is set, as the hosted CVO is unable to access
metrics scraped by RHOBS objects at the moment. In the case, that the
flag `--enable-cvo-management-cluster-metrics-access` is set on a
non-OpenShift management cluster, the CVO will continue to report
`Recommended = Unknown` for conditional updates.

A new network policy is created to only allow egress communication to
the PromQL query service pod's port from the hosted CVO pods.
@DavidHurta DavidHurta force-pushed the ota-855-point-hosted-cvo-at-management-cluster branch from 0097f51 to d49603e Compare December 6, 2023 16:34
@DavidHurta DavidHurta requested a review from csrwng December 6, 2023 16:46
@csrwng
Copy link
Contributor

csrwng commented Dec 6, 2023

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 6, 2023
@DavidHurta
Copy link
Contributor Author

/retest-required

All tests passed on the last run of ci/prow/e2e-kubevirt-aws-ovn https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_hypershift/2807/pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn/1732438664232833024. However, the job timed out.

INFO[2023-12-06T20:32:23Z] Running step e2e-kubevirt-aws-ovn-gather-must-gather. 
{"component":"entrypoint","file":"k8s.io/test-infra/prow/entrypoint/run.go:169","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2023-12-06T20:35:57Z"}

The previous run succeeded, and the new changes should not be responsible for the timeout. Rerunning the tests.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 46ac206 and 2 for PR HEAD d49603e in total

@LalatenduMohanty
Copy link
Member

/test ci/prow/e2e-kubevirt-aws-ovn

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2023

@LalatenduMohanty: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test e2e-aws
  • /test e2e-kubevirt-aws-ovn
  • /test images
  • /test unit
  • /test verify

The following commands are available to trigger optional jobs:

  • /test e2e-aws-4-12
  • /test e2e-aws-4-13
  • /test e2e-aws-metrics
  • /test e2e-conformance
  • /test e2e-ibmcloud-iks
  • /test e2e-ibmcloud-roks

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-hypershift-main-e2e-aws
  • pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn
  • pull-ci-openshift-hypershift-main-images
  • pull-ci-openshift-hypershift-main-unit
  • pull-ci-openshift-hypershift-main-verify
Details

In response to this:

/test ci/prow/e2e-kubevirt-aws-ovn

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@LalatenduMohanty
Copy link
Member

/test e2e-kubevirt-aws-ovn

@LalatenduMohanty
Copy link
Member

The last e2e-kubevirt-aws-ovn test failed with below error which seems orthogonal to the PR

: Find all of the input images from ocp/4.15 and tag them into the stable stream expand_less 	34s
{  could not create snapshot imagestream ocp/4.15 for release latest: Timeout: request did not complete within requested timeout - context deadline exceeded}

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 5b9c536 and 1 for PR HEAD d49603e in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2023

@Davoska: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 099a1f8 into openshift:main Dec 7, 2023
@openshift-bot
Copy link

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-hypershift-container-v4.15.0-202312071813.p0.g099a1f8.assembly.stream for distgit hypershift.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/ci-tooling Indicates the PR includes changes for CI or tooling area/cli Indicates the PR includes changes for CLI area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants