Skip to content

feat: Add ObservabilityUI plugins API#434

Merged
jan--f merged 4 commits intorhobs:mainfrom
jgbernalp:add-observability-ui-plugins
Apr 22, 2024
Merged

feat: Add ObservabilityUI plugins API#434
jan--f merged 4 commits intorhobs:mainfrom
jgbernalp:add-observability-ui-plugins

Conversation

@jgbernalp
Copy link
Contributor

@jgbernalp jgbernalp commented Mar 12, 2024

This PR adds the Observability UI CRDs.

https://issues.redhat.com/browse/OU-357

Every ObservabilityUIPlugin, creates the following resources:

  • Deployment to host a lightweight backend for the plugin assets
  • Service to expose the deployment to the console
  • Service account
  • ConsolePlugin CR so the console knows in which service the plugin lives
  • Optional: ClusterRole and ClusterRoleBinding for some plugins so their backend can watch configmaps in a namespace

And patches the console to add the plugin so is enabled in the web console

@jgbernalp jgbernalp requested a review from a team as a code owner March 12, 2024 16:47
@jgbernalp jgbernalp requested review from danielmellado and jan--f and removed request for a team March 12, 2024 16:47
@openshift-ci
Copy link

openshift-ci bot commented Mar 12, 2024

Hi @jgbernalp. Thanks for your PR.

I'm waiting for a rhobs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jgbernalp jgbernalp force-pushed the add-observability-ui-plugins branch 2 times, most recently from f0c64c4 to b81e843 Compare March 13, 2024 14:31
@jgbernalp jgbernalp force-pushed the add-observability-ui-plugins branch from b81e843 to 6707faf Compare March 13, 2024 14:35
@jan--f
Copy link
Collaborator

jan--f commented Mar 13, 2024

Generally this looks good, I might have requests on some details later.
One question we should discuss: currently obo is Openshift agnostic and can be installed on a vanilla k8s. We make use of that in development during e2e testing with a kind cluster. If at all possible I'd like to preserve that feature. Iiuc the dependencies with this PR are the ConsolePlugin kind and serving cert annotations in the service.
Do you think there is a way to degrade gracefully if we're not on OCP, i.e. the ConsolePlugin is not present?
Does the Service require and https endpoint or could this be implemented differently? So far we leave any tls setups to the user via service-side-apply, though we certainly have considered encrypted by default before.

@jgbernalp
Copy link
Contributor Author

Generally this looks good, I might have requests on some details later. One question we should discuss: currently obo is Openshift agnostic and can be installed on a vanilla k8s. We make use of that in development during e2e testing with a kind cluster. If at all possible I'd like to preserve that feature. Iiuc the dependencies with this PR are the ConsolePlugin kind and serving cert annotations in the service. Do you think there is a way to degrade gracefully if we're not on OCP, i.e. the ConsolePlugin is not present? Does the Service require and https endpoint or could this be implemented differently? So far we leave any tls setups to the user via service-side-apply, though we certainly have considered encrypted by default before.

Thanks for the initial check. I agree, we should verify whether the cluster supports the ConsolePlugin and, if it's not present, skip the reconciliation process. Imo since the service is relevant only within an OpenShift cluster where plugins are available, opting to skip reconciliation seems to be an adequate solution also for the serving cert annotation.

@danielmellado
Copy link
Contributor

I was also about to suggest to drop openshift-specific names like groupAPI but it just makes sense to completely fall back if on non-openshift cluster. +1!

@periklis
Copy link

Generally this looks good, I might have requests on some details later. One question we should discuss: currently obo is Openshift agnostic and can be installed on a vanilla k8s. We make use of that in development during e2e testing with a kind cluster. If at all possible I'd like to preserve that feature. Iiuc the dependencies with this PR are the ConsolePlugin kind and serving cert annotations in the service. Do you think there is a way to degrade gracefully if we're not on OCP, i.e. the ConsolePlugin is not present? Does the Service require and https endpoint or could this be implemented differently? So far we leave any tls setups to the user via service-side-apply, though we certainly have considered encrypted by default before.

Sharing my experience on how we isolate OpenShift features in Loki Operator the last two years:

  1. First things first, having an operator registering CRDs in the client side scheme even if the CRDs do not existing on the APIserver side is not a harmful thing. It get's only harmful if you start a controller starting to sync/watch for such resources.
  2. What worked good for us are two things. First we publish three almost bundles (community, community-openshift, openshift). The bundles are identical except for:
    • Container images: community and community-openshift reference images on dockerhub based on alpine. Openshift is a bundle only populated at and by Red Hat using RHEL-based images and other productization bits (annotations, CVP, etc.)
    • Feature gates: This used to be the controller-runtime project config that migrated into the operator code base because it is migrated (i.e. community, community-openshift, openshift). The feature gates include a separate OpenShift only section that handles features like service-ca, CCO, etc. Regardless these some of the generic feature gates are opened all together because we know we have support on OpenShift only OOTB (e.g. ServiceMonitors, TLS Endpoints in ServiceMonitors, Restricted Pod Security, Alerts)
  3. Controlling the feature gates per bundle gives us the maximum control on how much we can automate on the target cluster. The Community OpenShift and the OpenShift bundles basically go to the maximum because we can safely assume that Prometheus, ServiceCA, etc. exist on the platform. On the other hand the Community bundle is the bare minimum that one can install from here and the user can adapt the Feature Gates at will. The only exception to the rule is the dependency to Cert-Manager (and this is only a recommendation) for TLS certificates used for our Webhooks.
  4. Based on feature gates we manage individual runtime characterists:
    • Starting controllers: For example only on OpenShift the dashboards-controller
    • Resource Registration: Based on the gates we register watches/owns handlers for each controller, e.g. we watch for the cluster Proxy (used for container proxy configuration) or APIServer (used for TLS Profiles) objects only in OpenShift.
    • Extra Webhook Validators: For example only on OpenShift we enable the validations of alerting rules according to the official style guide.
    • Manage Resources: On OpenShift we replace Ingress for Route and add support for CCO's CredentialRequest using in ARO and ROSA.
  5. Anything we put in manifests is pure k8s from the beginning. This means we can test this on kind. Anything OpenShift specific is preserved in dedicated subpackages that is applied by patching the structs with mergo.

@simonpasquier
Copy link
Contributor

simonpasquier commented Mar 14, 2024

Reading quickly through the discussion, I think that the Loki approach is lean and flexible (thanks @periklis for the details!). FWIW the upstream Prometheus operator does something different: it introspects CRDs + RBAC permissions to adjust the behavior at runtime (e.g. conditional start of some controllers) but the situation isn't the same since we don't control the deployment model (can be custom installation, Helm, OLM, ...) and have to deal with users not updating everything when they upgrade to a new operator version.

One question would be regarding HyperShift: do you foresee (or have you experienced) a situation where the operator depends on a feature that exists in classic OCP but not in HyperShift?

@jgbernalp
Copy link
Contributor Author

Thnx @periklis for the suggestion, I was already working on the feature gates based on Loki Operator example. 7862c0d adds them.

@periklis
Copy link

One question would be regarding HyperShift: do you foresee (or have you experienced) a situation where the operator depends on a feature that exists in classic OCP but not in HyperShift?

Not yet! But from experience with ARO/ROSA I can foresee the following workflow, assuming Service Delivery is managing the Loki Operator installation for customers:

  • Open the Loki Operator feature gates as a separate API (e.g. similar to openshift/api's FeatureGates or via this one) to manage on runtime
  • Or provide a mechanism for HIVE/HyperShift/whatever to populate a separate ConfigMap for our feature gates.

In general our support matrix for feature gates settings in Loki Operator has been static and in turn weakly tested to users manipulating them before installing the operator. We suspect with quite a confidence that some combinations will break the operator entirely. Therefore we give them a static but tested configmap upfront in the bundle as per later being treated immutable by OLM.

@danielmellado
Copy link
Contributor

/ok-to-test

// +required
// +kubebuilder:validation:Required
// +kubebuilder:validation:Enum=dashboards
Type UIPluginType `json:"type"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was experimenting with ObservabilityUIPlugin creation in an OpenShift cluster and even though the type is displayed as required (in UI), I can create a plugin with an empty value.

Perhaps the // +kubebuilder:validation:Enum=dashboards should be applied to UIPluginType type (example here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still problematic (enum required validation). Maybe the ObservabilityUIPluginSpec should not have omitempty for validation to happen and it should probably be // +kubebuilder:validation:Enum=Dashboards

Maybe we could also declare Dashboards as the default value, but I don't know what the future plans are.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tremes wdyt about the current version here?


### Dashboards

The plugin will search for datasources as ConfigMaps in the `openshift-config-managed` namespace with the `console.openshift.io/dashboard-datasource: 'true'` label. The namespace `openshift-config-managed` is required, more details on how to create a datasource ConfigMap can be found in the [console-dashboards-plugin](https://github.com/openshift/console-dashboards-plugin/blob/main/docs/add-datasource.md)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need to update this section if the plugin searches only in the COO namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plugin will search only in the openshift-config-managed namespace. This can be adjusted in the future to be a configurable value. But now it follows the same namespace as the console dashboards.

case uiv1alpha1.TypeDashboards:
{
readerRoleName := plugin.Name + "-datasource-reader"
datasourcesNamespace := "openshift-config-managed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my own education, this is currently hardcoded by the plugin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be configured in the plugin using the --dashboards-namespace flag but its default is openshift-config-managed. At this point there is a single deployment template used for any plugin. This might change when we add support for other plugins that need specific options.

},
ObjectMeta: metav1.ObjectMeta{
Name: readerRoleName,
Namespace: datasourcesNamespace,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the role is limited to the COO namespace, shouldn't we adjust the --dashboards-namespace plugin flags accordingly?
https://github.com/openshift/console-dashboards-plugin/blob/6800afc1e9bd785e31bf0a7a4b7d282d60eb0b1a/cmd/plugin-backend.go#L17

// RBAC for managing observability ui plugin objects
//+kubebuilder:rbac:groups=apps,resources=deployments,verbs=list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=rbac.authorization.k8s.io,resources=roles;rolebindings,verbs=list;watch;create;update;delete;patch
//+kubebuilder:rbac:groups="",resources=serviceaccounts;services;configmaps,verbs=get;list;watch;create;update;patch;delete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need create/update/patch/delete permissions?
It can be adjusted as a follow-up and in fact, we probably want to review/tidy up the operator permissions after the PR merges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a follow up task https://issues.redhat.com/browse/COO-121

resources:
- uiplugins
verbs:
- create
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same remark here regarding permissions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a follow up task https://issues.redhat.com/browse/COO-121

@jgbernalp jgbernalp force-pushed the add-observability-ui-plugins branch from adece23 to 6e99265 Compare April 18, 2024 18:19
@simonpasquier simonpasquier dismissed periklis’s stale review April 19, 2024 13:20

IIUC the concern was about the UIPlugin resource being namespace-scoped, it's now cluster-scoped.

func (r Updater) Reconcile(ctx context.Context, c client.Client, scheme *runtime.Scheme) error {
if r.resourceOwner.GetNamespace() == r.resource.GetNamespace() {
// If the resource owner is in the same namespace as the resource, or if the resource owner is cluster scoped set the owner reference.
if r.resourceOwner.GetNamespace() == r.resource.GetNamespace() || r.resourceOwner.GetNamespace() == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@simonpasquier
Copy link
Contributor

I didn't test/review in depth the latest version of the PR but you've got my virtual /lgtm :)

@danielmellado
Copy link
Contributor

Besides the nit issue on the commit-lint IMHO this is ready to merge. @jgbernalp mind addressing that? Tnx!

@jgbernalp jgbernalp force-pushed the add-observability-ui-plugins branch from 4ccc116 to 090b61e Compare April 22, 2024 08:11
@jgbernalp jgbernalp force-pushed the add-observability-ui-plugins branch from 090b61e to a7e6188 Compare April 22, 2024 08:26
@openshift-ci
Copy link

openshift-ci bot commented Apr 22, 2024

@jgbernalp: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/observability-operator-e2e 090b61e link true /test observability-operator-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jan--f
Copy link
Collaborator

jan--f commented Apr 22, 2024

Works as expected, follow-up tasks for some minor nits are created and tracked. Good to go, thanks everyone!
/lgtm

@jan--f
Copy link
Collaborator

jan--f commented Apr 22, 2024

/approve

@openshift-ci
Copy link

openshift-ci bot commented Apr 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jan--f, jgbernalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jan--f jan--f merged commit 92bae83 into rhobs:main Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants