CONSOLE-2833: Integrate multicluster POC operator changes by TheRealJon · Pull Request #575 · openshift/console-operator

TheRealJon · 2021-08-17T16:37:00Z

Add new managed cluster controller
- Ceates configmaps for each managed cluster API server CA bundle.
- Builds new managed-clusters configmap that will be consumed by the multicluster backend
Update console deployment sync to create volumes/volumemounts for new managed cluster configmaps

We still need the work from @florkbr for the other part of this, which is to create the OauthClients on spoke clusters and build the manged cluster oauth configs.

TheRealJon · 2021-08-17T18:12:54Z

/retest

TheRealJon · 2021-08-18T14:42:45Z

/retest

jhadvig

@TheRealJon thanks for the PR.
For the first review round I would advice to undo all the example && manifest changes that are only adding extra tabs, so the changes are more readable.

jhadvig

Thanks for putting this logic to separate controller. Couple or comments regarding the code.

TheRealJon · 2021-08-30T14:31:32Z

/retest

TheRealJon · 2021-08-31T13:17:22Z

/retest

TheRealJon · 2021-09-01T13:52:22Z

/retest

TheRealJon · 2021-09-01T16:19:14Z

/retest

TheRealJon · 2021-09-01T18:26:54Z

/hold

TheRealJon · 2021-09-02T15:38:26Z

/retest

TheRealJon · 2021-09-08T14:58:51Z

/retest

TheRealJon · 2021-09-08T20:20:43Z

/retest

TheRealJon · 2021-09-23T13:54:54Z

/hold cancel

florkbr · 2021-09-27T14:46:33Z

Failing tests seem unrelated to this PR:

Agnostic upgrade (multiple)

[sig-network] pods should successfully create sandboxes by not timing out

aws-operator

Operator degraded (APIServerDeployment_UnavailablePod): APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()

aws-single-node

 ClusterVersion: Installing "" for <unknown>: <unknown>
error getting cluster operators: Get "https://api.ci-op-8vydrbip-71ba7.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 52.35.183.89:6443: connect: connection refused
ClusterOperators:
	clusteroperators are missing

florkbr · 2021-09-27T14:47:33Z

/retest

florkbr · 2021-09-27T18:41:56Z

aws-operator failed tests:

=== RUN   TestEditManagedConsoleCLIDownloads
    managed_test.go:13: waiting for setup to reach settled state...
    util.go:97: patching DisplayName on the oc-cli-downloads ConsoleCLIDownloads custom resource
    util.go:104: polling for patched DisplayName on the oc-cli-downloads ConsoleCLIDownloads custom resource
    managed_test.go:70: error: timed out waiting for the condition
    managed_test.go:17: waiting for cleanup to reach settled state...
--- FAIL: TestEditManagedConsoleCLIDownloads (14.24s)

--- FAIL: TestMetricsEndpoint (5.41s)
--- FAIL: TestAddPlugins (48.34s)

florkbr · 2021-09-27T18:42:05Z

/retest

TheRealJon · 2021-09-28T13:22:03Z

/retest

TheRealJon · 2021-09-30T13:59:46Z

Testing this PR

Prereqs

Two clusters. One will include PR CONSOLE-2833: Integrate multicluster POC operator changes #575 (hub cluster), the other can just be a normal GA release cluster (spoke cluster)
Member of open-cluster-management org on quay.io to pull latest ACM images.

Post a request in #forum-acm-devops channel in the CoreOS Slack workspace, mentioning @acm-cicd
Openshift AWS account if spinning up a cluster using openshift-install

See https://source.redhat.com/groups/public/atomicopenshift/atomicopenshift_wiki/setting_up_your_accounts_openshift
Hub cluster has enough compute resources to run ACM.
- Scale up a few worker machinesets on the hub cluster before trying to install ACM
```
oc scale machineset.machine.openshift.io/<MACHINESET_NAME> -n openshift-machine-api --replicas=2
```

Testing

Most of the test conditions will require grepping the console operator logs for certain output. Here's an example command that can be used when searching for expected log output:

  oc logs deploy/console-operator -n openshift-console-operator | grep "expected log output"

Step 1: Spin up or use existing spoke cluster

Provision the spoke cluster before the hub cluster because you don't want to waste up-time on your cluster-bot cluster (see next step). You will need kubeadmin access to an existing (non-ACM managed) cluster before step 2. This can be accomplished by using cluster-bot, openshift install, or using an existing cluster.

Step 2: Spin up hub cluster

Use cluster-bot to spin up a second cluster (the hub cluster) with #575

launch openshift/console-operator#575

Once the cluster is up, oc login as kubeadmin.

Step 3: Increase console-operator log level on the hub cluster

Patch the console operator to increase log level:

oc patch console.operator.openshift.io cluster --patch '{"spec":{"logLevel":"Debug","operatorLogLevel":"Debug"}}' --type merge

Test Conditions

Grep the console operator logs for "managed clusters". Expect 0 matches.

Step 4: Install ACM on the hub cluster

Multicluster console requires some unreleased ACM API changes, so we need to install ACM using a 2.4.0 image.

Copy a recent 2.4.0 image tag text from https://quay.io/repository/open-cluster-management/acm-custom-registry
Follow the readme for https://github.com/open-cluster-management/deploy to install ACM. The same can be used to uninstall ACM if you run into issues.

Be aware that 2.4.0 images might have some bugs since they are still in dev. If you run into some unexpected ACM behavior, you may need to uninstall and re-install ACM with a different tag.

Test Conditions

Check that the ManagedCluster CRD is installed on the cluster:

oc get customresourcedefinition.apiextensions.k8s.io/managedclusters.cluster.open-cluster-management.io`

Once the previous command returns successfully, grep the console operator logs for: console-operator is in a managed state: syncing managed clusters. Expect at least 1 match.

Step 5: Import spoke cluster

Once ACM is up and running on the hub cluster, either use the ACM console or the CLI to import the spoke cluster from step 1.

Name the imported cluster "multicluster-test"

Console instructions
CLI Instructions

The console instructions are for ACM 2.3, so they are slightly different on ACM 2.4. Make the following adjustments:

Instead of clicking "Add a cluster" then clicking "Import an existing cluster" you can just click "Import Cluster" on the "Infrastructure > Clusters" page (near the top).

Chooes one of the 3 options for "Import mode" and fill out the form appropriately. The options in order from easiest to hardes are:

"Kubeconfig": Copy and paste the contents your spoke cluster kubeconfig

"Enter your server URL and API token for the existing cluster":

To get the server URL:oc login to the spoke cluster, then run oc whoami --show-server

To get the API token: oc login to the spoke cluster, then run oc whoami --show-token

"Run import commands manually": This is what the 2.3 instructions explain and is the most complicated.

Test Conditions

It may take a while for the cluster import to complete and for changes to propagate through the console operator, so you may have to run some of these commands multiple times to see the expected output.

Validation

As the import occurs, the console operator should be validating the state of ManagedCluster resources and will log validation errors before the import is complete.

When the ManagedCluster resource is first created, it will not have any spec.managedClusterClientConfigs items defined yet, which should cause output from the console operator.

Grep the console operator logs for "Skipping managed cluster multicluster-test, no client config found". Expect at least 1 match.

During the import process, spec.managedClusterClientConfigs will be added to the ManagedCluster at some point. The console operator validates the client config properties, and if all the expected data is not present, errors will be logged. Note that these lines may not occur.

Grep the console operator logs for Skipping managed cluster multicluster-test, client config CA bundle not found Expect 0 or more matches.
Grep the console operator logs for Skipping managed cluster multicluster-test, client config URL not found. Expect 0 or more matches.

Console operator checks for the existence of the managed-clusters configmap and logs a line when it's not found.

Grep the console operator logs for managed-clusters config map not found, continuing.... Expect at least 1 match.

Reconciliation

Once a valid client config is defined on the ManagedCluster resource, the console operator will begin reconciling the state on the cluster. Once everything is reconciled, two ConfigMaps should have been created in the openshift-console namespace, and the console deployment should have been updated.

CA Bundle ConfigMap exists and has the correct properties

oc get configmap/multicluster-test-managed-cluster-api-server-ca -n openshift-console -o yaml

Expected YAML (omitting actual CA data):

metadata:
  labels:
    app: console
    managed-cluster: multicluster-test
    managed-cluster-api-server-ca: ""
  name: multicluster-test-managed-cluster-api-server-ca
  namespace: openshift-console
data:
  ca.crt: |
    <CA_BUNDLE_DATA>

To be thorough, you can compare the CA bundle data in this ConfigMap to the data found in the ManagedCluster resource:
oc get managedcluster.cluster.open-cluster-management.io/multicluster-test -o json | jq -r '.spec.managedClusterClientConfigs[0].caBundle' | base64 --decode

managed-clusters ConfigMap exists and has the correct properties

oc get configmap/managed-clusters -n openshift-console -o yaml

Expected YAML:

- name: multicluster-test
  apiServer:
    url: <spoke cluster API server url>
    caFile: /var/managed-cluster-certs/multicluster-test-managed-cluster-api-server-ca/ca-bundle.crt
  oauth:
    clientID: ""
    clientSecret: ""
    caFile: ""

console Deployment has been updated with volumes and volume mounts

oc get deploy/console -n openshift-console -o yaml

Expected YAML (only showing the properties we are looking for):

spec:
  template:
    spec:
      containers:
        - name: console
          volumeMounts:
            - name: managed-clusters
              readOnly: true
              mountPath: /var/managed-cluster-config
            - name: multicluster-test-managed-cluster-api-server-ca
              readOnly: true
              mountPath: /var/managed-cluster-certs/multicluster-test-managed-cluster-api-server-ca
      volumes:
        - name: managed-clusters
          configMap:
            name: managed-clusters
            defalutMode: 420
        - name: multicluster-test-managed-cluster-api-server-ca
          configMap:
            name: multicluster-test-managed-cluster-api-server-ca
            defaultMode: 420

- Still need feature gate and oauth work for multicluster to be fully implmented - Add managed cluster controller that watches ACM ManagedCluster resources and builds a config file that can be consumed by the console backend - Update console deployment sync with new volumes and volume mounts for managed cluster config and CA bundles. - Add managed cluster and API server CA bundle ConfigMaps to subresources - Update informer event filter utils - Add github.com/open-cluster-management/api dependency

openshift-ci · 2021-10-08T19:12:42Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: TheRealJon
To complete the pull request process, please assign bparees after the PR has been reviewed.
You can assign the PR to them by writing /assign @bparees in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2021-10-08T21:35:01Z

@TheRealJon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/verify	`864b40d`	link	true	`/test verify`
ci/prow/unit	`864b40d`	link	true	`/test unit`
ci/prow/e2e-aws-operator	`864b40d`	link	true	`/test e2e-aws-operator`
ci/prow/e2e-aws-single-node	`864b40d`	link	false	`/test e2e-aws-single-node`
ci/prow/e2e-agnostic-upgrade	`864b40d`	link	true	`/test e2e-agnostic-upgrade`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

yapei · 2021-10-12T02:50:35Z

Followed the test instructions and verified that importing one spoke cluster is working fine and as expected

yapei · 2021-10-12T03:05:08Z

/remove-label qe-approved

yapei · 2021-10-12T03:14:25Z

When I'm trying to import another spoke cluster, the previous imported cluster will become Unknown

# oc get managedcluster.cluster.open-cluster-management.io
NAME                        HUB ACCEPTED   MANAGED CLUSTER URLS                                                       JOINED   AVAILABLE   AGE
local-cluster               true           https://api.ci-ln-0kxxyjk-f76d1.xxxxx:6443   True     True        32m
multicluster-test           true           https://api.qe-ui49-1012.qe.xxxxx:6443                  True     Unknown     30m
**multicluster-test-another   true           https://api.qe-daily-1012.qe.xxxx:6443                 True     True        10m**

# oc get cm -n openshift-console 
NAME                                                      DATA   AGE
managed-clusters                                          1      28m
m**ulticluster-test-another-managed-cluster-api-server-ca   1      7m2s**
multicluster-test-managed-cluster-api-server-ca           1      28m

# oc get cm managed-clusters -n openshift-console -o yaml 
apiVersion: v1
data:
  managed-clusters.yaml: |
    - name: multicluster-test
      apiServer:
        url: https://api.qe-ui49-1012.qe.xxxx:6443
        caFile: /var/managed-cluster-certs/multicluster-test-managed-cluster-api-server-ca/ca-bundle.crt
      oauth:
        clientID: ""
        clientSecret: ""
        caFile: ""
    - name: multicluster-test-another
      apiServer:
        url: https://api.qe-daily-1012.qe.xxxx:6443
        caFile: /var/managed-cluster-certs/multicluster-test-another-managed-cluster-api-server-ca/ca-bundle.crt
      oauth:
        clientID: ""
        clientSecret: ""
        caFile: ""

When cluster is Detached from ACM, the entry in cm/managed-clusters will be removed, however it looks like still one entry left although managedcluster.cluster.open-cluster-management.io/multicluster-test-another already deleted
@TheRealJon could you help check? Shall we remove it as well? [managedcluster.cluster.open-cluster-management.io/multicluster-test is deleted and the entry in cm/managed-clusters is removed as well]

# oc get managedcluster.cluster.open-cluster-management.io
NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                                                       JOINED   AVAILABLE   AGE
local-cluster   true           https://api.ci-ln-0kxxyjk-f76d1.origin-ci-int-gce.dev.openshift.com:6443   True     True        40m

# oc get cm managed-clusters -n openshift-console -o yaml 
apiVersion: v1
data:
  managed-clusters.yaml: |
    - name: multicluster-test-another
      apiServer:
        url: https://api.qe-daily-1012.qe.devcluster.openshift.com:6443
        caFile: /var/managed-cluster-certs/multicluster-test-another-managed-cluster-api-server-ca/ca-bundle.crt
      oauth:
        clientID: ""
        clientSecret: ""
        caFile: ""
kind: ConfigMap

TheRealJon · 2021-10-12T14:14:55Z

Hi @yapei! Thanks for looking at this! I will take a look at the things you pointed out. For the first issue with the managed cluster showing an Available status of "Unknown", was this accompanied by an unexpected behavior in the console operator, or just an issue with the ACM status being unexpected? For the second issue, I think I have an idea of what's causing that issue and it will need to be fixed, so thanks for catching this edge case!

yapei · 2021-10-13T04:54:03Z

For the first issue with the managed cluster showing an Available status of "Unknown", was this accompanied by an unexpected behavior in the console operator, or just an issue with the ACM status being unexpected

@TheRealJon It is ACM returns 'Unknown' status(since CLI shows 'Unknown' status as well), I think it is not our console operator issue, sorry for confusing

TheRealJon · 2021-11-10T16:11:37Z

Closing in lieu of #602 which includes these changes.

openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 17, 2021

openshift-ci Bot requested review from jhadvig and spadgett August 17, 2021 16:41

TheRealJon force-pushed the CONSOLE-2833 branch from 21389eb to b42a98e Compare August 17, 2021 16:44

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 17, 2021

TheRealJon mentioned this pull request Aug 18, 2021

[WIP] Get managed cluster API endpoints from ACM #549

Closed

TheRealJon force-pushed the CONSOLE-2833 branch 2 times, most recently from 493f7a6 to f750fd3 Compare August 19, 2021 17:12

jhadvig reviewed Aug 20, 2021

View reviewed changes

Comment thread pkg/console/operator/sync_v400.go Outdated

Comment thread pkg/console/operator/sync_v400.go Outdated

jhadvig requested changes Aug 26, 2021

View reviewed changes

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2021

TheRealJon force-pushed the CONSOLE-2833 branch 4 times, most recently from b02d6ad to bebe654 Compare August 27, 2021 18:16

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2021

TheRealJon requested a review from jhadvig August 30, 2021 20:38

TheRealJon changed the title ~~[WIP] Multicluster console operator changes~~ Multicluster console operator changes Sep 1, 2021

openshift-ci Bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Sep 1, 2021

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 7, 2021

TheRealJon force-pushed the CONSOLE-2833 branch from 7a3d007 to 80206a8 Compare September 7, 2021 14:22

TheRealJon changed the base branch from master to master-multi-cluster-feature September 23, 2021 13:48

openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2021

TheRealJon requested a review from jhadvig September 28, 2021 13:22

TheRealJon assigned yapei Sep 30, 2021

TheRealJon force-pushed the CONSOLE-2833 branch from 80206a8 to 864b40d Compare October 8, 2021 19:12

openshift-ci Bot added the qe-approved Signifies that QE has signed off on this PR label Oct 12, 2021

openshift-ci Bot removed the qe-approved Signifies that QE has signed off on this PR label Oct 12, 2021

TheRealJon changed the title ~~Multicluster console operator changes~~ CONSOLE-2833: Integrate multicluster POC operator changes Oct 14, 2021

TheRealJon closed this Nov 10, 2021

TheRealJon deleted the CONSOLE-2833 branch February 4, 2022 19:28

Conversation

TheRealJon commented Aug 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheRealJon commented Aug 17, 2021

Uh oh!

TheRealJon commented Aug 18, 2021

Uh oh!

jhadvig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhadvig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheRealJon commented Aug 30, 2021

Uh oh!

TheRealJon commented Aug 31, 2021

Uh oh!

TheRealJon commented Sep 1, 2021

Uh oh!

TheRealJon commented Sep 1, 2021

Uh oh!

TheRealJon commented Sep 1, 2021

Uh oh!

TheRealJon commented Sep 2, 2021

Uh oh!

TheRealJon commented Sep 8, 2021

Uh oh!

TheRealJon commented Sep 8, 2021

Uh oh!

TheRealJon commented Sep 23, 2021

Uh oh!

florkbr commented Sep 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agnostic upgrade (multiple)

aws-operator

aws-single-node

Uh oh!

florkbr commented Sep 27, 2021

Uh oh!

florkbr commented Sep 27, 2021

Uh oh!

florkbr commented Sep 27, 2021

Uh oh!

TheRealJon commented Sep 28, 2021

Uh oh!

TheRealJon commented Sep 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing this PR

Prereqs

Testing

Step 1: Spin up or use existing spoke cluster

Step 2: Spin up hub cluster

Step 3: Increase console-operator log level on the hub cluster

Test Conditions

Step 4: Install ACM on the hub cluster

Test Conditions

Step 5: Import spoke cluster

TheRealJon commented Aug 17, 2021 •

edited

Loading

florkbr commented Sep 27, 2021 •

edited

Loading

TheRealJon commented Sep 30, 2021 •

edited

Loading

yapei commented Oct 12, 2021 •

edited

Loading

yapei commented Oct 12, 2021 •

edited

Loading

yapei commented Oct 13, 2021 •

edited

Loading