Skip to content

CONSOLE-2833: Integrate multicluster POC operator changes#575

Closed
TheRealJon wants to merge 1 commit intoopenshift:master-multi-cluster-featurefrom
TheRealJon:CONSOLE-2833
Closed

CONSOLE-2833: Integrate multicluster POC operator changes#575
TheRealJon wants to merge 1 commit intoopenshift:master-multi-cluster-featurefrom
TheRealJon:CONSOLE-2833

Conversation

@TheRealJon
Copy link
Copy Markdown
Member

@TheRealJon TheRealJon commented Aug 17, 2021

  • Add new managed cluster controller
    • Ceates configmaps for each managed cluster API server CA bundle.
    • Builds new managed-clusters configmap that will be consumed by the multicluster backend
  • Update console deployment sync to create volumes/volumemounts for new managed cluster configmaps

We still need the work from @florkbr for the other part of this, which is to create the OauthClients on spoke clusters and build the manged cluster oauth configs.

@openshift-ci openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 17, 2021
@openshift-ci openshift-ci Bot requested review from jhadvig and spadgett August 17, 2021 16:41
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 17, 2021
@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

@TheRealJon TheRealJon force-pushed the CONSOLE-2833 branch 2 times, most recently from 493f7a6 to f750fd3 Compare August 19, 2021 17:12
Copy link
Copy Markdown
Member

@jhadvig jhadvig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheRealJon thanks for the PR.
For the first review round I would advice to undo all the example && manifest changes that are only adding extra tabs, so the changes are more readable.

Comment thread pkg/console/operator/operator.go Outdated
Comment thread pkg/console/operator/sync_v400.go
Comment thread pkg/console/operator/sync_v400.go Outdated
Comment thread pkg/console/operator/sync_v400.go Outdated
Comment thread pkg/console/operator/sync_v400.go Outdated
Comment thread pkg/console/operator/sync_v400.go Outdated
Comment thread pkg/console/starter/starter.go Outdated
Comment thread pkg/console/operator/sync_v400.go Outdated
Comment thread pkg/console/subresource/consoleserver/types.go
Comment thread pkg/console/operator/sync_v400.go Outdated
Comment thread pkg/console/operator/sync_v400.go Outdated
Comment thread pkg/console/operator/sync_v400.go Outdated
Copy link
Copy Markdown
Member

@jhadvig jhadvig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this logic to separate controller. Couple or comments regarding the code.

Comment thread pkg/console/controllers/managedclusters/controller.go Outdated
Comment thread pkg/console/operator/sync_v400.go Outdated
Comment thread pkg/console/subresource/deployment/deployment.go Outdated
Comment thread pkg/console/controllers/managedclusters/controller.go Outdated
Comment thread pkg/console/controllers/managedclusters/controller.go Outdated
Comment thread pkg/console/controllers/managedclusters/controller.go Outdated
Comment thread pkg/console/controllers/managedclusters/controller.go Outdated
Comment thread pkg/console/subresource/configmap/managed_clusters.go Outdated
Comment thread pkg/console/controllers/managedclusters/controller.go
Comment thread pkg/console/controllers/managedclusters/controller.go Outdated
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2021
@TheRealJon TheRealJon force-pushed the CONSOLE-2833 branch 4 times, most recently from b02d6ad to bebe654 Compare August 27, 2021 18:16
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2021
@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

@TheRealJon TheRealJon requested a review from jhadvig August 30, 2021 20:38
@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

2 similar comments
@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

@TheRealJon TheRealJon changed the title [WIP] Multicluster console operator changes Multicluster console operator changes Sep 1, 2021
@TheRealJon
Copy link
Copy Markdown
Member Author

/hold

@openshift-ci openshift-ci Bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Sep 1, 2021
@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 7, 2021
@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

1 similar comment
@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

@TheRealJon TheRealJon changed the base branch from master to master-multi-cluster-feature September 23, 2021 13:48
@TheRealJon
Copy link
Copy Markdown
Member Author

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2021
@florkbr
Copy link
Copy Markdown
Contributor

florkbr commented Sep 27, 2021

Failing tests seem unrelated to this PR:

Agnostic upgrade (multiple)

[sig-network] pods should successfully create sandboxes by not timing out

aws-operator

Operator degraded (APIServerDeployment_UnavailablePod): APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()

aws-single-node

 ClusterVersion: Installing "" for <unknown>: <unknown>
error getting cluster operators: Get "https://api.ci-op-8vydrbip-71ba7.origin-ci-int-aws.dev.rhcloud.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp 52.35.183.89:6443: connect: connection refused
ClusterOperators:
	clusteroperators are missing

@florkbr
Copy link
Copy Markdown
Contributor

florkbr commented Sep 27, 2021

/retest

@florkbr
Copy link
Copy Markdown
Contributor

florkbr commented Sep 27, 2021

aws-operator failed tests:

=== RUN   TestEditManagedConsoleCLIDownloads
    managed_test.go:13: waiting for setup to reach settled state...
    util.go:97: patching DisplayName on the oc-cli-downloads ConsoleCLIDownloads custom resource
    util.go:104: polling for patched DisplayName on the oc-cli-downloads ConsoleCLIDownloads custom resource
    managed_test.go:70: error: timed out waiting for the condition
    managed_test.go:17: waiting for cleanup to reach settled state...
--- FAIL: TestEditManagedConsoleCLIDownloads (14.24s)

--- FAIL: TestMetricsEndpoint (5.41s)
--- FAIL: TestAddPlugins (48.34s)

@florkbr
Copy link
Copy Markdown
Contributor

florkbr commented Sep 27, 2021

/retest

1 similar comment
@TheRealJon
Copy link
Copy Markdown
Member Author

/retest

@TheRealJon
Copy link
Copy Markdown
Member Author

TheRealJon commented Sep 30, 2021

Testing this PR

Prereqs

Testing

Most of the test conditions will require grepping the console operator logs for certain output. Here's an example command that can be used when searching for expected log output:

  oc logs deploy/console-operator -n openshift-console-operator | grep "expected log output"

Step 1: Spin up or use existing spoke cluster

Provision the spoke cluster before the hub cluster because you don't want to waste up-time on your cluster-bot cluster (see next step). You will need kubeadmin access to an existing (non-ACM managed) cluster before step 2. This can be accomplished by using cluster-bot, openshift install, or using an existing cluster.

Step 2: Spin up hub cluster

Use cluster-bot to spin up a second cluster (the hub cluster) with #575

launch openshift/console-operator#575

Once the cluster is up, oc login as kubeadmin.

Step 3: Increase console-operator log level on the hub cluster

Patch the console operator to increase log level:

oc patch console.operator.openshift.io cluster --patch '{"spec":{"logLevel":"Debug","operatorLogLevel":"Debug"}}' --type merge

Test Conditions

  • Grep the console operator logs for "managed clusters". Expect 0 matches.

Step 4: Install ACM on the hub cluster

Multicluster console requires some unreleased ACM API changes, so we need to install ACM using a 2.4.0 image.

Be aware that 2.4.0 images might have some bugs since they are still in dev. If you run into some unexpected ACM behavior, you may need to uninstall and re-install ACM with a different tag.

Test Conditions

  • Check that the ManagedCluster CRD is installed on the cluster:
    oc get customresourcedefinition.apiextensions.k8s.io/managedclusters.cluster.open-cluster-management.io`
    
  • Once the previous command returns successfully, grep the console operator logs for: console-operator is in a managed state: syncing managed clusters. Expect at least 1 match.

Step 5: Import spoke cluster

Once ACM is up and running on the hub cluster, either use the ACM console or the CLI to import the spoke cluster from step 1.

Name the imported cluster "multicluster-test"

Console instructions
CLI Instructions

The console instructions are for ACM 2.3, so they are slightly different on ACM 2.4. Make the following adjustments:

  • Instead of clicking "Add a cluster" then clicking "Import an existing cluster" you can just click "Import Cluster" on the "Infrastructure > Clusters" page (near the top).
  • Chooes one of the 3 options for "Import mode" and fill out the form appropriately. The options in order from easiest to hardes are:
    • "Kubeconfig": Copy and paste the contents your spoke cluster kubeconfig
    • "Enter your server URL and API token for the existing cluster":
      • To get the server URL:oc login to the spoke cluster, then run oc whoami --show-server
      • To get the API token: oc login to the spoke cluster, then run oc whoami --show-token
    • "Run import commands manually": This is what the 2.3 instructions explain and is the most complicated.

Test Conditions

It may take a while for the cluster import to complete and for changes to propagate through the console operator, so you may have to run some of these commands multiple times to see the expected output.

Validation

As the import occurs, the console operator should be validating the state of ManagedCluster resources and will log validation errors before the import is complete.

When the ManagedCluster resource is first created, it will not have any spec.managedClusterClientConfigs items defined yet, which should cause output from the console operator.

  • Grep the console operator logs for "Skipping managed cluster multicluster-test, no client config found". Expect at least 1 match.

During the import process, spec.managedClusterClientConfigs will be added to the ManagedCluster at some point. The console operator validates the client config properties, and if all the expected data is not present, errors will be logged. Note that these lines may not occur.

  • Grep the console operator logs for Skipping managed cluster multicluster-test, client config CA bundle not found Expect 0 or more matches.
  • Grep the console operator logs for Skipping managed cluster multicluster-test, client config URL not found. Expect 0 or more matches.

Console operator checks for the existence of the managed-clusters configmap and logs a line when it's not found.

  • Grep the console operator logs for managed-clusters config map not found, continuing.... Expect at least 1 match.

Reconciliation

Once a valid client config is defined on the ManagedCluster resource, the console operator will begin reconciling the state on the cluster. Once everything is reconciled, two ConfigMaps should have been created in the openshift-console namespace, and the console deployment should have been updated.

  • CA Bundle ConfigMap exists and has the correct properties
oc get configmap/multicluster-test-managed-cluster-api-server-ca -n openshift-console -o yaml

Expected YAML (omitting actual CA data):

metadata:
  labels:
    app: console
    managed-cluster: multicluster-test
    managed-cluster-api-server-ca: ""
  name: multicluster-test-managed-cluster-api-server-ca
  namespace: openshift-console
data:
  ca.crt: |
    <CA_BUNDLE_DATA>

To be thorough, you can compare the CA bundle data in this ConfigMap to the data found in the ManagedCluster resource:
oc get managedcluster.cluster.open-cluster-management.io/multicluster-test -o json | jq -r '.spec.managedClusterClientConfigs[0].caBundle' | base64 --decode

  • managed-clusters ConfigMap exists and has the correct properties
oc get configmap/managed-clusters -n openshift-console -o yaml

Expected YAML:

- name: multicluster-test
  apiServer:
    url: <spoke cluster API server url>
    caFile: /var/managed-cluster-certs/multicluster-test-managed-cluster-api-server-ca/ca-bundle.crt
  oauth:
    clientID: ""
    clientSecret: ""
    caFile: ""
  • console Deployment has been updated with volumes and volume mounts
oc get deploy/console -n openshift-console -o yaml

Expected YAML (only showing the properties we are looking for):

spec:
  template:
    spec:
      containers:
        - name: console
          volumeMounts:
            - name: managed-clusters
              readOnly: true
              mountPath: /var/managed-cluster-config
            - name: multicluster-test-managed-cluster-api-server-ca
              readOnly: true
              mountPath: /var/managed-cluster-certs/multicluster-test-managed-cluster-api-server-ca
      volumes:
        - name: managed-clusters
          configMap:
            name: managed-clusters
            defalutMode: 420
        - name: multicluster-test-managed-cluster-api-server-ca
          configMap:
            name: multicluster-test-managed-cluster-api-server-ca
            defaultMode: 420

- Still need feature gate and oauth  work for multicluster to be fully implmented
- Add managed cluster controller that watches ACM ManagedCluster resources and builds a config file that can be consumed by the console backend
- Update console deployment sync with new volumes and volume mounts for managed cluster config and CA bundles.
- Add managed cluster and API server CA bundle ConfigMaps to subresources
- Update informer event filter utils
- Add github.com/open-cluster-management/api dependency
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 8, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: TheRealJon
To complete the pull request process, please assign bparees after the PR has been reviewed.
You can assign the PR to them by writing /assign @bparees in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 8, 2021

@TheRealJon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify 864b40d link true /test verify
ci/prow/unit 864b40d link true /test unit
ci/prow/e2e-aws-operator 864b40d link true /test e2e-aws-operator
ci/prow/e2e-aws-single-node 864b40d link false /test e2e-aws-single-node
ci/prow/e2e-agnostic-upgrade 864b40d link true /test e2e-agnostic-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@yapei
Copy link
Copy Markdown
Contributor

yapei commented Oct 12, 2021

Followed the test instructions and verified that importing one spoke cluster is working fine and as expected
Screen Shot 2021-10-12 at 10 49 29 AM

@openshift-ci openshift-ci Bot added the qe-approved Signifies that QE has signed off on this PR label Oct 12, 2021
@yapei
Copy link
Copy Markdown
Contributor

yapei commented Oct 12, 2021

/remove-label qe-approved

@openshift-ci openshift-ci Bot removed the qe-approved Signifies that QE has signed off on this PR label Oct 12, 2021
@yapei
Copy link
Copy Markdown
Contributor

yapei commented Oct 12, 2021

  • When I'm trying to import another spoke cluster, the previous imported cluster will become Unknown
# oc get managedcluster.cluster.open-cluster-management.io
NAME                        HUB ACCEPTED   MANAGED CLUSTER URLS                                                       JOINED   AVAILABLE   AGE
local-cluster               true           https://api.ci-ln-0kxxyjk-f76d1.xxxxx:6443   True     True        32m
multicluster-test           true           https://api.qe-ui49-1012.qe.xxxxx:6443                  True     Unknown     30m
**multicluster-test-another   true           https://api.qe-daily-1012.qe.xxxx:6443                 True     True        10m**

# oc get cm -n openshift-console 
NAME                                                      DATA   AGE
managed-clusters                                          1      28m
m**ulticluster-test-another-managed-cluster-api-server-ca   1      7m2s**
multicluster-test-managed-cluster-api-server-ca           1      28m

# oc get cm managed-clusters -n openshift-console -o yaml 
apiVersion: v1
data:
  managed-clusters.yaml: |
    - name: multicluster-test
      apiServer:
        url: https://api.qe-ui49-1012.qe.xxxx:6443
        caFile: /var/managed-cluster-certs/multicluster-test-managed-cluster-api-server-ca/ca-bundle.crt
      oauth:
        clientID: ""
        clientSecret: ""
        caFile: ""
    - name: multicluster-test-another
      apiServer:
        url: https://api.qe-daily-1012.qe.xxxx:6443
        caFile: /var/managed-cluster-certs/multicluster-test-another-managed-cluster-api-server-ca/ca-bundle.crt
      oauth:
        clientID: ""
        clientSecret: ""
        caFile: ""
  • When cluster is Detached from ACM, the entry in cm/managed-clusters will be removed, however it looks like still one entry left although managedcluster.cluster.open-cluster-management.io/multicluster-test-another already deleted
    @TheRealJon could you help check? Shall we remove it as well? [managedcluster.cluster.open-cluster-management.io/multicluster-test is deleted and the entry in cm/managed-clusters is removed as well]
# oc get managedcluster.cluster.open-cluster-management.io
NAME            HUB ACCEPTED   MANAGED CLUSTER URLS                                                       JOINED   AVAILABLE   AGE
local-cluster   true           https://api.ci-ln-0kxxyjk-f76d1.origin-ci-int-gce.dev.openshift.com:6443   True     True        40m

# oc get cm managed-clusters -n openshift-console -o yaml 
apiVersion: v1
data:
  managed-clusters.yaml: |
    - name: multicluster-test-another
      apiServer:
        url: https://api.qe-daily-1012.qe.devcluster.openshift.com:6443
        caFile: /var/managed-cluster-certs/multicluster-test-another-managed-cluster-api-server-ca/ca-bundle.crt
      oauth:
        clientID: ""
        clientSecret: ""
        caFile: ""
kind: ConfigMap

Screen Shot 2021-10-12 at 11 24 52 AM

@TheRealJon
Copy link
Copy Markdown
Member Author

Hi @yapei! Thanks for looking at this! I will take a look at the things you pointed out. For the first issue with the managed cluster showing an Available status of "Unknown", was this accompanied by an unexpected behavior in the console operator, or just an issue with the ACM status being unexpected? For the second issue, I think I have an idea of what's causing that issue and it will need to be fixed, so thanks for catching this edge case!

@yapei
Copy link
Copy Markdown
Contributor

yapei commented Oct 13, 2021

For the first issue with the managed cluster showing an Available status of "Unknown", was this accompanied by an unexpected behavior in the console operator, or just an issue with the ACM status being unexpected

@TheRealJon It is ACM returns 'Unknown' status(since CLI shows 'Unknown' status as well), I think it is not our console operator issue, sorry for confusing

@TheRealJon TheRealJon changed the title Multicluster console operator changes CONSOLE-2833: Integrate multicluster POC operator changes Oct 14, 2021
@TheRealJon
Copy link
Copy Markdown
Member Author

Closing in lieu of #602 which includes these changes.

@TheRealJon TheRealJon closed this Nov 10, 2021
@TheRealJon TheRealJon deleted the CONSOLE-2833 branch February 4, 2022 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants