Skip to content

Conversation

@hexfusion
Copy link
Contributor

@hexfusion hexfusion commented Jan 15, 2020

This PR effectively adds the cluster-etcd-operator to the CVO operator payload and set it to a Managed state.

We also are reverting the static-sync controller to utilize a generic operator informer and remove it from WaitForCacheSync. This is only a short term precaution as we observed a failure case in testing where WaitForCacheSync appeared to be blocking the actions of the controller which are vital to the bootstrapping process.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 15, 2020
@alaypatel07
Copy link
Contributor

/test e2e-aws

@hexfusion
Copy link
Contributor Author

Azure seems fine to me I think we might have a slight delay on rolling out the cluster resulting in timeout. This was a manual run I just did with this CI release image.

level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: downloading update"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 1% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 8% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 13% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 64% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 78% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 83% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 84% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 97% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 98% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 98% complete, waiting on authentication, console, monitoring, openshift-apiserver, openshift-samples"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 99% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 100% complete, waiting on authentication, console"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 100% complete"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 100% complete"
level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console"
level=debug msg="Still waiting for the cluster to initialize: Working towards 0.0.1-2020-01-24-134140: 100% complete"
level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console"
level=error msg="Cluster operator authentication Degraded is True with RouteHealth_FailedGet: RouteHealthDegraded: failed to GET route: dial tcp: lookup oauth-openshift.apps.sbatsche-ceo-53-1-2020-01-24.catchall.azure.devcluster.openshift.com on 172.30.0.10:53: no such host"
level=info msg="Cluster operator authentication Progressing is Unknown with NoData: "
level=info msg="Cluster operator authentication Available is Unknown with NoData: "
level=info msg="Cluster operator console Progressing is True with SyncLoopRefreshProgressingInProgress: SyncLoopRefreshProgressing: Working toward version 0.0.1-2020-01-24-134140"
level=info msg="Cluster operator console Available is False with DeploymentAvailableInsufficientReplicas: DeploymentAvailable: 0 pods available for console deployment"
level=info msg="Cluster operator insights Disabled is True with Disabled: Health reporting is disabled"
level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console"

@hexfusion
Copy link
Contributor Author

hexfusion commented Jan 24, 2020

gcp will need the MCO PR [1]

[1] openshift/machine-config-operator#1408

Update: merged

@hexfusion hexfusion changed the title wip for testing *: add cluster-etcd-operator to CVO operator payload Jan 24, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 24, 2020
…factory

This reverts commits

- 78dbdc9
- bb9b498

for

pkg/cmd/staticpodcontroller/staticpodcontroller.go

Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@hexfusion
Copy link
Contributor Author

cc @smarterclayton @crawford

@hexfusion
Copy link
Contributor Author

/hold

for further testing

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 24, 2020
@hexfusion
Copy link
Contributor Author

hexfusion commented Jan 24, 2020

aws failure a known issue

ref: https://coreos.slack.com/archives/C999USB0D/p1579889171233100

/test e2e-aws

NOTE: this did not appear to be known.

@hexfusion
Copy link
Contributor Author

/test e2e-azure

@hexfusion
Copy link
Contributor Author

/test e2e-aws

@alaypatel07
Copy link
Contributor

level=info msg="Cluster operator kube-scheduler Progressing is True with : Progressing: 3 nodes are at revision 4; 0 nodes have achieved new revision 5"
level=info msg="Cluster operator machine-config Available is False with : Cluster not available for 0.0.1-2020-01-24-192655"
level=error msg="Cluster operator machine-config Degraded is True with RequiredPoolsFailed: Failed to resync 0.0.1-2020-01-24-192655 because: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready, retrying. Status: (pool degraded: false total: 3, ready 2, updated: 3, unavailable: 1)"
level=info msg="Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack."
level=error msg="Cluster operator monitoring Degraded is True with UpdatingnodeExporterFailed: Failed to rollout

/test e2e-azure

@alaypatel07
Copy link
Contributor

staticsync will have to be refined, but for now, let's get this in!!

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 25, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alaypatel07, hexfusion

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [alaypatel07,hexfusion]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hexfusion
Copy link
Contributor Author

hexfusion commented Jan 25, 2020

/hold

waiting for cluster bot upgrade tests to complete

@hexfusion
Copy link
Contributor Author

/test e2e-gcp-upgrade

@hexfusion
Copy link
Contributor Author

Given the above, I am making the call to move forward.

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 25, 2020
@openshift-merge-robot openshift-merge-robot merged commit aec838e into openshift:master Jan 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants