Skip to content

Conversation

@smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Jan 11, 2019

Implement #77

  • Refactor payload syncing into its own loop, and the main loop tells it what to do and gathers status on where it is
    • Take subtle behavior of syncing and put it into the worker, not the sync loop (retries, timeouts, remembering status)
  • Writing CV status is much simpler and is based only on inputs (what the sync loop is doing + available updates + validation errors)
  • Add integration and more extensive core testing

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 11, 2019
@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 11, 2019
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 11, 2019
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jan 12, 2019
@smarterclayton smarterclayton force-pushed the background_sync branch 2 times, most recently from 5a8fa8e to df64a32 Compare January 13, 2019 22:01
@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jan 13, 2019

I've got a working prototype of the new structure, although I'm still adding scenario unit tests.

See the comment on the commit, but basically:

  1. Create a SyncWorker background loop
  2. Have the Operator.sync() method tell the SyncWorker what the desired state is
  3. Take whatever the current SyncWorker state is and use that to craft the ClusterVersion status each sync() invocation.
  4. Whenever the SyncWorker records a change in the background, requeue the sync() loop but behind a rate limiter so we don't write too much

I think cvo_scenarios_test.go is a lot easier to read and touches all the main code paths for how the core controller works - still need to add a lot more but wanted to checkpoint from you before I moved further.

@smarterclayton
Copy link
Contributor Author

Added a single example integration test that spins up the CVO in a namespace and puts it through its paces.

@smarterclayton smarterclayton force-pushed the background_sync branch 2 times, most recently from 481bb47 to d171e6c Compare January 14, 2019 20:33
payload *updatePayload
}

// NewSyncWorker accepts a desired state via Update() and works to keep that state in sync. Once a particular
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: doesn't this belong on SyncWorker ?

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jan 14, 2019

There are now two integration tests:

  1. Test clean state -> reconcile -> new version -> reconcile
    1. Launch a CVO
    2. Verify it applies the initial payload
    3. Set desired update, verify the upgrade completes
    4. Verify reconcile doesn't update status
  2. Test clean state -> reconcile -> failing upgrade -> abort upgrade to previous version
    1. Launch a CVO
    2. Verify it applies the initial payload
    3. Set desired update to a failing payload
    4. Verify it halts and reports errors and the correct status
    5. Set desired update to previous version
    6. Ensure we reconcile back to previous version

@smarterclayton smarterclayton changed the title wip: sync: Do config syncing in the background sync: Do config syncing in the background Jan 14, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2019
@abhinavdahiya
Copy link
Contributor

@smarterclayton

Ques: looking at Update and run interactions, I might be missing something...
for a run that takes a long time, each Update is going to backoff the run call exponentially as this as we will try to read notify on next iteration of syncOnce loop ?

@abhinavdahiya
Copy link
Contributor

other wise the test refactror and sync_worker.go look 💯
looking at the cvo_integration_tests.go right now.

@abhinavdahiya
Copy link
Contributor

@smarterclayton
Ques: Is there a specific reason why cvo_integration_test.go is not in separate e2e package like this? That will force us to not use any internal structs... ?

@smarterclayton
Copy link
Contributor Author

The integration test uses the internal structs to get better testing. I think there would be a separate e2e test eventually that is more blackbox. But in this case I can verify things like resync and status info that isn't really available outside. I can also force predictable payloads (while an e2e test might verify the job part of a payload).

@abhinavdahiya
Copy link
Contributor

this change looks good to merge. is there anything else you are testing right now or should i drop a /lgtm ?

@smarterclayton
Copy link
Contributor Author

It's ready for the big lgtm

@smarterclayton
Copy link
Contributor Author

There's a lot of follow up tests I want to add

@abhinavdahiya
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 17, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/spf13/cobra"
"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@abhinavdahiya
Copy link
Contributor

There's a lot of follow up tests I want to add

let me know if i can help with some 😇

@openshift-merge-robot openshift-merge-robot merged commit 949d6d5 into openshift:master Jan 17, 2019
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 21, 2020
As described in [1], bumping status.desired when a new
spec.desiredUpdate is set leads to trouble like:

  $ oc adm upgrade --to 4.3.13
  Updating to 4.3.13
  $ oc get -o json clusterversion version | jq -r '.status.conditions[] | .lastTransitionTime + " " + .type + " " + .status + " " + .message' | sort
  2020-04-20T20:57:12Z RetrievedUpdates True
  2020-04-20T21:23:40Z Available True Done applying 4.3.10
  2020-04-20T22:01:55Z Upgradeable False Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [privileged]
  2020-04-20T22:16:40Z Failing True Precondition "ClusterVersionUpgradeable" failed because of "DefaultSecurityContextConstraints_Mutated": Cluster operator kube-apiserver cannot be upgraded: DefaultSecurityContextConstraintsUpgradeable: Default SecurityContextConstraints object(s) have mutated [privileged]
  2020-04-20T22:16:40Z Progressing True Unable to apply 4.3.13: it may not be safe to apply this update
  $ oc get -o json clusterversion version | jq -r '.status.desired.version'
  4.3.13
  $ oc adm upgrade --to=4.3.13 --force
  info: Cluster is already at version 4.3.13
  $ oc adm upgrade --clear
  Cleared the update field, still at 4.3.13
  $ oc get -o json clusterversion version | jq -r '.status.desired.version'
  4.3.10

Where the "already at..." and "still at ..." messages are relying on
the status.desired semantics of [2]:

  > desired is the version that the cluster is reconciling towards.

Ideally the property would have been called 'current' or some such,
but it's too late to change it now.

When the user sets spec.desiredUpdate, they are making their intention
clear (and bumping history at this point is appropriate, because we
need *somewhere* to store the 'verified' history entry).  But to match
the "reconciling towards" semantics, this commit shifts the actual
status.desired bump so it happens right after the new CVO comes up
(with the new currentVersion).

This change reverses the earlier:

  Prefers the payload version over the operator's version...

comments from 961873d (sync: Do config syncing in the background,
2019-01-11, openshift#82).  It also sets the stage for a world in which [3] has
been fixed and the CVO continues to apply the current release's
manifests while vetting a new target's preconditions in parallel.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1826115
[2]: https://github.com/openshift/api/blob/0f159fee64dbf711d40dac3fa2ec8b563a2aaca8/config/v1/types_cluster_version.go#L82-L87
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1822752
wking added a commit to wking/cluster-version-operator that referenced this pull request May 28, 2020
With this commit, I drop contextIsCancelled in favor of Context.Err().
From the docs [1]:

  If Done is not yet closed, Err returns nil.  If Done is closed, Err
  returns a non-nil error explaining why: Canceled if the context
  was canceled or DeadlineExceeded if the context's deadline
  passed.  After Err returns a non-nil error, successive calls to
  Err return the same error.

I dunno why we'd been checking Done() instead, but contextIsCancelled
dates back to 961873d (sync: Do config syncing in the background,
2019-01-11, openshift#82).

I've also generalized a number of *Cancel* helpers to be *Context* to
remind folks that Context.Err() can be DeadlineExceeded as well as
Canceled, and the CVO uses both WithCancel and WithTimeout.  The new
error messages will be either:

  update context deadline exceeded at 1 of 2

or:

  update context canceled at 1 of 2

Instead of always claiming:

  update was cancelled at 1 of 2

[1]: https://golang.org/pkg/context/#Context
wking added a commit to wking/cluster-version-operator that referenced this pull request May 28, 2020
One less level of nesting makes for slightly easier reading.  The
nested 'if changed' landed when sync_worker.go was born in 961873d
(sync: Do config syncing in the background, 2019-01-11, openshift#82).
wking added a commit to wking/cluster-version-operator that referenced this pull request May 28, 2020
The property landed with SyncWorker in 961873d (sync: Do config
syncing in the background, 2019-01-11, openshift#82), but has never been used.
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Jun 3, 2020
With this commit, I drop contextIsCancelled in favor of Context.Err().
From the docs [1]:

  If Done is not yet closed, Err returns nil.  If Done is closed, Err
  returns a non-nil error explaining why: Canceled if the context
  was canceled or DeadlineExceeded if the context's deadline
  passed.  After Err returns a non-nil error, successive calls to
  Err return the same error.

I dunno why we'd been checking Done() instead, but contextIsCancelled
dates back to 961873d (sync: Do config syncing in the background,
2019-01-11, openshift#82).

I've also generalized a number of *Cancel* helpers to be *Context* to
remind folks that Context.Err() can be DeadlineExceeded as well as
Canceled, and the CVO uses both WithCancel and WithTimeout.  The new
error messages will be either:

  update context deadline exceeded at 1 of 2

or:

  update context canceled at 1 of 2

Instead of always claiming:

  update was cancelled at 1 of 2

[1]: https://golang.org/pkg/context/#Context
wking added a commit to wking/cluster-version-operator that referenced this pull request Jun 3, 2020
With this commit, I drop contextIsCancelled in favor of Context.Err().
From the docs [1]:

  If Done is not yet closed, Err returns nil.  If Done is closed, Err
  returns a non-nil error explaining why: Canceled if the context
  was canceled or DeadlineExceeded if the context's deadline
  passed.  After Err returns a non-nil error, successive calls to
  Err return the same error.

I dunno why we'd been checking Done() instead, but contextIsCancelled
dates back to 961873d (sync: Do config syncing in the background,
2019-01-11, openshift#82).

I've also generalized a number of *Cancel* helpers to be *Context* to
remind folks that Context.Err() can be DeadlineExceeded as well as
Canceled, and the CVO uses both WithCancel and WithTimeout.  The new
error messages will be either:

  update context deadline exceeded at 1 of 2

or:

  update context canceled at 1 of 2

Instead of always claiming:

  update was cancelled at 1 of 2

Cherry-picked from eea2092 (pkg/cvo/sync_worker: Generalize
CancelError to ContextError, 2020-05-28, openshift#378) and edited to resolve
context conflicts because release-4.4 lacks 2a469e3 (cvo: When
installing or upgrading, fast-fill cluster-operators, 2020-02-07, openshift#318).

[1]: https://golang.org/pkg/context/#Context
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Jun 4, 2020
With this commit, I drop contextIsCancelled in favor of Context.Err().
From the docs [1]:

  If Done is not yet closed, Err returns nil.  If Done is closed, Err
  returns a non-nil error explaining why: Canceled if the context
  was canceled or DeadlineExceeded if the context's deadline
  passed.  After Err returns a non-nil error, successive calls to
  Err return the same error.

I dunno why we'd been checking Done() instead, but contextIsCancelled
dates back to 961873d (sync: Do config syncing in the background,
2019-01-11, openshift#82).

I've also generalized a number of *Cancel* helpers to be *Context* to
remind folks that Context.Err() can be DeadlineExceeded as well as
Canceled, and the CVO uses both WithCancel and WithTimeout.  The new
error messages will be either:

  update context deadline exceeded at 1 of 2

or:

  update context canceled at 1 of 2

Instead of always claiming:

  update was cancelled at 1 of 2

Cherry-picked from eea2092 (pkg/cvo/sync_worker: Generalize
CancelError to ContextError, 2020-05-28, openshift#378) and edited to resolve
context conflicts because release-4.4 lacks 2a469e3 (cvo: When
installing or upgrading, fast-fill cluster-operators, 2020-02-07, openshift#318).

[1]: https://golang.org/pkg/context/#Context
openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Jun 11, 2020
With this commit, I drop contextIsCancelled in favor of Context.Err().
From the docs [1]:

  If Done is not yet closed, Err returns nil.  If Done is closed, Err
  returns a non-nil error explaining why: Canceled if the context
  was canceled or DeadlineExceeded if the context's deadline
  passed.  After Err returns a non-nil error, successive calls to
  Err return the same error.

I dunno why we'd been checking Done() instead, but contextIsCancelled
dates back to 961873d (sync: Do config syncing in the background,
2019-01-11, openshift#82).

I've also generalized a number of *Cancel* helpers to be *Context* to
remind folks that Context.Err() can be DeadlineExceeded as well as
Canceled, and the CVO uses both WithCancel and WithTimeout.  The new
error messages will be either:

  update context deadline exceeded at 1 of 2

or:

  update context canceled at 1 of 2

Instead of always claiming:

  update was cancelled at 1 of 2

Cherry-picked from eea2092 (pkg/cvo/sync_worker: Generalize
CancelError to ContextError, 2020-05-28, openshift#378) and edited to resolve
context conflicts because release-4.4 lacks 2a469e3 (cvo: When
installing or upgrading, fast-fill cluster-operators, 2020-02-07, openshift#318).

[1]: https://golang.org/pkg/context/#Context
wking added a commit to wking/cluster-version-operator that referenced this pull request Oct 9, 2020
This line is descended from 961873d (sync: Do config syncing in the
background, 2019-01-11, openshift#82).  But we triggered it recently with
88c222c (install/0000_00_cluster-version-operator_03_deployment:
Bump to --v=5, 2020-08-30, openshift#448).  Drop the line, because it's noisy
spew in the log files, with all of the small "fraction-completed"
bumps:

  $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log6265
  $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n42020-10-09T18:21:53.523398340Z I1009 18:21:53.522733       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
wking added a commit to wking/cluster-version-operator that referenced this pull request Oct 9, 2020
This line is descended from 961873d (sync: Do config syncing in the
background, 2019-01-11, openshift#82).  But we triggered it recently with
88c222c (install/0000_00_cluster-version-operator_03_deployment:
Bump to --v=5, 2020-08-30, openshift#448).  Drop the line, because it's noisy
spew in the log files, with all of the small "fraction-completed"
bumps:

  $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log6265
  $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n4
  2020-10-09T18:21:53.523398340Z I1009 18:21:53.522733       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
wking added a commit to wking/cluster-version-operator that referenced this pull request Oct 9, 2020
And also the "Status report channel was full" lines.  These lines are
descended from 961873d (sync: Do config syncing in the background,
2019-01-11, openshift#82).  But we triggered it recently with 88c222c
(install/0000_00_cluster-version-operator_03_deployment: Bump to
--v=5, 2020-08-30, openshift#448).  Drop the line, because it's noisy spew in
the log files, with all of the small "fraction-completed" bumps:

  $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log
  6265
  $ grep -c 'sync_worker.go:.*Status report channel was full' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log
  5586
  $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n4
  2020-10-09T18:21:53.523398340Z I1009 18:21:53.522733       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
wking added a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Oct 9, 2020
And also the "Status report channel was full" lines.  These lines are
descended from 961873d (sync: Do config syncing in the background,
2019-01-11, openshift#82).  But we triggered it recently with 88c222c
(install/0000_00_cluster-version-operator_03_deployment: Bump to
--v=5, 2020-08-30, openshift#448).  Drop the line, because it's noisy spew in
the log files, with all of the small "fraction-completed" bumps:

  $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log
  6265
  $ grep -c 'sync_worker.go:.*Status report channel was full' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log
  5586
  $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n4
  2020-10-09T18:21:53.523398340Z I1009 18:21:53.522733       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
wking added a commit to openshift-cherrypick-robot/cluster-version-operator that referenced this pull request Oct 9, 2020
And also the "Status report channel was full" lines.  These lines are
descended from 961873d (sync: Do config syncing in the background,
2019-01-11, openshift#82).  But we triggered it recently with 88c222c
(install/0000_00_cluster-version-operator_03_deployment: Bump to
--v=5, 2020-08-30, openshift#448).  Drop the line, because it's noisy spew in
the log files, with all of the small "fraction-completed" bumps:

  $ grep -c 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log
  6265
  $ grep -c 'sync_worker.go:.*Status report channel was full' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log
  5586
  $ grep 'sync_worker.go:.*Status change' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | tail -n4
  2020-10-09T18:21:53.523398340Z I1009 18:21:53.522733       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9936, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.538399862Z I1009 18:21:53.537709       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9952, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.570915921Z I1009 18:21:53.570847       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"ApplyResources", Failure:error(nil), Fraction:0.9984, Completed:0, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
  2020-10-09T18:21:53.585933980Z I1009 18:21:53.585786       1 sync_worker.go:458] Status change cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Fraction:1, Completed:10, Reconciling:true, Initial:false, VersionHash:"NtNVAdtksjk=", LastProgress:time.Time{wall:0xbfd8487062ea4a14, ext:2681032283239, loc:(*time.Location)(0x26b0400)}, Actual:v1.Release{Version:"4.7.0-0.ci.test-2020-10-09-171926-ci-op-8r50wp9l", Image:"registry.build01.ci.openshift.org/ci-op-8r50wp9l/release@sha256:9c90e64d7ccec7475608daf3e7a21ed749d047a330203734501eb7bc95c65ac8", URL:"", Channels:[]string(nil)}, Verified:false}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants