-
Notifications
You must be signed in to change notification settings - Fork 213
NO-JIRA: task graph: test speedup and code cleaup #1101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NO-JIRA: task graph: test speedup and code cleaup #1101
Conversation
Given correct synchronization I do not see any value in the `Sleep`, it just makes the test much slower: ```console $ go test --count 10 --run TestRunGraph/cancelation_without_task_errors ./pkg/payload/ ok github.com/openshift/cluster-version-operator/pkg/payload 10.041s $ git checkout ocpbugs-22442-test-run-graph-mid-task-cancellation-flake ... $ go test --count 10 --run TestRunGraph/cancellation_without_task_errors ./pkg/payload/ ok github.com/openshift/cluster-version-operator/pkg/payload 0.043s ```
We only start draining `workCh` when canceled (`ctx.Done()`). We never submit more work once canceled so reseting `submitted` record is useless
|
@petr-muller: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| case <-ctx.Done(): | ||
| select { | ||
| case runTask := <-workCh: // workers canceled, so remove any work from the queue ourselves | ||
| case <-workCh: // workers canceled, so remove any work from the queue ourselves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
With submitted[runTask.index] = false removed, perhaps we can update the comment as well?
| case <-workCh: // workers canceled, so remove any work from the queue ourselves | |
| case <-workCh: // Reset the inflight counter as workers got canceled. We never submit more work once canceled and thus no need to reset the submitted records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important part is still the "remove any work from the queue" though - that needs to stay. The channel gets drained (eventually...).
I'm not convinced about the usefulness of mentioning the submitted records - it explains the absence of code, but that only makes sense attached to a commit/code change, does not make that much sense in the actual code? "We are not doing something because it does not need to be done"-like comment IMO makes sense only if that absence is actually surprising, which does not seem to be the case here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "remove any work from the queue" though - that needs to stay.
I think i got it now.
"remove any work from the queue ourselves" corresponds to <-workCh (that is why they stay in the same line).
My mistake was I thought the comment was for the code below, currently having only inflight-- which i couldn't relate to the comment. Actually, inflight-- is the additional things we need to do when <-workCh takes place.
We are not doing something because it does not need to be done
I got the point.
like comment IMO makes sense only if that absence is actually surprising,
It would probably surprise the author of "submitted[runTask.index] = false". 😉 When we push a task, we do submitted[nextNode] = true, my intuitive feeling is that we would set it to false when it is removed from the queue before the task is completed. With a comment, it would be telling that we do not do it intentionally, instead of forgetting it. Maybe it is just me. Just a NIT anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably surprise the author of "submitted[runTask.index] = false"
😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hongkailiu, petr-muller The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| callbacks: map[string]callbackFn{ | ||
| "a": func(t *testing.T, name string, ctx context.Context, cancelFn func()) error { | ||
| cancelFn() | ||
| time.Sleep(time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd added this with the test-case back in eaa3d19 (#255). I didn't talk about the Sleep specifically in the commit message, but my guess is that what I was aiming for was "make sure b isn't running because the task-graph runner knows a failed" to avoid "b would have run, except the runner got lucky and raced closed before it got around to completing". It's been in place for a long time now though, and I'm fine dropping the "be-extra certain" sleep at this point in order to allow us to run tests more quickly.
/override ci/prow/e2e-agnostic-ovn /override ci/prow/e2e-agnostic-ovn-upgrade-out-of-change Neither of these seem to indicate a problem that could be caused by CVO |
|
@petr-muller: Overrode contexts on behalf of petr-muller: ci/prow/e2e-agnostic-ovn, ci/prow/e2e-agnostic-ovn-upgrade-out-of-change DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@petr-muller: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/label no-qe Still just test tweaks and refactor |
|
[ART PR BUILD NOTIFIER] Distgit: cluster-version-operator |
task graph test: do not wait after cancelation
Given correct synchronization I do not see any value in the
Sleep, it just makes the test much slower:task graph: simplify code
We only start draining
workChwhen canceled (ctx.Done()). We never submit more work once canceled so resetingsubmittedrecord is useless