Skip to content

Conversation

@djoshy
Copy link
Contributor

@djoshy djoshy commented May 14, 2024

This PR adds:

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 14, 2024

@djoshy: This pull request references MCO-1152 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

- What I did

- How to verify it

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 14, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 14, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2024
@djoshy
Copy link
Contributor Author

djoshy commented May 14, 2024

/test e2e-gcp-op-techpreview

@djoshy djoshy force-pushed the node-disrupt-e2e branch from 193a193 to bc86228 Compare May 15, 2024 15:02
@djoshy
Copy link
Contributor Author

djoshy commented May 15, 2024

/test e2e-gcp-op-techpreview

@djoshy djoshy force-pushed the node-disrupt-e2e branch from bc86228 to d7f367a Compare May 15, 2024 15:16
@djoshy
Copy link
Contributor Author

djoshy commented May 15, 2024

/test e2e-gcp-op-techpreview

@djoshy djoshy force-pushed the node-disrupt-e2e branch from d7f367a to a804831 Compare May 15, 2024 17:24
@djoshy djoshy marked this pull request as ready for review May 15, 2024 17:24
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2024
@openshift-ci openshift-ci bot requested review from cdoern and cheesesashimi May 15, 2024 17:26
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 15, 2024

@djoshy: This pull request references MCO-1152 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

This PR:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 15, 2024

@djoshy: This pull request references MCO-1152 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

This PR adds:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 19, 2024
@djoshy djoshy force-pushed the node-disrupt-e2e branch from a804831 to 16a6d4f Compare May 20, 2024 13:21
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 20, 2024
@djoshy djoshy force-pushed the node-disrupt-e2e branch from 16a6d4f to aa5388d Compare May 20, 2024 13:32
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 20, 2024

@djoshy: This pull request references MCO-1152 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

Details

In response to this:

This PR adds:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy djoshy changed the title MCO-1152: Add e2e tests for NodeDisruptionPolicy MCO-1152: MCO-1146: Add e2e tests for NodeDisruptionPolicy May 20, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 20, 2024

@djoshy: This pull request references MCO-1152 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set.

Details

In response to this:

This PR adds:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@djoshy
Copy link
Contributor Author

djoshy commented May 21, 2024

/retest-required

@djoshy djoshy force-pushed the node-disrupt-e2e branch from 53b5a3b to dc99777 Compare May 21, 2024 16:37
@djoshy
Copy link
Contributor Author

djoshy commented May 21, 2024

/test unit

@djoshy djoshy force-pushed the node-disrupt-e2e branch from dc99777 to 2cf9a1c Compare May 22, 2024 18:04
@djoshy
Copy link
Contributor Author

djoshy commented May 23, 2024

/retest-required

@djoshy
Copy link
Contributor Author

djoshy commented May 23, 2024

/test e2e-gcp-op-techpreview

@djoshy
Copy link
Contributor Author

djoshy commented May 30, 2024

/test all

Copy link
Member

@cheesesashimi cheesesashimi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great! I just have a few small suggestions.

require.Nil(t, err)

assertLogsContain(t, cs, mcdPod, &node, logEntry)
helpers.AssertMCDLogsContain(t, cs, mcdPod, &node, logEntry)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Nice!

var ac *mcoac.NodeDisruptionPolicySpecFileApplyConfiguration
fileName := "/etc/test-" + string(action.Type)
if action.Type == opv1.ReloadSpecAction {
ac = mcoac.NodeDisruptionPolicySpecFile().WithPath(fileName).WithActions(mcoac.NodeDisruptionPolicySpecAction().WithType(action.Type).WithReload(mcoac.ReloadService().WithServiceName(action.Reload.ServiceName)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (non-blocking): For better readability, split this up a bit:

reload := mcoac.ReloadService().WithServiceName(action.Reload.ServiceName)
actions := mcoac.NodeDisruptionPolicySpecAction().WithType(action.Type).WithReload(reload)
ac = mcoac.NodeDisruptionPolicySpecFile().WithPath(fileName).WithActions(actions)

If this API won't let you do that or you get bizarre errors by doing that, you can break this across multiple lines like this instead:

ac = mcoac.NodeDisruptionPolicySpecFile().WithPath(fileName).WithActions(
	mcoac.NodeDisruptionPolicySpecAction().WithType(action.Type).WithReload(
		mcoac.ReloadService().WithServiceName(action.Reload.ServiceName)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will attempt to clean this up, thanks (:

Copy link
Contributor Author

@djoshy djoshy May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to break this down as suggested, and create a new helper - making it overall a lot cleaner and re-usable. Thanks so much for this suggestion!

}
}

func GetFunctionName(i interface{}) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Cool!

// Ensure status.ObservedGeneration matches the last generation of MachineConfiguration
if mcop.Generation != mcop.Status.ObservedGeneration {
klog.Errorf("calculating NodeDisruptionPolicies: NodeDisruptionPolicyStatus is not up to date.")
err = fmt.Errorf("NodeDisruptionPolicyStatus is not up to date")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Should this error be returned? If not, add a comment explaining why.

Copy link
Member

@cheesesashimi cheesesashimi May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I just took a look at why this was written like this. Basically, we use the variable err for both the value returned by wait.PollUntilContextTimeout() in addition to what is inside the closure that wait.PollUntilContextTimeout() executes. Its a bit confusing as to whether we should stop polling whenever we encounter an error or keep going and deal with it error later. If we want to keep going and deal with the error later, using the Aggregate type coupled with an appropriate deduplication function could help readability.

(To be clear: I am not asking you to change this as part of this PR. It's just a thought I'm putting here for posterity.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the structure being confusing! My thought process to use it was the following:

  • If we encounter an error, we don't want to keep going in the polling function. We want to try again after an interval of time.
  • If we timeout on the polling function, we want the last error we encountered to be reported. Hence the use of the same variable inside and outside of the function.

I hope that clears up why I went with the last error vs aggregation. Any of the errors within the polling loop are fatal, with the earlier ones being slightly more fatal than the ones following.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh that makes sense, thanks for the clarification!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assigned a new variable so it is easier to read what I'm trying to do (:

{Type: opv1.RestartSpecAction, Restart: &opv1.RestartService{ServiceName: "crio.service"}},
{Type: opv1.ReloadSpecAction, Reload: &opv1.ReloadService{ServiceName: "crio.service"}}}

// Shuffle the three action sets so each testFunc is randomly assigned one of the above action sets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Nice! I wish we could run our e2e suite with -shuffle=on. I think we have e2e tests that need to execute in a certain order, but I'm unsure how that flag affects subtests like this.

t.Run(helpers.GetFunctionName(testFunc), func(t *testing.T) {
// Only parallelize if there are enough nodes to run the tests individually
if len(nodes) >= len(testFuncs) {
t.Parallel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Nice! I want to be able to parallelize more in our e2e suite.

@djoshy djoshy force-pushed the node-disrupt-e2e branch from 2cf9a1c to 33b1a8f Compare May 31, 2024 16:09
@djoshy
Copy link
Contributor Author

djoshy commented Jun 3, 2024

/retest-required

3 similar comments
@djoshy
Copy link
Contributor Author

djoshy commented Jun 3, 2024

/retest-required

@djoshy
Copy link
Contributor Author

djoshy commented Jun 3, 2024

/retest-required

@djoshy
Copy link
Contributor Author

djoshy commented Jun 4, 2024

/retest-required

@djoshy djoshy force-pushed the node-disrupt-e2e branch 2 times, most recently from 313c89b to 2f6cc09 Compare June 4, 2024 20:41
Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes looks good.
/approve
Will see if Zack have any final thoughts and would like to tag lgtm.

@djoshy
Copy link
Contributor Author

djoshy commented Jun 6, 2024

/test all

@cheesesashimi
Copy link
Member

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 7, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cheesesashimi, djoshy, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [cheesesashimi,djoshy,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@djoshy
Copy link
Contributor Author

djoshy commented Jun 7, 2024

/hold

just in case QE wants to take a look

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 7, 2024
@djoshy
Copy link
Contributor Author

djoshy commented Jun 7, 2024

/test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 7, 2024

@djoshy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-upi 23a8c8d link false /test e2e-vsphere-ovn-upi

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@djoshy
Copy link
Contributor Author

djoshy commented Jun 10, 2024

Discussed with @sergiordlr , as this is mainly e2es, pre merge QE is not required. Unholding.

/unhold
/retest-required

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 10, 2024
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 3506e3e and 2 for PR HEAD 23a8c8d in total

@openshift-merge-bot openshift-merge-bot bot merged commit e7a0f9a into openshift:master Jun 10, 2024
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build ose-machine-config-operator-container-v4.17.0-202406102247.p0.ge7a0f9a.assembly.stream.el9 for distgit ose-machine-config-operator.
All builds following this will include this PR.

@djoshy djoshy deleted the node-disrupt-e2e branch June 27, 2024 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants