-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: e2e: better polling support #112013
RFC: e2e: better polling support #112013
Conversation
ginkgo.By can be used to record how long a callback runs. This particular feature is not used much (or at all?) in Kubernetes and we don't have tooling for analyzing the measurements. What would be more important is text output. framework.By provides that: - reminders that a step is still running at regular intervals, configurable with --slow-step-threshold - the total duration for slow steps Example: STEP: delete pods and claims 08/24/22 11:55:19.676 STEP: deleting *v1.Pod dra-5514/tester-1 08/24/22 11:55:19.679 STEP: deleting *v1.ResourceClaim dra-5514/tester-1-my-inline-claim 08/24/22 11:55:19.686 STEP still running: delete pods and claims 08/24/22 11:55:29.677 STEP still running: delete pods and claims 08/24/22 11:55:39.676 STEP still running: delete pods and claims 08/24/22 11:55:49.677 <additional output when the objects finally get removed> STEP duration: delete pods and claims 33.253509791s
gomega.Eventually with a function that takes a gomega.Gomega instance can be used to poll for certain conditions. The advantage over custom code is that gomega assertions can be used. However, gomega as it stands today does not report why the polling continues. This is annoying when running a test interactively: one has to wait for the timeout before additional information is shown in the failure message. SucceedEventually is a replacement for gomega.Eventually which is tailored towards this use case and adds intermediate output. It's best combined with the By replacement. Example with some artificial failure: STEP: wait for plugin registration 08/24/22 14:17:23.488 ... STEP still running: wait for plugin registration 08/24/22 14:17:33.489 STEP still running: wait for plugin registration 08/24/22 14:17:43.489 STEP still running: wait for plugin registration 08/24/22 14:17:53.489 Aug 24 14:17:53.509: INFO: wait for plugin registration xxx Expected <bool>: false to equal <bool>: true STEP still running: wait for plugin registration 08/24/22 14:18:03.489 STEP still running: wait for plugin registration 08/24/22 14:18:13.489 STEP still running: wait for plugin registration 08/24/22 14:18:23.489 Aug 24 14:18:23.489: FAIL: Timed out after 60.000s. wait for plugin registration Expected <string>: xxx Expected <bool>: false to equal <bool>: true to be empty Full Stack Trace k8s.io/kubernetes/test/e2e/dra.(*Driver).SetUp.func7() /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/deploy.go:195 +0xda k8s.io/kubernetes/test/e2e/framework.By({0x702cfda, 0x1c}, 0xc0029d3df8, {0x0, 0x0, 0x6fbc0cb?}) /nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log.go:115 +0x5ea k8s.io/kubernetes/test/e2e/dra.(*Driver).SetUp(0xc0006af700) /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/deploy.go:194 +0xa67 k8s.io/kubernetes/test/e2e/dra.NewDriver.func1() /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/deploy.go:67 +0x26
@pohly: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Some tests are needed if this is to be merged. /hold |
/retest |
// | ||
// To get timing information for a long running SucceedEventually, wrap it in | ||
// By. | ||
func SucceedEventually(callback func(g gomega.Gomega), extra []interface{}, intervals ...interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to point out one potential pitfall (will also add to the comment): the callback must use g.Expect
instead of ginkgo.Expect
or our framework helper functions like framework.ExpectNoError
.
That's because the failure must be reported via the failure handler encapsulated inside the Gomega instance, not the global failure handler.
The alternative is to support a func() error
callback instead or in addition to func(g gomega.Gomega)
. If we want to support both, callback would have to be of type interface{}
with runtime type assertion. Not sure whether generics would help.
/cc @aojea It looks like @onsi is wary of supporting something like this in Gomega (onsi/gomega#574 (comment)) and I tend to agree, so I'd like to get some opinions about this PR and move it forward. |
I don't have a lot of day-to-day experience with the e2e suite so y'all should take this input with a big grain of salt. It seems like the main problem this is trying to solve is around the user experience when manually (i.e. on the console locally? in a specialized CI environment?) running a small subset of tests. Namely: can we get the user faster feedback about where the test is and what might be in the process of failing and why. The proposed solution is to emit information periodically whenever the spec is waiting/polling. One potential downside to this is that it would make CI much more verbose - but it looks like the answer to that is to make the status update interval configurable. That's fair... With that said, though, the I'm not sure the addition to have The additions to Personally, when I'm debugging specs that can get stuck locally I try to lower the timeouts involved to get faster feedback. I'd be happy to add support to Gomega to make this dead simple to use. For example If you'd nonetheless like to pursue the approach in this PR we could explore ways to make |
That only work if code is instrumented consistently, i.e. starts with a By. It can happen that I am looking at a
but in reality that operation is already completed and the test is stuck on "doing something else". |
That won't help in Kubernetes because most polling is not done with Gomega. |
got it, makes sense. i wonder if an additional helpful thing could be for ginkgo to support receiving a signal and telling you where the current test is currently waiting. could even imagine adding a flag where it will emit that information periodically if a test has been running for >N seconds |
As noted in onsi/ginkgo#1023 (comment), we need to keep printing BeforeEach/AfterEach in Kubernetes because we cannot assume that each node starts with a By or some other log output. Even with that, improving By as proposed here still makes sense for a long-running node because ginkgo cannot know where in the node the process currently is: it knows nothing about ginkgo.By, nor about gomega.Eventually. A signal that triggers "current running BeforeEach/AfterEach/It XYZ" may be useful, but IMHO not as useful as a the extended By from this PR. If we were to add that By enhancement to ginkgo, perhaps the extra output could be limited to runs where |
I discussed this some more with Onsi and the conclusion is:
Both means that this PR is not going to be needed. /close |
@pohly: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
When running tests manually it is important to know when a test is stuck and why. The commits in this PR provide some replacements for ginkgo and gomega functions which provide better output while a long-running operation is going on.
Special notes for your reviewer:
I described the idea in onsi/gomega#574. We need to decide whether this can and/or should be solved in ginkgo/gomega or in Kubernetes.
The new functions get added to test/e2e/framework instead of, say, test/e2e/framework/ginkgowrapper because I see it as similar to framework.Logf and framework.Failf and because the intermediate output is done with framework.Logf. That would not be possible when in a sub-package because sub-packages must not depend on the core framework.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: