-
Notifications
You must be signed in to change notification settings - Fork 42.4k
e2e: apply timeout for CSI Storage Capacity test only to node #118200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e: apply timeout for CSI Storage Capacity test only to node #118200
Conversation
Applying it to the entire spec included cleaning up, which makes predicting the acceptable duration harder because it includes code not owned by the test itself. It's better to specify a timeout only for the test code itself.
|
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Thanks Patrick
yeah, that will be a nice small project for new contributor, instrument the e2e framework to measure these things and being able to analyze trends for regressions @onsi did you even have request for these kind of metrics on the project? |
|
LGTM label has been added. DetailsGit tree hash: 646795e25facf27ce26c123e0d091f63cf6b7b43 |
| for _, t := range tests { | ||
| test := t | ||
| ginkgo.It(t.name, ginkgo.SpecTimeout(f.Timeouts.PodStart), func(ctx context.Context) { | ||
| ginkgo.It(t.name, ginkgo.NodeTimeout(f.Timeouts.PodStart), func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the NodeTimeout does not consider the ginkgo.DeferCleanup , right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the difference.
|
hey @aojea - I think this would be a valuable project for something of the scale and scope of k8s e2e. I haven't gotten many requests around it, no - but I can paint a picture for where one might start today and what could be added to Ginkgo over time. Today - Ginkgo's JSON report format includes a series of I could easily imagine some independent code that consumes JSON reports over time and looks for patterns to identify trends. The biggest limitation right now would be how to match up nodes across different test runs. My intuition is that the combination of spec text (i.e. the concatenation of all the container texts and the This would then allow you to track, statistically, what the median node runtimes are and if any are drifting and which commits correlate with the beginning of the drift. All of this could be done today and the first iteration of this would not take very long to build at all. From there I could imagine a few enhancements:
I don't know how new ideas/projects germinate in k8s but I'd be happy to support an effort to explore this. |
…0-upstream-release-1.27 Automated cherry pick of #118200: e2e: apply timeout for CSI Storage Capacity test only to node
What type of PR is this?
/kind bug
/kind failing-test
What this PR does / why we need it:
Applying it to the entire spec included cleaning up, which makes predicting the acceptable duration harder because it includes code not owned by the test itself. It's better to specify a timeout only for the test code itself.
Which issue(s) this PR fixes:
Fixes #118175
Special notes for your reviewer:
It's not clear why this started flaking. Somehow the namespace cleanup must have gotten slower.
Does this PR introduce a user-facing change?