-
Notifications
You must be signed in to change notification settings - Fork 380
test: deflake NodeClaim and presubmit tests #2240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
engedaam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Pull Request Test Coverage Report for Build 15192600380Details
💛 - Coveralls |
|
/test ? |
|
@rschalo: The following commands are available to trigger optional jobs: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
/retest |
|
/test pull-karpenter-test-1-27 |
jonathan-innis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: engedaam, jonathan-innis, rschalo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* test: Lower resource requests for NodeClaim test (kubernetes-sigs#2229) * perf: Don't deepcopy inside of watch handler functions (kubernetes-sigs#2232) * test: Add random name string for NodePool and NodeClass (kubernetes-sigs#2231) * test: Update E2E testing suite to be named Regression (kubernetes-sigs#2234) * refactor: convert validation to an interface (kubernetes-sigs#2220) * fix: allow non-churn empty nodes to be disrupted (kubernetes-sigs#2206) * perf: Only deep copy nodes during GetCandidates once (kubernetes-sigs#2233) * feat: add metrics for disruption candidate validation (kubernetes-sigs#2239) * perf: Only call .Available() once which prevents duplicate allocs (kubernetes-sigs#2241) * docs: update issue triage meeting schedule (kubernetes-sigs#2244) * test: deflake NodeClaim and presubmit tests (kubernetes-sigs#2240) * perf: Avoid deepcopy when get nodePool/cluster health (kubernetes-sigs#2247) * perf: Improve OrderByPrice performance (kubernetes-sigs#2250) * test: add validating admission policy for nodeclass status (kubernetes-sigs#2251) Co-authored-by: Jonathan Innis <[email protected]> --------- Co-authored-by: Amanuel Engeda <[email protected]> Co-authored-by: Jonathan Innis <[email protected]> Co-authored-by: Reed Schalo <[email protected]> Co-authored-by: DerekFrank <[email protected]> Co-authored-by: Jason Deal <[email protected]> Co-authored-by: Reed Schalo <[email protected]> Co-authored-by: Jonathan Innis <[email protected]>
* test: Lower resource requests for NodeClaim test (kubernetes-sigs#2229) * perf: Don't deepcopy inside of watch handler functions (kubernetes-sigs#2232) * test: Add random name string for NodePool and NodeClass (kubernetes-sigs#2231) * test: Update E2E testing suite to be named Regression (kubernetes-sigs#2234) * refactor: convert validation to an interface (kubernetes-sigs#2220) * fix: allow non-churn empty nodes to be disrupted (kubernetes-sigs#2206) * perf: Only deep copy nodes during GetCandidates once (kubernetes-sigs#2233) * feat: add metrics for disruption candidate validation (kubernetes-sigs#2239) * perf: Only call .Available() once which prevents duplicate allocs (kubernetes-sigs#2241) * docs: update issue triage meeting schedule (kubernetes-sigs#2244) * test: deflake NodeClaim and presubmit tests (kubernetes-sigs#2240) * perf: Avoid deepcopy when get nodePool/cluster health (kubernetes-sigs#2247) * perf: Improve OrderByPrice performance (kubernetes-sigs#2250) * test: add validating admission policy for nodeclass status (kubernetes-sigs#2251) Co-authored-by: Jonathan Innis <[email protected]> * feat: drain and volume detachment status conditions (kubernetes-sigs#1876) * fix: show the cron parse error to users to allow them to debug (kubernetes-sigs#2258) * perf: Don't deep-copy nodes and nodeclaims in our synced check (kubernetes-sigs#2260) * chore: Fix getting current script directory in install-kwok.sh (kubernetes-sigs#2262) * perf: Perform quick checks in node health first (kubernetes-sigs#2264) * chore: Update pod metrics when pod is completed (kubernetes-sigs#2259) * fix: Correctly build nodepool mapping for complex clusters (kubernetes-sigs#2263) * fix: fail open for missing nodeclaims in termination (kubernetes-sigs#2266) * perf: Limit GetInstanceTypes() calls per-NodeClaim (kubernetes-sigs#2271) * perf: Parallelize disruption execution actions (kubernetes-sigs#2270) * fix: Fix node owner reference update (kubernetes-sigs#2274) * perf: Be more resilient to deletion failures in disruption controller (kubernetes-sigs#2272) * chore(deps): bump the go-deps group with 2 updates (kubernetes-sigs#2277) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: Ensure we can stand up multiple partitions with kwok (kubernetes-sigs#2283) * chore: Inject resources into Kwok through a patch (kubernetes-sigs#2285) * chore: Update NodeClaim E2E test to only replace one status condition (kubernetes-sigs#2284) * chore: Avoid validating admission policy for clusters older then 1.30 (kubernetes-sigs#2289) * chore(deps): bump the go-deps group with 2 updates (kubernetes-sigs#2295) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: bump go version to 1.24.4 (kubernetes-sigs#2298) * chore: Only log that the command succeeded when it actually did (kubernetes-sigs#2302) * fix: Fix bug with MarkForDeletion before creating replacements (kubernetes-sigs#2300) * perf: Refactor the eviction queue to be multithreaded (kubernetes-sigs#2252) * docs: Add Bizfly Cloud provider (kubernetes-sigs#2303) * chore: Bump lifecycle cache expiration to one hour (kubernetes-sigs#2307) * chore: Use cluster state to check replacement NodeClaim existence (kubernetes-sigs#2308) * chore(deps): bump github.com/samber/lo from 1.50.0 to 1.51.0 in the go-deps group (kubernetes-sigs#2315) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: bump operatorpkg (kubernetes-sigs#2314) * chore(deps): bump the k8s-go-deps group across 1 directory with 4 updates (kubernetes-sigs#2317) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: Refactor Orchestration Queue and Handle Mark/Unmark Deletion in Queue (kubernetes-sigs#2305) * chore(deps): bump the k8s-go-deps group with 7 updates (kubernetes-sigs#2326) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * perf: multithreaded orchestration queue (kubernetes-sigs#2293) * test: Add nodeclaim name when you have garbage collection (kubernetes-sigs#2333) * perf: Reduce multiple patch calls in instance termination (kubernetes-sigs#2324) * fix: add helm rbac for kwok-provider to update finalizers (kubernetes-sigs#2336) Signed-off-by: Max Cao <[email protected]> * feat: configure CRD status operator with larger histogram buckets (kubernetes-sigs#2328) * chore(deps): bump sigs.k8s.io/yaml from 1.4.0 to 1.5.0 in the k8s-go-deps group (kubernetes-sigs#2339) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump github.com/docker/docker from 28.2.2+incompatible to 28.3.0+incompatible in the go-deps group (kubernetes-sigs#2340) Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: Fix re-retrieving object on retry (kubernetes-sigs#2337) * fix: Fix overriding error with patch call (kubernetes-sigs#2338) * fix: add missing rlock to disruption queue (kubernetes-sigs#2348) * test: allow e2e tests to output junit report (kubernetes-sigs#2334) Signed-off-by: Max Cao <[email protected]> * docs: Add Oracle Cloud Infrastructure (OCI) provider (kubernetes-sigs#2342) * fix: no longer allow the same hostname to take multiple capacity (kubernetes-sigs#2356) * feat: support auto relaxing min values (kubernetes-sigs#2299) * fix: update provider ID to ensure that Cloud Provider tests pass (kubernetes-sigs#2363) * fix: remove unsupported capacity_type label from karpenter_nodeclaims… (kubernetes-sigs#2364) * fix: update deletionTimestamp on terminating pods when after nodeDeletionTimestamp (kubernetes-sigs#2316) Co-authored-by: Amanuel Engeda <[email protected]> * chore: promote ReservedCapacity feature gate to beta (kubernetes-sigs#2365) * fix: flakiness in expiration tests (kubernetes-sigs#2366) * test: Bump the termination time for the deletion timestamp (kubernetes-sigs#2367) * chore: cherry-pick kubernetes-sigs#2399 (kubernetes-sigs#2401) --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Max Cao <[email protected]> Co-authored-by: Amanuel Engeda <[email protected]> Co-authored-by: Jonathan Innis <[email protected]> Co-authored-by: Reed Schalo <[email protected]> Co-authored-by: DerekFrank <[email protected]> Co-authored-by: Jason Deal <[email protected]> Co-authored-by: Reed Schalo <[email protected]> Co-authored-by: Jonathan Innis <[email protected]> Co-authored-by: Todd Neal <[email protected]> Co-authored-by: Jigisha Patil <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Lê Minh Quân <[email protected]> Co-authored-by: Max Cao <[email protected]> Co-authored-by: Aidan Rowe <[email protected]> Co-authored-by: Daniel Lopes <[email protected]> Co-authored-by: Saurav Agarwalla <[email protected]> Co-authored-by: cosimomeli <[email protected]>
Fixes #N/A
Description
The presubmit test was calling
cluster.Nodes()which returns nodes based on a map. This adds a sort so the nodes are deterministically selected for churn.Deflakes registration by adding time since it can take longer than 1 minute to move from TTL to deletion completing:
Also fixes a case where the old test providerID failed the regex. Having no providerID accomplishes the desired garbage collection behavior.
How was this change tested?
make upstream-e2etests
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.