Skip to content

Conversation

@rschalo
Copy link
Contributor

@rschalo rschalo commented May 21, 2025

Fixes #N/A

Description
The presubmit test was calling cluster.Nodes() which returns nodes based on a map. This adds a sort so the nodes are deterministically selected for churn.

Deflakes registration by adding time since it can take longer than 1 minute to move from TTL to deletion completing:

     {"level":"DEBUG","time":"2025-05-21T19:04:08.017Z","logger":"controller","caller":"lifecycle/liveness.go:69","message":"terminating due to registration ttl","commit":"91d8ecc","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"fighterribbon-23-rfpvpgytno"},"namespace":"","name":"fighterribbon-23-rfpvpgytno","reconcileID":"ea4d5c90-7f46-4f6b-80d6-3e3d38602624","provider-id":"aws:///us-east-2b/i-0349b1aa449d49a2d","ttl":"15m0s"}
    {"level":"INFO","time":"2025-05-21T19:05:25.282Z","logger":"controller","caller":"lifecycle/controller.go:242","message":"deleted nodeclaim","commit":"91d8ecc","controller":"nodeclaim.lifecycle","controllerGroup":"karpenter.sh","controllerKind":"NodeClaim","NodeClaim":{"name":"fighterribbon-23-rfpvpgytno"},"namespace":"","name":"fighterribbon-23-rfpvpgytno","reconcileID":"9be9e594-65de-45de-a195-10b54561cf8b","provider-id":"aws:///us-east-2b/i-0349b1aa449d49a2d"}

Also fixes a case where the old test providerID failed the regex. Having no providerID accomplishes the desired garbage collection behavior.

How was this change tested?
make upstream-e2etests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 21, 2025
@k8s-ci-robot k8s-ci-robot requested review from engedaam and tallaxes May 21, 2025 23:28
@rschalo rschalo changed the title test: deflake tests test: deflake nodeclaim tests May 21, 2025
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label May 21, 2025
Copy link
Contributor

@engedaam engedaam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2025
@coveralls
Copy link

coveralls commented May 21, 2025

Pull Request Test Coverage Report for Build 15192600380

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.008%) to 81.975%

Files with Coverage Reduction New Missed Lines %
pkg/controllers/provisioning/scheduling/nodeclaim.go 3 89.66%
Totals Coverage Status
Change from base Build 15175060207: -0.008%
Covered Lines: 10178
Relevant Lines: 12416

💛 - Coveralls

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2025
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 22, 2025
@rschalo rschalo changed the title test: deflake nodeclaim tests test: deflake NodeClaim and presubmit tests May 22, 2025
@rschalo
Copy link
Contributor Author

rschalo commented May 22, 2025

/test ?

@k8s-ci-robot
Copy link
Contributor

@rschalo: The following commands are available to trigger optional jobs:

/test pull-karpenter-test-1-26
/test pull-karpenter-test-1-27
/test pull-karpenter-test-1-28
/test pull-karpenter-test-1-29

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rschalo
Copy link
Contributor Author

rschalo commented May 22, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label May 22, 2025
@rschalo
Copy link
Contributor Author

rschalo commented May 22, 2025

/retest

@rschalo
Copy link
Contributor Author

rschalo commented May 22, 2025

/test pull-karpenter-test-1-27

Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: engedaam, jonathan-innis, rschalo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 22, 2025
@k8s-ci-robot k8s-ci-robot merged commit 9aec722 into kubernetes-sigs:main May 22, 2025
26 of 27 checks passed
@flavono123
Copy link
Contributor

may i retest what i submitted already which are failed for "does not delete nodes with pod churn, deletes nodes without pod churn", or should i?

#2090 #2054

Raj-Popat added a commit to acquia/karpenter that referenced this pull request Jun 11, 2025
* test: Lower resource requests for NodeClaim test (kubernetes-sigs#2229)

* perf: Don't deepcopy inside of watch handler functions (kubernetes-sigs#2232)

* test: Add random name string for NodePool and NodeClass (kubernetes-sigs#2231)

* test: Update E2E testing suite to be named Regression (kubernetes-sigs#2234)

* refactor: convert validation to an interface (kubernetes-sigs#2220)

* fix: allow non-churn empty nodes to be disrupted (kubernetes-sigs#2206)

* perf: Only deep copy nodes during GetCandidates once (kubernetes-sigs#2233)

* feat: add metrics for disruption candidate validation (kubernetes-sigs#2239)

* perf: Only call .Available() once which prevents duplicate allocs (kubernetes-sigs#2241)

* docs: update issue triage meeting schedule (kubernetes-sigs#2244)

* test: deflake NodeClaim and presubmit tests (kubernetes-sigs#2240)

* perf: Avoid deepcopy when get nodePool/cluster health (kubernetes-sigs#2247)

* perf: Improve OrderByPrice performance (kubernetes-sigs#2250)

* test: add validating admission policy for nodeclass status (kubernetes-sigs#2251)

Co-authored-by: Jonathan Innis <[email protected]>

---------

Co-authored-by: Amanuel Engeda <[email protected]>
Co-authored-by: Jonathan Innis <[email protected]>
Co-authored-by: Reed Schalo <[email protected]>
Co-authored-by: DerekFrank <[email protected]>
Co-authored-by: Jason Deal <[email protected]>
Co-authored-by: Reed Schalo <[email protected]>
Co-authored-by: Jonathan Innis <[email protected]>
harshad3339 added a commit to acquia/karpenter that referenced this pull request Jul 31, 2025
* test: Lower resource requests for NodeClaim test (kubernetes-sigs#2229)

* perf: Don't deepcopy inside of watch handler functions (kubernetes-sigs#2232)

* test: Add random name string for NodePool and NodeClass (kubernetes-sigs#2231)

* test: Update E2E testing suite to be named Regression (kubernetes-sigs#2234)

* refactor: convert validation to an interface (kubernetes-sigs#2220)

* fix: allow non-churn empty nodes to be disrupted (kubernetes-sigs#2206)

* perf: Only deep copy nodes during GetCandidates once (kubernetes-sigs#2233)

* feat: add metrics for disruption candidate validation (kubernetes-sigs#2239)

* perf: Only call .Available() once which prevents duplicate allocs (kubernetes-sigs#2241)

* docs: update issue triage meeting schedule (kubernetes-sigs#2244)

* test: deflake NodeClaim and presubmit tests (kubernetes-sigs#2240)

* perf: Avoid deepcopy when get nodePool/cluster health (kubernetes-sigs#2247)

* perf: Improve OrderByPrice performance (kubernetes-sigs#2250)

* test: add validating admission policy for nodeclass status (kubernetes-sigs#2251)

Co-authored-by: Jonathan Innis <[email protected]>

* feat: drain and volume detachment status conditions (kubernetes-sigs#1876)

* fix: show the cron parse error to users to allow them to debug (kubernetes-sigs#2258)

* perf: Don't deep-copy nodes and nodeclaims in our synced check (kubernetes-sigs#2260)

* chore: Fix getting current script directory in install-kwok.sh (kubernetes-sigs#2262)

* perf: Perform quick checks in node health first (kubernetes-sigs#2264)

* chore: Update pod metrics when pod is completed (kubernetes-sigs#2259)

* fix: Correctly build nodepool mapping for complex clusters (kubernetes-sigs#2263)

* fix: fail open for missing nodeclaims in termination (kubernetes-sigs#2266)

* perf: Limit GetInstanceTypes() calls per-NodeClaim (kubernetes-sigs#2271)

* perf: Parallelize disruption execution actions (kubernetes-sigs#2270)

* fix: Fix node owner reference update (kubernetes-sigs#2274)

* perf: Be more resilient to deletion failures in disruption controller (kubernetes-sigs#2272)

* chore(deps): bump the go-deps group with 2 updates (kubernetes-sigs#2277)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore: Ensure we can stand up multiple partitions with kwok (kubernetes-sigs#2283)

* chore: Inject resources into Kwok through a patch (kubernetes-sigs#2285)

* chore: Update NodeClaim E2E test to only replace one status condition (kubernetes-sigs#2284)

* chore: Avoid validating admission policy for clusters older then 1.30 (kubernetes-sigs#2289)

* chore(deps): bump the go-deps group with 2 updates (kubernetes-sigs#2295)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore: bump go version to 1.24.4 (kubernetes-sigs#2298)

* chore: Only log that the command succeeded when it actually did (kubernetes-sigs#2302)

* fix: Fix bug with MarkForDeletion before creating replacements (kubernetes-sigs#2300)

* perf: Refactor the eviction queue to be multithreaded (kubernetes-sigs#2252)

* docs: Add Bizfly Cloud provider (kubernetes-sigs#2303)

* chore: Bump lifecycle cache expiration to one hour (kubernetes-sigs#2307)

* chore: Use cluster state to check replacement NodeClaim existence (kubernetes-sigs#2308)

* chore(deps): bump github.com/samber/lo from 1.50.0 to 1.51.0 in the go-deps group (kubernetes-sigs#2315)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore: bump operatorpkg (kubernetes-sigs#2314)

* chore(deps): bump the k8s-go-deps group across 1 directory with 4 updates (kubernetes-sigs#2317)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore: Refactor Orchestration Queue and Handle Mark/Unmark Deletion in Queue (kubernetes-sigs#2305)

* chore(deps): bump the k8s-go-deps group with 7 updates (kubernetes-sigs#2326)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* perf: multithreaded orchestration queue (kubernetes-sigs#2293)

* test: Add nodeclaim name when you have garbage collection (kubernetes-sigs#2333)

* perf: Reduce multiple patch calls in instance termination (kubernetes-sigs#2324)

* fix: add helm rbac for kwok-provider to update finalizers (kubernetes-sigs#2336)

Signed-off-by: Max Cao <[email protected]>

* feat: configure CRD status operator with larger histogram buckets (kubernetes-sigs#2328)

* chore(deps): bump sigs.k8s.io/yaml from 1.4.0 to 1.5.0 in the k8s-go-deps group (kubernetes-sigs#2339)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* chore(deps): bump github.com/docker/docker from 28.2.2+incompatible to 28.3.0+incompatible in the go-deps group (kubernetes-sigs#2340)

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix: Fix re-retrieving object on retry (kubernetes-sigs#2337)

* fix: Fix overriding error with patch call (kubernetes-sigs#2338)

* fix: add missing rlock to disruption queue (kubernetes-sigs#2348)

* test: allow e2e tests to output junit report (kubernetes-sigs#2334)

Signed-off-by: Max Cao <[email protected]>

* docs: Add Oracle Cloud Infrastructure (OCI) provider  (kubernetes-sigs#2342)

* fix: no longer allow the same hostname to take multiple capacity (kubernetes-sigs#2356)

* feat: support auto relaxing min values (kubernetes-sigs#2299)

* fix: update provider ID to ensure that Cloud Provider tests pass (kubernetes-sigs#2363)

* fix: remove unsupported capacity_type label from karpenter_nodeclaims… (kubernetes-sigs#2364)

* fix: update deletionTimestamp on terminating pods when after nodeDeletionTimestamp (kubernetes-sigs#2316)

Co-authored-by: Amanuel Engeda <[email protected]>

* chore: promote ReservedCapacity feature gate to beta (kubernetes-sigs#2365)

* fix: flakiness in expiration tests (kubernetes-sigs#2366)

* test: Bump the termination time for the deletion timestamp (kubernetes-sigs#2367)

* chore: cherry-pick kubernetes-sigs#2399 (kubernetes-sigs#2401)

---------

Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Max Cao <[email protected]>
Co-authored-by: Amanuel Engeda <[email protected]>
Co-authored-by: Jonathan Innis <[email protected]>
Co-authored-by: Reed Schalo <[email protected]>
Co-authored-by: DerekFrank <[email protected]>
Co-authored-by: Jason Deal <[email protected]>
Co-authored-by: Reed Schalo <[email protected]>
Co-authored-by: Jonathan Innis <[email protected]>
Co-authored-by: Todd Neal <[email protected]>
Co-authored-by: Jigisha Patil <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Lê Minh Quân <[email protected]>
Co-authored-by: Max Cao <[email protected]>
Co-authored-by: Aidan Rowe <[email protected]>
Co-authored-by: Daniel Lopes <[email protected]>
Co-authored-by: Saurav Agarwalla <[email protected]>
Co-authored-by: cosimomeli <[email protected]>
jigisha620 pushed a commit to jigisha620/karpenter that referenced this pull request Sep 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants