Skip to content

Conversation

@rushmash91
Copy link
Member

@rushmash91 rushmash91 commented Jun 2, 2025

Fixes: aws-controllers-k8s/community#2497, aws-controllers-k8s/community#2103

Description of changes:

During controller shutdown (SIGTERM), status updates were failing because patchResourceStatus was attempting to patch using the original cancelled context. This resulted in resources not having their final status properly updated in the resource manifest, leaving them in an inconsistent state.

FIX: Always use context.WithoutCancel(ctx) for all patch operations to prevent cancellation propagation while preserving context values. The 30s SIGTERM grace period serves as the effective timeout, eliminating the need for additional timeout logic that could interfere with normal Kubernetes client timeout/retry strategy.

Changes:

Created util functions:

- patchWithoutCancel()
- patchStatusWithoutCancel() 

Operations updated to Use context.WithoutCancel(ctx)
Main Reconciler:

- patchResourceMetadataAndSpec()
- patchResourceStatus()

Adoption Reconciler:

- patchMetadataAndSpec()
- patchStatus()

Field Export Reconciler:

- patchMetadataAndSpec()
- patchStatus()
- writeToConfigMap()
- writeToSecret()

eg logs:

{"level":"debug","ts":"2025-06-05T09:52:16.723-0700","logger":"ackrt","msg":"< r.Sync","kind":"Policy","namespace":"default","name":"iiobalic90","account":"xx","role":"","region":"us-west-2","is_adopted":false,"generation":1,"error":"operation error IAM: GetPolicy, context canceled"}
{"level":"info","ts":"2025-06-05T09:52:16.723-0700","logger":"ackrt","msg":"created patch context with timeout","kind":"Policy","namespace":"default","name":"iiobalic90","account":"xx","role":"","region":"us-west-2","is_adopted":false,"generation":1,"timeout_seconds":10}
{"level":"debug","ts":"2025-06-05T09:52:16.723-0700","logger":"ackrt","msg":"> r.patchResourceStatus","kind":"Policy","namespace":"default","name":"iiobalic90","account":"xx","role":"","region":"us-west-2","is_adopted":false,"generation":1}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ack-prow ack-prow bot requested review from a-hilaly and knottnt June 2, 2025 16:59
@ack-prow ack-prow bot added the approved label Jun 2, 2025
@rushmash91 rushmash91 changed the title Fix graceful shutdown status patching by using independent context when original context is cancelled Fix graceful shutdown status patching by using a graceful context when original context is cancelled Jun 2, 2025
@rushmash91 rushmash91 changed the title Fix graceful shutdown status patching by using a graceful context when original context is cancelled Fix controller shutdown status patching with a graceful context when original context is cancelled Jun 2, 2025
Copy link

@knottnt knottnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this behavior?

@rushmash91
Copy link
Member Author

/retest

@rushmash91
Copy link
Member Author

/test ec2-controller-test

@rushmash91 rushmash91 requested a review from michaelhtm June 5, 2025 17:00
@rushmash91 rushmash91 changed the title Fix controller shutdown status patching with a graceful context when original context is cancelled Fix controller shutdown status patching with a fresh context for patches Jun 5, 2025
@ack-prow
Copy link

ack-prow bot commented Jun 5, 2025

@rushmash91: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

  • /test ec2-controller-test
  • /test ecr-controller-test
  • /test iam-controller-test
  • /test s3-controller-test
  • /test sagemaker-controller-test
  • /test unit-test

The following commands are available to trigger optional jobs:

  • /test verify-attribution

Use /test all to run all jobs.

In response to this:

/retest s3-controller-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rushmash91
Copy link
Member Author

/test s3-controller-test

@rushmash91
Copy link
Member Author

/test ec2-controller-test

@knottnt knottnt self-requested a review June 6, 2025 16:59
@rushmash91
Copy link
Member Author

/hold

@ack-prow ack-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 10, 2025
@rushmash91 rushmash91 force-pushed the signals-play branch 2 times, most recently from 3dbcccc to 19c06a9 Compare June 10, 2025 19:02
Copy link
Member

@a-hilaly a-hilaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks @rushmash91 !

@rushmash91
Copy link
Member Author

/unhold

@ack-prow ack-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 10, 2025
Copy link
Member

@a-hilaly a-hilaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @rushmash91 !
/lgtm

@ack-prow ack-prow bot added the lgtm Indicates that a PR is ready to be merged. label Jun 10, 2025
@a-hilaly
Copy link
Member

/retest

@ack-prow
Copy link

ack-prow bot commented Jun 10, 2025

@rushmash91: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
verify-attribution 21f3a8b link false /test verify-attribution

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Member

@a-hilaly a-hilaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@ack-prow
Copy link

ack-prow bot commented Jun 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: a-hilaly, knottnt, rushmash91

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [a-hilaly,knottnt,rushmash91]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rushmash91
Copy link
Member Author

/test ec2-controller-test

@ack-prow ack-prow bot merged commit 4d837e7 into aws-controllers-k8s:main Jun 11, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ACK iam-controller restarts/cancelled contexts can lead to EntityAlreadyExists errors.

4 participants