Skip to content

fix(target-allocator): add cert-manager Certificate and Issuer ownership to TargetAllocator controller#4372

Merged
swiatekm merged 11 commits into
open-telemetry:mainfrom
schahal:fix/target-allocator-certificate-ownership
Oct 20, 2025
Merged

fix(target-allocator): add cert-manager Certificate and Issuer ownership to TargetAllocator controller#4372
swiatekm merged 11 commits into
open-telemetry:mainfrom
schahal:fix/target-allocator-certificate-ownership

Conversation

@schahal

@schahal schahal commented Sep 19, 2025

Copy link
Copy Markdown
Contributor

Description:

The TargetAllocator seems to be missing ownership watches for cert-manager Certificate and Issuer resources [ref].

This, I believe, is causing certificate renewal issues where:

  1. cert-manager would update Certificate resources
  2. TargetAllocator controller would not be notified of these changes
  3. TargetAllocator pods would continue using stale certificate mounts
  4. Manual Secret deletion was required to force certificate pickup

This fix adds Certificate and Issuer ownership when cert-manager is available and TargetAllocator mTLS is enabled, ensuring the controller gets notified when certificates are updated and can trigger pod restarts as needed.

Fixes: Certificate renewal requiring manual Secret deletion

Link to tracking Issue(s):

Testing:

Relying on CI.

I confirmed locally that my Certificate is indeed controller-owned:

$ kubectl get certificate otel-collector-agent-ta-server-cert -o yaml | yq '.metadata.ownerReferences'
- apiVersion: opentelemetry.io/v1alpha1
  blockOwnerDeletion: true
  controller: true
  kind: TargetAllocator
  name: otel-collector-agent
  uid: d2b2007b-28cf-4032-8747-b7c1aab216a6

Documentation:

N/A (fix to existing functionality)

@schahal schahal requested a review from a team as a code owner September 19, 2025 19:27
@schahal

schahal commented Sep 19, 2025

Copy link
Copy Markdown
Contributor Author

Maintainers: this was a theory pinpointed by Claude and I vetted it against references and existing code, but need maintainers eye for sure.

@swiatekm

Copy link
Copy Markdown
Contributor

Thank you for the contribution! A simple way to test this in the CI would be to check ownership in this e2e test. Could you also add a changelog entry? make chlog-new will generate a yaml file that you need to fill in.

@schahal

schahal commented Sep 20, 2025

Copy link
Copy Markdown
Contributor Author

Thanks! Added! (I may not have permission to re-trigger the workflows)

@github-actions

github-actions Bot commented Sep 22, 2025

Copy link
Copy Markdown
Contributor

E2E Test Results

 34 files  ±0  221 suites  ±0   2h 3m 36s ⏱️ - 55m 55s
 88 tests ±0   88 ✅ ±0  0 💤 ±0  0 ❌ ±0 
225 runs  ±0  225 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit e7414e0. ± Comparison against base commit 9a6b1e4.

♻️ This comment has been updated with latest results.

@swiatekm

swiatekm commented Sep 22, 2025

Copy link
Copy Markdown
Contributor

@schahal can you verify the ownership in the following e2e test? I'd like to have a persistent check that the ownership is set correctly.

@schahal

schahal commented Sep 22, 2025

Copy link
Copy Markdown
Contributor Author

@swiatekm I updated the branch for a test to hopefully check when/if cert renewal breaks in any new/unique way going forward (including ownership watcher missing).

(Triggering Certificate renewal by force-renewal annotation)

🤞

@swiatekm

Copy link
Copy Markdown
Contributor

@schahal looks like the e2e tests are failing, can you have a look?

@schahal

schahal commented Oct 7, 2025

Copy link
Copy Markdown
Contributor Author

@swiatekm possible to re-trigger the workflows? (i missed passing in the namespace the the kubectl patch cert portion of the test)

@swiatekm swiatekm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schahal it looks like there's still some kind of problem with the e2e tests. Can you have a look?

schahal and others added 11 commits October 9, 2025 16:31
…ator controller

The TargetAllocator controller was missing ownership watches for cert-manager
Certificate and Issuer resources. This caused certificate renewal issues where:

1. cert-manager would update Certificate resources and recreate Secrets
2. TargetAllocator controller would not be notified of these changes
3. TargetAllocator pods would continue using stale certificate mounts
4. Manual Secret deletion was required to force certificate pickup

This fix adds Certificate and Issuer ownership when cert-manager is available
and TargetAllocator mTLS is enabled, ensuring the controller gets notified
when certificates are updated and can trigger pod restarts as needed.

Fixes: Certificate renewal requiring manual Secret deletion
@schahal

schahal commented Oct 15, 2025

Copy link
Copy Markdown
Contributor Author

it looks like there's still some kind of problem with the e2e tests. Can you have a look?

Can you re-trigger the workflows, @swiatekm , please?

I have the tests passing locally:

$ make add-operator-arg OPERATOR_ARG='--feature-gates=operator.targetallocator.mtls' add-certmanager-permissions prepare-e2e KUBE_VERSION=1.33 VERSION=e2e
[...]
$ make e2e-ta-collector-mtls
[...]
    | 13:51:41 | targetallocator-collector-mtls | @chainsaw | DELETE    | OK    | v1/Namespace @ chainsaw-equipped-skylark
    | 13:52:24 | targetallocator-collector-mtls | @chainsaw | CLEANUP   | END   |
--- PASS: chainsaw (0.00s)
    --- PASS: chainsaw/ta-disabled (9.09s)
    --- PASS: chainsaw/ta-collector-mtls-scrapeconfig-node (57.22s)
    --- PASS: chainsaw/targetallocator-collector-mtls (155.25s)
PASS
Tests Summary...
- Passed  tests 3
- Failed  tests 0
- Skipped tests 0
Saving report...
Done.

@schahal schahal requested a review from swiatekm October 17, 2025 18:52

@swiatekm swiatekm left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

@swiatekm swiatekm merged commit 890031a into open-telemetry:main Oct 20, 2025
51 of 52 checks passed
@schahal schahal deleted the fix/target-allocator-certificate-ownership branch December 1, 2025 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Renewed certificates are not reloaded by target allocator

3 participants