Skip to content

[Backport/v1.28] identity: reload CA root cert channel on file change#1801

Merged
istio-testing merged 7 commits intoistio:release-1.28from
jlojosnegros:backport/v1.28/root-ca-reload
Apr 7, 2026
Merged

[Backport/v1.28] identity: reload CA root cert channel on file change#1801
istio-testing merged 7 commits intoistio:release-1.28from
jlojosnegros:backport/v1.28/root-ca-reload

Conversation

@jlojosnegros
Copy link
Copy Markdown
Contributor

@jlojosnegros jlojosnegros commented Mar 20, 2026

Backport of #1775 to release-1.28

Everything is almost directly backported.
Note: some dependencies added.

Introduces a modification over the master PR, a boolean env var CA_CERT_WATCHER (default: true) that allows operators to disable the CA root cert file watcher at runtime.
When set to false, no watcher thread is started and the gRPC channel is never rebuilt on cert rotation — the startup-time cert is retained permanently.

The flag is only effective when ca_root_cert is a file path; Static and Default certs never start a watcher regardless.

Introduce RootCertManager (analogous to CrlManager) to watch
the CA root cert file for changes using a notify debouncer.

When the file changes, a dirty flag (AtomicBool) is set. On the next call
to CaClient::fetch_certificate(), the flag is checked and, if set, the
TLS gRPC channel is rebuilt using the updated cert before sending the CSR
request. If the rebuild fails, the flag is rearmed and the existing channel
is reused, so cert renewal continues working despite a transient error.

The gRPC channel is now wrapped in a RwLock inside CaClient to allow
lazy replacement without blocking concurrent reads. CaClient::new()
now accepts a RootCert directly instead of a pre-built cert provider,
so it can start the file watcher and retain the data needed to rebuild
the channel later.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Add notify and notify-debouncer-full needed to watch the CA root file
parent folder and hot reload the cert.

Add tempfile dependency needed for testing purposes

NOTE: Used the versions in main for the three dependencies even if there
are newer ones

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
@istio-testing
Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@istio-testing istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 20, 2026
@fjglira
Copy link
Copy Markdown

fjglira commented Mar 20, 2026

Hey @jlojosnegros, the backports are automatically created from PR on the master branch by using the label cherry-pick/release-1.xx

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Add write_lock_wait_ms to the debug log emitted after a successful
root cert hot-reload, so contention on the RwLock is observable in
logs without requiring additional instrumentation.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Introduces a boolean env var CA_CERT_WATCHER (default: true) that allows
operators to disable the CA root cert file watcher at runtime.
When set to false, no watcher thread is started and the gRPC channel is never
rebuilt on cert rotation — the startup-time cert is retained permanently.

The flag is only effective when ca_root_cert is a file path; Static and
Default certs never start a watcher regardless.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
@jlojosnegros jlojosnegros reopened this Mar 26, 2026
@istio-testing istio-testing added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 26, 2026
@jlojosnegros
Copy link
Copy Markdown
Contributor Author

jlojosnegros commented Apr 7, 2026

Hi @fjglira
after this comment #1775 (comment)
I assume the backport should be done by hand and not using the usual labels.

Has the backport been approved?

@jlojosnegros jlojosnegros marked this pull request as ready for review April 7, 2026 12:09
@jlojosnegros jlojosnegros requested a review from a team as a code owner April 7, 2026 12:09
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Apr 7, 2026
@fjglira
Copy link
Copy Markdown

fjglira commented Apr 7, 2026

@jlojosnegros: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
test_ztunnel_release-1.28 88289cc link true /test test
Details

@jlojosnegros can you please check the errors?

@fjglira
Copy link
Copy Markdown

fjglira commented Apr 7, 2026

@ilrudie, what do you think about the CA_CERT_WATCHER flag? And from my point of view, I think the default state should be false? I mean, if there are users with already a ca_root_cert as a file path, this will change the current behaviour?

Another question: should we add a release note in the istio/istio to document the change?

Note: @jlojosnegros after we agree all this, you will need to create the backport also for the 1.29 release branch

@ilrudie
Copy link
Copy Markdown
Contributor

ilrudie commented Apr 7, 2026

I already did a 1.29 cherry-pick here. We can flip that flag there too if folks want.

The flag was requested here in slack for 1.29 where I had brought up the topic of cherry-picking more generally.

@fjglira
Copy link
Copy Markdown

fjglira commented Apr 7, 2026

CA_CERT_WATCHER

Great, approving

@fjglira
Copy link
Copy Markdown

fjglira commented Apr 7, 2026

/retest

@ilrudie
Copy link
Copy Markdown
Contributor

ilrudie commented Apr 7, 2026

@fjglira, the failure is a lint/clippy one not a flake. It'll need a code fix

@jlojosnegros
Copy link
Copy Markdown
Contributor Author

Checking ...

clippy complains about the number of arguments in CaClient::new exceeding the max of 7 arguments ...

if we cannot use '#[allow]' to disable the warning here, we should think on grouping some params to reduce the number of arguments.

Maybe the boolean flags? Or something else with low impact in the code ...

WDYT @ilrudie could we disable the warning here ( as this is just a backport ) or should we try to group some arguments in a new type?

@ilrudie
Copy link
Copy Markdown
Contributor

ilrudie commented Apr 7, 2026

d10aa3e

I just suppressed the lint error. A refactor on a release branch to support a release-branch-only flag doesn't make sense to me.

CaClient::new has 8 parameters after adding enable_ca_cert_watcher
in the previous commit, exceeding clippy's default limit of 7.
suppress the lint rather than refactoring.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
@istio-testing istio-testing removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 7, 2026
@istio-testing istio-testing added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 7, 2026
@istio-testing istio-testing merged commit 3677e2d into istio:release-1.28 Apr 7, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants