Skip to content

identity: reload CA root cert channel on file change#1775

Merged
istio-testing merged 10 commits intoistio:masterfrom
jlojosnegros:feat/root-ca-reload
Mar 26, 2026
Merged

identity: reload CA root cert channel on file change#1775
istio-testing merged 10 commits intoistio:masterfrom
jlojosnegros:feat/root-ca-reload

Conversation

@jlojosnegros
Copy link
Copy Markdown
Contributor

@jlojosnegros jlojosnegros commented Mar 10, 2026

Problem

ztunnel reads the CA root certificate once at startup and discards it. When an operator rotates the Istio CA (istio-ca-root-cert ConfigMap), the file on disk is updated by Kubernetes but ztunnel keeps using the stale cert to verify TLS with istiod. Every subsequent workload certificate renewal fails with UnknownIssuer.

Proposed Solution

Add a RootCertManager, modelled after the existing CrlManager. It holds a filesystem watcher (via notify-debouncer-full) that sets an atomic dirty flag whenever the root cert directory changes.

CaClient now retains the RootCert and a Option<Arc<RootCertManager>>. At the start of each fetch_certificate call, if the dirty flag is set, the TLS gRPC channel is rebuilt from the new cert before proceeding. On rebuild failure the dirty flag is re-armed so the next call retries.

Changes

File Change
src/tls/control.rs Add RootCertManager; expose control_plane_client_config as pub(crate)
src/identity/caclient.rs Retain connection params in CaClient; add rebuild_channel(); hot-reload check in fetch_certificate()
src/identity/manager.rs Adapt call site to new CaClient::new() signature
src/test_helpers/ca.rs Adapt CaServer::spawn() to new signature

No Cargo.toml changes needed: notify and notify-debouncer-full are already pulled in by CrlManager.

Solves: #1768

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Use the new RootCertManager to rebuild the TLS gRPC channel when a root
cert change is noted.

Need to add some fields to store the information to rebuild the channel.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
@istio-testing istio-testing added do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test labels Mar 10, 2026
@istio-testing
Copy link
Copy Markdown
Contributor

Hi @jlojosnegros. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
@jlojosnegros jlojosnegros marked this pull request as ready for review March 12, 2026 10:49
@jlojosnegros jlojosnegros requested a review from a team as a code owner March 12, 2026 10:49
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Mar 12, 2026
@ilrudie
Copy link
Copy Markdown
Contributor

ilrudie commented Mar 16, 2026

/ok-to-test

@istio-testing istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels Mar 16, 2026
@nmnellis
Copy link
Copy Markdown

Will you be able to backport this fix to 1.28?

@jlojosnegros
Copy link
Copy Markdown
Contributor Author

Will you be able to backport this fix to 1.28?

Yeah sur, no problem. Should be something like #1801

@ilrudie
Copy link
Copy Markdown
Contributor

ilrudie commented Mar 20, 2026

Will you be able to backport this fix to 1.28?

Yeah sur, no problem. Should be something like #1801

We'd typically only cherry-pick bug fixes. This is a grey area IMO. It could be seen as a bug of omission. At any rate, If we do want to consider this a bug we should use the cherry-pick methodology native to the project by labeling this issue for cherry-pick.

Copy link
Copy Markdown
Contributor

@ilrudie ilrudie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly this looks good. Nice work and thanks for the contribution.

I did want to talk about the locking a little bit to make sure I understand it the same way you do an that it's the right optimization but no need to immediately jump in and start changing that bit.

manager.mark_dirty();
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't requesting a change per se. Sort of reasoning "out loud" about the locking here.

After we take_dirty() the state returns to clean immediately and we enter the await to attempt rebuild_channel. During this time nothing holds the write lock and the state is clean so we may try to use an old client. It might be able to compound as contentious readers may continue to delay being able to take the write, all the while in a "clean" state. I think the advantage is that we do not queue multiple write/rebuilds, which seems like a nice property. In the happier state, the old client is still valid and we keep on running nicely but eventually wind up with a fresh client. In the worse state, the client isn't usable and we produce errors while potentially blocking the resolution by causing read contention on the lock.

Assuming we held the rwlock before marking clean, a bunch of reads could await the write lock to resolve. On success, they get the newly minted client (probably ideal?). On fail, they get the old client and what happens happens. The downside is we could have multiple writelock awaits simultaneously queue since multiple calls to fetch could see the dirty state while we await any held read locks to clear. This might be kind of helpful, since at least write locks to not lead to further read contentions, but we might need to check if the dirty was resolved by someone else before actually rebuilding.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ilrudie ,
Thanks for the comments!!!
Sorry for the late response

After we take_dirty() the state returns to clean immediately and we enter the await to attempt rebuild_channel. During this time nothing holds the write lock and the state is clean so we may try to use an old client. It might be able to compound as contentious readers may continue to delay being able to take the write, all the while in a "clean" state

Yeah, you are right, but the contention will be quite limited I think.
If I remember correctly, in Linux the RW lock uses "write-preferring" policy by default, so once a writer ask for the lock it only has to wait for the readers that already has the lock to unlock it, but has preference over any new reader. Given this the contention will be limited to just those readers that asked for the lock before the writer and Read lock is only hold while doing the clone which is a very fast operation (~us), so the contention time should be really small

I think the advantage is that we do not queue multiple write/rebuilds, which seems like a nice property. In the happier state, the old client is still valid and we keep on running nicely but eventually wind up with a fresh client. In the worse state, the client isn't usable and we produce errors while potentially blocking the resolution by causing read contention on the lock.

You are right again. Old client will be used while rebuild is done and that will fail if the old cert is not valid just after rotation, but I was assuming a healthy time window between rotation and certification expiry.

Assuming we held the rwlock before marking clean, a bunch of reads could await the write lock to resolve. On success, they get the newly minted client (probably ideal?). On fail, they get the old client and what happens happens. The downside is we could have multiple writelock awaits simultaneously queue since multiple calls to fetch could see the dirty state while we await any held read locks to clear. This might be kind of helpful, since at least write locks to not lead to further read contentions, but we might need to check if the dirty was resolved by someone else before actually rebuilding.

Yeah, the problem with that is we cannot hold a RWlock through an await operation, so we should change std::async::RWLock to tokio::async::RWLock which is not trivial, and as you have already said, we will have to use a double-checked locking.

Working from this idea I was thinking ...
What if we use a "rebuild_lock" (tokio::sync::Mutex) to serialize channel builds, so we avoid queuing multiple builds, it will be something like:

if take_dirty() is true
    get rebuild_lock                                    // only one rebuild in queue
    if take_dirty() is true                             // double-check 
        new_client = rebuild_channel().await
        *self.client.write().unwrap() = new_client      // quick write lock
    endif
    unlock rebuild_lock
endif

This will:

  • "serialize" rebuilds, so we only have one at a time
  • avoid readers to be block while rebuilding channel, as write lock is only hold for the client swap
  • no need to change std::sync::RWLock

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion. I agree we should have time to rotate out the client in most reasonable cases so this is probably a suitable implementation with a good balance of complexity.

It might be nice to add how long we waited for the write lock to the debug message. This way if we suspect contention, we already have some information that can be made available to troubleshoot. I don't think we need to immediately introduce more complexity though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.
I have added the elapsed time in write contention to the debug log.

@jlojosnegros
Copy link
Copy Markdown
Contributor Author

Will you be able to backport this fix to 1.28?

Yeah sur, no problem. Should be something like #1801

We'd typically only cherry-pick bug fixes. This is a grey area IMO. It could be seen as a bug of omission. At any rate, If we do want to consider this a bug we should use the cherry-pick methodology native to the project by labeling this issue for cherry-pick.

I have marked the PR as a draft because I was not sure about the process to "backport" this in the project. I assumed we would need to follow some formal process to do that, I just wanna see if the backport was easy to do.

Thanks for the rest of the comments. I will review them and will come back asap.

@ilrudie ilrudie added cherrypick/release-1.28 Set this label on a PR to auto-merge it to the release-1.28 branch cherrypick/release-1.29 Set this label on a PR to auto-merge it to the release-1.29 branch labels Mar 23, 2026
@ilrudie
Copy link
Copy Markdown
Contributor

ilrudie commented Mar 23, 2026

Will you be able to backport this fix to 1.28?

Yeah sur, no problem. Should be something like #1801

We'd typically only cherry-pick bug fixes. This is a grey area IMO. It could be seen as a bug of omission. At any rate, If we do want to consider this a bug we should use the cherry-pick methodology native to the project by labeling this issue for cherry-pick.

I have marked the PR as a draft because I was not sure about the process to "backport" this in the project. I assumed we would need to follow some formal process to do that, I just wanna see if the backport was easy to do.

Thanks for the rest of the comments. I will review them and will come back asap.

I started asking release managers in slack. I'm not sure if they will accept it or not but I labeled the PR for cherry-pick to 1.28 and 1.29 and we can sort that out in the release branches.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
@ilrudie ilrudie added the do-not-merge/hold Block automatic merging of a PR. label Mar 25, 2026
Copy link
Copy Markdown
Contributor

@ilrudie ilrudie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments. Thanks for the productive discussions.

@fjglira, I think this could merge very soon. It would be good to gather the needed opinions about cherry-picks as soon as we can.

manager.mark_dirty();
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion. I agree we should have time to rotate out the client in most reasonable cases so this is probably a suitable implementation with a good balance of complexity.

It might be nice to add how long we waited for the write lock to the debug message. This way if we suspect contention, we already have some information that can be made available to troubleshoot. I don't think we need to immediately introduce more complexity though.

Add write_lock_wait_ms to the debug log emitted after a successful
root cert hot-reload, so contention on the RwLock is observable in
logs without requiring additional instrumentation.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
@fjglira fjglira removed cherrypick/release-1.28 Set this label on a PR to auto-merge it to the release-1.28 branch cherrypick/release-1.29 Set this label on a PR to auto-merge it to the release-1.29 branch labels Mar 25, 2026
@fjglira
Copy link
Copy Markdown

fjglira commented Mar 25, 2026

According to the discussion on slack we can manually cherry-pick this changes as a bug fix but let’s add a flag to allow user to turn off the watcher. On master we can keep the current behavior

Copy link
Copy Markdown
Contributor

@ilrudie ilrudie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks for working on this!

@ilrudie ilrudie removed the do-not-merge/hold Block automatic merging of a PR. label Mar 26, 2026
@istio-testing istio-testing merged commit 642b1d7 into istio:master Mar 26, 2026
3 checks passed
istio-testing added a commit that referenced this pull request Mar 30, 2026
* remove git tag (#1559)

* update how io errors are being generated to fix clippy issues (#1564)

Signed-off-by: ilrudie <ian.rudie@solo.io>

* Automator: update common-files@master in istio/ztunnel@master (#1563)

* Automator: update common-files@master in istio/ztunnel@master (#1568)

* Allow dynamic configuration of thread count (#1566)

* Allow dynamic configuration of thread count

* fix flakes

* don't send to empty address (#1570)

* don't send to empty address

* add test

* Automator: update common-files@master in istio/ztunnel@master (#1571)

* Automator: update common-files@master in istio/ztunnel@master (#1572)

* remove invalid test cases from parsing of ZTUNNEL_WORKER_THREADS (#1576)

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* Automator: update common-files@master in istio/ztunnel@master (#1577)

* Automator: update common-files@master in istio/ztunnel@master (#1578)

* tls: add PQC compliance policy (#1561)

* tls: add PQC compliance policy

Signed-off-by: Jacek Ewertowski <jacek.ewertowski1@gmail.com>

* Add global lazy variable PQC_ENABLED

Signed-off-by: Jacek Ewertowski <jacek.ewertowski1@gmail.com>

* Add unused_imports and dead_code to PQC_ENABLED declaration

Signed-off-by: Jacek Ewertowski <jacek.ewertowski1@gmail.com>

---------

Signed-off-by: Jacek Ewertowski <jacek.ewertowski1@gmail.com>

* Automator: update common-files@master in istio/ztunnel@master (#1582)

* Improved Service Resolution (#1562)

* initial idea for improved resolution

Signed-off-by: ilrudie <ian.rudie@solo.io>

* handle preferred service namespace; unit testing

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

---------

Signed-off-by: ilrudie <ian.rudie@solo.io>
Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* Automator: update common-files@master in istio/ztunnel@master (#1583)

* Multinetwork/Support remote networks for services with waypoints (#1565)

* Multinetwork/Support remote networks for services with waypoints

Currently `build_request` when it sees a service with a waypoint
resolves the waypoint backend and routes request there using regular
HBONE.

In multi network scenario though the waypoint may have workload on a
remote network and to reach it we have to go through E/W gateway and use
double HBONE.

This change enables handling of services with waypoint on a remote
network.

Some of the assumptions that were used when I prepared this change:

1. We assume uniformity of configuration (e.g., if service X in local
   cluster has a waypoint, then service X in remote network also has a
   waypoint, if waypoint is service addressable, then it's using service
   to address waypoint both locally and on remote network)
2  Split-horizon representation of waypoint workloads, just like with
   any regular workloads and services (e.g., in the local cluster
   instead of an actual waypoint workload pointing to a pod on another
   network we will have a "proxy" representation that just has network
   gateway).

Both of those can be in hanled by the controlplane (e.g., controlplane
can generate split-horizon workloads and when configuration is
non-uniform, just filter out remote configs for remote networks), though
we don't yet have a complete implementation.

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

* Return an error instead of panicking

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

* Update comments in src/proxy/outbound.rs

Co-authored-by: Ian Rudie <ilrudie@gmail.com>

* Update comments in src/proxy/outbound.rs

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

* Add a debug assert to provide a bit more context to the error in tests

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

* Fix formatting

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

* Added a few debug logs to be able to trace when a workload on a remote network is picked

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>

---------

Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>
Co-authored-by: Ian Rudie <ilrudie@gmail.com>

* increasing limit for open files (#1586)

* increasing limit for open files

* suggestion from PR

* adding comment

* Update src/main.rs

Co-authored-by: Daniel Hawton <daniel@hawton.org>

---------

Co-authored-by: Daniel Hawton <daniel@hawton.org>

* Buffer inner h2 streams (#1580)

* Buffer h2 streams

* Tests

* naming

* Review

simplify code

* Automator: update common-files@master in istio/ztunnel@master (#1589)

* Automator: update common-files@master in istio/ztunnel@master (#1595)

* Automator: update common-files@master in istio/ztunnel@master (#1596)

* Automator: update common-files@master in istio/ztunnel@master (#1597)

* Automator: update common-files@master in istio/ztunnel@master (#1598)

* adopt rcgen 14 (#1599)

* adopt rcgen 14

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* fmt

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* fmt fuzz

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

---------

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* respect IPV6 setting for DNS server (#1601)

* Automator: update common-files@master in istio/ztunnel@master (#1604)

* chore - clippy cleanup (#1610)

* fix build.rs

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* clippy

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* remove swap

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

---------

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* Automator: update common-files@master in istio/ztunnel@master (#1607)

* Update rust version and dependencies (#1614)

* Update rust version and dependencies

* fix mismatch

* drop criterion ver

* bump

* Automator: update common-files@master in istio/ztunnel@master (#1615)

* Automator: update common-files@master in istio/ztunnel@master (#1618)

* fix: upgrade pprof to version 0.15.0 to fix GHSA-2gh3-rmm4-6rq5/CVE-2025-53605. Creator of upstream patch to fix CVE issue: tikv/pprof-rs@3d4e696 (#1606)

Signed-off-by: Kyle Steere <kyle.steere@chainguard.dev>

* Automator: update common-files@master in istio/ztunnel@master (#1619)

* Automator: update common-files@master in istio/ztunnel@master (#1621)

* Update ztunnel profiling doc (#1620)

* Update ztunnel profiling doc

* reset the numbers to 1 since markdown auto-renders them in order

* Automator: update common-files@master in istio/ztunnel@master (#1622)

* Add open file metrics (#1626)

* Add open file metrics

* Lint

* use /dev/fd and subtract one from fd count

* import ordering

* Automator: update common-files@master in istio/ztunnel@master (#1628)

* initial impl for passthrough services (#1627)

* initial impl for passthrough services

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* make gen

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* tests

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

---------

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* Automator: update common-files@master in istio/ztunnel@master (#1635)

* retain valid certs on fetch failures (#1567)

* retain valid certs on fetch failures

* better unit tests.
- fetches now records failed attempts as well.
- validate that valid certificate are retained across fetch attempts despite ca failures

* minor tweaks as per review comments.

* Automator: update common-files@master in istio/ztunnel@master (#1638)

* Automator: update common-files@master in istio/ztunnel@master (#1639)

* Automator: update common-files@master in istio/ztunnel@master (#1640)

* admin: allow symbols in jemalloc pprof (#1636)

Otherwise its pretty useless. Manually tested this makes heap profiles
work

* Set timeouts and keepalive (#1641)

* set timeouts

* Listener takes socket config

* Make certificate DER fields public (#1646)

* logs: disable spammy DNS logs (#1649)

These logs are crazy, like 5 lines per DNS query. We turned them off but
an update changed the log target.

Not sure what backports we need

* Automator: update common-files@master in istio/ztunnel@master (#1653)

* Automator: update common-files@master in istio/ztunnel@master (#1654)

* Automator: update common-files@master in istio/ztunnel@master (#1655)

* fix small typo (#1647)

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* Bumps deps (#1623)

* Bumps deps

* fixup

* fix

* fmt

* rfmt

* fmt and 1.90

* their way

* Automator: update common-files@master in istio/ztunnel@master (#1661)

* Automator: update common-files@master in istio/ztunnel@master (#1670)

* Bump cargo deps (#1669)

* removing CNAME record answer for wildcards (#1664)

* removing CNAME record answer for wildcards

* removing unnused trace

* fixing unit test

* removing unused fild/type

* make gen

* metrics: fix accept header negotiation (#1681)

Basically `get_all` does NOT unconditionally split a header that has
been sent on 1 line by the client. This breaks kube-prometheus-stack

* Automator: update common-files@master in istio/ztunnel@master (#1685)

* Set socket options outside trace! (#1689)

* Automator: update common-files@master in istio/ztunnel@master (#1696)

* Set keepalives on outbound connections (#1688)

* Set keepalives on outbound connections

* mut

* Update tonic/prost crates to 0.14.x (#1687)

The only relevant breaking change is that `prost::Message` is no longer
a supertrait of `fmt::Debug`, so that bound is added in the few places
that need it.

* Automator: update common-files@master in istio/ztunnel@master (#1700)

* Fix clippy lint errors (#1701)

* Fix clippy lint errors

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Ordering

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

---------

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Automator: update common-files@master in istio/ztunnel@master (#1702)

* Automator: update common-files@master in istio/ztunnel@master (#1705)

* provide test to verify socket options set (#1690)

* metrics: fix dns histogram (#1706)

* canonical wds service (#1704)

* initial impl

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* improvements to the impl

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* implement preferred service namespace handling

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* lints

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* unit for canonical service

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* cleanup old commented reference code

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* comment about canonical's purpose

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* add issue link for deprecation of preferred_service_namespace

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* unit to assert that namespace-local Service definitions are preferred over canonical

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

---------

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* dry run auth (#1659)

* dry run auth

* lint

* Expose client config builder for workload TLS (#1710)

Add WorkloadCertificate::client_config to build a rustls ClientConfig
using the existing verifier and root store, so callers can customize
SNI/ALPN without duplicating TLS config construction. Existing outbound
behavior remains unchanged.

* feat: implement crl support in ztunnel (#1660)

* feat: implements ca-crl support in zTunnel

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: drains connection only for revoked cert

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* feat: revokes affected inbound connections only

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: refactors crl watcher

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: rejects new connections only

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: clr validation only at the HBONE layer

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: use rustls-webpki for CRL validation

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: validates CRL using webpki instead of custom implementation.

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: addresses review comments

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: implements CRL validation entirely with webpki's verify_for_usage method.

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: reverts verify_for_usage

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: removes comment

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: address review comments

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: fixes lock

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: uses Option rather than extra var

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: addresses review comments

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* chore: fixes merge conflict

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

---------

Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>

* add support for TLSv1.2 (#1711)

by setting `TLS12_ENABLED` to `true`, ztunnel will negotiate TLSv1.2 or
1.3.

Fixes #1296 until we have FIPS-140-3 support in istiod.

* Automator: update common-files@master in istio/ztunnel@master (#1716)

* enhance ztunnel metrics (#1695)

* enhance ztunnel metrics

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* make gen

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* change socket labels, review fixes

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* use gauge metric for socket tracking

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* expand connection falure checking

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* fix cargo issues

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* make gen

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* remove unneeded direction label

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* remove unused enum

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* downcast error for metrics

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

---------

Signed-off-by: Lucas Copi <lucas.copi@solo.io>

* Automator: update common-files@master in istio/ztunnel@master (#1721)

* Change log level from info to debug for "response received" (#1723)

* Embed test data files to enable running tests outside source tree (#1720)

* Embed test data files to enable running tests outside source tree

Several unit tests depend on external files via relative paths, causing
failures when the test binary is executed from a different directory
than the source tree (e.g. in CI environments that copy binaries to
isolated locations).

This change:
- Embed test data files at compile time using `include_str!`
- Adds `temp_file_with_content()` helper function for tests requiring
  file paths
- Uses `AuthSource::StaticToken` for CA client tests
- Moves tempfile from dev-dependencies to optional dependency activated
  by the testing feature

Affected tests:
- config::tests::config_from_proxyconfig
- identity::caclient::tests::{empty_chain, fetch_certificate,
  wrong_identity}
- state::workload::tests::local_client

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* small change `make gen`

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

---------

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* Automator: update common-files@master in istio/ztunnel@master (#1726)

* Add x forwarded network header (#1728)

* Initial plan

* Add x-origin-source header to inner CONNECT requests in double HBONE

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>

* Add comment explaining single HBONE codepath and tests for x-origin-source header

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>

* Rename header from x-origin-source to x-istio-origin-source

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>

* Rename header to x-istio-origin-network and refocus test on double HBONE

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>

* Add test back

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Use inbound x-istio-origin-network to know whether traffic originates from the gateway

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* fmt

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Actually check value of origin header

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Cargo fmt

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Remove incorrect TODO

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Update src/proxy.rs

Co-authored-by: Ian Rudie <ilrudie@gmail.com>

* Complete the rename

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Find/replace was too ambitious

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Fix inverted logic

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* fmt

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

---------

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>
Co-authored-by: Ian Rudie <ilrudie@gmail.com>

* Automator: update common-files@master in istio/ztunnel@master (#1737)

* Ambient Multicluster Telemetry (#1734)

* Add addtl codeowners for experimental (#1732)

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* More baggage support (#1731)

* baggage

* Use baggage for cross-cluster

* fix unit tests

* Fix namespaced tests. Remove extra code.

* cleanup a bit

* Initial plan

* Add x-origin-source header to inner CONNECT requests in double HBONE

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>

* Add comment explaining single HBONE codepath and tests for x-origin-source header

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>

* Rename header from x-origin-source to x-istio-origin-source

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>

* Rename header to x-istio-origin-network and refocus test on double HBONE

Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>

* Add test back

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Use inbound x-istio-origin-network to know whether traffic originates from the gateway

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* fmt

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* lint

* Codeowners

* Fix rebase

---------

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>
Co-authored-by: Keith Mattix II <keithmattix@microsoft.com>

* Cargo fmt

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Remove experimental codeowners

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Address review comments (#1738)

* Address PR comments

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

* Cargo fmt

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>

---------

Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Co-authored-by: Steven Jin <sjinxuan@microsoft.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>
Co-authored-by: Krinkin, Mike <mkrinkin@microsoft.com>

* PQC: openssl crypto provider support (#1743)

* PQC: openssl crypto provider support

ztunnel must be both compiled and ran with openssl >= 3.5.0.

Signed-off-by: Zuzana Miklankova <zmiklank@redhat.com>

* allow unusual byte groupings for openssl version detection

Signed-off-by: Zuzana Miklankova <zmiklank@redhat.com>

* running 'make gen'

Signed-off-by: Zuzana Miklankova <zmiklank@redhat.com>

---------

Signed-off-by: Zuzana Miklankova <zmiklank@redhat.com>

* Automator: update common-files@master in istio/ztunnel@master (#1750)

* Automator: update common-files@master in istio/ztunnel@master (#1758)

* prioritize canonical services on inbound (#1746)

* prioritize canonical services on inbound

* delete MatchReason

* fix itertools thing

* continue not return

* drop continue for into_iter

* as deref

* gencheck

* dry-run no allow policies match (#1745)

* dry-run no allow policies match

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* authpol logging macro + env var to allow info log level if desired

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* move macro to better location

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* clean up

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* update tests

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* fmt

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

---------

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* Automator: update common-files@master in istio/ztunnel@master (#1763)

* handle peer_addr() failure gracefully (#1764)

* proxy: handle peer_addr() failure gracefully instead of panicking

Replace .expect("must receive peer addr") with proper error handling in
outbound and socks5 proxy paths. A client can send RST immediately after
the TCP handshake completes, causing getpeername(2) to return ENOTCONN on
the already-queued socket. The previous .expect() converted this transient
OS-level error into a panic that killed the Tokio task.

- outbound: match on peer_addr(), log debug and return early on error
- socks5/handle_socks_connection: same pattern
- socks5/negotiate_socks_connection: use ? operator (From<io::Error> for
  SocksError already exists)

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* i socket: avoid panic in orig_dst_addr diagnostic log

peer_addr() and local_addr() inside the warn!() in orig_dst_addr() were
called with .unwrap(), which could panic if the socket became unavailable
before the log statement was reached. Replace with .map(...).unwrap_or_else()
so the warning logs "N.A." instead of crashing when the address cannot be
retrieved.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

---------

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* Automator: update common-files@master in istio/ztunnel@master (#1765)

* Automator: update common-files@master in istio/ztunnel@master (#1770)

* Bump the cargo group across 2 directories with 4 updates (#1771)

Bumps the cargo group with 2 updates in the / directory: [bytes](https://github.com/tokio-rs/bytes) and [time](https://github.com/time-rs/time).
Bumps the cargo group with 4 updates in the /fuzz directory: [bytes](https://github.com/tokio-rs/bytes), [tracing-subscriber](https://github.com/tokio-rs/tracing), [time](https://github.com/time-rs/time) and [crossbeam-channel](https://github.com/crossbeam-rs/crossbeam).


Updates `bytes` from 1.11.0 to 1.11.1
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](tokio-rs/bytes@v1.11.0...v1.11.1)

Updates `time` from 0.3.44 to 0.3.47
- [Release notes](https://github.com/time-rs/time/releases)
- [Changelog](https://github.com/time-rs/time/blob/main/CHANGELOG.md)
- [Commits](time-rs/time@v0.3.44...v0.3.47)

Updates `bytes` from 1.10.0 to 1.11.1
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](tokio-rs/bytes@v1.11.0...v1.11.1)

Updates `tracing-subscriber` from 0.3.19 to 0.3.22
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](tokio-rs/tracing@tracing-subscriber-0.3.19...tracing-subscriber-0.3.22)

Updates `time` from 0.3.37 to 0.3.47
- [Release notes](https://github.com/time-rs/time/releases)
- [Changelog](https://github.com/time-rs/time/blob/main/CHANGELOG.md)
- [Commits](time-rs/time@v0.3.44...v0.3.47)

Updates `crossbeam-channel` from 0.5.14 to 0.5.15
- [Release notes](https://github.com/crossbeam-rs/crossbeam/releases)
- [Changelog](https://github.com/crossbeam-rs/crossbeam/blob/master/CHANGELOG.md)
- [Commits](crossbeam-rs/crossbeam@crossbeam-channel-0.5.14...crossbeam-channel-0.5.15)

---
updated-dependencies:
- dependency-name: bytes
  dependency-version: 1.11.1
  dependency-type: direct:production
  dependency-group: cargo
- dependency-name: time
  dependency-version: 0.3.47
  dependency-type: direct:production
  dependency-group: cargo
- dependency-name: bytes
  dependency-version: 1.11.1
  dependency-type: indirect
  dependency-group: cargo
- dependency-name: tracing-subscriber
  dependency-version: 0.3.22
  dependency-type: indirect
  dependency-group: cargo
- dependency-name: time
  dependency-version: 0.3.47
  dependency-type: indirect
  dependency-group: cargo
- dependency-name: crossbeam-channel
  dependency-version: 0.5.15
  dependency-type: indirect
  dependency-group: cargo
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Automator: update common-files@master in istio/ztunnel@master (#1777)

* fix TLSv1.2 support by adding missing ciphersuites (#1779)

It turns out that Istio's SPIFFE certs use ECDSA certificates, which
only became a problem once I started testing against waypoint proxies.
This adds the missing CipherSuites (which are still FIPS-compliant of
course) to unblock Waypoint->ZTunnel communication.

* expand dependabot (#1778)

Signed-off-by: Daniel Hawton <daniel.hawton@solo.io>

* Dont log error on broken pipe (#1784)

* Automator: update common-files@master in istio/ztunnel@master (#1788)

* Automator: update common-files@master in istio/ztunnel@master (#1792)

* Automator: update common-files@master in istio/ztunnel@master (#1799)

* Automator: update common-files@master in istio/ztunnel@master (#1800)

* Automator: update common-files@master in istio/ztunnel@master (#1802)

* Automator: update common-files@master in istio/ztunnel@master (#1810)

* identity: reload CA root cert channel on file change (#1775)

* RootCertManager: Add new  CrlCertManager-like struct

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* caclient: rebuild channel when root cert changes

Use the new RootCertManager to rebuild the TLS gRPC channel when a root
cert change is noted.

Need to add some fields to store the information to rebuild the channel.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* small adaptations

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* Some unit tests

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* solve some compilation problems

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* delete is_dirty as it is not used

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* some clippy adjustments

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* adding some comments

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* addressing comments

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* log write lock wait time after TLS channel rebuild

Add write_lock_wait_ms to the debug log emitted after a successful
root cert hot-reload, so contention on the RwLock is observable in
logs without requiring additional instrumentation.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

---------

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* update after merge master branch

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: ilrudie <ian.rudie@solo.io>
Signed-off-by: Ian Rudie <ian.rudie@solo.io>
Signed-off-by: Jacek Ewertowski <jacek.ewertowski1@gmail.com>
Signed-off-by: Mikhail Krinkin <mkrinkin@microsoft.com>
Signed-off-by: Kyle Steere <kyle.steere@chainguard.dev>
Signed-off-by: Keith Mattix II <keithmattix@microsoft.com>
Signed-off-by: nilekh <1626598+nilekhc@users.noreply.github.com>
Signed-off-by: Lucas Copi <lucas.copi@solo.io>
Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Signed-off-by: Zuzana Miklankova <zmiklank@redhat.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Daniel Hawton <daniel.hawton@solo.io>
Signed-off-by: zirain <zirain2009@gmail.com>
Co-authored-by: Ian Rudie <ian.rudie@solo.io>
Co-authored-by: Istio Automation <istio-testing-bot@google.com>
Co-authored-by: John Howard <john.howard@solo.io>
Co-authored-by: Steven Landow <steven@landow.dev>
Co-authored-by: Jacek Ewertowski <jacek.ewertowski1@gmail.com>
Co-authored-by: Krinkin, Mike <krinkin.m.u@gmail.com>
Co-authored-by: Ian Rudie <ilrudie@gmail.com>
Co-authored-by: Gustavo Meira <grnmeira@users.noreply.github.com>
Co-authored-by: Daniel Hawton <daniel@hawton.org>
Co-authored-by: Steven Jin <sjinxuan@microsoft.com>
Co-authored-by: Kyle Steere <kbsteere@users.noreply.github.com>
Co-authored-by: Arka Bhattacharya <21124287+find-arka@users.noreply.github.com>
Co-authored-by: deveshdama <87668846+deveshdama@users.noreply.github.com>
Co-authored-by: Mantas Matelis <me@mantasmatelis.com>
Co-authored-by: Tamir Duberstein <tamird@gmail.com>
Co-authored-by: Keith Mattix II <keithmattix@microsoft.com>
Co-authored-by: Mike Zappa <michael.zappa@gmail.com>
Co-authored-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>
Co-authored-by: Daniel Grimm <dgrimm@redhat.com>
Co-authored-by: lcopi <lucas.copi@solo.io>
Co-authored-by: Ram Vennam <ram.vennam@solo.io>
Co-authored-by: Jose Luis Ojosnegros <jojosneg@redhat.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: keithmattix <1531662+keithmattix@users.noreply.github.com>
Co-authored-by: Krinkin, Mike <mkrinkin@microsoft.com>
Co-authored-by: Zuzana Miklánková <zmiklank@redhat.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Daniel Hawton <daniel.hawton@solo.io>
ilrudie pushed a commit to ilrudie/istio-ztunnel that referenced this pull request Apr 2, 2026
* RootCertManager: Add new  CrlCertManager-like struct

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* caclient: rebuild channel when root cert changes

Use the new RootCertManager to rebuild the TLS gRPC channel when a root
cert change is noted.

Need to add some fields to store the information to rebuild the channel.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* small adaptations

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* Some unit tests

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* solve some compilation problems

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* delete is_dirty as it is not used

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* some clippy adjustments

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* adding some comments

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* addressing comments

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* log write lock wait time after TLS channel rebuild

Add write_lock_wait_ms to the debug log emitted after a successful
root cert hot-reload, so contention on the RwLock is observable in
logs without requiring additional instrumentation.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

---------

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
istio-testing pushed a commit that referenced this pull request Apr 2, 2026
* identity: reload CA root cert channel on file change (#1775)

* RootCertManager: Add new  CrlCertManager-like struct

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* caclient: rebuild channel when root cert changes

Use the new RootCertManager to rebuild the TLS gRPC channel when a root
cert change is noted.

Need to add some fields to store the information to rebuild the channel.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* small adaptations

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* Some unit tests

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* solve some compilation problems

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* delete is_dirty as it is not used

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* some clippy adjustments

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* adding some comments

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* addressing comments

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* log write lock wait time after TLS channel rebuild

Add write_lock_wait_ms to the debug log emitted after a successful
root cert hot-reload, so contention on the RwLock is observable in
logs without requiring additional instrumentation.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

---------

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* minimal fixes from subsequent dependabot work

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

* add CA_CERT_WATCHER flag to disable root cert hot-reload

Introduces a boolean env var CA_CERT_WATCHER (default: true) that allows
operators to disable the CA root cert file watcher at runtime.
When set to false, no watcher thread is started and the gRPC channel is never
rebuilt on cert rotation — the startup-time cert is retained permanently.

The flag is only effective when ca_root_cert is a file path; Static and
Default certs never start a watcher regardless.

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>

* suppress clippy error

Signed-off-by: Ian Rudie <ian.rudie@solo.io>

---------

Signed-off-by: Jose Luis Ojosnegros Manchón <jojosneg@redhat.com>
Signed-off-by: Ian Rudie <ian.rudie@solo.io>
Co-authored-by: Jose Luis Ojosnegros <jojosneg@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants