[IMPROVED] Speed up a mirror or source consumer's resync across leafnode connections. by derekcollison · Pull Request #6981 · nats-io/nats-server

derekcollison · 2025-06-15T22:12:15Z

When a consumer for a source or a mirror is failing to be created, we backoff creation attempts.
If the failures were due to a downed leafnode, meaning the mirror or source is across a leafnode connection, the resync could take more time then desired after the leafnode reconnects.

This improves resync time by hooking into the leafnode's reconnect logic (either via connect or async info).
Once we detect the reconnect, we search for streams that are leaders and are mirrors or sources, and do not have an active sync consumer.

If we detect this we will reset the consumer backoff and retry with just a small jitter backoff.

Signed-off-by: Derek Collison derek@nats.io

Signed-off-by: Derek Collison <derek@nats.io>

…an extended downtime across a leafnode. When a consumer for a source or a mirror is failing to be created, we backoff creation attempts. If the failures were due to a downed leafnode, meaning the mirror or source is across a leafnode connection, the resync could take more time then desired after the leafnode reconnects. This improves resync time by hooking into the leafnode's reconnect logic (either via connect or async info). Once we detect the reconnect, we search for streams that are leaders, and are a mirror or sourcing from another stream, and do not have an active sync consumer. If we detect this we will reset the consumer backoff and retry with just a small jitter backoff. Signed-off-by: Derek Collison <derek@nats.io>

neilalexander

LGTM

server/jetstream_leafnode_test.go

Includes the following: - #6922 - #6931 - #6933 - #6934 - #6939 - #6938 - #6940 - #6941 - #6942 - #6943 - #6945 - #6944 - #6947 - #6948 - #6949 - #6956 - #6960 - #6961 - #6951 - #6965 - #6968 - #6981 - #6983 - #6984 Signed-off-by: Neil Twigg <neil@nats.io>

…are re-established. We previously improved this with PR #6981 - but this ws too rigid. It expected the LN to have JS enabled and have the same domain. The test also simulated a long time for the link to be down and manually changed the state to no in progress (si.sip). For simpler setups this worked, but if LNs were daisy chained, and either the GW Leafnode did not have JS enabled, or if enabled it would have a different domain, meaning the speedup would fail. Now we are much more broad about the conditions to retry. I did look into checking for $JS.<DOMAIN>.API.INFO but this was brittle and depended on timing and doing retries or backoffs. Will revisit in the future (We do have the ability to register for a callback for interest in a subject which could be utilized). For now this works well, and is simple, and the cost of being "wrong" in very complicated setups is minimal. Signed-off-by: Derek Collison <derek@nats.io>

derekcollison added 2 commits June 15, 2025 17:47

Allow our testing proxy to be restarted to simulate network down events

db53602

Signed-off-by: Derek Collison <derek@nats.io>

derekcollison requested a review from a team as a code owner June 15, 2025 22:12

neilalexander approved these changes Jun 15, 2025

View reviewed changes

server/jetstream_leafnode_test.go Show resolved Hide resolved

derekcollison merged commit 0e2e28e into main Jun 15, 2025
90 of 92 checks passed

derekcollison deleted the resync-faster branch June 15, 2025 23:16

neilalexander mentioned this pull request Jun 17, 2025

Cherry-picks for 2.11.5-RC.1 #6985

Merged

derekcollison mentioned this pull request Sep 6, 2025

[IMPROVED] Resync for Mirrors and Sources on LN reconnect in complex topologies #7265

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[IMPROVED] Speed up a mirror or source consumer's resync across leafnode connections.#6981

[IMPROVED] Speed up a mirror or source consumer's resync across leafnode connections.#6981
derekcollison merged 2 commits intomainfrom
resync-faster

derekcollison commented Jun 15, 2025

Uh oh!

neilalexander left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

derekcollison commented Jun 15, 2025

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants