Skip to content

[FIXED] Gateway RS+/- blocks on account fetch#7449

Merged
neilalexander merged 1 commit intomainfrom
maurice/gw-acc-fetch
Oct 28, 2025
Merged

[FIXED] Gateway RS+/- blocks on account fetch#7449
neilalexander merged 1 commit intomainfrom
maurice/gw-acc-fetch

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

@MauriceVanVeen MauriceVanVeen commented Oct 20, 2025

The methods processGatewayRSub and processGatewayRUnsub can call into updateInterestForAccountOnGateway, which in turn calls s.LookupAccount. If the account is not known it will be fetched if there's an account resolver. But, since this is done while in the read loop of the gateway, this blocks not only receiving the response for the account fetch (when using a NATS resolver) but also any other operation like PING/PONG, which then could lead to a stale connection.

The fix proposed by this PR is to not allow fetching the account inline of the gateway read loop.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

@MauriceVanVeen MauriceVanVeen changed the title [FIXED] Gateway R+/- blocks on account fetch [FIXED] Gateway RS+/- blocks on account fetch Oct 20, 2025
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen
Copy link
Copy Markdown
Member Author

I originally included this comment in the PR description:

However, TestJetStreamSuperClusterMixedModeSwitchToInterestOnlyOperatorConfig fails with Server C3-S1 - outbound gateway connection "C2": no account "A.." found in map. To resolve this, could change the assert to checkGWInterestOnlyModeOrNotPresent with notPresentOk=true if this is expected. But, not sure if this is a proper approach? Don't think blocking the gateway's read loop should be happening in any case though.

Have updated the test to include notPresentOk=true.

Not fetching the account in the gateway's readLoop upon receiving RS+/- seems to be totally fine since it's only even relevant if a leaf node is connected to the server. If none is connected, we can safely skip.

And, I've tested this condition as well where a server gets the RS+ over a gateway for an account that it doesn't know and then also doesn't fetch. When adding a leaf node connection to this server it itself will do the account fetching, which will be run in the client's readLoop as expected and not block the gateway's readLoop. Messages will then still be routed properly from the leaf node, through the gateway server that didn't fetch the account before, then to a gateway server that has a subscription in that account.

@MauriceVanVeen MauriceVanVeen marked this pull request as ready for review October 23, 2025 10:54
@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner October 23, 2025 10:54
Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit 3375f58 into main Oct 28, 2025
89 of 92 checks passed
@neilalexander neilalexander deleted the maurice/gw-acc-fetch branch October 28, 2025 12:16
neilalexander added a commit that referenced this pull request Oct 30, 2025
Includes the following:

- #7435
- #7433
- #7436
- #7443
- #7440
- #7444
- #7452
- #7455
- #7458
- #7465
- #7466
- #7474
- #7469
- #7460
- #7449
- #7484
- #7479
- #7486
- #7495
- #7482
- #7496

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants