(2.14) Separate info requests into their own JS API queue#7898
(2.14) Separate info requests into their own JS API queue#7898neilalexander merged 1 commit intomainfrom
Conversation
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a758f7152a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
this could affect the ordering I think, which is one of the readons why we rolled back the optimized consumer info path a while ago… |
|
Ordering is already not guaranteed, as we are pulling things off the IPQ in multiple goroutines anyway. |
a758f71 to
6758cd6
Compare
6758cd6 to
acd1069
Compare
Signed-off-by: Neil Twigg <neil@nats.io>
acd1069 to
67a0316
Compare
Tests demonstrate race conditions when consumer info is queried soon after consumer create in a clustered environment. These races are relevant to PR nats-io#7898 (separate info queue) analysis: - TestJetStreamClusterConsumerInfoDuringInflightCreate: Shows that when meta apply is paused on a follower, consumerAssignment() returns nil on that node while the consumer leader on another node responds. - TestJetStreamClusterConsumerInfoBeforeRaftGroupStarted: Attempts to catch the brief window where assignment exists but consumer Raft group hasn't started (node==nil path at jetstream_api.go:4834). - TestJetStreamClusterConsumerInfoScatterBeforeLeaderElected: Tests the window where consumer Raft group exists but no leader elected yet, allowing all members to respond (jetstream_api.go:4862). - TestJetStreamClusterConsumerInfoNotFoundDuringPipelinedCreate: Reliably reproduces 50/50 "consumer not found" errors when create and info are pipelined, because jsConsumerInfoRequest uses consumerAssignment() (only applied) rather than consumerAssignmentOrInflight(). - TestJetStreamClusterConsumerInfoPausedMetaApply: Pauses one follower's meta apply to show consumerAssignment() returns nil on that node while the consumer operates normally on the other two nodes. https://claude.ai/code/session_01Cv5a3WaLfJSU9vYsLHjUtD
We wouldn't run into similar issues, as the mentioned PR tried to short-circuit a 404 and then not sending the request on to other nodes. The issue was that that could prevent the consumer leader from getting the request and answering the consumer info. This PR doesn't functionally change behavior in that regard, even if handled with separate queues. An application doing a consumer create, getting a success response and then doing an info will succeed as expected. |
|
got it, thanks for the info 👍 |
| default: | ||
| select { | ||
| case <-infoqueue.ch: | ||
| processFromQueue(infoqueue) |
There was a problem hiding this comment.
Looks like info queue processing could starve under continuous load? Is this a concern here?
There was a problem hiding this comment.
It could, but would require all CPU cores to be locked on queue rather than infoqueue simultaneously.
Deprioritise calls to account info, stream info/list, consumer info/list etc, so that overuse of these API endpoints is less detrimental to a loaded system. The info queue can be drained separately to the request queue when overloaded.
Also adds a new
info_queue_limitoption to configure this separately, defaults to the same value asrequest_queue_limitif not specified.Signed-off-by: Neil Twigg neil@nats.io