Skip to content

(2.14) Separate info requests into their own JS API queue#7898

Merged
neilalexander merged 1 commit intomainfrom
neil/infoqueue
Mar 4, 2026
Merged

(2.14) Separate info requests into their own JS API queue#7898
neilalexander merged 1 commit intomainfrom
neil/infoqueue

Conversation

@neilalexander
Copy link
Copy Markdown
Member

Deprioritise calls to account info, stream info/list, consumer info/list etc, so that overuse of these API endpoints is less detrimental to a loaded system. The info queue can be drained separately to the request queue when overloaded.

Also adds a new info_queue_limit option to configure this separately, defaults to the same value as request_queue_limit if not specified.

Signed-off-by: Neil Twigg neil@nats.io

@neilalexander
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a758f7152a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@wallyqs
Copy link
Copy Markdown
Member

wallyqs commented Mar 3, 2026

this could affect the ordering I think, which is one of the readons why we rolled back the optimized consumer info path a while ago…

@neilalexander
Copy link
Copy Markdown
Member Author

Ordering is already not guaranteed, as we are pulling things off the IPQ in multiple goroutines anyway.

Signed-off-by: Neil Twigg <neil@nats.io>
Copy link
Copy Markdown
Member

@MauriceVanVeen MauriceVanVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wallyqs
Copy link
Copy Markdown
Member

wallyqs commented Mar 4, 2026

We had reverted this #6176 due to unexpected results in the creation of consumer create then info (at #6250) are we running into similar issues here maybe?

wallyqs pushed a commit to wallyqs/nats-server that referenced this pull request Mar 4, 2026
Tests demonstrate race conditions when consumer info is queried soon after
consumer create in a clustered environment. These races are relevant to
PR nats-io#7898 (separate info queue) analysis:

- TestJetStreamClusterConsumerInfoDuringInflightCreate: Shows that when
  meta apply is paused on a follower, consumerAssignment() returns nil
  on that node while the consumer leader on another node responds.

- TestJetStreamClusterConsumerInfoBeforeRaftGroupStarted: Attempts to
  catch the brief window where assignment exists but consumer Raft group
  hasn't started (node==nil path at jetstream_api.go:4834).

- TestJetStreamClusterConsumerInfoScatterBeforeLeaderElected: Tests the
  window where consumer Raft group exists but no leader elected yet,
  allowing all members to respond (jetstream_api.go:4862).

- TestJetStreamClusterConsumerInfoNotFoundDuringPipelinedCreate: Reliably
  reproduces 50/50 "consumer not found" errors when create and info are
  pipelined, because jsConsumerInfoRequest uses consumerAssignment() (only
  applied) rather than consumerAssignmentOrInflight().

- TestJetStreamClusterConsumerInfoPausedMetaApply: Pauses one follower's
  meta apply to show consumerAssignment() returns nil on that node while
  the consumer operates normally on the other two nodes.

https://claude.ai/code/session_01Cv5a3WaLfJSU9vYsLHjUtD
@MauriceVanVeen
Copy link
Copy Markdown
Member

We had reverted this #6176 due to unexpected results in the creation of consumer create then info (at #6250) are we running into similar issues here maybe?

We wouldn't run into similar issues, as the mentioned PR tried to short-circuit a 404 and then not sending the request on to other nodes. The issue was that that could prevent the consumer leader from getting the request and answering the consumer info.

This PR doesn't functionally change behavior in that regard, even if handled with separate queues. An application doing a consumer create, getting a success response and then doing an info will succeed as expected.

@wallyqs
Copy link
Copy Markdown
Member

wallyqs commented Mar 4, 2026

got it, thanks for the info 👍

default:
select {
case <-infoqueue.ch:
processFromQueue(infoqueue)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like info queue processing could starve under continuous load? Is this a concern here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but would require all CPU cores to be locked on queue rather than infoqueue simultaneously.

@neilalexander neilalexander marked this pull request as ready for review March 4, 2026 13:35
@neilalexander neilalexander requested a review from a team as a code owner March 4, 2026 13:35
@neilalexander neilalexander merged commit 9203670 into main Mar 4, 2026
48 checks passed
@neilalexander neilalexander deleted the neil/infoqueue branch March 4, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants