Skip to content

Parallelisation when enabling JetStream#7482

Merged
neilalexander merged 2 commits intomainfrom
neil/enablejetstream
Oct 30, 2025
Merged

Parallelisation when enabling JetStream#7482
neilalexander merged 2 commits intomainfrom
neil/enablejetstream

Conversation

@neilalexander
Copy link
Copy Markdown
Member

@neilalexander neilalexander commented Oct 28, 2025

This introduces parallelisation into the JetStream startup, which should hopefully reduce startup time on systems with many streams, large streams or with higher numbers of CPU cores available.

A second subtle change is introduced here, which is that consumers are started immediately after their parent stream, rather than collected and started afterwards. Same for updating the interest state. This was originally broken out into different steps to ensure that sourcing consumers would work right in very old versions, but this is no longer relevant as ephemeral sourcing consumers are now created/recreated automatically as needed in the background.

Signed-off-by: Neil Twigg neil@nats.io

@neilalexander neilalexander marked this pull request as ready for review October 28, 2025 16:42
@neilalexander neilalexander requested a review from a team as a code owner October 28, 2025 16:42
Copy link
Copy Markdown
Member

@MauriceVanVeen MauriceVanVeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexbozhenko
Copy link
Copy Markdown
Member

alexbozhenko commented Oct 28, 2025

which should hopefully reduce startup time on systems with many streams

How was this tested? Do we have a confirmation that startup time actually improves with this?
Should we add a benchmark for the index rebuilds?

Signed-off-by: Neil Twigg <neil@nats.io>
@neilalexander
Copy link
Copy Markdown
Member Author

With GOMAXPROCS=1:

$ find ~/jetstream -type f -name "index.db" -exec truncate -s 0 {} + && GOMAXPROCS=1 go run . -c ~/server.conf
...
[87465] 2025/10/30 13:45:53.004933 [INF] -------------------------------------------
[87465] 2025/10/30 13:45:53.005451 [INF]   Starting restore for stream 'neil > stream_a'
[87465] 2025/10/30 13:45:53.005607 [WRN] Filestore [stream_a] Stream state too short (0 bytes)
[87465] 2025/10/30 13:45:53.005612 [WRN] Filestore [stream_a] Recovering stream state from index errored: corrupt state file
[87465] 2025/10/30 13:45:53.383557 [INF]   Restored 1,000,000 messages for stream 'neil > stream_a' in 378ms
[87465] 2025/10/30 13:45:53.383699 [INF]   Starting restore for stream 'neil > stream_b'
[87465] 2025/10/30 13:45:53.383930 [WRN] Filestore [stream_b] Stream state too short (0 bytes)
[87465] 2025/10/30 13:45:53.383935 [WRN] Filestore [stream_b] Recovering stream state from index errored: corrupt state file
[87465] 2025/10/30 13:45:53.757303 [INF]   Restored 1,000,000 messages for stream 'neil > stream_b' in 374ms
[87465] 2025/10/30 13:45:53.757439 [INF]   Starting restore for stream 'neil > stream_c'
[87465] 2025/10/30 13:45:53.757637 [WRN] Filestore [stream_c] Stream state too short (0 bytes)
[87465] 2025/10/30 13:45:53.757643 [WRN] Filestore [stream_c] Recovering stream state from index errored: corrupt state file
[87465] 2025/10/30 13:45:54.169766 [INF]   Restored 1,000,000 messages for stream 'neil > stream_c' in 412ms
[87465] 2025/10/30 13:45:54.169910 [INF]   Starting restore for stream 'neil > stream_d'
[87465] 2025/10/30 13:45:54.170106 [WRN] Filestore [stream_d] Stream state too short (0 bytes)
[87465] 2025/10/30 13:45:54.170110 [WRN] Filestore [stream_d] Recovering stream state from index errored: corrupt state file
[87465] 2025/10/30 13:45:54.506368 [INF]   Restored 1,000,000 messages for stream 'neil > stream_d' in 336ms
[87465] 2025/10/30 13:45:54.506515 [INF]   Starting restore for stream 'neil > stream_e'
[87465] 2025/10/30 13:45:54.506701 [WRN] Filestore [stream_e] Stream state too short (0 bytes)
[87465] 2025/10/30 13:45:54.506707 [WRN] Filestore [stream_e] Recovering stream state from index errored: corrupt state file
[87465] 2025/10/30 13:45:54.988585 [INF]   Restored 1,000,000 messages for stream 'neil > stream_e' in 482ms
[87465] 2025/10/30 13:45:54.988733 [INF]   Starting restore for stream 'neil > stream_f'
[87465] 2025/10/30 13:45:54.988934 [WRN] Filestore [stream_f] Stream state too short (0 bytes)
[87465] 2025/10/30 13:45:54.988938 [WRN] Filestore [stream_f] Recovering stream state from index errored: corrupt state file
[87465] 2025/10/30 13:45:55.327075 [INF]   Restored 1,000,000 messages for stream 'neil > stream_f' in 338ms
[87465] 2025/10/30 13:45:55.327209 [INF]   Starting restore for stream 'neil > stream_g'
[87465] 2025/10/30 13:45:55.327417 [WRN] Filestore [stream_g] Stream state too short (0 bytes)
[87465] 2025/10/30 13:45:55.327421 [WRN] Filestore [stream_g] Recovering stream state from index errored: corrupt state file
[87465] 2025/10/30 13:45:55.660941 [INF]   Restored 1,000,000 messages for stream 'neil > stream_g' in 334ms
[87465] 2025/10/30 13:45:55.661068 [INF]   Starting restore for stream 'neil > stream_h'
[87465] 2025/10/30 13:45:55.661253 [WRN] Filestore [stream_h] Stream state too short (0 bytes)
[87465] 2025/10/30 13:45:55.661256 [WRN] Filestore [stream_h] Recovering stream state from index errored: corrupt state file
[87465] 2025/10/30 13:45:56.012229 [INF]   Restored 1,000,000 messages for stream 'neil > stream_h' in 351ms
[87465] 2025/10/30 13:45:56.012378 [INF] Took 3.007594333s to start JetStream

With GOMAXPROCS=8:

$ find ~/jetstream -type f -name "index.db" -exec truncate -s 0 {} + && GOMAXPROCS=8 go run . -c ~/server.conf
...
[87480] 2025/10/30 13:46:23.210529 [INF]   API Level:       2
[87480] 2025/10/30 13:46:23.210530 [INF] -------------------------------------------
[87480] 2025/10/30 13:46:23.211061 [INF]   Starting restore for stream 'neil > stream_e'
[87480] 2025/10/30 13:46:23.211069 [INF]   Starting restore for stream 'neil > stream_h'
[87480] 2025/10/30 13:46:23.211080 [INF]   Starting restore for stream 'neil > stream_d'
[87480] 2025/10/30 13:46:23.211066 [INF]   Starting restore for stream 'neil > stream_f'
[87480] 2025/10/30 13:46:23.211116 [INF]   Starting restore for stream 'neil > stream_g'
[87480] 2025/10/30 13:46:23.211140 [INF]   Starting restore for stream 'neil > stream_b'
[87480] 2025/10/30 13:46:23.211205 [INF]   Starting restore for stream 'neil > stream_a'
[87480] 2025/10/30 13:46:23.211263 [INF]   Starting restore for stream 'neil > stream_c'
[87480] 2025/10/30 13:46:23.211393 [WRN] Filestore [stream_h] Stream state too short (0 bytes)
[87480] 2025/10/30 13:46:23.211401 [WRN] Filestore [stream_h] Recovering stream state from index errored: corrupt state file
[87480] 2025/10/30 13:46:23.211438 [WRN] Filestore [stream_d] Stream state too short (0 bytes)
[87480] 2025/10/30 13:46:23.211446 [WRN] Filestore [stream_d] Recovering stream state from index errored: corrupt state file
[87480] 2025/10/30 13:46:23.211586 [WRN] Filestore [stream_e] Stream state too short (0 bytes)
[87480] 2025/10/30 13:46:23.211597 [WRN] Filestore [stream_e] Recovering stream state from index errored: corrupt state file
[87480] 2025/10/30 13:46:23.211642 [WRN] Filestore [stream_g] Stream state too short (0 bytes)
[87480] 2025/10/30 13:46:23.211646 [WRN] Filestore [stream_g] Recovering stream state from index errored: corrupt state file
[87480] 2025/10/30 13:46:23.211699 [WRN] Filestore [stream_b] Stream state too short (0 bytes)
[87480] 2025/10/30 13:46:23.211710 [WRN] Filestore [stream_b] Recovering stream state from index errored: corrupt state file
[87480] 2025/10/30 13:46:23.211725 [WRN] Filestore [stream_c] Stream state too short (0 bytes)
[87480] 2025/10/30 13:46:23.211751 [WRN] Filestore [stream_a] Stream state too short (0 bytes)
[87480] 2025/10/30 13:46:23.211942 [WRN] Filestore [stream_a] Recovering stream state from index errored: corrupt state file
[87480] 2025/10/30 13:46:23.211947 [WRN] Filestore [stream_c] Recovering stream state from index errored: corrupt state file
[87480] 2025/10/30 13:46:23.211702 [WRN] Filestore [stream_f] Stream state too short (0 bytes)
[87480] 2025/10/30 13:46:23.211961 [WRN] Filestore [stream_f] Recovering stream state from index errored: corrupt state file
[87480] 2025/10/30 13:46:23.707344 [INF]   Restored 1,000,000 messages for stream 'neil > stream_e' in 496ms
[87480] 2025/10/30 13:46:23.736176 [INF]   Restored 1,000,000 messages for stream 'neil > stream_g' in 525ms
[87480] 2025/10/30 13:46:23.743631 [INF]   Restored 1,000,000 messages for stream 'neil > stream_d' in 533ms
[87480] 2025/10/30 13:46:23.749859 [INF]   Restored 1,000,000 messages for stream 'neil > stream_c' in 539ms
[87480] 2025/10/30 13:46:23.754698 [INF]   Restored 1,000,000 messages for stream 'neil > stream_b' in 544ms
[87480] 2025/10/30 13:46:23.768052 [INF]   Restored 1,000,000 messages for stream 'neil > stream_a' in 557ms
[87480] 2025/10/30 13:46:23.771218 [INF]   Restored 1,000,000 messages for stream 'neil > stream_h' in 560ms
[87480] 2025/10/30 13:46:23.771670 [INF]   Restored 1,000,000 messages for stream 'neil > stream_f' in 561ms
[87480] 2025/10/30 13:46:23.771732 [INF] Took 561.428667ms to start JetStream

Signed-off-by: Neil Twigg <neil@nats.io>
@neilalexander neilalexander merged commit 898f96c into main Oct 30, 2025
89 of 92 checks passed
@neilalexander neilalexander deleted the neil/enablejetstream branch October 30, 2025 15:42
neilalexander added a commit that referenced this pull request Oct 30, 2025
Includes the following:

- #7435
- #7433
- #7436
- #7443
- #7440
- #7444
- #7452
- #7455
- #7458
- #7465
- #7466
- #7474
- #7469
- #7460
- #7449
- #7484
- #7479
- #7486
- #7495
- #7482
- #7496

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Nov 3, 2025
Due to the stream restore parallelization introduced in
#7482, we can panic on
`access time service not running`. Since `ats.Register()` can now be
called in parallel as well.

We can simply store an initial value, on startup it's no issue that this
timestamp is only slightly stale.

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
neilalexander added a commit that referenced this pull request Nov 5, 2025
Includes the following:

- #7416
- #7425
- #7486
- #7495
- #7482
- #7496
- #7499
- #7503
- #7508 (excluding weak
pointer/cache-related changes that apply only to 2.12.x)
- #7510
- #7509
- #7512
- #7516
- #7515

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Nov 7, 2025
Continues from #7482 but this will give us a bit more headroom, as `dios`
are often less than the full core count and are often the limiting factor
anyway.

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Nov 7, 2025
Continues from #7482 but this will give us a bit more headroom, as
`dios` are often less than the full core count and are often the
limiting factor anyway.

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Nov 7, 2025
Continues from #7482 but this will give us a bit more headroom, as `dios`
are often less than the full core count and are often the limiting factor
anyway.

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Nov 7, 2025
Continues from #7482 but this will give us a bit more headroom, as `dios`
are often less than the full core count and are often the limiting factor
anyway.

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants