Skip to content

[FIXED] Meta snapshot performance & missing consumers#7350

Merged
neilalexander merged 2 commits intomainfrom
maurice/offline-perf
Sep 25, 2025
Merged

[FIXED] Meta snapshot performance & missing consumers#7350
neilalexander merged 2 commits intomainfrom
maurice/offline-perf

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

@MauriceVanVeen MauriceVanVeen commented Sep 24, 2025

Due to the custom marshalling offline asset awareness made the meta snapshot take a longer time to compute, even if there are no offline assets. This PR improves that by performing the meta snapshot as before if there are no offline assets, and only have those offline assets go through the slower path.

Also fixes a bug where consumers would be lost when a new meta snapshot was created for an unsupported stream.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner September 24, 2025 10:32
@wallyqs
Copy link
Copy Markdown
Member

wallyqs commented Sep 24, 2025

can you take a look at adapting TestNoRaceJetStreamClusterLargeMetaSnapshotTiming to have a baseline here? Test with and without offline assets for example

@MauriceVanVeen
Copy link
Copy Markdown
Member Author

can you take a look at adapting TestNoRaceJetStreamClusterLargeMetaSnapshotTiming to have a baseline here? Test with and without offline assets for example

The test is highly dependent on hardware as to how long it runs (it generally runs slower on my machine than in the other examples). So IMO it would be better to have separate benchmarking, instead of doing this in a unit test here.

@wallyqs
Copy link
Copy Markdown
Member

wallyqs commented Sep 24, 2025

@MauriceVanVeen you can have a different version of the test with a smaller number of streams and assets as a quick test, then measure that the time processing without offline assets is faster than with offline assets, besides the test for benchmarking where we also need to consider different configurations (number of streams, consumers...)


type writeableConsumerAssignment struct {
consumerAssignment
*consumerAssignment
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*consumerAssignment is pointerised here but backingStreamAssignment isn't on the other side. Is that meant to be the case?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this also offline, but this was indeed intentional to not need to use *ca in &writeableConsumerAssignment{consumerAssignment: ca}.
Have tested if there's a difference between not having this take a pointer, but there seems no significant difference.
(Actually the current approach seems faster, but hard to tell because the test varies greatly in performance numbers already on re-run, so it might just be my laptop running less hot then)

@neilalexander
Copy link
Copy Markdown
Member

I'm not convinced that a benchmark that just compares the supported vs unsupported path is going to be meaningful, we know that the unsupported path is slower, a test isn't going to tell us anything new about that.

I think it probably makes far more sense that we add a benchmark to the Gauge set that creates a bunch of assets and then measures the amount of time it takes to call JetStreamSnapshotMeta() however many times, both with and without unsupported state, so that we can follow the trends across releases. That is normally where we look first for meaningful changes in performance.

@MauriceVanVeen MauriceVanVeen changed the title [IMPROVED] Meta snapshot performance without offline assets [FIXED] Meta snapshot performance & missing consumers Sep 24, 2025
@MauriceVanVeen MauriceVanVeen marked this pull request as draft September 24, 2025 16:31
@MauriceVanVeen
Copy link
Copy Markdown
Member Author

Temporarily on draft, to do some additional testing for an edge case.

@wallyqs
Copy link
Copy Markdown
Member

wallyqs commented Sep 25, 2025

The gauge bench sounds good too though I think a simple benchmark in the repo should be good enough at least for coverage of the changes in the PR, does not have to be too complicated.

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen marked this pull request as ready for review September 25, 2025 13:59
Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit 03a6e5b into main Sep 25, 2025
90 of 92 checks passed
@neilalexander neilalexander deleted the maurice/offline-perf branch September 25, 2025 14:22
neilalexander added a commit that referenced this pull request Sep 26, 2025
Folow-up of #7350

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
neilalexander added a commit that referenced this pull request Sep 29, 2025
Includes the following:

- #7290
- #7295
- #7291
- #7287
- #7299
- #7300
- #7297
- #7303
- #7304
- #7305
- #7309
- #7307
- #7320
- #7337
- #7344
- #7345
- #7348
- #7349
- #7350
- #7357
- #7356
- #7358
- #7367
- #7293

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Sep 29, 2025
Includes the following:

- #7337
- #7342
- #7344
- #7345
- #7347
- #7346
- #7348
- #7349
- #7350
- #7357
- #7356
- #7358
- #7359
- #7366
- #7367
- #7293
- #7368

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants