Skip to content

Improve consumer pending count tracking during stream contention#6297

Merged
derekcollison merged 1 commit intomainfrom
maurice/improve-pending-count
Dec 23, 2024
Merged

Improve consumer pending count tracking during stream contention#6297
derekcollison merged 1 commit intomainfrom
maurice/improve-pending-count

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

The drifting tests would occasionally fail due to the consumer pending count drifting. This was due to a race condition described on checkNumPending:

// Does some sanity checks to see if we should re-calculate.
// Since there is a race when decrementing when there is contention at the beginning of the stream.
// The race is a getNextMsg skips a deleted msg, and then the decStreamPending call fires.
// This does some quick sanity checks to see if we should re-calculate num pending.
// Lock should be held.
func (o *consumer) checkNumPending() uint64 {

This PR doesn't fix this race condition, but improves the tracking which improves test reliability. If the race condition happens we can still check if the deleted message is between o.asflr and the o.sseq that's skipped ahead. In which case we can still decrement the pending count (o.npc) if the message is not pending/delivered. This improves the reliability of the pending count tracking as long as the ack floor hasn't moved up yet.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner December 23, 2024 14:08
Copy link
Copy Markdown
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit c543f53 into main Dec 23, 2024
@derekcollison derekcollison deleted the maurice/improve-pending-count branch December 23, 2024 17:32
// Either we have not reached the message yet, or we've hit the race condition
// when there is contention at the beginning of the stream. In which case we can
// only decrement if the ack floor is still low enough to be able to detect it.
if o.isFilteredMatch(subj) && sseq > o.asflr && (sseq >= o.sseq || !wasPending) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: given that o.isFilteredMatch(subj) can be more expensive than the other checks, I would have put it last:

if sseq > o.asflr && (sseq >= o.sseq || !wasPending) && o.isFilteredMatch(subj) {

derekcollison added a commit that referenced this pull request Dec 24, 2024
Makes the change suggested in
#6297 (comment)

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
derekcollison added a commit that referenced this pull request Jan 8, 2025
Reverts the change made in
#6297. The change did
(somewhat) improve the reliability of the drifting tests if the
`o.npc--` was done as a result of contention, but resulted in a
regression if `o.npc--` was done twice when a message was acked that did
not move the ack floor up for WorkQueue/Interest retention.

This PR fixes what would otherwise have been a regression.

We really should try to fix the race condition itself though (outside of
this PR). Without fixing it the pending count can be incorrect with no
way to be resolved unless all messages are consumed, or we'd need to
manually recalculate.

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
wallyqs added a commit that referenced this pull request Jan 9, 2025
#### Dependencies
- #6323
- #6324

####  Leafnode
- #6291

#### JetStream
- #6226
- #6235
- #6277
- #6279
- #6283
- #6289
- #6316
- #6317
- #6325
- #6326
- #6335
- #6338
- #6341
- #6344
- #6150
- #6351
- #6355

#### Tests
- #6278
- #6297
- #6300
- #6343
- #6329
- #6330
- #6331
- #6332
- #6334
- #6356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants