Skip to content

[FIXED] Filestore lost tombstones#7384

Merged
neilalexander merged 3 commits intomainfrom
maurice/fs-lost-tombstones-compat
Oct 2, 2025
Merged

[FIXED] Filestore lost tombstones#7384
neilalexander merged 3 commits intomainfrom
maurice/fs-lost-tombstones-compat

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

@MauriceVanVeen MauriceVanVeen commented Oct 1, 2025

This PR fixes multiple cases of lost tombstones and incorrect accounting, resulting in stream first/last sequences rolling back or deletes being undone.

If a block contained tombstones, and a message before those tombstones was erased, those tombstones would be lost. If a server would then be ungracefully shutdown such that the index needs to be rebuilt, it would have deleted messages re-appear.

If you'd write a bunch of messages and then ack all of those messages, tombstones will be written for all these messages. When these tombstones overflow into a new message block, then eventually the first block containing the messages will be removed. If the server then restarts ungracefully it uses the minimum known tombstone seq and timestamp instead of the maximum. This means the sequences roll back, which could be one of the reasons for consumers having higher stream sequences than the stream.

Additionally, if multiple blocks would contain these tombstones. Then the stream could have no messages, come up with a correct last sequence, but a too low first sequence from an earlier block.

This PR also resolves cases of this warning being logged: Stream state encountered internal inconsistency on write

Resolves #5412, #7241

Signed-off-by: Maurice van Veen github@mauricevanveen.com

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner October 1, 2025 15:43
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/fs-lost-tombstones-compat branch from bef7cd7 to 4b7a001 Compare October 1, 2025 17:28
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/fs-lost-tombstones-compat branch from 4b7a001 to 86b3d3a Compare October 2, 2025 07:53
@MauriceVanVeen MauriceVanVeen linked an issue Oct 2, 2025 that may be closed by this pull request
Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit 4b871fe into main Oct 2, 2025
90 of 92 checks passed
@neilalexander neilalexander deleted the maurice/fs-lost-tombstones-compat branch October 2, 2025 09:21
neilalexander added a commit that referenced this pull request Oct 3, 2025
Includes the following:

- #7374
- #7373
- #7377
- #7380
- #7382
- #7381
- #7364
- #7384
- #7385
- #7388
- #7386
- #7391
- #7242

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Oct 28, 2025
Includes the following:

- #7380
- #7384
- #7385
- #7388
- #7395
- #7400
- #7399
- #7401
- #7402
- #7423
- #7424
- #7411
- #7428
- #7429
- #7431
- #7435
- #7433
- #7443
- #7455
- #7465
- #7466
- #7460
- #7484
- #7479

Signed-off-by: Neil Twigg <neil@nats.io>
neilalexander added a commit that referenced this pull request Nov 13, 2025
Given two message blocks, the first containing 100 messages, the second
containing 99 tombstones for the messages in the first block, except for
the first message/seq. When a new message was written into the second
block and removed. The filestore would recognize this block as empty and
then remove it. However, if a server would be ungracefully restarted all
the 99 removed messages would re-appear, since the tombstones for them
were removed when removing the second block.

This PR fixes this issue by recognizing when a block contains tombstones
for prior blocks. In this case it will not remove a block until it is
both empty and contains no tombstones for prior blocks.

Additionally, `fs.syncBlocks` can now not only remove tombstones if they
are below the first sequence, but also if a tombstone that's contained
in a block is not actively referencing a message in another block. Which
means that if a block contains a tombstone for a block that has been
removed, then that tombstone can be compacted as well.

`fs.deleteBlocks()` has also been improved since multiple empty blocks
containing tombstones at various sequences would otherwise result in
many sequential delete ranges. Those are now intelligently collapsed
into one large delete range spanning all these blocks.

This is reserved for 2.14 as it requires the compatibility of below PR,
and could result in first sequences not moving up if a downgrade would
be performed and there would be empty blocks with tombstones prior to
the downgrade.

Follow-up of #7384

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants