Merged
Conversation
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
bef7cd7 to
4b7a001
Compare
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
4b7a001 to
86b3d3a
Compare
This was referenced Oct 2, 2025
neilalexander
added a commit
that referenced
this pull request
Nov 13, 2025
Given two message blocks, the first containing 100 messages, the second containing 99 tombstones for the messages in the first block, except for the first message/seq. When a new message was written into the second block and removed. The filestore would recognize this block as empty and then remove it. However, if a server would be ungracefully restarted all the 99 removed messages would re-appear, since the tombstones for them were removed when removing the second block. This PR fixes this issue by recognizing when a block contains tombstones for prior blocks. In this case it will not remove a block until it is both empty and contains no tombstones for prior blocks. Additionally, `fs.syncBlocks` can now not only remove tombstones if they are below the first sequence, but also if a tombstone that's contained in a block is not actively referencing a message in another block. Which means that if a block contains a tombstone for a block that has been removed, then that tombstone can be compacted as well. `fs.deleteBlocks()` has also been improved since multiple empty blocks containing tombstones at various sequences would otherwise result in many sequential delete ranges. Those are now intelligently collapsed into one large delete range spanning all these blocks. This is reserved for 2.14 as it requires the compatibility of below PR, and could result in first sequences not moving up if a downgrade would be performed and there would be empty blocks with tombstones prior to the downgrade. Follow-up of #7384 Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes multiple cases of lost tombstones and incorrect accounting, resulting in stream first/last sequences rolling back or deletes being undone.
If a block contained tombstones, and a message before those tombstones was erased, those tombstones would be lost. If a server would then be ungracefully shutdown such that the index needs to be rebuilt, it would have deleted messages re-appear.
If you'd write a bunch of messages and then ack all of those messages, tombstones will be written for all these messages. When these tombstones overflow into a new message block, then eventually the first block containing the messages will be removed. If the server then restarts ungracefully it uses the minimum known tombstone seq and timestamp instead of the maximum. This means the sequences roll back, which could be one of the reasons for consumers having higher stream sequences than the stream.
Additionally, if multiple blocks would contain these tombstones. Then the stream could have no messages, come up with a correct last sequence, but a too low first sequence from an earlier block.
This PR also resolves cases of this warning being logged:
Stream state encountered internal inconsistency on writeResolves #5412, #7241
Signed-off-by: Maurice van Veen github@mauricevanveen.com