Skip to content

[eval] (2.14) [FIXED] Filestore lost tombstones#3

Open
Uzay-G wants to merge 1 commit intoeval/base-pr-7387from
eval/upstream-pr-7387
Open

[eval] (2.14) [FIXED] Filestore lost tombstones#3
Uzay-G wants to merge 1 commit intoeval/base-pr-7387from
eval/upstream-pr-7387

Conversation

@Uzay-G
Copy link
Copy Markdown

@Uzay-G Uzay-G commented Feb 21, 2026

Mirror of nats-io#7387 (MERGED) for Orpheus review evaluation.

Upstream: nats-io#7387


Original PR description:

Given two message blocks, the first containing 100 messages, the second containing 99 tombstones for the messages in the first block, except for the first message/seq. When a new message was written into the second block and removed. The filestore would recognize this block as empty and then remove it. However, if a server would be ungracefully restarted all the 99 removed messages would re-appear, since the tombstones for them were removed when removing the second block.

This PR fixes this issue by recognizing when a block contains tombstones for prior blocks. In this case it will not remove a block until it is both empty and contains no tombstones for prior blocks.

Additionally, fs.syncBlocks can now not only remove tombstones if they are below the first sequence, but also if a tombstone that's contained in a block is not actively referencing a message in another block. Which means that if a block contains a tombstone for a block that has been removed, then that tombstone can be compacted as well.

fs.deleteBlocks() has also been improved since multiple empty blocks containing tombstones at various sequences would otherwise result in many sequential delete ranges. Those are now intelligently collapsed into one large delete range spanning all these blocks.

This is reserved for 2.14 as it requires the compatibility of below PR, and could result in first sequences not moving up if a downgrade would be performed and there would be empty blocks with tombstones prior to the downgrade.

Follow-up of nats-io#7384

Signed-off-by: Maurice van Veen github@mauricevanveen.com

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@orpheus-by-fulcrum-dev-uzay-g
Copy link
Copy Markdown

Review started.

Copy link
Copy Markdown

@orpheus-by-fulcrum-dev-uzay-g orpheus-by-fulcrum-dev-uzay-g bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All checks passed.

Expected behavior

  • When a message block is empty of real messages but contains tombstones for prior blocks, the block is NOT removed from disk
  • After an ungraceful restart (crash recovery without index.db), deleted messages remain deleted and do not reappear
  • EraseMsg performs the erase even on empty blocks that contain tombstones, preserving those tombstones
  • syncBlocks compacts tombstones that reference messages in blocks that no longer exist
  • deleteBlocks() collapses multiple sequential gaps from empty tombstone-only blocks into a single DeleteRange
  • Recovery paths (recoverMsgs, rebuildFirst) use forceRemoveMsgBlock for truly empty/corrupt blocks

What happens

  • ✅ Core fix works: with PR, deleted messages stay deleted after crash recovery (1 msg, 10 deleted); without PR on base branch, 9 deleted messages reappear (10 msgs, 1 deleted)
  • ✅ EraseMsg on empty block preserves tombstones: after erase and crash recovery, state is consistent (1 msg, 2 deleted); base branch loses tombstone (2 msgs, 1 deleted)
  • ✅ syncBlocks compacts irrelevant tombstones: after removing msg 1 (moving FirstSeq to 4), syncBlocks correctly removes now-irrelevant tombstones for seqs 2,3
  • ✅ All 3 new PR tests pass across all 6 permutations (None/AES-GCM/ChaCha20 x None/S2 compression)
  • ✅ Full TestFileStore* suite (291s) and TestNoRaceFilestoreBinaryStreamSnapshotEncodingLargeGaps pass with no regressions
Detailed evidence

Setup

export PATH=$PATH:/usr/local/go/bin:$HOME/go/bin
cd /home/agent/nats-server
git checkout eval/upstream-pr-7387
go build -o nats-server .

Demo program

A test file was written to exercise the filestore directly with BlockSize: 10 * 33 (~10 messages per block), reproducing the exact PR scenario.

Demo 1: Core scenario — tombstones preserved after crash recovery

Creates two blocks. Block 1 has 10 messages. Block 2 has 1 message (seq 11) plus tombstones for seqs 2-10. Deletes msg 11, making block 2 empty. Removes index.db. Recovers.

$ go test -v -run "TestDemoTombstonesPreservedAfterCrash" ./server/ -count=1

=== RUN   TestDemoTombstonesPreservedAfterCrash
Before crash: Msgs=1 FirstSeq=1 LastSeq=11 NumDeleted=10
Removed index.db to simulate crash recovery
After crash: Msgs=1 FirstSeq=1 LastSeq=11 NumDeleted=10
Message 1 still accessible: OK
Messages 2-11 still deleted: OK
PASS: Tombstones preserved through crash recovery
--- PASS: TestDemoTombstonesPreservedAfterCrash (0.00s)
PASS

Demo 2: EraseMsg preserves tombstones in empty block

Block 1: msgs 1,2. Block 2: msg 3 (secret). Deletes msg 2 (tombstone in block 2). Erases msg 3 (block 2 empty but has tombstone). Crash recovery.

$ go test -v -run "TestDemoEraseMsgPreservesTombstones" ./server/ -count=1

=== RUN   TestDemoEraseMsgPreservesTombstones
After erase: Msgs=1 FirstSeq=1 LastSeq=3 NumDeleted=2
After recovery: Msgs=1 FirstSeq=1 LastSeq=3 NumDeleted=2
Msg 2 still deleted after recovery: OK
PASS: EraseMsg preserves tombstones
--- PASS: TestDemoEraseMsgPreservesTombstones (0.01s)
PASS

Demo 3: syncBlocks compacts irrelevant tombstones

Block 1: msgs 1,2,3. Block 2: msg 4, tombstones for 2,3. Remove msg 1 (FirstSeq moves to 4). Trigger syncBlocks.

$ go test -v -run "TestDemoSyncBlocksCompaction" ./server/ -count=1

=== RUN   TestDemoSyncBlocksCompaction
Before: Msgs=2 FirstSeq=1 LastSeq=4 NumDeleted=2
After removing msg 1: Msgs=1 FirstSeq=4 LastSeq=4
After syncBlocks: Msgs=1 FirstSeq=4 LastSeq=4 NumDeleted=0
PASS: syncBlocks compaction works
--- PASS: TestDemoSyncBlocksCompaction (0.01s)
PASS

Demo 4: deleteBlocks gap collapsing

$ go test -v -run "TestDemoDeleteBlocksGapCollapsing" ./server/ -count=1

=== RUN   TestDemoDeleteBlocksGapCollapsing
Initial: Msgs=10 FirstSeq=1 LastSeq=10
After deletes: Msgs=2 FirstSeq=1 LastSeq=10 NumDeleted=8
Block count: 2
PASS: deleteBlocks handles multiple empty blocks with tombstones
--- PASS: TestDemoDeleteBlocksGapCollapsing (0.00s)
PASS

Regression verification on base branch

Same demos run on eval/base-pr-7387 (before the fix):

$ git checkout eval/base-pr-7387
$ go test -v -run "TestDemoTombstonesPreservedAfterCrash" ./server/ -count=1

=== RUN   TestDemoTombstonesPreservedAfterCrash
Before crash: Msgs=1 FirstSeq=1 LastSeq=11 NumDeleted=10
Removed index.db to simulate crash recovery
After crash: Msgs=10 FirstSeq=1 LastSeq=11 NumDeleted=1
    demo_tombstone_test.go:94: TOMBSTONE LOSS: expected 1 msgs, got 10 (deleted messages reappeared!)
--- FAIL: TestDemoTombstonesPreservedAfterCrash (0.00s)
FAIL
$ go test -v -run "TestDemoEraseMsgPreservesTombstones" ./server/ -count=1

=== RUN   TestDemoEraseMsgPreservesTombstones
After erase: Msgs=1 FirstSeq=1 LastSeq=3 NumDeleted=2
After recovery: Msgs=2 FirstSeq=1 LastSeq=3 NumDeleted=1
    demo_tombstone_test.go:182: FAIL: expected 1 msg, got 2
--- FAIL: TestDemoEraseMsgPreservesTombstones (0.00s)
FAIL

Confirms the bug: on the base branch, 9 deleted messages reappear after crash recovery (10 msgs instead of 1).

PR test suite

$ git checkout eval/upstream-pr-7387
$ go test -v -run "TestFileStoreEraseMsgDoesNotLoseTombstonesInEmptyBlock|TestFileStoreEmptyBlockContainsPriorTombstones|TestFileStoreCompactTombstonesBelowFirstSeq" ./server/ -count=1

--- PASS: TestFileStoreEraseMsgDoesNotLoseTombstonesInEmptyBlock (0.65s)
--- PASS: TestFileStoreEmptyBlockContainsPriorTombstones (0.65s)
--- PASS: TestFileStoreCompactTombstonesBelowFirstSeq (0.64s)
PASS

All 3 new tests pass across 6 permutations (None/AES-GCM/ChaCha20-Poly1305 × None/S2).

Full filestore test suite

$ go test -run "TestFileStore" ./server/ -count=1 -timeout=600s

ok  github.com/nats-io/nats-server/v2/server  291.106s

No regressions.

Modified norace test

$ go test -v -run "TestNoRaceFilestoreBinaryStreamSnapshotEncodingLargeGaps" ./server/ -count=1

--- PASS: TestNoRaceFilestoreBinaryStreamSnapshotEncodingLargeGaps (3.28s)
PASS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants