Skip to content

fix: fr32: Correct multithreaded Unpad chunk boundary alignment and simplify unpadReader#13455

Merged
rjan90 merged 8 commits intomasterfrom
fix/new-unpad-reader
Dec 10, 2025
Merged

fix: fr32: Correct multithreaded Unpad chunk boundary alignment and simplify unpadReader#13455
rjan90 merged 8 commits intomasterfrom
fix/new-unpad-reader

Conversation

@magik6k
Copy link
Copy Markdown
Contributor

@magik6k magik6k commented Dec 9, 2025

Summary

This PR fixes a data corruption bug in the fr32 multithreaded Unpad() function and simplifies the unpadReader implementation. The bug caused data loss and corruption when processing data sizes that don't evenly divide into chunk-aligned thread work units.

2025-12-05-225323_1517x104_scrot

The Bug

The mt() function in fr32.go parallelizes Pad()/Unpad() operations across multiple threads. Each thread is assigned a range of bytes to process:

threadBytes := abi.PaddedPieceSize(padLen / int(threads))

The problem: threadBytes is not guaranteed to be a multiple of 128 bytes (one fr32 chunk).

For example, with padLen = 8455808 bytes (66061 chunks) and threads = 16:

  • threadBytes = 8455808 / 16 = 528488 bytes
  • 528488 / 128 = 4128.5not a whole number of chunks

The unpad() function processes only complete 128-byte chunks:

chunks := len(in) / 128  // = 4128 (integer division)
for chunk := 0; chunk < chunks; chunk++ { ... }

This means each thread only processes 4128 * 128 = 528384 bytes, leaving 104 bytes unprocessed at each thread boundary. With 16 threads, up to 1664 bytes could be silently lost.

When the bug occurs

The bug manifests when ALL of these conditions are met:

  1. Data size exceeds MTTresh (512KB) — triggers multithreaded path
  2. padLen / threads is not a multiple of 128 — creates misaligned thread boundaries
  3. Thread count doesn't evenly divide the chunk count — causes fractional chunks per thread

This is why existing tests (like TestRoundtrip16MRand with 16MiB data) passed — 16MiB divides evenly into chunk-aligned work units. The bug only appeared with sizes like ~8.5MiB that create fractional chunks per thread.

Symptoms observed

  • Zeros appearing in output (at 0x7fffe0 to 0x800000 in the original report)
  • Padded data appearing in unpadded output (padding bits 0x40, 0x80, 0xc0, 0x00 pattern visible)
  • Data shifted by a few bits after the corruption point
  • Corruption occurring at regular intervals (~512KB boundaries, matching MTTresh)

The Fix

1. Fix mt() chunk boundary alignment (fr32.go):

// Before: threadBytes could be fractional chunks
threadBytes := abi.PaddedPieceSize(padLen / int(threads))

// After: align to 128-byte chunk boundaries
chunksPerThread := (padLen / int(threads)) / 128
threadBytes := abi.PaddedPieceSize(chunksPerThread * 128)

// Last thread handles any remainder
if thread == int(threads)-1 {
    end = abi.PaddedPieceSize(padLen)
}

2. Simplify unpadReader (readers.go):

Rewrote using a simple bufio.Reader-style pattern:

  • Separate padbuf and unpadbuf buffers (eliminates any memory aliasing concerns)
  • Simple fill()Read() pattern
  • No complex stash/pool mechanisms that were prone to edge-case bugs

Testing

  • Added regression tests specifically for the boundary conditions
  • All existing fr32 tests continue to pass
  • New test TestUnpadReaderSizeMismatch_ExactByteLoss specifically verifies data integrity around 8MiB boundaries with non-power-of-2 aligned sizes

Impact

This bug could cause silent data corruption when:

  • Reading piece data through PieceProvider.ReadPiece()
  • Any code path using fr32.Unpad() with data sizes > 512KB that create misaligned thread boundaries

The corruption would manifest as incorrect bytes in the middle of the data stream, potentially causing:

  • Failed piece verification
  • Invalid CommP calculations
  • Corrupted unsealed sector data

Why this "worked"

I don't think anyone seriously used /piece retrievals and /ipfs retrievals usually fetched individual blocks which were much smaller than the ~8MB threshold where this happened. This is just a theory, this might have been causing an untold number of unexplained issues,

Copilot AI review requested due to automatic review settings December 9, 2025 13:13
@github-project-automation github-project-automation Bot moved this to 📌 Triage in FilOz Dec 9, 2025
Comment thread storage/sealer/fr32/readers.go
Comment thread storage/sealer/fr32/fr32.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical data corruption bug in the fr32 multithreaded Unpad() function that occurred when processing data sizes that don't evenly divide into chunk-aligned thread work units. The bug caused silent data loss at thread boundaries when (padded_size / thread_count) was not a multiple of 128 bytes (one fr32 chunk). The fix ensures proper chunk boundary alignment in multithreaded operations and simplifies the unpadReader implementation to prevent similar edge cases.

Key Changes

  • Fixed thread boundary alignment in mt() function to ensure each thread processes complete 128-byte chunks
  • Simplified unpadReader to use a straightforward buffer-fill-and-read pattern, eliminating complex stash mechanisms
  • Corrected piece_provider.go to pass the actual read size instead of full piece size to NewUnpadReaderBuf

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
storage/sealer/fr32/fr32.go Added chunk boundary alignment logic to ensure threadBytes is always a multiple of 128, with last thread handling remainder
storage/sealer/fr32/readers.go Complete rewrite of unpadReader using simpler fill-and-read pattern with separate padbuf/unpadbuf buffers
storage/sealer/piece_provider.go Changed to pass actual readPaddedSize to NewUnpadReaderBuf instead of full pieceSize
storage/sealer/fr32/readers_test.go Added comprehensive regression tests covering size mismatches, boundary conditions, and bit-shift corruption scenarios
storage/sealer/fr32/fr32_test.go Added tests for misaligned thread boundaries and various sizes above MTTresh threshold

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread storage/sealer/piece_provider.go Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread storage/sealer/fr32/fr32_test.go Outdated
Copy link
Copy Markdown
Contributor

@ZenGround0 ZenGround0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug fix and tests LGTM. We should probably push out a patch release with this ASAP

@github-project-automation github-project-automation Bot moved this from 📌 Triage to ✔️ Approved by reviewer in FilOz Dec 9, 2025
Comment thread storage/sealer/fr32/readers_test.go Outdated
@rjan90
Copy link
Copy Markdown
Contributor

rjan90 commented Dec 10, 2025

Merging now given that this has 3 approvals, and will prep it for a patch release as soon as possible

@rjan90 rjan90 enabled auto-merge (squash) December 10, 2025 14:35
@rjan90 rjan90 merged commit c3b13a6 into master Dec 10, 2025
98 checks passed
@rjan90 rjan90 deleted the fix/new-unpad-reader branch December 10, 2025 14:36
@github-project-automation github-project-automation Bot moved this from ✔️ Approved by reviewer to 🎉 Done in FilOz Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

6 participants