fix: fr32: Correct multithreaded Unpad chunk boundary alignment and simplify unpadReader#13455
fix: fr32: Correct multithreaded Unpad chunk boundary alignment and simplify unpadReader#13455
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a critical data corruption bug in the fr32 multithreaded Unpad() function that occurred when processing data sizes that don't evenly divide into chunk-aligned thread work units. The bug caused silent data loss at thread boundaries when (padded_size / thread_count) was not a multiple of 128 bytes (one fr32 chunk). The fix ensures proper chunk boundary alignment in multithreaded operations and simplifies the unpadReader implementation to prevent similar edge cases.
Key Changes
- Fixed thread boundary alignment in
mt()function to ensure each thread processes complete 128-byte chunks - Simplified
unpadReaderto use a straightforward buffer-fill-and-read pattern, eliminating complex stash mechanisms - Corrected
piece_provider.goto pass the actual read size instead of full piece size toNewUnpadReaderBuf
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| storage/sealer/fr32/fr32.go | Added chunk boundary alignment logic to ensure threadBytes is always a multiple of 128, with last thread handling remainder |
| storage/sealer/fr32/readers.go | Complete rewrite of unpadReader using simpler fill-and-read pattern with separate padbuf/unpadbuf buffers |
| storage/sealer/piece_provider.go | Changed to pass actual readPaddedSize to NewUnpadReaderBuf instead of full pieceSize |
| storage/sealer/fr32/readers_test.go | Added comprehensive regression tests covering size mismatches, boundary conditions, and bit-shift corruption scenarios |
| storage/sealer/fr32/fr32_test.go | Added tests for misaligned thread boundaries and various sizes above MTTresh threshold |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
ZenGround0
left a comment
There was a problem hiding this comment.
Bug fix and tests LGTM. We should probably push out a patch release with this ASAP
|
Merging now given that this has 3 approvals, and will prep it for a patch release as soon as possible |
Summary
This PR fixes a data corruption bug in the fr32 multithreaded
Unpad()function and simplifies theunpadReaderimplementation. The bug caused data loss and corruption when processing data sizes that don't evenly divide into chunk-aligned thread work units.The Bug
The
mt()function infr32.goparallelizesPad()/Unpad()operations across multiple threads. Each thread is assigned a range of bytes to process:The problem:
threadBytesis not guaranteed to be a multiple of 128 bytes (one fr32 chunk).For example, with
padLen = 8455808bytes (66061 chunks) andthreads = 16:threadBytes = 8455808 / 16 = 528488bytes528488 / 128 = 4128.5— not a whole number of chunksThe
unpad()function processes only complete 128-byte chunks:This means each thread only processes
4128 * 128 = 528384bytes, leaving 104 bytes unprocessed at each thread boundary. With 16 threads, up to 1664 bytes could be silently lost.When the bug occurs
The bug manifests when ALL of these conditions are met:
MTTresh(512KB) — triggers multithreaded pathpadLen / threadsis not a multiple of 128 — creates misaligned thread boundariesThis is why existing tests (like
TestRoundtrip16MRandwith 16MiB data) passed — 16MiB divides evenly into chunk-aligned work units. The bug only appeared with sizes like ~8.5MiB that create fractional chunks per thread.Symptoms observed
0x7fffe0to0x800000in the original report)0x40, 0x80, 0xc0, 0x00pattern visible)MTTresh)The Fix
1. Fix
mt()chunk boundary alignment (fr32.go):2. Simplify
unpadReader(readers.go):Rewrote using a simple
bufio.Reader-style pattern:padbufandunpadbufbuffers (eliminates any memory aliasing concerns)fill()→Read()patternTesting
TestUnpadReaderSizeMismatch_ExactByteLossspecifically verifies data integrity around 8MiB boundaries with non-power-of-2 aligned sizesImpact
This bug could cause silent data corruption when:
PieceProvider.ReadPiece()fr32.Unpad()with data sizes > 512KB that create misaligned thread boundariesThe corruption would manifest as incorrect bytes in the middle of the data stream, potentially causing:
Why this "worked"
I don't think anyone seriously used /piece retrievals and /ipfs retrievals usually fetched individual blocks which were much smaller than the ~8MB threshold where this happened. This is just a theory, this might have been causing an untold number of unexplained issues,