HDFS-16272. Fix int overflow in computing safe length during EC block recovery#3548
HDFS-16272. Fix int overflow in computing safe length during EC block recovery#3548sodonnel merged 4 commits intoapache:trunkfrom
Conversation
|
💔 -1 overall
This message was automatically generated. |
| // lastFullStripeIdx is the index of the last full stripe. | ||
| int lastFullStripeIdx = | ||
| (int) (cpy[cpy.length - dataBlkNum] / cellSize); | ||
| long lastFullStripeIdx = cpy[cpy.length - dataBlkNum] / cellSize; |
There was a problem hiding this comment.
I know this is existing code, but I'd like to understand what is happening here to review this.
This method receives an array of internal block lengths, so for 3-2 it will have 5 entries, 6-3 it will have 9 etc.
Then it sorts the lengths smallest to largest. Then it selects the one at position num_blocks - numDataUnits.
Why does it not just pick the first one, which would be the smallest, as the smallest data block in the group indicates the last full stripe.
Why is the safe length based on the full stripe, and not a potentially partial last stripe?
There was a problem hiding this comment.
The change you have made here makes sense to me, although I still don't fully understand how the method is used in practice. However I do worry where else a bug like this may exist.
I think there is a similar problem in offsetInBlkToOffsetInBG() in this same class. It only seems to be used in test code, but it would be good to fix it incase it is used in non-test code later.
There was a problem hiding this comment.
Many thanks for the review! @sodonnel
- My understanding to this why not pick the first one is that the EC background reconstruction procedure has the ability to compute the rest 2 blocks based on the 3 good blocks(take RS-3-2 policy for example).
- I did notice a todo in code comments of this method: "Include lastFullStripeIdx+1 stripe in safeLength, if there exists such a stripe (and it must be partial).", but I might not involve this since that work may take a while :(
cellIdxInBlk * cellSize * dataBlkNuminoffsetInBlkToOffsetInBGhas the same problem, I am glad to fix that too.
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
…ring EC block recovery apache#3548. (merge request !716) Squash merge branch 'THADOOP-403' into 'release-3.2.1-tq-0.2'
… recovery (apache#3548) (cherry picked from commit 5337beb) Change-Id: Id99bf64c1892cbb0a42c163c8b9968c32398664f (cherry picked from commit 7785bbd5de3faaae841969d324a65e480dcb324e)
Fix the int overflow problem.