HDFS-16509. Fix decommission UnsupportedOperationException#4077
HDFS-16509. Fix decommission UnsupportedOperationException#4077Hexiaoqiao merged 2 commits intoapache:trunkfrom
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
tomscut
left a comment
There was a problem hiding this comment.
LGTM.
If you do decommission, you can try DatanodeAdminBackoffMonitor. This has better performance.
|
@tomscut Thanks for your review and advice! |
|
Thanks @cndaimin for your great catch here. Would you mind to add new unit test to cover this case? Thanks. |
|
Hi @Hexiaoqiao Thanks for your review! I have added a test to this PR. Please take a look again, thanks. |
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
great catch! Is this the reason that sometimes decomm never completes? |
|
@jojochuang Yes, the exception is not well handled, and will make decommission stuck. |
|
🎊 +1 overall
This message was automatically generated. |
| waitNodeDecommissioned(dn); | ||
| } | ||
|
|
||
| private void waitNodeDecommissioned(DatanodeInfo node) { |
There was a problem hiding this comment.
Hi @cndaimin, I think we could invoke this method rather than add new one, what do you think about?
There was a problem hiding this comment.
Yes, we can use this method, updated. Thanks a lot! @Hexiaoqiao
|
🎊 +1 overall
This message was automatically generated. |
|
Committed to trunk. Thanks @cndaimin for your contributions. Thanks @jojochuang @tomscut for your reviews. |
|
@Hexiaoqiao I think it would be better to backport. Thanks @Hexiaoqiao @jojochuang @tomscut |
…ontributed by daimin. (cherry picked from commit c65c383)
|
Cherrypicked cleanly and pushed to branch-3.3. |
|
@jojochuang I have submitted a new PR here: Backport HDFS-16509 to branch branch-3.2 to resolve the conflicts. Please take a look, thanks! |
…). Contributed by daimin.
We encountered an
UnsupportedOperationException: Remove unsupportederror when some datanodes were in decommission. The reason of the exception is thatdatanode.getBlockIterator()returns an Iterator does not support remove, howeverDatanodeAdminDefaultMonitor#processBlocksInternalinvokesit.remove()when a block not found, e.g, the file containing the block is deleted.