HDFS-16484. [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread#4032
HDFS-16484. [SPS]: Fix an infinite loop bug in SPSPathIdProcessor thread#4032tasanuma merged 1 commit intoapache:trunkfrom
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@ayushtkn Could you please take a look at this. Thanks a lot. |
|
@tasanuma Could you please take a look at this. Thanks a lot. |
There was a problem hiding this comment.
How about cathing and ignoring the exception here?
| ctxt.removeSPSHint(startINode); | |
| try { | |
| ctxt.removeSPSHint(startINode); | |
| } catch (FileNotFoundException e) { | |
| // ignore if the file doesn't already exist | |
| startINode = null; | |
| } |
There was a problem hiding this comment.
@liubingxing I still prefer to catch the FileNotFoundException here. What do you think?
There was a problem hiding this comment.
@tasanuma Thanks for your suggestion.
It is also a good way to solve this problem by catching the FileNotFoundException here.
I updated the code according to your suggestion.
There was a problem hiding this comment.
@liubingxing Thank you for updating the PR. I think we can keep the retry logic implemented before while catching the FileNotFoundException here.
There was a problem hiding this comment.
@liubingxing I still prefer to catch the FileNotFoundException here. What do you think?
Sorry @tasanuma, I misunderstand your meaning here. I updated the code, please take a look.
There was a problem hiding this comment.
Do we still need to implement the retry logic after catching the FileNotFoundException?
There was a problem hiding this comment.
@tasanuma Thanks for your reviews.
I'm not sure if there are any other exceptions may cause this thread work incorrectly.
And the retry logic may help to avoid this thread falling into an infinite loop if the 'startINode' fail to be set to null.
|
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
- Let's define the constant of the max retry count (
private static final int MAX_RETRY_COUNT = 3;) inSPSPathIdProcessor. - How about logging a message when skipping the inode?
| retryCount++; | |
| if (retryCount >= 3) { | |
| startINode = null; | |
| } | |
| retryCount++; | |
| if (retryCount >= MAX_RETRY_COUNT) { | |
| LOG.warn("Skipping this inode {} due to too many retries.", startINode); | |
| startINode = null; | |
| } |
- And I think it's better to move the retry logic to the end of the catch block.
There was a problem hiding this comment.
@tasanuma Thanks for your advices. I updated the code.
|
🎊 +1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@liubingxing Thanks for your contribution! |
|
@tasanuma Thanks for your review and merge. |
JIRA: HDFS-16484
Currently, we ran SPS in our cluster and found this log. The SPSPathIdProcessor thread enters an infinite loop and prints the same log all the time.

The reason is that #ctxt.getNextSPSPath() get a inodeId which path does not exist. The inodeId will not be set to null, causing the thread hold this inodeId forever.