-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-15865 Interrupt DataStreamer thread if no ack #2728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@karthikhw , the patch generally looks good to me, However what are the cases when the nodes is null ? |
|
@mukul1987 Not exactly found when nodes come null but appears when client couldn't reachable datenode (during middle of its write). Looks the next retry comes with null. |
mukul1987
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
Outdated
Show resolved
Hide resolved
| newScope("waitForAckedSeqno")) { | ||
| LOG.debug("{} waiting for ack for: {}", this, seqno); | ||
| int dnodes = nodes != null ? nodes.length : 3; | ||
| int writeTimeout = dfsClient.getDatanodeWriteTimeout(dnodes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timeout is very long. For a 3 node pipeline, it will be 8 minutes + 3 * 5 seconds (for the extension).
I'm not sure I have a better suggestion for the timeout.
One question - I believe we saw this problem in a Hung Hive Server 2 process. Do we know how this problem causes the entire HS2 instance to get hung? I would have thought this issue would block the closing of a single file on HDFS and other files open within the same client could still progress as normal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much troubleshooting in Hive @sodonnel. It looks whole HS2 instance was hung. It didn't accept any new connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I guess we can go ahead with this change, but even with it, the HS2 may well get hung for 8+ minutes. Its hard to know for sure without knowing why this problem caused the whole instance to hang.
sodonnel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM +1
|
💔 -1 overall
This message was automatically generated. |
|
Thanks for the review @sodonnel , merging this. |
(cherry picked from commit bd3da73)
(cherry picked from commit bd3da73) Change-Id: I6604bb34e01b2e13ee4a8aafc54dc5126850fd00
No description provided.