HDFS-16610. Make fsck read timeout configurable#4384
Conversation
There was a problem hiding this comment.
do you want to restrict the config to ms? Can keep only dfs.client.fsck.connect.timeout similar to dfs.webhdfs.socket.connect-timeout and allow timeunits?
There was a problem hiding this comment.
This suggestion makes sense. I have changed it to use the same technique as with webhdfs.
ayushtkn
left a comment
There was a problem hiding this comment.
+1, if the build has no complains...
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
The native client is failing to build, but I cannot see how this change could cause that. I wonder if there is something else going on, or some other change has broken the build recently? |
|
May be due to HDFS-16604, try to run the build again... |
|
I rebased against the latest trunk and force pushed it. Lets see if we get a better build this time. |
|
💔 -1 overall
This message was automatically generated. |
|
Good build this time, so I will commit. Thanks all for the reviews. |
(cherry picked from commit 34a973a) Conflicts: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
Description of PR
In a cluster with a lot of small files, we encountered a case where fsck was very slow. I believe it is due to contention with many other threads reading / writing data on the cluster.
Sometimes fsck does not report any progress for more than 60 seconds and the client times out. Currently the connect and read timeout are hardcoded to 60 seconds. This change is to make them configurable.
How was this patch tested?
Tested manually by inserting a sleep into the fsck logic in the NN. I then adjusted the read timeout to validate I got a timeout or not depending on the timeout setting.