[SPARK-18119][SPARK-CORE] Namenode safemode check is only performed on one namenode which can stuck the startup of SparkHistory server#15648
Conversation
There was a problem hiding this comment.
minor: I think it isn't just for testing anymore, maybe we should remove this comment since we are updating this function anyways?
There was a problem hiding this comment.
Should mention what the true is for (e.g. with a named param or just something like /* isChecked - run on ActiveNN */
…n one namenode which can stuck the startup of SparkHistory server This commit is contributed by Criteo SA under the Apache v2 licence.
|
Comments taken in account. |
|
@srowen I'm actually not 100% sure what should happen with hdfs HA -- lemme ask around. This looks right but worth checking that we're not covering up some other cluster issue. |
|
HDFS folks internally indicated this is likely the right change. Let's leave it for a bit before committing. |
|
CC @steveloughran who may also have good insight into whether this it the right change for HDFS HA. |
|
LGTM, as the javadocs say If true check only for Active NNs status, else check first NN's status. But I don't know enough about HDFS HA to be It'll check the first NN, if that is on standby and stale reads are not allowed ( it'll log at error (HDFS-3477 proposes downgrading that), and throw an exception the url https://s.apache.org/sbnn-error. If someone sets Where my knowledge of HDFS-HA fails is what happens then; Does the RPC client try another NN? Or just it just fail? Maybe @liuml07 could assist there. The method went in with Hadoop 2.0.3 alpha in HDFS-3507, so will be across the whole of the Hadoop 2.x line. The enum used did change in 2015, with HDFS-4015; adding |
|
Test build #3437 has finished for PR 15648 at commit
|
|
Merged to master/2.1 |
…n one namenode which can stuck the startup of SparkHistory server ## What changes were proposed in this pull request? Instead of using the setSafeMode method that check the first namenode used the one which permitts to check only for active NNs ## How was this patch tested? manual tests Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. This commit is contributed by Criteo SA under the Apache v2 licence. Author: n.fraison <n.fraison@criteo.com> Closes #15648 from ashangit/SPARK-18119. (cherry picked from commit f42db0c) Signed-off-by: Sean Owen <sowen@cloudera.com>
|
Sorry for coming late. The change is very reasonable. Glad it's merged. Steve:
So in general,
By the way, the above comments make sense only if we're using the logical HDFS service name. |
…n one namenode which can stuck the startup of SparkHistory server ## What changes were proposed in this pull request? Instead of using the setSafeMode method that check the first namenode used the one which permitts to check only for active NNs ## How was this patch tested? manual tests Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. This commit is contributed by Criteo SA under the Apache v2 licence. Author: n.fraison <n.fraison@criteo.com> Closes apache#15648 from ashangit/SPARK-18119.
…n one namenode which can stuck the startup of SparkHistory server ## What changes were proposed in this pull request? Instead of using the setSafeMode method that check the first namenode used the one which permitts to check only for active NNs ## How was this patch tested? manual tests Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. This commit is contributed by Criteo SA under the Apache v2 licence. Author: n.fraison <n.fraison@criteo.com> Closes apache#15648 from ashangit/SPARK-18119.
…n one namenode which can stuck the startup of SparkHistory server ## What changes were proposed in this pull request? Instead of using the setSafeMode method that check the first namenode used the one which permitts to check only for active NNs ## How was this patch tested? manual tests Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. This commit is contributed by Criteo SA under the Apache v2 licence. Author: n.fraison <n.fraison@criteo.com> Closes apache#15648 from ashangit/SPARK-18119.
What changes were proposed in this pull request?
Instead of using the setSafeMode method that check the first namenode used the one which permitts to check only for active NNs
How was this patch tested?
manual tests
Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request.
This commit is contributed by Criteo SA under the Apache v2 licence.