Skip to content

Conversation

@LiShuMing
Copy link
Contributor

What changes were proposed in this pull request?

See SPARK-21660, this PR add one simple strategy to validate the chosen disk writable to avoid choosing a read-only disk.

How was this patch tested?

How to mock disk corrupted?

change the recovery path read-only mode:
sudo chmod -R 400 /var/log/hadoop-yarn/nodemanager/recovery-state/nm-aux-services/spark_shuffle

Before this pr, when we start the nodemanager, exception below:

2017-08-10 16:30:08,112 INFO yarn.YarnShuffleService (YarnShuffleService.java:(136)) - Initializing YARN shuffle service for Spark
2017-08-10 16:30:08,112 INFO containermanager.AuxServices (AuxServices.java:addService(72)) - Adding auxiliary service spark_shuffle, "spark_shuffle"
2017-08-10 16:30:08,218 ERROR util.LevelDBProvider (LevelDBProvider.java:initLevelDB(61)) - error opening leveldb file /var/log/hadoop-yarn/nodemanager/recovery-state/nm-aux-services/spark_shuffle/registeredExecutors.ldb. Creating new file, will not be able to recover state for existing applications
org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/nm-aux-services/spark_shuffle/registeredExecutors.ldb/LOCK: Permission denied
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:48)
at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:116)
at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:94)
at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:66)
at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:167)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:245)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:261)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:495)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:543)
2017-08-10 16:30:08,220 WARN util.LevelDBProvider (LevelDBProvider.java:initLevelDB(71)) - error deleting /var/log/hadoop-yarn/nodemanager/recovery-state/nm-aux-services/spark_shuffle/registeredExecutors.ldb
2017-08-10 16:30:08,220 INFO service.AbstractService (AbstractService.java:noteFailure(272)) - Service spark_shuffle failed in state INITED; cause: java.io.IOException: Unable to create state store
java.io.IOException: Unable to create state store
at org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:77)
at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:116)
at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:94)
at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:66)
at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:167)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:245)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:261)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:495)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:543)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/log/hadoop-yarn/nodemanager/recovery-state/nm-aux-services/spark_shuffle/registeredExecutors.ldb/LOCK: Permission denied
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.spark.network.util.LevelDBProvider.initLevelDB(LevelDBProvider.java:75)
... 15 more

After this pr:

2017-08-10 16:36:49,101 INFO yarn.YarnShuffleService (YarnShuffleService.java:(136)) - Initializing YARN shuffle service for Spark
2017-08-10 16:36:49,101 INFO containermanager.AuxServices (AuxServices.java:addService(72)) - Adding auxiliary service spark_shuffle, "spark_shuffle"
2017-08-10 16:36:49,102 INFO yarn.YarnShuffleService (YarnShuffleService.java:initRecoveryDb(359)) - Recovery path /var/log/hadoop-yarn/nodemanager/recovery-state/nm-aux-services/spark_shuffle ldb available: false.
2017-08-10 16:36:49,102 WARN yarn.YarnShuffleService (YarnShuffleService.java:initRecoveryDb(367)) - Recovery path /var/log/hadoop-yarn/nodemanager/recovery-state/nm-aux-services/spark_shuffle unavailable: set it to null
2017-08-10 16:36:49,180 INFO util.LevelDBProvider (LevelDBProvider.java:initLevelDB(51)) - Creating state database at /mnt/dfs/0/hadoop/yarn/local/registeredExecutors.ldb
2017-08-10 16:36:49,317 INFO util.LevelDBProvider$LevelDBLogger (LevelDBProvider.java:log(93)) - Delete type=3 #1
2017-08-10 16:36:49,548 INFO yarn.YarnShuffleService (YarnShuffleService.java:serviceInit(186)) - Started YARN shuffle service for Spark on port 7337. Authentication is not enabled. Registered executor file is /mnt/dfs/0/hadoop/yarn/local/registeredExecutors.ld
b

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@LiShuMing LiShuMing changed the title [SPARK-21660] [YARN] [Shuffle] Yarn ShuffleService failed to start when the chosen dir… [SPARK-21660][YARN][Shuffle] Yarn ShuffleService failed to start when the chosen dir… Aug 14, 2017
Copy link
Contributor

@jerryshao jerryshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that if work preserving is enabled (recovery path is not null), then user should guarantee the availability of this directory, am not sure if it is good to change to other directories (is yarn internally relying on it).

Also would you please add an unit test to verify your logics.


/**
* Check the chosen DB file available or not.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is a thorough way to check disk healthy, in our internal case, we found that disk is not mounted (due to failure), and trying to write to this unmounted disk throws permission deny exception.

I'm thinking that disk unwritable is just one case of disk unhealthy, maybe we should check YARN's disk healthy check mechanism.

/**
* Check the chosen DB file available or not.
*/
protected Boolean checkFileAvailable(File file) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two space indent for the java code.

}
}

// If recovery path unavailable, no use it any more.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think recovery path is set by user or use yarn default, user should make sure the availability of this directory, and yarn internally relies on it. It doesn't make sense to change to another disk if recovery path is unavailable.

}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If _recoveryPath is still null I think we should throw an exception here, since none of the disk is good.

@LiShuMing
Copy link
Contributor Author

@jerryshao Thanks for your replies! I will do such things then:

  1. "it is good to change to other directories (is yarn internally relying on it)?"
    I think the recovery path(local variable) is only used in YarnShuffleService, principally not affects yarn environment. This PR cares the scene that we can find a better way to choose a useful disk for the recovery path when there are many disks that can choose.

  2. Check HDFS/YARN's disk healthy check mechanism to better define checkFileAvailable() ;

  3. Fix code format.

  4. Throw an exception when _recoveryPath is empty finally.

@jerryshao
Copy link
Contributor

@LiShuMing any update on this?

@LiShuMing
Copy link
Contributor Author

Sorry, busy recently, I will update it today...

@LiShuMing
Copy link
Contributor Author

ping @jerryshao

I found a method to check disk in hadoop: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/DiskChecker.java#L111

I add a unit test, Can you help me review my code?

@jerryshao
Copy link
Contributor

jerryshao commented Aug 22, 2017

I have two questions about the fix:

  1. Is it a good idea to change recovery path to other directory? Since recovery path is configured by user or figured out by yarn, so maybe YARN has some assumption about this path, if we change to other one, will this introduce some issues. Also if recovery path is not null, should it be guaranteed by user for the availability.
  2. What if the previous bad disk back to normal with orphan data? For example is dir1 is failed with state V1, and based on this logic we should another dir2 and state changed to v2. Then after a while if dir1 is back to normal, then which dirs are we choosing based on your current code?

CC @tgravescs to review.

@tgravescs
Copy link
Contributor

The recovery path returned by yarn is supposed to be reliable and if it isn't working then the NM itself shouldn't run. So in general you should just use that if you want spark to be able to recover. If you don't have yarn recovery enabled them there is no need for us to write the DBs at all and I think we should change to not do that.

I think this jira is a dup of https://issues.apache.org/jira/browse/SPARK-17321

See my comments there.

@LiShuMing
Copy link
Contributor Author

See another approach to solve this problem: #19032 and I will close this pr.

Thanks @jerryshao @tgravescs .

@LiShuMing LiShuMing closed this Aug 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants