Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Jun 6, 2020

What changes were proposed in this pull request?

This PR adds the check to see whether the allocated executors for the waiting application is empty before recognizing it as a possible hang application.

Why are the changes needed?

It's a bugfix. The warning means there are not enough resources for the application to launch at least one executor. But we can still successfully run a job under this warning, which means it does have launched executor.

Does this PR introduce any user-facing change?

Yes. Before this PR, when using local cluster mode to start spark-shell, e.g. ./bin/spark-shell --master "local-cluster[2, 1, 1024]", the user would always see the warning:

20/06/06 22:21:02 WARN Utils: Your hostname, C02ZQ051LVDR resolves to a loopback address: 127.0.0.1; using 192.168.1.6 instead (on interface en0)
20/06/06 22:21:02 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/06/06 22:21:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Spark context Web UI available at http://192.168.1.6:4040
Spark context available as 'sc' (master = local-cluster[2, 1, 1024], app id = app-20200606222107-0000).
Spark session available as 'spark'.
20/06/06 22:21:07 WARN Master: App app-20200606222107-0000 requires more resource than any of Workers could have.
20/06/06 22:21:07 WARN Master: App app-20200606222107-0000 requires more resource than any of Workers could have.

After this PR, the warning has gone.

How was this patch tested?

Tested manually.

@Ngone51
Copy link
Member Author

Ngone51 commented Jun 6, 2020

ping @tgravescs @jiangxb1987 Please take a look, thanks!

@SparkQA
Copy link

SparkQA commented Jun 6, 2020

Test build #123590 has finished for PR 28742 at commit 55d4730.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.filter(canLaunchExecutor(_, app.desc))
.sortBy(_.coresFree).reverse
if (waitingApps.length == 1 && usableWorkers.isEmpty) {
val appMayHang = waitingApps.length == 1 &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to verify waitingApps.length == 1 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm..I was also thinking about this question when I was fixing. I think we can extend it to multiple apps. Let me try it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how useful is this warning message, this just reflects the current status within one round of scheduling, we can tolerant the behavior to not launch a new executor on currently available workers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from #25047 (comment) originally. The warning is mainly used to detect the possible hang of an application.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this check, but that seems to be the most simple solution for now. Let's keep it then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea of ==1 is that there is only 1 active application, if more then 1, other applications could be using resources so unless you did a lot more checking. We could definitely make it a more comprehensive check in the future. I believe this was supposed to be a simple sanity check for people playing around that may have started the cluster with not enough resources. The standalone mode does not tell you there aren't enough and will just hang forever.

@jiangxb1987
Copy link
Contributor

LGTM

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 8, 2020

Test build #123643 has finished for PR 28742 at commit 55d4730.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you all. Merged to master/3.0.

dongjoon-hyun pushed a commit that referenced this pull request Jun 9, 2020
… resource than any of Workers could have"

### What changes were proposed in this pull request?

This PR adds the check to see whether the allocated executors for the waiting application is empty before recognizing it as a possible hang application.

### Why are the changes needed?

It's a bugfix. The warning means there are not enough resources for the application to launch at least one executor. But we can still successfully run a job under this warning, which means it does have launched executor.

### Does this PR introduce _any_ user-facing change?

Yes. Before this PR, when using local cluster mode to start spark-shell, e.g. `./bin/spark-shell --master "local-cluster[2, 1, 1024]"`, the user would always see the warning:

```
20/06/06 22:21:02 WARN Utils: Your hostname, C02ZQ051LVDR resolves to a loopback address: 127.0.0.1; using 192.168.1.6 instead (on interface en0)
20/06/06 22:21:02 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/06/06 22:21:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
Spark context Web UI available at http://192.168.1.6:4040
Spark context available as 'sc' (master = local-cluster[2, 1, 1024], app id = app-20200606222107-0000).
Spark session available as 'spark'.
20/06/06 22:21:07 WARN Master: App app-20200606222107-0000 requires more resource than any of Workers could have.
20/06/06 22:21:07 WARN Master: App app-20200606222107-0000 requires more resource than any of Workers could have.
```

After this PR, the warning has gone.

### How was this patch tested?

Tested manually.

Closes #28742 from Ngone51/fix_warning.

Authored-by: yi.wu <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 38873d5)
Signed-off-by: Dongjoon Hyun <[email protected]>
@Ngone51
Copy link
Member Author

Ngone51 commented Jun 10, 2020

thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants