Skip to content

Conversation

@tgravescs
Copy link
Contributor

What changes were proposed in this pull request?

Add in GPU and generic resource type allocation to the executors.

Note this is part of a bigger feature for gpu-aware scheduling and is just how the executor find the resources. The general flow :

  • users ask for a certain set of resources, for instance number of gpus - each cluster manager has a specific way to do this.
  • cluster manager allocates a container or set of resources (standalone mode)
  • When spark launches the executor in that container, the executor either has to be told what resources it has or it has to auto discover them.
  • Executor has to register with Driver and tell the driver the set of resources it has so the scheduler can use that to schedule tasks that requires a certain amount of each of those resources

In this pr I added configs and arguments to the executor to be able discover resources. The argument to the executor is intended to be used by standalone mode or other cluster managers that don't have isolation so that it can assign specific resources to specific executors in case there are multiple executors on a node.

The discovery script is meant to be used in an isolated environment where the executor only sees the resources it should use.

Note that there will be follow on PRs to add other parts like the scheduler part. See the epic high level jira: https://issues.apache.org/jira/browse/SPARK-24615

How was this patch tested?

Added unit tests and manually tested.


case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls, attributes) =>
case RegisterExecutor(executorId, executorRef, hostname, cores, logUrls,
attributes, resources) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this isn't used anywhere at this point, the follow on jiras for scheduler will use it, this seemed like a good point to split the functionality.

@tgravescs tgravescs changed the title Gpu sched executor clean [SPARK-27024] Executor interface for cluster managers to support GPU Apr 17, 2019
@tgravescs tgravescs changed the title [SPARK-27024] Executor interface for cluster managers to support GPU [SPARK-27024] Executor interface for cluster managers to support GPU and other resources Apr 17, 2019
@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #104669 has finished for PR 24394 at commit 916991e.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# The script will return a string in the format: count:unit:comma-separated list of the resource addresses
#

ADDRS=`nvidia-smi --query-gpu=index --format=csv,noheader | sed 'N;s/\n/,/'`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't referenced this script in any documentation yet, I think as part of this SPIP we should add some high level descriptions about how it all flows - SPARK-27492

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #104674 has finished for PR 24394 at commit 6ff9953.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 17, 2019

Test build #104678 has finished for PR 24394 at commit bee34a0.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor Author

looks like I need to build with mesos.

| --executor-id <executorId>
| --hostname <hostname>
| --cores <cores>
| --resourceAddrs <rtype1=count:unit:addr1,addr2;rtype2=count:unit:r2addr1,r2addr2...>
Copy link
Contributor Author

@tgravescs tgravescs Apr 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this some more I think I should make this json format so its more extensible in the future and not as ugly on the command line.

@tgravescs tgravescs closed this Apr 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants