Skip to content

Conversation

@windkit
Copy link
Contributor

@windkit windkit commented Oct 17, 2017

What changes were proposed in this pull request?

To limit the amount of resources a spark job accept from Mesos, currently we can only use spark.cores.max to limit in terms of cpu cores.
However, when we have big memory executors, it would consume all the resources.

This PR added spark.mem.max option for Mesos

How was this patch tested?

Added Unit Test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove space

@skonto
Copy link
Contributor

skonto commented Oct 18, 2017

@windkit there is an open issue here: https://issues.apache.org/jira/browse/SPARK-22133
Could you please add documentation of the new property along with the old ones.

Copy link
Contributor

@skonto skonto Oct 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we defend against minimum values and fail fast? For example default executor memory is 1.4MB. We could calculate the value returned by MesosSchedulerUtils.executorMemory. I don't think these values calculated in canLaunchTask ever change. Same applies for cpus.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skonto
For cpus, I think we can compare with minCoresPerExecutor
For mem, calling the MesosSchedulerUtils.executorMemory to get the minimum requirement.

Then at here, we parse the option, check the minimum and if it is too small, throw exception?

Copy link
Contributor

@skonto skonto Oct 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exception I think would be ok, the idea if something is never going to work let the user know, especially for the novice user. In general, the minimum would be a warning if we dont want an exception thrown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add the check with this PR or a separate one?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have a check similar to

but for memory, so that we know we'll "land" on the maximum?

@windkit
Copy link
Contributor Author

windkit commented Oct 20, 2017

@skonto sure I can add those in, can you point to me where the documentation source code is?

@skonto
Copy link
Contributor

skonto commented Oct 20, 2017

@skonto
Copy link
Contributor

skonto commented Oct 20, 2017

@susanxhuynh pls review.

@susanxhuynh
Copy link
Contributor

@windkit Trying to understand the need for this config ... could you accomplish the same thing by setting spark.cores.max, spark.executor.cores, and spark.executor.memory? Could you give an example scenario where this is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: 'park'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: 'park'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: 'park'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add "across the cluster (not from each machine)". And, something about there is no maximum if this property is not set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion: "Duration for which unused resources are considered declined, when maximum number of cores spark.cores.max has been reached."
@ArtRand Is this the documentation you had in mind in https://issues.apache.org/jira/browse/SPARK-22133 ? Is this enough information for a non-Mesos expert to set this?

@windkit
Copy link
Contributor Author

windkit commented Nov 1, 2017

@susanxhuynh Thanks for reviewing.
I want to use both spark.mem.max and spark.cores.max to limit resource one task can use within the cluster.
Now I am setting up a common cluster for several users, they are allowed to configure spark.executor.cores and spark.executor.memory according to their need. I then need a limit for both cpu cores and memory.

asfgit pushed a commit that referenced this pull request Nov 8, 2017
## What changes were proposed in this pull request?
Documentation about Mesos Reject Offer Configurations

## Related PR
#19510 for `spark.mem.max`

Author: Li, YanKit | Wilson | RIT <[email protected]>

Closes #19555 from windkit/spark_22133.
@skonto
Copy link
Contributor

skonto commented May 19, 2018

@windkit Could you update the PR? @susanxhuynh I think this is fine to merge, are you ok?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

gentle ping @windkit

@asfgit asfgit closed this in 1a4fda8 Jul 19, 2018
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 22, 2025
Closes apache#17422
Closes apache#17619
Closes apache#18034
Closes apache#18229
Closes apache#18268
Closes apache#17973
Closes apache#18125
Closes apache#18918
Closes apache#19274
Closes apache#19456
Closes apache#19510
Closes apache#19420
Closes apache#20090
Closes apache#20177
Closes apache#20304
Closes apache#20319
Closes apache#20543
Closes apache#20437
Closes apache#21261
Closes apache#21726
Closes apache#14653
Closes apache#13143
Closes apache#17894
Closes apache#19758
Closes apache#12951
Closes apache#17092
Closes apache#21240
Closes apache#16910
Closes apache#12904
Closes apache#21731
Closes apache#21095

Added:
Closes apache#19233
Closes apache#20100
Closes apache#21453
Closes apache#21455
Closes apache#18477

Added:
Closes apache#21812
Closes apache#21787

Author: hyukjinkwon <[email protected]>

Closes apache#21781 from HyukjinKwon/closing-prs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants