-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15917][CORE] Added support for number of executors in Standalone [WIP] #15405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…can't be satisfied
|
add to whitelist |
| val numExecutorsLaunched = app.executors.size | ||
| // Check to see if we managed to launch the requested number of executors | ||
| if(numUsable != 0 && numExecutorsLaunched != app.executorLimit && | ||
| numExecutorsScheduled != app.executorLimit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are numExecutorsLaunched and numExecutorsScheduled related to each other? Also here we probably want to do an inequality check just in case.
Also style: need space after if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing is, how noisy is this? Do we log this if dynamic allocation is turned on (we shouldn't)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numExecutorsLaunched corresponds to the actual number of executors that have been launched so far (literally that have been registered in the executors list in the ApplicationInfo), whereas numExecutorsScheduled corresponds to the number of executors that have been scheduled/allocated by scheduleExecutorsOnWorkers. This is needed because scheduleExecutorsOnWorkers is called multiple times when setting up the executors, and if we don't check the condition we will log repeatedly the same message but with incorrect information (such as "0 executors launched" even though the executors have been launched previously).
Tell me if that doesn't make sense, I did a lot of trial and error until coming up with this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the noise produced, it should be quite minimal. When it's not possible to launch the number of executors requested, just one warning is logged.
With dynamic allocation on, a message is logged when the initial number of executors is specified and it couldn't be satisfied. I don't think it's too much of a problem as there isn't any warning currently for that, but I can add a check to remove the warning when dynamic allocation is enabled if you prefer.
|
Thanks for working on this. It's great to see how small the patch turned out to be! |
|
Test build #66675 has finished for PR 15405 at commit
|
|
Test build #66681 has finished for PR 15405 at commit
|
|
Test build #3323 has finished for PR 15405 at commit
|
|
Are you still working on this? @JonathanTaws |
|
Hi Jiang,
I've put this on hold as I wasn't getting updates from the admins on the
next steps for this. I'd definitely like to move on with this and
contribute it to the codebase, as I belive it's still relevant nowadays.
Let me know!
Le 13 juin 2017 04:54, "Jiang Xingbo" <[email protected]> a écrit :
Are you still working on this? @JonathanTaws
<https://github.com/jonathantaws>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15405 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFS21qYo6hs8nM2K2Yus5rM33vPjsCZSks5sDfnxgaJpZM4KSBQa>
.
|
|
I see this is WIP, when do you think it will be ready for review? Thanks! |
|
My bad, should have removed it. I'll check it's working as expected this
weekend and we can move forward on it!
Le 14 juin 2017 03:33, "Jiang Xingbo" <[email protected]> a écrit :
… I see this is WIP, when do you think it will be ready for review? Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15405 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFS21gNJxKCLiugvq3CCyVVP94DLEvfWks5sDzhXgaJpZM4KSBQa>
.
|
|
ping @JonathanTaws Please let me know once this PR is ready for review, thanks! |
|
@jiang Quite busy at the moment, will take care of it as soon as possible.
I'll ping you once it's done
Le 25 juin 2017 16:33, "Jiang Xingbo" <[email protected]> a écrit :
ping @JonathanTaws <https://github.com/jonathantaws> Please let me know
once this PR is ready for review, thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15405 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFS21riOyIG178z5LukgJj2hxxAV1Hljks5sHm-zgaJpZM4KSBQa>
.
|
## What changes were proposed in this pull request? This PR proposes to close stale PRs, mostly the same instances with apache#18017 Closes apache#14085 - [SPARK-16408][SQL] SparkSQL Added file get Exception: is a directory … Closes apache#14239 - [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism to accelerate shuffle stage. Closes apache#14567 - [SPARK-16992][PYSPARK] Python Pep8 formatting and import reorganisation Closes apache#14579 - [SPARK-16921][PYSPARK] RDD/DataFrame persist()/cache() should return Python context managers Closes apache#14601 - [SPARK-13979][Core] Killed executor is re spawned without AWS key… Closes apache#14830 - [SPARK-16992][PYSPARK][DOCS] import sort and autopep8 on Pyspark examples Closes apache#14963 - [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in lint-python Closes apache#15227 - [SPARK-17655][SQL]Remove unused variables declarations and definations in a WholeStageCodeGened stage Closes apache#15240 - [SPARK-17556] [CORE] [SQL] Executor side broadcast for broadcast joins Closes apache#15405 - [SPARK-15917][CORE] Added support for number of executors in Standalone [WIP] Closes apache#16099 - [SPARK-18665][SQL] set statement state to "ERROR" after user cancel job Closes apache#16445 - [SPARK-19043][SQL]Make SparkSQLSessionManager more configurable Closes apache#16618 - [SPARK-14409][ML][WIP] Add RankingEvaluator Closes apache#16766 - [SPARK-19426][SQL] Custom coalesce for Dataset Closes apache#16832 - [SPARK-19490][SQL] ignore case sensitivity when filtering hive partition columns Closes apache#17052 - [SPARK-19690][SS] Join a streaming DataFrame with a batch DataFrame which has an aggregation may not work Closes apache#17267 - [SPARK-19926][PYSPARK] Make pyspark exception more user-friendly Closes apache#17371 - [SPARK-19903][PYSPARK][SS] window operator miss the `watermark` metadata of time column Closes apache#17401 - [SPARK-18364][YARN] Expose metrics for YarnShuffleService Closes apache#17519 - [SPARK-15352][Doc] follow-up: add configuration docs for topology-aware block replication Closes apache#17530 - [SPARK-5158] Access kerberized HDFS from Spark standalone Closes apache#17854 - [SPARK-20564][Deploy] Reduce massive executor failures when executor count is large (>2000) Closes apache#17979 - [SPARK-19320][MESOS][WIP]allow specifying a hard limit on number of gpus required in each spark executor when running on mesos Closes apache#18127 - [SPARK-6628][SQL][Branch-2.1] Fix ClassCastException when executing sql statement 'insert into' on hbase table Closes apache#18236 - [SPARK-21015] Check field name is not null and empty in GenericRowWit… Closes apache#18269 - [SPARK-21056][SQL] Use at most one spark job to list files in InMemoryFileIndex Closes apache#18328 - [SPARK-21121][SQL] Support changing storage level via the spark.sql.inMemoryColumnarStorage.level variable Closes apache#18354 - [SPARK-18016][SQL][CATALYST][BRANCH-2.1] Code Generation: Constant Pool Limit - Class Splitting Closes apache#18383 - [SPARK-21167][SS] Set kafka clientId while fetch messages Closes apache#18414 - [SPARK-21169] [core] Make sure to update application status to RUNNING if executors are accepted and RUNNING after recovery Closes apache#18432 - resolve com.esotericsoftware.kryo.KryoException Closes apache#18490 - [SPARK-21269][Core][WIP] Fix FetchFailedException when enable maxReqSizeShuffleToMem and KryoSerializer Closes apache#18585 - SPARK-21359 Closes apache#18609 - Spark SQL merge small files to big files Update InsertIntoHiveTable.scala Added: Closes apache#18308 - [SPARK-21099][Spark Core] INFO Log Message Using Incorrect Executor I… Closes apache#18599 - [SPARK-21372] spark writes one log file even I set the number of spark_rotate_log to 0 Closes apache#18619 - [SPARK-21397][BUILD]Maven shade plugin adding dependency-reduced-pom.xml to … Closes apache#18667 - Fix the simpleString used in error messages Closes apache#18782 - Branch 2.1 Added: Closes apache#17694 - [SPARK-12717][PYSPARK] Resolving race condition with pyspark broadcasts when using multiple threads Added: Closes apache#16456 - [SPARK-18994] clean up the local directories for application in future by annother thread Closes apache#18683 - [SPARK-21474][CORE] Make number of parallel fetches from a reducer configurable Closes apache#18690 - [SPARK-21334][CORE] Add metrics reporting service to External Shuffle Server Added: Closes apache#18827 - Merge pull request 1 from apache/master ## How was this patch tested? N/A Author: hyukjinkwon <[email protected]> Closes apache#18780 from HyukjinKwon/close-prs.
What changes were proposed in this pull request?
Currently in standalone mode it is not possible to set the number of executors by using the
--num-executorsorspark.executor.instancesproperty. Instead, as many executors as possible will be spawned based on the available resources and the properties set.This patch corrects that to support the number of executors property.
Here's the new behavior :
executor.coresproperty isn't set, we will try to spawn one executor on each worker taking all of the cores available (like the default value) while the number of workers < number of executors requested. If we can't launch the specified number of executors, a warning is logged.executor.coresproperty is set (repeat the same logic forexecutor.memory):executor.instances*executor.cores<=cores.max, thenexecutor.instanceswill be spawned,executor.instances*executor.cores>cores.max, then as many executors will be spawned as it is possible - basically the previous behavior when only executor.cores was set - but we also log a warning saying we couldn't spawn the requested number of executors,In the case where
executor.memoryis set, all constraints are taken into account based on the number of cores and memory per worker assigned (same logic as with the cores).How was this patch tested?
I tested this patch by running a simple Spark app in standalone mode and specifying the
--num-executorsorspark.executor.instances property, and checking if the number of executors was coherent based on the available resources and the requested number of executors.I plan on testing this patch by adding tests in
MasterSuiteand running the usual/dev/run-tests.