Mkim/1.3.1 palantir3 #14

mingyukim · 2015-07-10T16:35:43Z

No description provided.

…Dirs for automatic deletion. As documented in createDirectory, the result of createDirectory is not registered for automatic removal. Currently there are 4 directories left in `/tmp` after just running `pyspark`. Author: Milan Straka <[email protected]> Closes apache#4759 from foxik/remove-tmp-dirs and squashes the following commits: 280450d [Milan Straka] Use createTempDir in getOrCreateLocalRootDirs...

…sing Spark job to fail Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on. andrewor14 Author: Ashwin Shankar <[email protected]> Closes apache#5993 from ashwinshankar77/SPARK-7451 and squashes the following commits: 90900cf [Ashwin Shankar] Fix log info message cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure

… ne... ...gative n... ...umber of executors Author: Sandy Ryza <sandycloudera.com> Closes apache#5704 from sryza/sandy-spark-6954 and squashes the following commits: b7890fb [Sandy Ryza] Avoid ramping up to an existing number of executors 6eb516a [Sandy Ryza] SPARK-6954. ExecutorAllocationManager can end up requesting a negative number of executors Author: Sandy Ryza <[email protected]> Closes apache#5856 from sryza/sandy-backport-6954 and squashes the following commits: 1cb517a [Sandy Ryza] [SPARK-6954] [YARN] ExecutorAllocationManager can end up requesting a negative n...

mingyukim · 2015-07-10T16:36:13Z

@mccheah @punya for SA.

mingyukim · 2015-09-18T23:41:36Z

We ended up not doing this. Closing.

…gle batch ## What changes were proposed in this pull request? This PR support multiple Python UDFs within single batch, also improve the performance. ```python >>> from pyspark.sql.types import IntegerType >>> sqlContext.registerFunction("double", lambda x: x * 2, IntegerType()) >>> sqlContext.registerFunction("add", lambda x, y: x + y, IntegerType()) >>> sqlContext.sql("SELECT double(add(1, 2)), add(double(2), 1)").explain(True) == Parsed Logical Plan == 'Project [unresolvedalias('double('add(1, 2)), None),unresolvedalias('add('double(2), 1), None)] +- OneRowRelation$ == Analyzed Logical Plan == double(add(1, 2)): int, add(double(2), 1): int Project [double(add(1, 2))palantir#14,add(double(2), 1)palantir#15] +- Project [double(add(1, 2))palantir#14,add(double(2), 1)palantir#15] +- Project [pythonUDF0#16 AS double(add(1, 2))palantir#14,pythonUDF0#18 AS add(double(2), 1)palantir#15] +- EvaluatePython [add(pythonUDF1#17, 1)], [pythonUDF0#18] +- EvaluatePython [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17] +- OneRowRelation$ == Optimized Logical Plan == Project [pythonUDF0#16 AS double(add(1, 2))palantir#14,pythonUDF0#18 AS add(double(2), 1)palantir#15] +- EvaluatePython [add(pythonUDF1#17, 1)], [pythonUDF0#18] +- EvaluatePython [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17] +- OneRowRelation$ == Physical Plan == WholeStageCodegen : +- Project [pythonUDF0#16 AS double(add(1, 2))palantir#14,pythonUDF0#18 AS add(double(2), 1)palantir#15] : +- INPUT +- !BatchPythonEvaluation [add(pythonUDF1#17, 1)], [pythonUDF0#16,pythonUDF1#17,pythonUDF0#18] +- !BatchPythonEvaluation [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17] +- Scan OneRowRelation[] ``` ## How was this patch tested? Added new tests. Using the following script to benchmark 1, 2 and 3 udfs, ``` df = sqlContext.range(1, 1 << 23, 1, 4) double = F.udf(lambda x: x * 2, LongType()) print df.select(double(df.id)).count() print df.select(double(df.id), double(df.id + 1)).count() print df.select(double(df.id), double(df.id + 1), double(df.id + 2)).count() ``` Here is the results: N | Before | After | speed up ---- |------------ | -------------|------ 1 | 22 s | 7 s | 3.1X 2 | 38 s | 13 s | 2.9X 3 | 58 s | 16 s | 3.6X This benchmark ran locally with 4 CPUs. For 3 UDFs, it launched 12 Python before before this patch, 4 process after this patch. After this patch, it will use less memory for multiple UDFs than before (less buffering). Author: Davies Liu <[email protected]> Closes apache#12057 from davies/multi_udfs.

* Added service name as prefix to executor pods to be able to tell them apart from kubectl output * Addressed comments

foxik and others added 4 commits July 10, 2015 09:27

Preparing 1.3.1-palantir3 release

da15a61

mingyukim closed this Sep 18, 2015

robert3005 deleted the mkim/1.3.1-palantir3 branch September 24, 2016 04:09

ash211 pushed a commit that referenced this pull request Feb 16, 2017

Added service name as prefix to executor pods (#14)

a24fe11

* Added service name as prefix to executor pods to be able to tell them apart from kubectl output * Addressed comments

mccheah pushed a commit that referenced this pull request Apr 27, 2017

Added service name as prefix to executor pods (#14)

761b317

* Added service name as prefix to executor pods to be able to tell them apart from kubectl output * Addressed comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mkim/1.3.1 palantir3 #14

Mkim/1.3.1 palantir3 #14

Uh oh!

mingyukim commented Jul 10, 2015

Uh oh!

mingyukim commented Jul 10, 2015

Uh oh!

mingyukim commented Sep 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Mkim/1.3.1 palantir3 #14

Mkim/1.3.1 palantir3 #14

Uh oh!

Conversation

mingyukim commented Jul 10, 2015

Uh oh!

mingyukim commented Jul 10, 2015

Uh oh!

mingyukim commented Sep 18, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants