Skip to content

Conversation

@mingyukim
Copy link

No description provided.

foxik and others added 4 commits July 10, 2015 09:27
…Dirs for automatic deletion.

As documented in createDirectory, the result of createDirectory is not registered for automatic removal. Currently there are 4 directories left in `/tmp` after just running `pyspark`.

Author: Milan Straka <[email protected]>

Closes apache#4759 from foxik/remove-tmp-dirs and squashes the following commits:

280450d [Milan Straka] Use createTempDir in getOrCreateLocalRootDirs...
…sing Spark job to fail

Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on.
andrewor14

Author: Ashwin Shankar <[email protected]>

Closes apache#5993 from ashwinshankar77/SPARK-7451 and squashes the following commits:

90900cf [Ashwin Shankar] Fix log info message
cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure
… ne...

...gative n...

...umber of executors

Author: Sandy Ryza <sandycloudera.com>

Closes apache#5704 from sryza/sandy-spark-6954 and squashes the following commits:

b7890fb [Sandy Ryza] Avoid ramping up to an existing number of executors
6eb516a [Sandy Ryza] SPARK-6954. ExecutorAllocationManager can end up requesting a negative number of executors

Author: Sandy Ryza <[email protected]>

Closes apache#5856 from sryza/sandy-backport-6954 and squashes the following commits:

1cb517a [Sandy Ryza] [SPARK-6954] [YARN] ExecutorAllocationManager can end up requesting a negative n...
@mingyukim
Copy link
Author

@mccheah @punya for SA.

@mingyukim
Copy link
Author

We ended up not doing this. Closing.

@mingyukim mingyukim closed this Sep 18, 2015
buckhx pushed a commit to buckhx/spark that referenced this pull request Apr 13, 2016
…gle batch

## What changes were proposed in this pull request?

This PR support multiple Python UDFs within single batch, also improve the performance.

```python
>>> from pyspark.sql.types import IntegerType
>>> sqlContext.registerFunction("double", lambda x: x * 2, IntegerType())
>>> sqlContext.registerFunction("add", lambda x, y: x + y, IntegerType())
>>> sqlContext.sql("SELECT double(add(1, 2)), add(double(2), 1)").explain(True)
== Parsed Logical Plan ==
'Project [unresolvedalias('double('add(1, 2)), None),unresolvedalias('add('double(2), 1), None)]
+- OneRowRelation$

== Analyzed Logical Plan ==
double(add(1, 2)): int, add(double(2), 1): int
Project [double(add(1, 2))palantir#14,add(double(2), 1)palantir#15]
+- Project [double(add(1, 2))palantir#14,add(double(2), 1)palantir#15]
   +- Project [pythonUDF0#16 AS double(add(1, 2))palantir#14,pythonUDF0#18 AS add(double(2), 1)palantir#15]
      +- EvaluatePython [add(pythonUDF1#17, 1)], [pythonUDF0#18]
         +- EvaluatePython [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17]
            +- OneRowRelation$

== Optimized Logical Plan ==
Project [pythonUDF0#16 AS double(add(1, 2))palantir#14,pythonUDF0#18 AS add(double(2), 1)palantir#15]
+- EvaluatePython [add(pythonUDF1#17, 1)], [pythonUDF0#18]
   +- EvaluatePython [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17]
      +- OneRowRelation$

== Physical Plan ==
WholeStageCodegen
:  +- Project [pythonUDF0#16 AS double(add(1, 2))palantir#14,pythonUDF0#18 AS add(double(2), 1)palantir#15]
:     +- INPUT
+- !BatchPythonEvaluation [add(pythonUDF1#17, 1)], [pythonUDF0#16,pythonUDF1#17,pythonUDF0#18]
   +- !BatchPythonEvaluation [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17]
      +- Scan OneRowRelation[]
```

## How was this patch tested?

Added new tests.

Using the following script to benchmark 1, 2 and 3 udfs,
```
df = sqlContext.range(1, 1 << 23, 1, 4)
double = F.udf(lambda x: x * 2, LongType())
print df.select(double(df.id)).count()
print df.select(double(df.id), double(df.id + 1)).count()
print df.select(double(df.id), double(df.id + 1), double(df.id + 2)).count()
```
Here is the results:

N | Before | After  | speed up
---- |------------ | -------------|------
1 | 22 s | 7 s |  3.1X
2 | 38 s | 13 s | 2.9X
3 | 58 s | 16 s | 3.6X

This benchmark ran locally with 4 CPUs. For 3 UDFs, it launched 12 Python before before this patch, 4 process after this patch. After this patch, it will use less memory for multiple UDFs than before (less buffering).

Author: Davies Liu <[email protected]>

Closes apache#12057 from davies/multi_udfs.
@robert3005 robert3005 deleted the mkim/1.3.1-palantir3 branch September 24, 2016 04:09
ash211 pushed a commit that referenced this pull request Feb 16, 2017
* Added service name as prefix to executor pods to be able to tell them apart from kubectl output

* Addressed comments
mccheah pushed a commit that referenced this pull request Apr 27, 2017
* Added service name as prefix to executor pods to be able to tell them apart from kubectl output

* Addressed comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants