forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 51
Mkim/1.3.1 palantir3 #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…Dirs for automatic deletion. As documented in createDirectory, the result of createDirectory is not registered for automatic removal. Currently there are 4 directories left in `/tmp` after just running `pyspark`. Author: Milan Straka <[email protected]> Closes apache#4759 from foxik/remove-tmp-dirs and squashes the following commits: 280450d [Milan Straka] Use createTempDir in getOrCreateLocalRootDirs...
…sing Spark job to fail Added a check to handle container exit status for the preemption scenario, log an INFO message in such cases and move on. andrewor14 Author: Ashwin Shankar <[email protected]> Closes apache#5993 from ashwinshankar77/SPARK-7451 and squashes the following commits: 90900cf [Ashwin Shankar] Fix log info message cf8b6cf [Ashwin Shankar] Stop counting preemption of executors as failure
… ne... ...gative n... ...umber of executors Author: Sandy Ryza <sandycloudera.com> Closes apache#5704 from sryza/sandy-spark-6954 and squashes the following commits: b7890fb [Sandy Ryza] Avoid ramping up to an existing number of executors 6eb516a [Sandy Ryza] SPARK-6954. ExecutorAllocationManager can end up requesting a negative number of executors Author: Sandy Ryza <[email protected]> Closes apache#5856 from sryza/sandy-backport-6954 and squashes the following commits: 1cb517a [Sandy Ryza] [SPARK-6954] [YARN] ExecutorAllocationManager can end up requesting a negative n...
Author
Author
|
We ended up not doing this. Closing. |
buckhx
pushed a commit
to buckhx/spark
that referenced
this pull request
Apr 13, 2016
…gle batch
## What changes were proposed in this pull request?
This PR support multiple Python UDFs within single batch, also improve the performance.
```python
>>> from pyspark.sql.types import IntegerType
>>> sqlContext.registerFunction("double", lambda x: x * 2, IntegerType())
>>> sqlContext.registerFunction("add", lambda x, y: x + y, IntegerType())
>>> sqlContext.sql("SELECT double(add(1, 2)), add(double(2), 1)").explain(True)
== Parsed Logical Plan ==
'Project [unresolvedalias('double('add(1, 2)), None),unresolvedalias('add('double(2), 1), None)]
+- OneRowRelation$
== Analyzed Logical Plan ==
double(add(1, 2)): int, add(double(2), 1): int
Project [double(add(1, 2))palantir#14,add(double(2), 1)palantir#15]
+- Project [double(add(1, 2))palantir#14,add(double(2), 1)palantir#15]
+- Project [pythonUDF0#16 AS double(add(1, 2))palantir#14,pythonUDF0#18 AS add(double(2), 1)palantir#15]
+- EvaluatePython [add(pythonUDF1#17, 1)], [pythonUDF0#18]
+- EvaluatePython [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17]
+- OneRowRelation$
== Optimized Logical Plan ==
Project [pythonUDF0#16 AS double(add(1, 2))palantir#14,pythonUDF0#18 AS add(double(2), 1)palantir#15]
+- EvaluatePython [add(pythonUDF1#17, 1)], [pythonUDF0#18]
+- EvaluatePython [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17]
+- OneRowRelation$
== Physical Plan ==
WholeStageCodegen
: +- Project [pythonUDF0#16 AS double(add(1, 2))palantir#14,pythonUDF0#18 AS add(double(2), 1)palantir#15]
: +- INPUT
+- !BatchPythonEvaluation [add(pythonUDF1#17, 1)], [pythonUDF0#16,pythonUDF1#17,pythonUDF0#18]
+- !BatchPythonEvaluation [double(add(1, 2)),double(2)], [pythonUDF0#16,pythonUDF1#17]
+- Scan OneRowRelation[]
```
## How was this patch tested?
Added new tests.
Using the following script to benchmark 1, 2 and 3 udfs,
```
df = sqlContext.range(1, 1 << 23, 1, 4)
double = F.udf(lambda x: x * 2, LongType())
print df.select(double(df.id)).count()
print df.select(double(df.id), double(df.id + 1)).count()
print df.select(double(df.id), double(df.id + 1), double(df.id + 2)).count()
```
Here is the results:
N | Before | After | speed up
---- |------------ | -------------|------
1 | 22 s | 7 s | 3.1X
2 | 38 s | 13 s | 2.9X
3 | 58 s | 16 s | 3.6X
This benchmark ran locally with 4 CPUs. For 3 UDFs, it launched 12 Python before before this patch, 4 process after this patch. After this patch, it will use less memory for multiple UDFs than before (less buffering).
Author: Davies Liu <[email protected]>
Closes apache#12057 from davies/multi_udfs.
ash211
pushed a commit
that referenced
this pull request
Feb 16, 2017
* Added service name as prefix to executor pods to be able to tell them apart from kubectl output * Addressed comments
mccheah
pushed a commit
that referenced
this pull request
Apr 27, 2017
* Added service name as prefix to executor pods to be able to tell them apart from kubectl output * Addressed comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.