[SPARK-2652] [PySpark] Turning some default configs for PySpark #1568

davies · 2014-07-24T06:59:55Z

Add several default configs for PySpark, related to serialization in JVM.

spark.serializer = org.apache.spark.serializer.KryoSerializer
spark.serializer.objectStreamReset = 100
spark.rdd.compress = True

This will help to reduce the memory usage during RDD.partitionBy()

SparkQA · 2014-07-24T07:03:28Z

QA tests have started for PR 1568. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17107/consoleFull

SparkQA · 2014-07-24T07:46:10Z

QA results for PR 1568:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17107/consoleFull

SparkQA · 2014-07-24T18:38:29Z

QA tests have started for PR 1568. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17127/consoleFull

SparkQA · 2014-07-24T19:22:11Z

QA results for PR 1568:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17127/consoleFull

mateiz · 2014-07-25T01:17:20Z

python/pyspark/context.py

I've now merged #1051, so update this to do _conf.setIfMissing().

Also you may want to move the "spark.rdd.compress" that that one set into your map above

SparkQA · 2014-07-25T17:58:37Z

QA tests have started for PR 1568. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17189/consoleFull

SparkQA · 2014-07-25T18:43:03Z

QA results for PR 1568:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17189/consoleFull

mateiz · 2014-07-25T22:07:11Z

python/pyspark/context.py

@davies, you also need to remove the self._conf.setIfMissing("spark.rdd.compress", "true") line above. Otherwise it looks good.

SparkQA · 2014-07-26T01:23:41Z

QA tests have started for PR 1568. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17212/consoleFull

SparkQA · 2014-07-26T02:10:19Z

QA results for PR 1568:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17212/consoleFull

mateiz · 2014-07-26T08:07:46Z

Merged this, thanks.

Add several default configs for PySpark, related to serialization in JVM. spark.serializer = org.apache.spark.serializer.KryoSerializer spark.serializer.objectStreamReset = 100 spark.rdd.compress = True This will help to reduce the memory usage during RDD.partitionBy() Author: Davies Liu <[email protected]> Closes apache#1568 from davies/conf and squashes the following commits: cd316f1 [Davies Liu] remove duplicated line f71a355 [Davies Liu] rebase to master, add spark.rdd.compress = True 8f63f45 [Davies Liu] Merge branch 'master' into conf 8bc9f08 [Davies Liu] fix unittest c04a83d [Davies Liu] some default configs for PySpark

some default configs for PySpark

c04a83d

fix unittest

8bc9f08

mateiz reviewed Jul 25, 2014
View reviewed changes

davies added 2 commits July 25, 2014 10:52

Merge branch 'master' into conf

8f63f45

rebase to master, add spark.rdd.compress = True

f71a355

mateiz reviewed Jul 25, 2014
View reviewed changes

remove duplicated line

cd316f1

asfgit closed this in 75663b5 Jul 26, 2014

davies deleted the conf branch July 29, 2014 00:42

[SPARK-2652] [PySpark] Turning some default configs for PySpark #1568

[SPARK-2652] [PySpark] Turning some default configs for PySpark #1568

Uh oh!

Conversation

davies commented Jul 24, 2014

Uh oh!

SparkQA commented Jul 24, 2014

Uh oh!

SparkQA commented Jul 24, 2014

Uh oh!

SparkQA commented Jul 24, 2014

Uh oh!

SparkQA commented Jul 24, 2014

Uh oh!

mateiz Jul 25, 2014

Choose a reason for hiding this comment

Uh oh!

mateiz Jul 25, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 25, 2014

Uh oh!

SparkQA commented Jul 25, 2014

Uh oh!

mateiz Jul 25, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 26, 2014

Uh oh!

SparkQA commented Jul 26, 2014

Uh oh!

mateiz commented Jul 26, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants