[SPARK-3595] Respect configured OutputCommitters when calling saveAsHadoopFile #2450

themodernlife · 2014-09-18T20:56:26Z

Addresses the issue in https://issues.apache.org/jira/browse/SPARK-3595, namely saveAsHadoopFile hardcoding the OutputCommitter. This is not ideal when running Spark jobs that write to S3, especially when running them from an EMR cluster where the default OutputCommitter is a DirectOutputCommitter.

… to an S3 bucket from an EMR cluster

AmplabJenkins · 2014-09-18T20:57:08Z

Can one of the admins verify this patch?

pwendell · 2014-09-19T18:05:42Z

core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala

For this comment I'd make it more general:

// Use configured output committer if already set

I'm guessing over time we'll run into many formats that require this.

pwendell · 2014-09-19T18:21:11Z

Thanks for sending this. The approach seems solid. I made some small comments in a few places.

pwendell · 2014-09-21T00:36:09Z

Jenkins, this please.

pwendell · 2014-09-21T00:37:13Z

LGTM pending tests.

pwendell · 2014-09-21T17:35:35Z

Jenkins, test this please.

SparkQA · 2014-09-21T17:38:19Z

QA tests have started for PR 2450 at commit f37a0e5.

This patch merges cleanly.

SparkQA · 2014-09-21T18:45:54Z

QA tests have finished for PR 2450 at commit f37a0e5.

This patch passes unit tests.
This patch merges cleanly.
This patch adds no public classes.

…adoopFile Addresses the issue in https://issues.apache.org/jira/browse/SPARK-3595, namely saveAsHadoopFile hardcoding the OutputCommitter. This is not ideal when running Spark jobs that write to S3, especially when running them from an EMR cluster where the default OutputCommitter is a DirectOutputCommitter. Author: Ian Hummel <[email protected]> Closes #2450 from themodernlife/spark-3595 and squashes the following commits: f37a0e5 [Ian Hummel] Update based on comments from pwendell a11d9f3 [Ian Hummel] Fix formatting 4359664 [Ian Hummel] Add an example showing usage 8b6be94 [Ian Hummel] Add ability to specify OutputCommitter, espcially useful when writing to an S3 bucket from an EMR cluster

…adoopFile Addresses the issue in https://issues.apache.org/jira/browse/SPARK-3595, namely saveAsHadoopFile hardcoding the OutputCommitter. This is not ideal when running Spark jobs that write to S3, especially when running them from an EMR cluster where the default OutputCommitter is a DirectOutputCommitter. Author: Ian Hummel <[email protected]> Closes apache#2450 from themodernlife/spark-3595 and squashes the following commits: f37a0e5 [Ian Hummel] Update based on comments from pwendell a11d9f3 [Ian Hummel] Fix formatting 4359664 [Ian Hummel] Add an example showing usage 8b6be94 [Ian Hummel] Add ability to specify OutputCommitter, espcially useful when writing to an S3 bucket from an EMR cluster

themodernlife added 3 commits September 18, 2014 16:31

Add ability to specify OutputCommitter, espcially useful when writing…

8b6be94

… to an S3 bucket from an EMR cluster

Add an example showing usage

4359664

Fix formatting

a11d9f3

pwendell reviewed Sep 19, 2014
View reviewed changes

Update based on comments from pwendell

f37a0e5

asfgit closed this in a0454ef Sep 21, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-3595] Respect configured OutputCommitters when calling saveAsHadoopFile #2450

[SPARK-3595] Respect configured OutputCommitters when calling saveAsHadoopFile #2450

Uh oh!

themodernlife commented Sep 18, 2014

Uh oh!

AmplabJenkins commented Sep 18, 2014

Uh oh!

pwendell Sep 19, 2014

Uh oh!

pwendell commented Sep 19, 2014

Uh oh!

pwendell commented Sep 21, 2014

Uh oh!

pwendell commented Sep 21, 2014

Uh oh!

pwendell commented Sep 21, 2014

Uh oh!

SparkQA commented Sep 21, 2014

Uh oh!

SparkQA commented Sep 21, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-3595] Respect configured OutputCommitters when calling saveAsHadoopFile #2450

[SPARK-3595] Respect configured OutputCommitters when calling saveAsHadoopFile #2450

Uh oh!

Conversation

themodernlife commented Sep 18, 2014

Uh oh!

AmplabJenkins commented Sep 18, 2014

Uh oh!

pwendell Sep 19, 2014

Choose a reason for hiding this comment

Uh oh!

pwendell commented Sep 19, 2014

Uh oh!

pwendell commented Sep 21, 2014

Uh oh!

pwendell commented Sep 21, 2014

Uh oh!

pwendell commented Sep 21, 2014

Uh oh!

SparkQA commented Sep 21, 2014

Uh oh!

SparkQA commented Sep 21, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants