Skip to content

Conversation

@liancheng
Copy link
Contributor

This PR leverages the output commit coordinator introduced in #4066 to help committing Hive and Parquet tables.

This PR extracts output commit code in SparkHadoopWriter.commit to SparkHadoopMapRedUtil.commitTask, and reuses it for committing Parquet and Hive tables on executor side.

TODO

  • Add tests

Review on Reviewable

@SparkQA
Copy link

SparkQA commented Mar 23, 2015

Test build #29004 has started for PR 5139 at commit dfdf3ef.

  • This patch merges cleanly.

@marmbrus
Copy link
Contributor

@aarondav if you have time, I'd appreciate your input here.

@SparkQA
Copy link

SparkQA commented Mar 23, 2015

Test build #29004 has finished for PR 5139 at commit dfdf3ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29004/
Test PASSed.

@liancheng
Copy link
Contributor Author

@JoshRosen It would be great if you can also help reviewing this. Thanks in advance!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation to this guy mentioning what it means to commitTask (i.e., we may contact the driver to become authorized to commit to ensure speculative tasks do not override each other, and that this may cause us to abort the task by throwing a CommitDeniedException if we cannot become authorized as such), pointing to the JIRA that this fixes (the original one).

@aarondav
Copy link
Contributor

LGTM from my side, but @JoshRosen should confirm the driver side should be happy with this. Only comment was that now that it's extracted and used in a common location, we need to make sure its API is well-documented.

@liancheng
Copy link
Contributor Author

@aarondav Thanks! Added javadoc for this.

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29405 has started for PR 5139 at commit 9a4b82b.

@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29405 has finished for PR 5139 at commit 9a4b82b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29405/
Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Commit output coordinator" -> "Output commit coordinator"

@JoshRosen
Copy link
Contributor

LGTM from a Spark core point-of-view. One of the biggest risks here is passing the Long-valued parameters in the wrong order, but it looks like we've done it correctly here. I suppose that another risk might be calling the commit function with values of jobId, splitId, and attemptId that don't match / correspond to the ones used in the MapReduceTaskAttemptContext (that would undermine the whole scheme because the coordination wouldn't necessarily be guarding the right output paths), but our usage here looks fine as far as I can tell.

@liancheng liancheng changed the title [SPARK-6369] [SQL] [WIP] Uses commit coordinator to help committing Hive and Parquet tables [SPARK-6369] [SQL] Uses commit coordinator to help committing Hive and Parquet tables Mar 30, 2015
@SparkQA
Copy link

SparkQA commented Mar 30, 2015

Test build #29434 has started for PR 5139 at commit 72eb628.

@liancheng
Copy link
Contributor Author

Thanks @JoshRosen! Fixed the typo. I'm merging this to master and 1.3.

asfgit pushed a commit that referenced this pull request Mar 30, 2015
…d Parquet tables

This PR leverages the output commit coordinator introduced in #4066 to help committing Hive and Parquet tables.

This PR extracts output commit code in `SparkHadoopWriter.commit` to `SparkHadoopMapRedUtil.commitTask`, and reuses it for committing Parquet and Hive tables on executor side.

TODO

- [ ] Add tests

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5139)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes #5139 from liancheng/spark-6369 and squashes the following commits:

72eb628 [Cheng Lian] Fixes typo in javadoc
9a4b82b [Cheng Lian] Adds javadoc and addresses @aarondav's comments
dfdf3ef [Cheng Lian] Uses commit coordinator to help committing Hive and Parquet tables

(cherry picked from commit fde6945)
Signed-off-by: Cheng Lian <[email protected]>
@asfgit asfgit closed this in fde6945 Mar 30, 2015
@liancheng liancheng deleted the spark-6369 branch March 30, 2015 23:52
@SparkQA
Copy link

SparkQA commented Mar 31, 2015

Test build #29434 has finished for PR 5139 at commit 72eb628.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29434/
Test PASSed.

yu-iskw pushed a commit to yu-iskw/spark that referenced this pull request Apr 3, 2015
…d Parquet tables

This PR leverages the output commit coordinator introduced in apache#4066 to help committing Hive and Parquet tables.

This PR extracts output commit code in `SparkHadoopWriter.commit` to `SparkHadoopMapRedUtil.commitTask`, and reuses it for committing Parquet and Hive tables on executor side.

TODO

- [ ] Add tests

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5139)
<!-- Reviewable:end -->

Author: Cheng Lian <[email protected]>

Closes apache#5139 from liancheng/spark-6369 and squashes the following commits:

72eb628 [Cheng Lian] Fixes typo in javadoc
9a4b82b [Cheng Lian] Adds javadoc and addresses @aarondav's comments
dfdf3ef [Cheng Lian] Uses commit coordinator to help committing Hive and Parquet tables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants