[SPARK-29158][SQL] Expose SerializableConfiguration for DataSource V2 developers #25838

holdenk · 2019-09-18T23:41:58Z

What changes were proposed in this pull request?

Currently the SerializableConfiguration, which makes the Hadoop configuration serializable is private. This makes it public, with a developer annotation.

Why are the changes needed?

Many data source depend on the Hadoop configuration which may have specific components on the driver. Inside of Spark's own DataSourceV2 implementations this is frequently used (Parquet, Json, Orc, etc.)

Does this PR introduce any user-facing change?

This provides a new developer API.

How was this patch tested?

No new tests are added as this only exposes a previously developed & thoroughly used + tested component.

…e / sink writers working on DataSource V2 who need access to the Hadoop configuration. This is used extensively inside of Spark's own DSV2 implementations.

holdenk · 2019-09-18T23:42:27Z

cc @dbtsai who I was talking about this with.

dongjoon-hyun

+1, LGTM.
cc @rdblue , @cloud-fan , @gatorsmile

core/src/main/scala/org/apache/spark/util/SerializableConfiguration.scala

HyukjinKwon · 2019-09-19T02:32:58Z

I guess it's fine.

SparkQA · 2019-09-19T03:13:01Z

Test build #110949 has finished for PR 25838 at commit 76b9f96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rdblue · 2019-09-19T17:30:22Z

+1 to this as it is. I think developer API is more appropriate. I'm not against unstable if that's what it takes to get this in, but it seems unlikely that this is actually unstable.

holdenk · 2019-09-19T18:23:59Z

I've marked it as unstable, I doubt we'll change it but I don't feel strongly about that.
I've also added a test to make sure it's callable from Java.
If no one objects I'll merge this tomorrow.

SparkQA · 2019-09-19T21:04:32Z

Test build #111017 has finished for PR 25838 at commit 9f1f561.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class SerializableConfigurationSuite

HyukjinKwon · 2019-09-20T01:32:13Z

retest this please

SparkQA · 2019-09-20T04:23:07Z

Test build #111036 has finished for PR 25838 at commit 9f1f561.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class SerializableConfigurationSuite

HyukjinKwon · 2019-09-20T05:39:04Z

Merged to master.

dongjoon-hyun · 2019-09-20T06:14:47Z

core/src/main/java/org/apache/spark/util/SerializableConfigurationSuite.java

+ * This test ensures that the API we've exposed for SerializableConfiguration is usable
+ * from Java. It does not test any of the serialization it's self.
+ */
+class SerializableConfigurationSuite {


Although this is a compilation test, a test suite should not be under core/src/main/java.
@HyukjinKwon . Could you make a follow up?

Otherwise, this will be a part of our core library.

Wait, I missed it too. yea will fix it

Here #25867

Thanks for catching this all, my bad I thought was in the test directory but didn't pay enough attention.

…est` and minor documentation correction ### What changes were proposed in this pull request? This PR is a followup of #25838 and proposes to create an actual test case under `src/test`. Previously, compile only test existed at `src/main`. Also, just changed the wordings in `SerializableConfiguration` just only to describe what it does (remove other words). ### Why are the changes needed? Tests codes should better exist in `src/test` not `src/main`. Also, it should better test a basic functionality. ### Does this PR introduce any user-facing change? No except minor doc change. ### How was this patch tested? Unit test was added. Closes #25867 from HyukjinKwon/SPARK-29158. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

holdenk added 2 commits September 18, 2019 14:27

Expose the SerializableConfiguration as a DeveloperAPI for data sourc…

fe4532c

…e / sink writers working on DataSource V2 who need access to the Hadoop configuration. This is used extensively inside of Spark's own DSV2 implementations.

long line fix

76b9f96

holdenk changed the title ~~[SPARK-29158] Expose SerializableConfiguration for DataSource V2 developers~~ [SPARK-29158][SQL] Expose SerializableConfiguration for DataSource V2 developers Sep 18, 2019

dongjoon-hyun approved these changes Sep 19, 2019

View reviewed changes

dongjoon-hyun added the SQL label Sep 19, 2019

HyukjinKwon reviewed Sep 19, 2019

View reviewed changes

core/src/main/scala/org/apache/spark/util/SerializableConfiguration.scala Outdated Show resolved Hide resolved

dongjoon-hyun added the SPARK CORE label Sep 19, 2019

cloud-fan approved these changes Sep 19, 2019

View reviewed changes

rdblue approved these changes Sep 19, 2019

View reviewed changes

holdenk added 2 commits September 19, 2019 11:21

Mark SerializableConfiguration as Unstable

83ed56a

Make sure we can call the Scala API from Java

9f1f561

HyukjinKwon approved these changes Sep 20, 2019

View reviewed changes

HyukjinKwon closed this in bd05339 Sep 20, 2019

dongjoon-hyun reviewed Sep 20, 2019

View reviewed changes

HyukjinKwon mentioned this pull request Sep 20, 2019

[SPARK-29158][SQL][FOLLOW-UP] Create an actual test case under src/test and minor documentation correction #25867

Closed

[SPARK-29158][SQL] Expose SerializableConfiguration for DataSource V2 developers #25838

[SPARK-29158][SQL] Expose SerializableConfiguration for DataSource V2 developers #25838

Uh oh!

Conversation

holdenk commented Sep 18, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

holdenk commented Sep 18, 2019

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HyukjinKwon commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

rdblue commented Sep 19, 2019

Uh oh!

holdenk commented Sep 19, 2019

Uh oh!

SparkQA commented Sep 19, 2019

Uh oh!

HyukjinKwon commented Sep 20, 2019

Uh oh!

SparkQA commented Sep 20, 2019

Uh oh!

HyukjinKwon commented Sep 20, 2019

Uh oh!

dongjoon-hyun Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

holdenk Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants