Skip to content

Conversation

@holdenk
Copy link
Contributor

@holdenk holdenk commented Sep 18, 2019

What changes were proposed in this pull request?

Currently the SerializableConfiguration, which makes the Hadoop configuration serializable is private. This makes it public, with a developer annotation.

Why are the changes needed?

Many data source depend on the Hadoop configuration which may have specific components on the driver. Inside of Spark's own DataSourceV2 implementations this is frequently used (Parquet, Json, Orc, etc.)

Does this PR introduce any user-facing change?

This provides a new developer API.

How was this patch tested?

No new tests are added as this only exposes a previously developed & thoroughly used + tested component.

…e / sink writers working on DataSource V2 who need access to the Hadoop configuration. This is used extensively inside of Spark's own DSV2 implementations.
@holdenk holdenk changed the title [SPARK-29158] Expose SerializableConfiguration for DataSource V2 developers [SPARK-29158][SQL] Expose SerializableConfiguration for DataSource V2 developers Sep 18, 2019
@holdenk
Copy link
Contributor Author

holdenk commented Sep 18, 2019

cc @dbtsai who I was talking about this with.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.
cc @rdblue , @cloud-fan , @gatorsmile

@HyukjinKwon
Copy link
Member

I guess it's fine.

@SparkQA
Copy link

SparkQA commented Sep 19, 2019

Test build #110949 has finished for PR 25838 at commit 76b9f96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rdblue
Copy link
Contributor

rdblue commented Sep 19, 2019

+1 to this as it is. I think developer API is more appropriate. I'm not against unstable if that's what it takes to get this in, but it seems unlikely that this is actually unstable.

@holdenk
Copy link
Contributor Author

holdenk commented Sep 19, 2019

I've marked it as unstable, I doubt we'll change it but I don't feel strongly about that.
I've also added a test to make sure it's callable from Java.
If no one objects I'll merge this tomorrow.

@SparkQA
Copy link

SparkQA commented Sep 19, 2019

Test build #111017 has finished for PR 25838 at commit 9f1f561.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SerializableConfigurationSuite

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 20, 2019

Test build #111036 has finished for PR 25838 at commit 9f1f561.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SerializableConfigurationSuite

@HyukjinKwon
Copy link
Member

Merged to master.

* This test ensures that the API we've exposed for SerializableConfiguration is usable
* from Java. It does not test any of the serialization it's self.
*/
class SerializableConfigurationSuite {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is a compilation test, a test suite should not be under core/src/main/java.
@HyukjinKwon . Could you make a follow up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, this will be a part of our core library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I missed it too. yea will fix it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here #25867

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this all, my bad I thought was in the test directory but didn't pay enough attention.

dongjoon-hyun pushed a commit that referenced this pull request Sep 20, 2019
…est` and minor documentation correction

### What changes were proposed in this pull request?

This PR is a followup of #25838 and proposes to create an actual test case under `src/test`. Previously, compile only test existed at `src/main`.

Also, just changed the wordings in `SerializableConfiguration` just only to describe what it does (remove other words).

### Why are the changes needed?

Tests codes should better exist in `src/test` not `src/main`. Also, it should better test a basic functionality.

### Does this PR introduce any user-facing change?

No except minor doc change.

### How was this patch tested?

Unit test was added.

Closes #25867 from HyukjinKwon/SPARK-29158.

Authored-by: HyukjinKwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants