-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29158][SQL] Expose SerializableConfiguration for DataSource V2 developers #25838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29158][SQL] Expose SerializableConfiguration for DataSource V2 developers #25838
Conversation
…e / sink writers working on DataSource V2 who need access to the Hadoop configuration. This is used extensively inside of Spark's own DSV2 implementations.
|
cc @dbtsai who I was talking about this with. |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
cc @rdblue , @cloud-fan , @gatorsmile
core/src/main/scala/org/apache/spark/util/SerializableConfiguration.scala
Outdated
Show resolved
Hide resolved
|
I guess it's fine. |
|
Test build #110949 has finished for PR 25838 at commit
|
|
+1 to this as it is. I think developer API is more appropriate. I'm not against unstable if that's what it takes to get this in, but it seems unlikely that this is actually unstable. |
|
I've marked it as unstable, I doubt we'll change it but I don't feel strongly about that. |
|
Test build #111017 has finished for PR 25838 at commit
|
|
retest this please |
|
Test build #111036 has finished for PR 25838 at commit
|
|
Merged to master. |
| * This test ensures that the API we've exposed for SerializableConfiguration is usable | ||
| * from Java. It does not test any of the serialization it's self. | ||
| */ | ||
| class SerializableConfigurationSuite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is a compilation test, a test suite should not be under core/src/main/java.
@HyukjinKwon . Could you make a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, this will be a part of our core library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I missed it too. yea will fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here #25867
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this all, my bad I thought was in the test directory but didn't pay enough attention.
…est` and minor documentation correction ### What changes were proposed in this pull request? This PR is a followup of #25838 and proposes to create an actual test case under `src/test`. Previously, compile only test existed at `src/main`. Also, just changed the wordings in `SerializableConfiguration` just only to describe what it does (remove other words). ### Why are the changes needed? Tests codes should better exist in `src/test` not `src/main`. Also, it should better test a basic functionality. ### Does this PR introduce any user-facing change? No except minor doc change. ### How was this patch tested? Unit test was added. Closes #25867 from HyukjinKwon/SPARK-29158. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
Currently the SerializableConfiguration, which makes the Hadoop configuration serializable is private. This makes it public, with a developer annotation.
Why are the changes needed?
Many data source depend on the Hadoop configuration which may have specific components on the driver. Inside of Spark's own DataSourceV2 implementations this is frequently used (Parquet, Json, Orc, etc.)
Does this PR introduce any user-facing change?
This provides a new developer API.
How was this patch tested?
No new tests are added as this only exposes a previously developed & thoroughly used + tested component.