-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24990][SQL] merge ReadSupport and ReadSupportWithSchema #21946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@cloud-fan, thanks! I am a bot who has found some folks who might be able to help with the review:@gatorsmile, @zsxwing and @tdas |
|
Wouldn't the redo of the API that we're discussing obsolete this? |
|
In the new proposal, we just rename |
|
Isn't this unnecessary after the API redesign? For the redesign, the |
|
@cloud-fan, from your comment around the same time as mine, it sounds like the confusion may just be in how you're updating the current API to the proposed one. Can you post a migration plan? It sounds like something like this:
Is that right? The re-use of |
I'd prefer the current proposal in https://docs.google.com/document/d/1DDXCTCrup4bKWByTalkXWgavcPdvur8a4eEu8x1BzPM/edit?ts=5b613c42 :
|
|
@rdblue the plan is, I will have a big PR that implements the redesign. However, if there is something makes sense even without the redesign, we should have a separated PR. I think merging |
Yea, this is what I'm doing in my local branch for the redesign. I'll push it soon when it's finished. |
|
Yeah, I'm fine with this, then. It may be better to combine this with the other change, or to add the context to the description. |
|
Test build #93888 has finished for PR 21946 at commit
|
|
@rdblue This change is pretty isolated. It also LGTM to me. Since you are fine about the change, I am assuming you are not blocking this. I will merge this soon. |
|
+1 |
|
Test build #93891 has finished for PR 21946 at commit
|
|
Test build #93896 has finished for PR 21946 at commit
|
|
Test build #93897 has finished for PR 21946 at commit
|
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Merged to master.
Regarding user-specified schema, data sources may have 3 different behaviors: 1. must have a user-specified schema 2. can't have a user-specified schema 3. can accept the user-specified if it's given, or infer the schema. I added `ReadSupportWithSchema` to support these behaviors, following data source v1. But it turns out we don't need this extra interface. We can just add a `createReader(schema, options)` to `ReadSupport` and make it call `createReader(options)` by default. TODO: also fix the streaming API in followup PRs. existing tests. Author: Wenchen Fan <[email protected]> Closes apache#21946 from cloud-fan/ds-schema. (cherry picked from commit ce084d3) Conflicts: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala
Regarding user-specified schema, data sources may have 3 different behaviors: 1. must have a user-specified schema 2. can't have a user-specified schema 3. can accept the user-specified if it's given, or infer the schema. I added `ReadSupportWithSchema` to support these behaviors, following data source v1. But it turns out we don't need this extra interface. We can just add a `createReader(schema, options)` to `ReadSupport` and make it call `createReader(options)` by default. TODO: also fix the streaming API in followup PRs. existing tests. Author: Wenchen Fan <[email protected]> Closes apache#21946 from cloud-fan/ds-schema.
Regarding user-specified schema, data sources may have 3 different behaviors: 1. must have a user-specified schema 2. can't have a user-specified schema 3. can accept the user-specified if it's given, or infer the schema. I added `ReadSupportWithSchema` to support these behaviors, following data source v1. But it turns out we don't need this extra interface. We can just add a `createReader(schema, options)` to `ReadSupport` and make it call `createReader(options)` by default. TODO: also fix the streaming API in followup PRs. existing tests. Author: Wenchen Fan <[email protected]> Closes apache#21946 from cloud-fan/ds-schema. (cherry picked from commit ce084d3) Conflicts: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2Suite.scala
What changes were proposed in this pull request?
Regarding user-specified schema, data sources may have 3 different behaviors:
I added
ReadSupportWithSchemato support these behaviors, following data source v1. But it turns out we don't need this extra interface. We can just add acreateReader(schema, options)toReadSupportand make it callcreateReader(options)by default.TODO: also fix the streaming API in followup PRs.
How was this patch tested?
existing tests.