-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API #40277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * @since 1.4.0 | |
| * @since 3.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just put the predicates into the DataSource message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the transform path is very different from DataSource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's simple to add a if-else in transformReadRel, if we can reuse existing DataSource message (with new field predicates )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Let's put the predicates into the DataSource message.
7627a0a to
4d48895
Compare
| // (Optional) A list of path for file-system backed data sources. | ||
| repeated string paths = 4; | ||
|
|
||
| // (Optional) Condition in the where clause for each partition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the comment that this currently only works for jdbc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
| table: String, | ||
| predicates: Array[String], | ||
| connectionProperties: Properties): DataFrame = { | ||
| sparkSession.newDataFrame { builder => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please set the format to JDBC? We are now relying the presence of predicates to figure out that something is a JDBC table. That is relying far too heavily on the client doing the right thing, for example what would happen if you set format = parquet and still define predicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. we can't rely on client.
| case s: StructType => reader.schema(s) | ||
| case other => throw InvalidPlanInput(s"Invalid schema $other") | ||
|
|
||
| if (rel.getDataSource.getPredicatesCount == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make the logic a bit like this:
if (format == "jdbc" && rel.getDataSource.getPredicatesCount) {
// Plan JDBC with predicates
} else id (rel.getDataSource.getPredicatesCount == 0) {
// Plan datasource
} else {
throw InvalidPlan(s"Predicates are not supported for $format datasources.)"
}
|
|
||
| // (Optional) Condition in the where clause for each partition. | ||
| // | ||
| // Only work for JDBC data source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only supported by the JDBC data source.
|
@hvanhovell Do you have any other advice? cc @HyukjinKwon @zhengruifeng @dongjoon-hyun |
hvanhovell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… remaining jdbc API ### What changes were proposed in this pull request? #40252 supported some jdbc API that reuse the proto msg `DataSource`. The `DataFrameReader` also have another kind jdbc API that is unrelated to load data source. ### Why are the changes needed? This PR adds the new proto msg `PartitionedJDBC` to support the remaining jdbc API. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes #40277 from beliefer/SPARK-42555_followup. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit 39a5512) Signed-off-by: Herman van Hovell <[email protected]>
|
@hvanhovell @zhengruifeng Thank you. |
… remaining jdbc API ### What changes were proposed in this pull request? apache#40252 supported some jdbc API that reuse the proto msg `DataSource`. The `DataFrameReader` also have another kind jdbc API that is unrelated to load data source. ### Why are the changes needed? This PR adds the new proto msg `PartitionedJDBC` to support the remaining jdbc API. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes apache#40277 from beliefer/SPARK-42555_followup. Authored-by: Jiaan Geng <[email protected]> Signed-off-by: Herman van Hovell <[email protected]> (cherry picked from commit 39a5512) Signed-off-by: Herman van Hovell <[email protected]>
What changes were proposed in this pull request?
#40252 supported some jdbc API that reuse the proto msg
DataSource. TheDataFrameReaderalso have another kind jdbc API that is unrelated to load data source.Why are the changes needed?
This PR adds the new proto msg
PartitionedJDBCto support the remaining jdbc API.Does this PR introduce any user-facing change?
'No'.
New feature.
How was this patch tested?
New test cases.