-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-41238][CONNECT][PYTHON] Support more built-in datatypes #38770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41238][CONNECT][PYTHON] Support more built-in datatypes #38770
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice! LGTM, pending tests.
We should also probably fix up literal support together after this (see also #38462 (comment))
|
Thanks. Can you test both |
aa72c69 to
4c5f279
Compare
the tests added in |
|
Thank you! This has caused some challenges in demoing as we didn't support many types! |
|
|
||
| // This message describes the logical [[DataType]] of something. It does not carry the value | ||
| // itself but only describes it. | ||
| message DataType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a severe breaking change, we need to make sure that at some point in time our protos become stable. In particular, here the order of the members changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also need to add some UDTs(like VectorUDT) in the future, but now SQL types were completed.
| ) | ||
|
|
||
| # test FloatType, DoubleType, DecimalType, StringType, BooleanType, NullType | ||
| query = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the SQL test for type checking :)
|
LGTM There is a duplicate JIRA: https://issues.apache.org/jira/browse/SPARK-41057. Please remember mark that one as duplicate when this PR is merged. |
|
merged into master |
will take a look |
…` in the client ### What changes were proposed in this pull request? Support `DayTimeIntervalType` in the client ### Why are the changes needed? In #38770, I forgot to deal with `DayTimeIntervalType` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case, the schema should be ``` In [1]: query = """ SELECT INTERVAL '100 10:30' DAY TO MINUTE AS interval """ In [2]: spark.sql(query).schema Out[2]: StructType([StructField('interval', DayTimeIntervalType(0, 2), False)]) ``` Closes #38818 from zhengruifeng/connect_type_time_stamp. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? 1, in the sever side, make `proto_datatype` <-> `catalyst_datatype` conversion support all the built-in sql datatypes; 2, in the client side, make `proto_datatype` <-> `pyspark_catalyst_datatype` conversion support [all the datatypes that are supported in pyspark now.](https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L60-L83) ### Why are the changes needed? right now, only `long`, `string`, `struct` are supported ``` grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNKNOWN details = "Does not support convert float to connect proto types." debug_error_string = "{"created":"1669206685.760099000","description":"Error received from peer ipv6:[::1]:15002","file":"src/core/lib/surface/call.cc","file_line":1064,"grpc_message":"Does not support convert float to connect proto types.","grpc_status":2}" ``` this PR make the schema and literal expr support more datatypes. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added UT Closes apache#38770 from zhengruifeng/connect_support_more_datatypes. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
…` in the client ### What changes were proposed in this pull request? Support `DayTimeIntervalType` in the client ### Why are the changes needed? In apache#38770, I forgot to deal with `DayTimeIntervalType` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case, the schema should be ``` In [1]: query = """ SELECT INTERVAL '100 10:30' DAY TO MINUTE AS interval """ In [2]: spark.sql(query).schema Out[2]: StructType([StructField('interval', DayTimeIntervalType(0, 2), False)]) ``` Closes apache#38818 from zhengruifeng/connect_type_time_stamp. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request? 1, in the sever side, make `proto_datatype` <-> `catalyst_datatype` conversion support all the built-in sql datatypes; 2, in the client side, make `proto_datatype` <-> `pyspark_catalyst_datatype` conversion support [all the datatypes that are supported in pyspark now.](https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L60-L83) ### Why are the changes needed? right now, only `long`, `string`, `struct` are supported ``` grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNKNOWN details = "Does not support convert float to connect proto types." debug_error_string = "{"created":"1669206685.760099000","description":"Error received from peer ipv6:[::1]:15002","file":"src/core/lib/surface/call.cc","file_line":1064,"grpc_message":"Does not support convert float to connect proto types.","grpc_status":2}" ``` this PR make the schema and literal expr support more datatypes. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added UT Closes apache#38770 from zhengruifeng/connect_support_more_datatypes. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
…` in the client ### What changes were proposed in this pull request? Support `DayTimeIntervalType` in the client ### Why are the changes needed? In apache#38770, I forgot to deal with `DayTimeIntervalType` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? added test case, the schema should be ``` In [1]: query = """ SELECT INTERVAL '100 10:30' DAY TO MINUTE AS interval """ In [2]: spark.sql(query).schema Out[2]: StructType([StructField('interval', DayTimeIntervalType(0, 2), False)]) ``` Closes apache#38818 from zhengruifeng/connect_type_time_stamp. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
What changes were proposed in this pull request?
1, in the sever side, make
proto_datatype<->catalyst_datatypeconversion support all the built-in sql datatypes;2, in the client side, make
proto_datatype<->pyspark_catalyst_datatypeconversion support all the datatypes that are supported in pyspark now.Why are the changes needed?
right now, only
long,string,structare supportedthis PR make the schema and literal expr support more datatypes.
Does this PR introduce any user-facing change?
No
How was this patch tested?
added UT