Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Nov 23, 2022

What changes were proposed in this pull request?

1, in the sever side, make proto_datatype <-> catalyst_datatype conversion support all the built-in sql datatypes;

2, in the client side, make proto_datatype <-> pyspark_catalyst_datatype conversion support all the datatypes that are supported in pyspark now.

Why are the changes needed?

right now, only long, string, struct are supported

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Does not support convert float to connect proto types."
        debug_error_string = "{"created":"@1669206685.760099000","description":"Error received from peer ipv6:[::1]:15002","file":"src/core/lib/surface/call.cc","file_line":1064,"grpc_message":"Does not support convert float to connect proto types.","grpc_status":2}"

this PR make the schema and literal expr support more datatypes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

added UT

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! LGTM, pending tests.

We should also probably fix up literal support together after this (see also #38462 (comment))

@amaliujia
Copy link
Contributor

amaliujia commented Nov 24, 2022

Thanks.

Can you test both nullable=true and nullable=false case? Only IIRC only ARRAY, MAP, STRUCT needs to care about it.

@zhengruifeng zhengruifeng changed the title [SPARK-41238][CONNECT][PYTHON] Support more datatypes [SPARK-41238][CONNECT][PYTHON] Support more built-in datatypes Nov 24, 2022
@zhengruifeng zhengruifeng force-pushed the connect_support_more_datatypes branch from aa72c69 to 4c5f279 Compare November 24, 2022 04:10
@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Nov 24, 2022

Can you test both nullable=true and nullable=false case?

the tests added in test_schema covers those cases @amaliujia

@grundprinzip
Copy link
Contributor

Thank you! This has caused some challenges in demoing as we didn't support many types!


// This message describes the logical [[DataType]] of something. It does not carry the value
// itself but only describes it.
message DataType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a severe breaking change, we need to make sure that at some point in time our protos become stable. In particular, here the order of the members changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also need to add some UDTs(like VectorUDT) in the future, but now SQL types were completed.

)

# test FloatType, DoubleType, DecimalType, StringType, BooleanType, NullType
query = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the SQL test for type checking :)

@amaliujia
Copy link
Contributor

amaliujia commented Nov 24, 2022

LGTM

There is a duplicate JIRA: https://issues.apache.org/jira/browse/SPARK-41057. Please remember mark that one as duplicate when this PR is merged.

@zhengruifeng
Copy link
Contributor Author

merged into master

@zhengruifeng zhengruifeng deleted the connect_support_more_datatypes branch November 25, 2022 01:27
@zhengruifeng
Copy link
Contributor Author

We should also probably fix up literal support together after this (see also #38462 (comment))

will take a look

zhengruifeng added a commit that referenced this pull request Nov 29, 2022
…` in the client

### What changes were proposed in this pull request?
Support `DayTimeIntervalType` in the client

### Why are the changes needed?
In #38770, I forgot to deal with `DayTimeIntervalType`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added test case, the schema should be
```
In [1]: query = """ SELECT INTERVAL '100 10:30' DAY TO MINUTE AS interval """

In [2]: spark.sql(query).schema
Out[2]: StructType([StructField('interval', DayTimeIntervalType(0, 2), False)])
```

Closes #38818 from zhengruifeng/connect_type_time_stamp.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
### What changes were proposed in this pull request?
1, in the sever side, make `proto_datatype` <-> `catalyst_datatype` conversion support all the built-in sql datatypes;

2, in the client side, make `proto_datatype` <-> `pyspark_catalyst_datatype` conversion support [all the datatypes that are supported in pyspark now.](https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L60-L83)

### Why are the changes needed?
right now, only `long`, `string`, `struct` are supported

```
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Does not support convert float to connect proto types."
        debug_error_string = "{"created":"1669206685.760099000","description":"Error received from peer ipv6:[::1]:15002","file":"src/core/lib/surface/call.cc","file_line":1064,"grpc_message":"Does not support convert float to connect proto types.","grpc_status":2}"
```

this PR make the schema and literal expr support more datatypes.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added UT

Closes apache#38770 from zhengruifeng/connect_support_more_datatypes.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 15, 2022
…` in the client

### What changes were proposed in this pull request?
Support `DayTimeIntervalType` in the client

### Why are the changes needed?
In apache#38770, I forgot to deal with `DayTimeIntervalType`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added test case, the schema should be
```
In [1]: query = """ SELECT INTERVAL '100 10:30' DAY TO MINUTE AS interval """

In [2]: spark.sql(query).schema
Out[2]: StructType([StructField('interval', DayTimeIntervalType(0, 2), False)])
```

Closes apache#38818 from zhengruifeng/connect_type_time_stamp.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
### What changes were proposed in this pull request?
1, in the sever side, make `proto_datatype` <-> `catalyst_datatype` conversion support all the built-in sql datatypes;

2, in the client side, make `proto_datatype` <-> `pyspark_catalyst_datatype` conversion support [all the datatypes that are supported in pyspark now.](https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L60-L83)

### Why are the changes needed?
right now, only `long`, `string`, `struct` are supported

```
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Does not support convert float to connect proto types."
        debug_error_string = "{"created":"1669206685.760099000","description":"Error received from peer ipv6:[::1]:15002","file":"src/core/lib/surface/call.cc","file_line":1064,"grpc_message":"Does not support convert float to connect proto types.","grpc_status":2}"
```

this PR make the schema and literal expr support more datatypes.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added UT

Closes apache#38770 from zhengruifeng/connect_support_more_datatypes.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
beliefer pushed a commit to beliefer/spark that referenced this pull request Dec 18, 2022
…` in the client

### What changes were proposed in this pull request?
Support `DayTimeIntervalType` in the client

### Why are the changes needed?
In apache#38770, I forgot to deal with `DayTimeIntervalType`

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
added test case, the schema should be
```
In [1]: query = """ SELECT INTERVAL '100 10:30' DAY TO MINUTE AS interval """

In [2]: spark.sql(query).schema
Out[2]: StructType([StructField('interval', DayTimeIntervalType(0, 2), False)])
```

Closes apache#38818 from zhengruifeng/connect_type_time_stamp.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants