-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-52770][CONNECT] Support TIME type in connect proto
#51464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-52770][CONNECT] Support TIME type in connect proto
#51464
Conversation
634b4ce to
68a1563
Compare
|
@dengziming Could you review this PR since you are working on similar one: #51462 |
dengziming
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth for this PR, I think we should add a test case for this change, and I don't think your test case makes sense, I duplicated it locally and it run smoothly even without this change, take a look at the image below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make precision optional analogous to Decimal.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @peter-toth .
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the PR description looks wrong to me.
# How was this patch tested?
Manually:
scala> sql("SELECT TIME '12:13:14'").show()
+---------------+
|TIME '12:13:14'|
+---------------+
| 12:13:14|
+---------------+
As reported on the JIRA issue, show is handled by differently and works already. collect was the problem.
$ bin/spark-connect-shell --remote sc://localhost:15002
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 4.1.0-preview1
/_/
Type in expressions to have them evaluated.
Spark connect server version 4.1.0-preview1.
Spark session available as 'spark'.
scala> sql("SELECT TIME '12:13:14'").show()
+---------------+
|TIME '12:13:14'|
+---------------+
| 12:13:14|
+---------------+
scala> sql("SELECT TIME '12:13:14'").collect()
org.apache.spark.SparkException: org.apache.spark.sql.connect.common.InvalidPlanInput: [INTERNAL_ERROR] Does not support convert time(6) to connect proto types. SQLSTATE: XX000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test case like the above comment.
|
The difference is that we need to return the result schema which is represented by |
|
@dongjoon-hyun, ahh indeed, I ran the wrong test. The proper type support needs a bit more changes.
@dengziming, that's expected. |
68a1563 to
66361ef
Compare
66361ef to
b16793b
Compare
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating the PR, @peter-toth .
Sure. I will add some more tests and adjust the description as well, but I don't see yet what would be the best place for the |
| // UnparsedDataType | ||
| Unparsed unparsed = 24; | ||
|
|
||
| Time time = 28; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding. Do you know where is 25, 26, 27, @peter-toth ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately these are not in order. A few lines up:
Variant variant = 25;
and down:
// Reserved for geometry and geography types
reserved 26, 27;
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM from my side. Thank you, @peter-toth .
cc @MaxGekk , @grundprinzip , @HyukjinKwon
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @MaxGekk @hvanhovell @grundprinzip FYI
LGTM
| accessor = new TimestampAccessor(timeStampMicroTZVector); | ||
| } else if (vector instanceof TimeStampMicroVector timeStampMicroVector) { | ||
| accessor = new TimestampNTZAccessor(timeStampMicroVector); | ||
| } else if (vector instanceof TimeNanoVector timeNanoVector) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a stupid question, when is these method used? why should we change arrow related method when adding connect dataType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it isn't a stupid question. Arrow is used for streaming data between connect server and clients: (https://spark.apache.org/docs/latest/spark-connect-overview.html#how-spark-connect-works)
| checkAnswer(df, (0 until 6).map(i => Row(i))) | ||
| } | ||
|
|
||
| test("SPARK-52770: Support Time type in connect") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test seems good, but the test name seems a bit overkill; "Time type" is OK enough.
dengziming
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test name nitpick, it's OK to leave it as it is.
|
Merged to Thanks @dongjoon-hyun, @HyukjinKwon, @MaxGekk and @dengziming for the review! |
|
Thank you, @peter-toth and all! |
…th `4.1.0-preview2` ### What changes were proposed in this pull request? This PR aims to update Spark Connect-generated Swift source code with Apache Spark `4.1.0-preview2`. ### Why are the changes needed? There are many changes from Apache Spark 4.1.0. - apache/spark#52342 - apache/spark#52256 - apache/spark#52271 - apache/spark#52242 - apache/spark#51473 - apache/spark#51653 - apache/spark#52072 - apache/spark#51561 - apache/spark#51563 - apache/spark#51489 - apache/spark#51507 - apache/spark#51462 - apache/spark#51464 - apache/spark#51442 To use the latest bug fixes and new messages to develop for new features of `4.1.0-preview2`. ``` $ git clone -b v4.1.0-preview2 https://github.com/apache/spark.git $ cd spark/sql/connect/common/src/main/protobuf/ $ protoc --swift_out=. spark/connect/*.proto $ protoc --grpc-swift_out=. spark/connect/*.proto // Remove empty GRPC files $ cd spark/connect $ grep 'This file contained no services' * catalog.grpc.swift:// This file contained no services. commands.grpc.swift:// This file contained no services. common.grpc.swift:// This file contained no services. example_plugins.grpc.swift:// This file contained no services. expressions.grpc.swift:// This file contained no services. ml_common.grpc.swift:// This file contained no services. ml.grpc.swift:// This file contained no services. pipelines.grpc.swift:// This file contained no services. relations.grpc.swift:// This file contained no services. types.grpc.swift:// This file contained no services. $ rm catalog.grpc.swift commands.grpc.swift common.grpc.swift example_plugins.grpc.swift expressions.grpc.swift ml_common.grpc.swift ml.grpc.swift pipelines.grpc.swift relations.grpc.swift types.grpc.swift ``` ### Does this PR introduce _any_ user-facing change? Pass the CIs. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #250 from dongjoon-hyun/SPARK-53777. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR adds
TIMEtype support to Spark Connect.Why are the changes needed?
TIMEtype is new in 4.1 and Spark Connect should support it.Does this PR introduce any user-facing change?
Yes, it adds basic
TIMEtype support.How was this patch tested?
Added new UTs.
Was this patch authored or co-authored using generative AI tooling?
No.