-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-52448][CONNECT] Add simplified Struct Expression.Literal #51561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5d8e764 to
9d68335
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this improvement. So we will contain the field name in every element, the space complexity would be O(n) compared to O(1) in the deprecated way? So what's the perf gain from this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the space complexity in the deprecated approach is also O(n), since every element includes the field name in the struct_type field, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, it is not necessary to include the field name/data type in every element, since not setting the data_type field should be sufficient for saving space.
1db4711 to
f38223d
Compare
|
Merged to master. |
| // | ||
| // Whether data_type_struct.fields.data_type should be set depends on | ||
| // whether each field's type can be inferred from the elements field. | ||
| DataType.Struct data_type_struct = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heyihong sorry for the late comments, what about introducing a new message for this purpose?
for example, SimplifiedStruct or LightStruct, my feeling is that the conversion in the server side become too complex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zhengruifeng FYI, I simplified the implementation in this PR: #52098, but it's not necessary to introduce a new message.
…th `4.1.0-preview2` ### What changes were proposed in this pull request? This PR aims to update Spark Connect-generated Swift source code with Apache Spark `4.1.0-preview2`. ### Why are the changes needed? There are many changes from Apache Spark 4.1.0. - apache/spark#52342 - apache/spark#52256 - apache/spark#52271 - apache/spark#52242 - apache/spark#51473 - apache/spark#51653 - apache/spark#52072 - apache/spark#51561 - apache/spark#51563 - apache/spark#51489 - apache/spark#51507 - apache/spark#51462 - apache/spark#51464 - apache/spark#51442 To use the latest bug fixes and new messages to develop for new features of `4.1.0-preview2`. ``` $ git clone -b v4.1.0-preview2 https://github.com/apache/spark.git $ cd spark/sql/connect/common/src/main/protobuf/ $ protoc --swift_out=. spark/connect/*.proto $ protoc --grpc-swift_out=. spark/connect/*.proto // Remove empty GRPC files $ cd spark/connect $ grep 'This file contained no services' * catalog.grpc.swift:// This file contained no services. commands.grpc.swift:// This file contained no services. common.grpc.swift:// This file contained no services. example_plugins.grpc.swift:// This file contained no services. expressions.grpc.swift:// This file contained no services. ml_common.grpc.swift:// This file contained no services. ml.grpc.swift:// This file contained no services. pipelines.grpc.swift:// This file contained no services. relations.grpc.swift:// This file contained no services. types.grpc.swift:// This file contained no services. $ rm catalog.grpc.swift commands.grpc.swift common.grpc.swift example_plugins.grpc.swift expressions.grpc.swift ml_common.grpc.swift ml.grpc.swift pipelines.grpc.swift relations.grpc.swift types.grpc.swift ``` ### Does this PR introduce _any_ user-facing change? Pass the CIs. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #250 from dongjoon-hyun/SPARK-53777. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR adds a new
data_type_structfield to the protobuf definition for struct literals in Spark Connect, addressing the ambiguity issues with the existingstruct_typefield. The changes include:Protobuf Schema Update: Added a new
data_type_structfield of typeDataType.Structto theLiteral.Structmessage inexpressions.proto, while marking the existingstruct_typefield as deprecated.Enhanced Struct Conversion Logic: Updated
LiteralValueProtoConverter.scalato:data_type_structfield when available for more precise struct type definitionstruct_typefieldWhy are the changes needed?
The current Expression.Struct literal is somewhat overcomplicated since it duplicates most of the information its fields already have. This is bulky to send over the wire, and it can be ambiguous.
Does this PR introduce any user-facing change?
No. This PR maintains backward compatibility with existing struct literal implementations. Existing code using the deprecated
struct_typefield will continue to work without modification.How was this patch tested?
build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.2.4