-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add substrait support for Interval types and literals #10646
Conversation
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
Signed-off-by: Ruihang Xia <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @waynexia - thank you
/// - days: `i32` | ||
/// - nanoseconds: `i64` | ||
/// | ||
/// See also [`ScalarValue::IntervalMonthDayNano`] for the literal definition in DataFusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI the interval implementation is changing in the next arrow I think: apache/arrow-rs#5769
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for informing, I'd like to help migrate to the new arrow version. BTW, is there any plan for next bump?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ETA is next week sometime apache/arrow-rs#5688
Maybe you can make a "pre-release" of DataFusion against the un-released version of arrow-rs (which @tustvold often does to make sure as a way to sanity check the release)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good 👍 The "pre-update" is #10765 (still WIP
@@ -1160,6 +1162,24 @@ pub(crate) fn from_substrait_type(dt: &substrait::proto::Type) -> Result<DataTyp | |||
"Unsupported Substrait type variation {v} of type {s_kind:?}" | |||
), | |||
}, | |||
r#type::Kind::UserDefined(u) => { | |||
match u.type_reference { | |||
INTERVAL_YEAR_MONTH_TYPE_REF => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be totally wrong, but are you sure this is how type_reference
is supposed to work? I'd kind of expect the type_reference to map to an extension / MappingType::ExtensionType, kinda like function_reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not at all sure how this is supposed to work -- when I was reviewing this PR I think I concluded it followed the existing patterns.
If you have additional information / knowledge that would help improve things I think we would welcome that help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, actually I think both references (type_variation_reference and type_reference) have the same problem - I hadn't realized it affects the type_variation_reference too. Now, if each system defines the type/variation references as consts, then plans will look compatible, but may have different meaning (nothing tells another Substrait producer/consumer that type_variation_reference = 1 on a list means a large list, for example).
I believe both should be defined as SimpleExtension files, following the schema (like here https://github.com/apache/arrow/blob/main/format/substrait/extension_types.yaml) rather than as constants (kinda what
//! we make use of the [simple extensions](https://substrait.io/extensions/#simple-extensions) of substrait |
That way one can (in theory, at least) teach another Substrait producer/consumer about the DataFusion extensions and keep plans compatible, or at least the systems will recognize that the plans are incompatible as it refers to extension URIs that the other system doesn't know about.
Does that make sense? I'm not 100% sure of anything I'm saying here, so I may as be understanding something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe both should be defined as SimpleExtension files
Yes, that's right. The last (I'm aware of, at least) big piece we are missing in substrait is those *.yaml spec for all extended things and related URL settings. At present, all the things are defined in the document.
From the substrait website, we'll need a yaml parsing component to support extensions from other systems as well, if we are going to implement the ability to consume plans from external systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
going to file a tracking issue for tasks related to substrait support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated at #8149
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah cool, so it was the plan all along :) Sg!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document PR: #10719
…0646) * feat: support interval types Signed-off-by: Ruihang Xia <[email protected]> * impl literals Signed-off-by: Ruihang Xia <[email protected]> * fix deadlink in doc Signed-off-by: Ruihang Xia <[email protected]> --------- Signed-off-by: Ruihang Xia <[email protected]>
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Support convert to/from three interval types and literals in substrait
Are these changes tested?
Yes. There is a new UT
roundtrip_interval_literal
that covers both type and literals.Are there any user-facing changes?