-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: switch to using proper Substrait types for IntervalYearMonth and IntervalDayTime #11471
Conversation
…nd IntervalDayTime also clean up IntervalMonthDayNano type - the type itself needs no parameters
da02530
to
488f6b7
Compare
false, // The inner map field is always non-nullable (Arrow #1697), | ||
)), false)) | ||
}, | ||
Ok(DataType::Map( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto-formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an improvement to me -- less code and more features 👍
cc @waynexia let us know if you want a chance to review
Thanks again @Blizzara |
…nd IntervalDayTime (apache#11471) also clean up IntervalMonthDayNano type - the type itself needs no parameters
…nd IntervalDayTime (apache#11471) also clean up IntervalMonthDayNano type - the type itself needs no parameters
…nd IntervalDayTime (apache#11471) also clean up IntervalMonthDayNano type - the type itself needs no parameters
Sorry for the huge delay! (I'm catching up on things in past weeks) I used to use custom interval types to match our definition of intervals. Because of two reasons:
For the consumer side, we can support both types and report errors when substrait's overflow for wider scenarios. But I'm not sure if we should only produce substrait types for the above reasons. Given this kind of type information should be static, we cannot decide which type to use depending on the value at runtime. One possible solution is not to emit |
Ah, thanks for the notes @waynexia !
Yeah.. however no-one cares about those limits currently afaik, and I don't know if it's good enough reason to reduce interoperability for values that fall within that (reasonably large :D) range. As the limit is just a written text, DF -> DF roundtrip would anyways work outside of those limits too.
Good catch - however, if I think this right, that means we may fail to consume some Substrait plans - but that's true anyways, even if we were to use Substrait UDTs for intervals, right? Now we fail to consume the too-large values, while with UDTs we'd fail to consume everything. Since it's DF that has a more restricted type, any DF-created plan is guaranteed to be within the limits of not overflowing. As such, I think it makes sense to both produce and consume the Substrait DayTime interval type, given it's able to represent all DF DayTime intervals. Did I miss something? 😅 |
Which issue does this PR close?
Closes #.
Rationale for this change
I'm looking at supporting date-time stuff properly through Substrait and DataFusion, part of that is introducing (🤞) a MonthDayNano type into Substrait. While looking into it, I noticed there already are types for YearMonth and DayTime, so we could (and should!) be using those types instead of the custom UDTs introduced in #10646 (cc @waynexia )
What changes are included in this PR?
Add consuming support for IntervalYear and IntervalDay types as well as IntervalYearToMonth literals (IntervalDayToSecond already existed). Keep existing consuming support for the UDT for both for backwards compatibility (can probs remove this soon as well, but keeping it for now allows reading plans created by earlier version of DF, in case that's useful?).
Switch producer to produce the Substrait types and literals instead of UDTs for IntervalYearMonth and IntervalDayTime
Remove the parameters from the UDT type that's still kept for IntervalMonthDayNano - parameters are for things like decimal precision and scale, ie. parameters modifying the type itself; we don't need that for the UDT here.
Are these changes tested?
Existing unit tests (in producer.rs)
Are there any user-facing changes?
Start producing Substrait's interval types instead of custom UDT types, where possible.