Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement date_part for intervals #6071

Merged
merged 2 commits into from
Jul 27, 2024
Merged

Conversation

nrc
Copy link
Contributor

@nrc nrc commented Jul 16, 2024

Which issue does this PR close?

Addresses apache/datafusion#6327

Mentioned in #4969.

Closes #6113

Rationale for this change

Will support using extract with intervals in DataFusion. It's also desirable functionality in Arrow for it's own sake, although I don't have a directly motivating use case.

Are there any user-facing changes?

The date_part function will now work with intervals

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 16, 2024
@nrc
Copy link
Contributor Author

nrc commented Jul 16, 2024

apache/datafusion#11501 demonstrates how this change supports extract on intervals in DataFusion.

@nrc
Copy link
Contributor Author

nrc commented Jul 22, 2024

@alamb @Jefffrey would either of you be able to review this PR please? (Sorry if I'm missing process for requesting a review or you're not the best folk, it's my first Arrow PR).

@alamb
Copy link
Contributor

alamb commented Jul 25, 2024

@alamb @Jefffrey would either of you be able to review this PR please? (Sorry if I'm missing process for requesting a review or you're not the best folk, it's my first Arrow PR).

No problem @nrc -- I am sorry for the delay in reviewing. I think @tustvold has less time to review PRs than he used to, and the rest of us are just trying to work through reviews as much as possible. Sadly, review bandwidth is almost always our most limited resource

I am checking it out now

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nrc -- I think this PR looks really nice . Sorry for the delay in reviewing.

impl ExtractDatePartExt for PrimitiveArray<IntervalYearMonthType> {
fn date_part(&self, part: DatePart) -> Result<Int32Array, ArrowError> {
match part {
DatePart::Year => Ok(self.unary_opt(|d| Some(d / 12))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense as IntervalYearMonth stores the number of whole months: https://docs.rs/arrow/latest/arrow/datatypes/enum.IntervalUnit.html

👍

@@ -1500,4 +1595,203 @@ mod tests {
ensure_returns_error(&Time64MicrosecondArray::from(vec![0]));
ensure_returns_error(&Time64NanosecondArray::from(vec![0]));
}

// IntervalDayTimeType week, day, hour, miute, second, mili, u, nano; invalid month, year; ignores the other part
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments seem like they belong with the other tests test_interval_day_time_array and test_interval_month_day_nano_array 🤔

assert_eq!(0, actual.value(0));
assert_eq!(42, actual.value(1));
assert_eq!(1042, actual.value(2));
assert_eq!(MILLISECONDS_IN_DAY as i32 + 1, actual.value(3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it somewhat strange that only for MILLISECONDS_IN_DAY we have a defined constant 🤔

assert_eq!(10, actual.value(1));
assert_eq!(35, actual.value(2));

// Times follow from nanos, but are not affected by months or dats.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Times follow from nanos, but are not affected by months or dats.
// Times follow from nanos, but are not affected by months or days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed to fix in #6140

@alamb
Copy link
Contributor

alamb commented Jul 25, 2024

I took the liberty of merging up from main to pick up #6116

@alamb
Copy link
Contributor

alamb commented Jul 25, 2024

FYI @Jefffrey and @tustvold -- I'll plan to merge this tomorrow unless there are any concerns or anyone would like more time to reivew

@Dandandan Dandandan changed the title Implement data_part for intervals Implement date_part for intervals Jul 26, 2024
@alamb
Copy link
Contributor

alamb commented Jul 27, 2024

Thanks again @nrc

@alamb alamb merged commit e815d06 into apache:master Jul 27, 2024
14 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement date_part for Interval
2 participants