Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert to local epoch when casting to timestamp without timezone #5831

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 60 additions & 5 deletions arrow-cast/src/cast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1602,9 +1602,8 @@ pub fn cast_with_options(
};
// Normalize timezone
let adjusted = match (from_tz, to_tz) {
// Only this case needs to be adjusted because we're casting from
// unknown time offset to some time offset, we want the time to be
// unchanged.
// We're casting from unknown time offset to some time offset,
// we want the time to be unchanged.
//
// i.e. Timestamp('2001-01-01T00:00', None) -> Timestamp('2001-01-01T00:00', '+0700')
(None, Some(to_tz)) => {
Expand Down Expand Up @@ -1632,6 +1631,35 @@ pub fn cast_with_options(
)?,
}
}
// We're casting from a time offset to a local time offset,
// we want the time to be unchanged.
//
// i.e. Timestamp('2001-01-01T00:00', '+0700') -> Timestamp('2001-01-01T00:00', None)
(Some(from_tz), None) => {
let from_tz: Tz = from_tz.parse()?;
match to_unit {
TimeUnit::Second => adjust_timestamp_from_timezone::<TimestampSecondType>(
converted,
from_tz,
cast_options,
)?,
TimeUnit::Millisecond => adjust_timestamp_from_timezone::<
TimestampMillisecondType,
>(
converted, from_tz, cast_options
)?,
TimeUnit::Microsecond => adjust_timestamp_from_timezone::<
TimestampMicrosecondType,
>(
converted, from_tz, cast_options
)?,
TimeUnit::Nanosecond => adjust_timestamp_from_timezone::<
TimestampNanosecondType,
>(
converted, from_tz, cast_options
)?,
}
}
_ => converted,
};
Ok(make_timestamp_array(
Expand Down Expand Up @@ -2072,6 +2100,27 @@ fn adjust_timestamp_to_timezone<T: ArrowTimestampType>(
Ok(adjusted)
}

fn adjust_timestamp_from_timezone<T: ArrowTimestampType>(
array: PrimitiveArray<Int64Type>,
tz: Tz,
cast_options: &CastOptions,
) -> Result<PrimitiveArray<Int64Type>, ArrowError> {
let adjust = |o| {
let local = as_datetime_with_timezone::<T>(o, tz)?;
T::make_value(local.naive_local())
};
let adjusted = if cast_options.safe {
array.unary_opt::<_, Int64Type>(adjust)
} else {
array.try_unary::<_, Int64Type, _>(|o| {
adjust(o).ok_or_else(|| {
ArrowError::CastError("Cannot cast timezone to different timezone".to_string())
})
})?
};
Ok(adjusted)
}

/// Cast numeric types to Boolean
///
/// Any zero value returns `false` while non-zero returns `true`
Expand Down Expand Up @@ -4791,9 +4840,15 @@ mod tests {

let string_array = cast(&timestamp_array, &DataType::Utf8).unwrap();
let result = string_array.as_string::<i32>();
assert_eq!("2000-01-01T00:00:00.123", result.value(0));
assert_eq!("2010-01-01T00:00:00.123", result.value(1));
assert_eq!("2000-01-01T07:00:00.123", result.value(0));
assert_eq!("2010-01-01T07:00:00.123", result.value(1));
assert!(result.is_null(2));

let array = StringArray::from(vec!["2010-01-01T00:00:00.123456+08:00"]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have a difference in behaviour here, which I think is kind of unfortunate - #4201 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the same behavior, would we have to change the string parsing logic too?

let data_type = DataType::Timestamp(TimeUnit::Nanosecond, None);
let cast = cast(&array, &data_type).unwrap();
let value = cast.as_primitive::<TimestampNanosecondType>().value_as_datetime(0).unwrap();
assert_eq!(value.to_string(), "2009-12-31 16:00:00.123456");
}

// Cast Timestamp(_, Some(timezone)) -> Timestamp(_, Some(timezone))
Expand Down
Loading