-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: replace_time_zone with single-null-element "ambiguous" was panicking #14971
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #14971 +/- ##
=======================================
Coverage 80.98% 80.98%
=======================================
Files 1333 1333
Lines 173149 173144 -5
Branches 2458 2458
=======================================
- Hits 140225 140221 -4
+ Misses 32457 32455 -2
- Partials 467 468 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should throw an error, right? null
is not a valid input for ambiguous
.
if from_time_zone == "UTC" && ambiguous.len() == 1 && ambiguous.get(0).unwrap() == "raise" { | ||
if from_time_zone == "UTC" | ||
&& ambiguous.len() == 1 | ||
&& unsafe { ambiguous.get_unchecked(0) } == Some("raise") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use unsafe
here. This isn't called repeatedly in a hot loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, thanks
In [4]: df
Out[4]:
shape: (4, 2)
┌─────────────────────┬───────────┐
│ ts ┆ ambiguous │
│ --- ┆ --- │
│ datetime[μs] ┆ str │
╞═════════════════════╪═══════════╡
│ 2018-10-28 01:30:00 ┆ earliest │
│ 2018-10-28 02:00:00 ┆ earliest │
│ 2018-10-28 02:30:00 ┆ latest │
│ 2018-10-28 02:00:00 ┆ latest │
└─────────────────────┴───────────┘
In [5]: df.with_columns(
...: ts_localized=pl.col("ts").dt.replace_time_zone(
...: "Europe/Brussels", ambiguous=pl.col("ambiguous")
...: )
...: )
Out[5]:
shape: (4, 3)
┌─────────────────────┬───────────┬───────────────────────────────┐
│ ts ┆ ambiguous ┆ ts_localized │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ datetime[μs, Europe/Brussels] │
╞═════════════════════╪═══════════╪═══════════════════════════════╡
│ 2018-10-28 01:30:00 ┆ earliest ┆ 2018-10-28 01:30:00 CEST │
│ 2018-10-28 02:00:00 ┆ earliest ┆ 2018-10-28 02:00:00 CEST │
│ 2018-10-28 02:30:00 ┆ latest ┆ 2018-10-28 02:30:00 CET │
│ 2018-10-28 02:00:00 ┆ latest ┆ 2018-10-28 02:00:00 CET │
└─────────────────────┴───────────┴───────────────────────────────┘ If one of those elements is missing, then the null value propagates: In [7]: df
Out[7]:
shape: (4, 2)
┌─────────────────────┬───────────┐
│ ts ┆ ambiguous │
│ --- ┆ --- │
│ datetime[μs] ┆ str │
╞═════════════════════╪═══════════╡
│ 2018-10-28 01:30:00 ┆ earliest │
│ 2018-10-28 02:00:00 ┆ earliest │
│ 2018-10-28 02:30:00 ┆ latest │
│ 2018-10-28 02:00:00 ┆ null │
└─────────────────────┴───────────┘
In [8]: df.with_columns(
...: ts_localized=pl.col("ts").dt.replace_time_zone(
...: "Europe/Brussels", ambiguous=pl.col("ambiguous")
...: )
...: )
Out[8]:
shape: (4, 3)
┌─────────────────────┬───────────┬───────────────────────────────┐
│ ts ┆ ambiguous ┆ ts_localized │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ datetime[μs, Europe/Brussels] │
╞═════════════════════╪═══════════╪═══════════════════════════════╡
│ 2018-10-28 01:30:00 ┆ earliest ┆ 2018-10-28 01:30:00 CEST │
│ 2018-10-28 02:00:00 ┆ earliest ┆ 2018-10-28 02:00:00 CEST │
│ 2018-10-28 02:30:00 ┆ latest ┆ 2018-10-28 02:30:00 CET │
│ 2018-10-28 02:00:00 ┆ null ┆ null │
└─────────────────────┴───────────┴───────────────────────────────┘ So if there's a single null value, then that should also propagate. However, on the latest release, it panics: In [9]: df[3:]
Out[9]:
shape: (1, 2)
┌─────────────────────┬───────────┐
│ ts ┆ ambiguous │
│ --- ┆ --- │
│ datetime[μs] ┆ str │
╞═════════════════════╪═══════════╡
│ 2018-10-28 02:00:00 ┆ null │
└─────────────────────┴───────────┘
In [10]: df[3:].with_columns(
...: ts_localized=pl.col("ts").dt.replace_time_zone(
...: "Europe/Brussels", ambiguous=pl.col("ambiguous")
...: )
...: )
thread '<unnamed>' panicked at crates/polars-ops/src/chunked_array/datetime/replace_time_zone.rs:71:76:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
[...]
PanicException: called `Option::unwrap()` on a `None` value With this PR, it just returns a single-row
|
Thanks for the explanation - I didn't know a null would propagate here. |
closes #14970