Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: replace_time_zone with single-null-element "ambiguous" was panicking #14971

Merged
merged 3 commits into from
Mar 11, 2024

Conversation

MarcoGorelli
Copy link
Collaborator

closes #14970

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Mar 10, 2024
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 10, 2024 21:06
Copy link

codecov bot commented Mar 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.98%. Comparing base (9ee7c9e) to head (3103639).

❗ Current head 3103639 differs from pull request most recent head 9374d73. Consider uploading reports for the commit 9374d73 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14971   +/-   ##
=======================================
  Coverage   80.98%   80.98%           
=======================================
  Files        1333     1333           
  Lines      173149   173144    -5     
  Branches     2458     2458           
=======================================
- Hits       140225   140221    -4     
+ Misses      32457    32455    -2     
- Partials      467      468    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should throw an error, right? null is not a valid input for ambiguous.

if from_time_zone == "UTC" && ambiguous.len() == 1 && ambiguous.get(0).unwrap() == "raise" {
if from_time_zone == "UTC"
&& ambiguous.len() == 1
&& unsafe { ambiguous.get_unchecked(0) } == Some("raise")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use unsafe here. This isn't called repeatedly in a hot loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, thanks

@MarcoGorelli
Copy link
Collaborator Author

MarcoGorelli commented Mar 11, 2024

This case should throw an error, right? null is not a valid input for ambiguous.

ambiguous can take an expression, e.g.:

In [4]: df
Out[4]:
shape: (4, 2)
┌─────────────────────┬───────────┐
│ tsambiguous │
│ ------       │
│ datetime[μs]        ┆ str       │
╞═════════════════════╪═══════════╡
│ 2018-10-28 01:30:00earliest  │
│ 2018-10-28 02:00:00earliest  │
│ 2018-10-28 02:30:00latest    │
│ 2018-10-28 02:00:00latest    │
└─────────────────────┴───────────┘

In [5]: df.with_columns(
   ...:     ts_localized=pl.col("ts").dt.replace_time_zone(
   ...:         "Europe/Brussels", ambiguous=pl.col("ambiguous")
   ...:     )
   ...: )
Out[5]:
shape: (4, 3)
┌─────────────────────┬───────────┬───────────────────────────────┐
│ tsambiguousts_localized                  │
│ ---------                           │
│ datetime[μs]        ┆ strdatetime[μs, Europe/Brussels] │
╞═════════════════════╪═══════════╪═══════════════════════════════╡
│ 2018-10-28 01:30:00earliest2018-10-28 01:30:00 CEST      │
│ 2018-10-28 02:00:00earliest2018-10-28 02:00:00 CEST      │
│ 2018-10-28 02:30:00latest2018-10-28 02:30:00 CET       │
│ 2018-10-28 02:00:00latest2018-10-28 02:00:00 CET       │
└─────────────────────┴───────────┴───────────────────────────────┘

If one of those elements is missing, then the null value propagates:

In [7]: df
Out[7]:
shape: (4, 2)
┌─────────────────────┬───────────┐
│ tsambiguous │
│ ------       │
│ datetime[μs]        ┆ str       │
╞═════════════════════╪═══════════╡
│ 2018-10-28 01:30:00earliest  │
│ 2018-10-28 02:00:00earliest  │
│ 2018-10-28 02:30:00latest    │
│ 2018-10-28 02:00:00null      │
└─────────────────────┴───────────┘

In [8]: df.with_columns(
   ...:     ts_localized=pl.col("ts").dt.replace_time_zone(
   ...:         "Europe/Brussels", ambiguous=pl.col("ambiguous")
   ...:     )
   ...: )
Out[8]:
shape: (4, 3)
┌─────────────────────┬───────────┬───────────────────────────────┐
│ tsambiguousts_localized                  │
│ ---------                           │
│ datetime[μs]        ┆ strdatetime[μs, Europe/Brussels] │
╞═════════════════════╪═══════════╪═══════════════════════════════╡
│ 2018-10-28 01:30:00earliest2018-10-28 01:30:00 CEST      │
│ 2018-10-28 02:00:00earliest2018-10-28 02:00:00 CEST      │
│ 2018-10-28 02:30:00latest2018-10-28 02:30:00 CET       │
│ 2018-10-28 02:00:00nullnull                          │
└─────────────────────┴───────────┴───────────────────────────────┘

So if there's a single null value, then that should also propagate. However, on the latest release, it panics:

In [9]: df[3:]
Out[9]:
shape: (1, 2)
┌─────────────────────┬───────────┐
│ tsambiguous │
│ ------       │
│ datetime[μs]        ┆ str       │
╞═════════════════════╪═══════════╡
│ 2018-10-28 02:00:00null      │
└─────────────────────┴───────────┘

In [10]: df[3:].with_columns(
    ...:     ts_localized=pl.col("ts").dt.replace_time_zone(
    ...:         "Europe/Brussels", ambiguous=pl.col("ambiguous")
    ...:     )
    ...: )
thread '<unnamed>' panicked at crates/polars-ops/src/chunked_array/datetime/replace_time_zone.rs:71:76:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
---------------------------------------------------------------------------
[...]

PanicException: called `Option::unwrap()` on a `None` value

With this PR, it just returns a single-row null (i.e. the missing value propagates, just like in the multi-row case)

shape: (1, 3)
┌─────────────────────┬───────────┬───────────────────────────────┐
│ ts                  ┆ ambiguous ┆ ts_localized                  │
│ ---                 ┆ ---       ┆ ---                           │
│ datetime[μs]        ┆ str       ┆ datetime[μs, Europe/Brussels] │
╞═════════════════════╪═══════════╪═══════════════════════════════╡
│ 2018-10-28 02:00:00 ┆ null      ┆ null                          │
└─────────────────────┴───────────┴───────────────────────────────┘

@stinodego
Copy link
Contributor

Thanks for the explanation - I didn't know a null would propagate here.

@ritchie46 ritchie46 merged commit 8c2eca9 into pola-rs:main Mar 11, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

replace_time_zone with ambiguous with single null value panics
3 participants