chore(typing): SparkLikeExpr properties#2152
Conversation
|
@MarcoGorelli if you merge this ahead of #2132 - you'll be able to get more useful IDE feedback from I checked out the PR and was confused why I didn't get hints from #2051 |
narwhals/_spark_like/expr_str.py
Outdated
| def is_naive(obj: Any) -> bool: | ||
| return obj is not None and {"%s", "%z", "Z"}.isdisjoint(obj) |
There was a problem hiding this comment.
| def is_naive(obj: Any) -> bool: | |
| return obj is not None and {"%s", "%z", "Z"}.isdisjoint(obj) | |
| def is_naive(format: str) -> bool: | |
| return {"%s", "%z", "Z"}.isdisjoint(format) |
just to signal that this checks the date time format (should we maybe call it is_naive_format?).
Also based on how it is used, format will always be a non-null str right?
There was a problem hiding this comment.
Thanks @EdAbati, yeah is_naive_format sounds good to me
I renamed format to obj to avoid a ruff warning for shadowing builtins.format
Also narrowing from Any to str sounds good as well.
I originally wrote it using a TypeIs but removed that since it gave a false-positive on unreachability
There was a problem hiding this comment.
Happy for you to make those changes and merge 🙌
There was a problem hiding this comment.
Actually I think the current set needs to be changed to:
{"s", "z", "Z"}I'm relying on format splitting into a sequence of characters - but two of them are 2-char sequences.
I wonder why that didn't fail a test?
There was a problem hiding this comment.
I wonder why that didn't fail a test?
I actually haven't seen a test covering this 👀 double checking with the datetime wizard @MarcoGorelli , is it something that we are not testing on purpose?
There was a problem hiding this comment.
Probably related to
There was a problem hiding this comment.
Maybe I'm reading this wrong, but I dont think we have coverage for any backend with a tz-aware format for .str.to_datetime(...)
There was a problem hiding this comment.
@MarcoGorelli really need some help on writing a test for this 🙏
Right now I've got something kinda working for a lot of backends.
But it fails to parse for sqlframe, pyarrow.
Also polars.Expr.str.to_datetime seems to have more options - that might be able to preserve the timezone offset?
test_to_datetime_tz_aware
@pytest.mark.parametrize(
("fmt", "data", "expected_pandas", "expected_polars_duckdb"),
[
(
"%Y-%m-%d %H:%M:%S%z",
{"a": ["2020-01-01 12:34:56+0200"]},
"2020-01-01 12:34:56+02:00",
"2020-01-01 10:34:56+00:00",
)
],
)
def test_to_datetime_tz_aware(
constructor: Constructor,
fmt: str,
data: dict[str, Sequence[str]],
expected_pandas: str,
expected_polars_duckdb: str,
) -> None:
constructor_str = str(constructor)
if "pandas" in constructor_str or "dask" in constructor_str:
expected = expected_pandas
elif "polars" in constructor_str or "duckdb" in constructor_str:
expected = expected_polars_duckdb
else:
expected = "fails"
result = (
nw.from_native(constructor(data))
.lazy()
.select(b=nw.col("a").str.to_datetime(fmt))
.collect()
.item(row=0, column="b")
)
assert str(result) == expectedI can fully understand nope-ing out of handling timezones for all these different packages 😒
There was a problem hiding this comment.
Alright so (e19dcdf) -> (194ef83) now has a single test
sqlframe works, pyspark apprently doesn't https://github.com/narwhals-dev/narwhals/actions/runs/13700361438/job/38312364899?pr=2152
Absolutely don't want to do timezones again 😩
There was a problem hiding this comment.
@FBruzzesi I see you added strptime_to_pyspark_format in #1826
AFAICT there were never any tests for tz-aware timestamps and right now I can't get one to work 😞
Tried my best, but as of (ce3145d) I'm giving up
Maybe this was just a TZDATA issue locally? https://github.com/narwhals-dev/narwhals/actions/runs/13699734154/job/38310256617?pr=2152
Not even sure what `pyarrow` is doing here https://github.com/narwhals-dev/narwhals/actions/runs/13700021595/job/38311197947?pr=2152
SparkLikeExpr propertiesSparkLikeExpr property typing, fix str.to_datetime
MarcoGorelli
left a comment
There was a problem hiding this comment.
thanks for looking into this!
I think this PR is doing too much, please either address typing or fix str.to_datetime
the former should be fine, some comments for the latter:
- can we keep
formateven if it overrides the builtin? - we should probably make sure that we consistently parse into UTC, so there should only be a single
expectedfortest_to_datetime_tz_aware
I can take care of the timezone handling separately, but it does need splitting out
SparkLikeExpr property typing, fix str.to_datetimeSparkLikeExpr properties
|
Thanks for the review @MarcoGorelli Caught me at a good time with the feedback - hopefully I've addressed it all in:
Phew - I'm in no hurry to deal w/ timezones again 😅 EditJust for some clarity - I touched that method to resolve these https://github.com/narwhals-dev/narwhals/actions/runs/13682912007/job/38259412675?pr=2152 Looking back now - I'm not sure that was super clear |
narwhals/_spark_like/expr_str.py
Outdated
| def is_naive_format(format: str) -> bool: # noqa: A002 | ||
| return {"s", "z", "Z"}.isdisjoint(format) |
There was a problem hiding this comment.
i don't really like how we're changing the logic here, 'z' in a format string means the 'z' literal, it's '%z' we need to check for
the rest looks good though, well done!
PR to address time zone aware parsing: #2166
There was a problem hiding this comment.
Thanks @MarcoGorelli
I'm happy for that bit to be reverted and just keep the splitting out into the new function?
There was a problem hiding this comment.
This kind of any(x in (...)) check would probably be good
There was a problem hiding this comment.
sure! if you're happy with #2166 to add a test and adjust some logic, happy to then take this?
MarcoGorelli
left a comment
There was a problem hiding this comment.
thanks @dangotbanned much appreciated!

What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Porting over (#2051), didn't realize this was declared twice until (#2132)