-
Notifications
You must be signed in to change notification settings - Fork 172
feat: Add Series.from_iterable
#2933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Series.from_iterable
I feel like we should be issuing a warning here at-a-minimum narwhals/narwhals/_pandas_like/utils.py Lines 504 to 511 in 1cfcafb
But really I'd want an exception raised and an update to the UpdateMoved to #2964 |
| def _dataframe(self) -> type[DataFrame[Any]]: | ||
| return DataFrame | ||
|
|
||
| # TODO @dangotbanned: Fix `from_numpy` override missing in `v2` in another PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note
- Resolve missing
v2method(s) and tests in another PR
This comment was marked as resolved.
This comment was marked as resolved.
Going to parametrize more for `test_series_from_iterable` #2933 (comment)
`0.20.7` is close enough to our minimum and this conversion I can see being more common than the rest
|
Thanks for the review @MarcoGorelli π
Sure I could take a look! Would you be able to run They aren't showing in my top 100 slowest: Show durations
>>> pytest --durations=100
====== slowest 100 durations ===
2.10s call tests/tpch_q1_test.py::test_q1[dask]
1.35s call tests/frame/schema_test.py::test_nested_dtypes_dask
1.27s call tests/v1_test.py::test_gather_every_dask_v1[2-1]
1.12s call tests/no_imports_test.py::test_dask
1.11s call tests/v1_test.py::test_gather_every_dask_v1[1-2]
0.98s call tests/dependencies/imports_test.py::test_to_native_namespace_min_version[dask]
0.85s call tests/namespace_test.py::test_preserve_type_var[dask]
0.67s call tests/expr_and_series/over_test.py::test_over_unsupported_dask
0.61s call tests/new_series_test.py::test_new_series_dask
0.48s call tests/translate/from_native_test.py::test_eager_only_lazy_dask[False-context0]
0.44s call tests/frame/join_test.py::test_anti_join[sqlframe-join_key0-filter_expr0-expected0]
0.43s call tests/expr_and_series/rolling_mean_test.py::test_rolling_mean_expr_lazy_ungrouped[sqlframe-expected_a1-2-2-False]
0.42s call tests/frame/join_test.py::test_join_duplicate_column_names[sqlframe]
0.42s call tests/expr_and_series/rank_test.py::test_lazy_rank_expr_desc[sqlframe-data1-max]
0.41s call tests/frame/lazy_test.py::test_lazy_backend[pandas-dask1]
0.41s call tests/expr_and_series/quantile_test.py::test_quantile_expr[ibis-lower-expected0]
0.40s call tests/expr_and_series/diff_test.py::test_diff_lazy[sqlframe]
0.39s call tests/expr_and_series/rolling_sum_test.py::test_rolling_sum_expr_lazy_grouped[sqlframe-expected_a5-4-1-True]
0.37s call tests/expr_and_series/is_first_distinct_test.py::test_is_first_distinct_expr_lazy_grouped[sqlframe]
0.36s call tests/expr_and_series/operators_test.py::test_comparand_operators_expr[sqlframe-__gt__-expected5]
0.36s call tests/modern_polars/unpivot_test.py::test_unpivot[sqlframe]
0.35s call tests/expr_and_series/name/to_uppercase_test.py::test_to_uppercase_anonymous[sqlframe]
0.34s call tests/v1_test.py::test_toplevel
0.34s call tests/expr_and_series/rolling_mean_test.py::test_rolling_mean_expr_lazy_grouped[sqlframe-expected_a1-2-2-False]
0.33s call tests/expr_and_series/binary_test.py::test_expr_binary[sqlframe]
0.33s call tests/frame/join_test.py::test_anti_join[pandas[pyarrow]-join_key2-filter_expr2-expected2]
0.33s call tests/expr_and_series/name/prefix_test.py::test_suffix_after_alias[sqlframe]
0.33s call tests/expr_and_series/exp_test.py::test_exp_expr[sqlframe]
0.33s call tests/system_info_test.py::test_get_deps_info
0.32s call tests/frame/select_test.py::test_select_duplicates[ibis]
0.32s call tests/frame/join_test.py::test_inner_join_single_key[sqlframe]
0.32s call tests/expr_and_series/shift_test.py::test_shift_lazy[sqlframe]
0.32s call tests/dtypes_test.py::test_2d_array[sqlframe]
0.31s call tests/frame/join_test.py::test_joinasof_time[pandas[pyarrow]-forward-expected1]
0.31s call tests/v2_test.py::test_with_version[sqlframe]
0.31s call tests/expr_and_series/rank_test.py::test_lazy_rank_expr[sqlframe-data0-dense]
0.31s call tests/frame/join_test.py::test_inner_join_two_keys[sqlframe]
0.31s call tests/expr_and_series/null_count_test.py::test_null_count_expr[sqlframe]
0.30s call tests/expr_and_series/cum_sum_test.py::test_lazy_cum_sum_grouped[sqlframe-True-expected_a1]
0.30s call tests/selectors_test.py::test_datetime[polars[eager]]
0.30s call tests/expr_and_series/cum_sum_test.py::test_lazy_cum_sum_ordered_by_nulls[sqlframe-False-expected_a0]
0.30s call tests/translate/from_native_test.py::test_eager_only_pass_through_main[ibis-True-True-context2]
0.28s call tests/frame/to_native_test.py::test_to_native[sqlframe]
0.28s call tests/expr_and_series/rolling_sum_test.py::test_rolling_sum_expr_lazy_ungrouped[sqlframe-expected_a0-2-None-False]
0.27s call tests/expr_and_series/mean_test.py::test_expr_mean_expr[sqlframe-expr1]
0.27s call tests/frame/explode_test.py::test_explode_single_col[sqlframe-l3-expected_values1]
0.26s call tests/frame/join_test.py::test_left_join_overlapping_column[sqlframe]
0.26s call tests/expr_and_series/all_horizontal_test.py::test_allh[sqlframe]
0.25s call tests/expr_and_series/operators_test.py::test_comparand_operators_scalar_expr[ibis-__gt__-expected5]
0.25s call tests/expr_and_series/exclude_test.py::test_exclude[ibis-exclude_selector2-expected_cols2]
0.25s call tests/expr_and_series/str/zfill_test.py::test_str_zfill[sqlframe]
0.24s call tests/expr_and_series/clip_test.py::test_clip_expr[ibis-0-4-expected1]
0.22s call tests/frame/explode_test.py::test_explode_single_col[sqlframe-l2-expected_values0]
0.22s call tests/expr_and_series/name/to_uppercase_test.py::test_to_uppercase[ibis]
0.22s call tests/system_info_test.py::test_get_sys_info
0.22s call tests/expr_and_series/fill_null_test.py::test_fill_null_limits[sqlframe]
0.21s call tests/expr_and_series/clip_test.py::test_clip_expr[sqlframe-None-4-expected2]
0.21s call tests/expr_and_series/all_horizontal_test.py::test_allh_iterator[ibis]
0.21s call tests/expr_and_series/rolling_mean_test.py::test_rolling_mean_expr_lazy_ungrouped[ibis-expected_a5-4-1-True]
0.20s call tests/expr_and_series/null_count_test.py::test_null_count_expr[ibis]
0.19s call tests/expr_and_series/operators_test.py::test_comparand_operators_expr[sqlframe-__ne__-expected1]
0.19s call tests/frame/select_test.py::test_alias_invalid[sqlframe]
0.19s call tests/system_info_test.py::test_show_versions
0.18s call tests/expr_and_series/rank_test.py::test_lazy_rank_expr_desc[duckdb-data0-average]
0.18s call tests/expr_and_series/exclude_test.py::test_exclude[sqlframe-exclude_selector0-expected_cols0]
0.18s call tests/v2_test.py::test_with_version[ibis]
0.17s call tests/frame/join_test.py::test_left_join[sqlframe]
0.17s call tests/expr_and_series/rank_test.py::test_rank_expr_in_over_desc[ibis-max]
0.17s call tests/expr_and_series/cum_sum_test.py::test_lazy_cum_sum_grouped[ibis-True-expected_a1]
0.16s call tests/expr_and_series/operators_test.py::test_logic_operators_expr_scalar[duckdb-__ror__-expected3]
0.16s call tests/translate/from_native_test.py::test_eager_only_pass_through_main[sqlframe-False-False-context0]
0.16s call tests/modern_polars/unpivot_test.py::test_unpivot[ibis]
0.16s call tests/expr_and_series/binary_test.py::test_expr_binary[ibis]
0.16s call tests/expr_and_series/exp_test.py::test_exp_expr[ibis]
0.16s call tests/frame/group_by_test.py::test_fancy_functions[sqlframe]
0.16s call tests/expr_and_series/is_first_distinct_test.py::test_is_first_distinct_expr_lazy[ibis]
0.16s call tests/expr_and_series/operators_test.py::test_comparand_operators_scalar_expr[ibis-__le__-expected2]
0.15s call tests/expr_and_series/str/zfill_test.py::test_str_zfill[ibis]
0.15s call tests/expr_and_series/rolling_sum_test.py::test_rolling_sum_expr[polars[eager]]
0.15s call tests/expr_and_series/shift_test.py::test_shift_lazy_grouped[ibis]
0.15s call tests/expr_and_series/diff_test.py::test_diff_lazy_grouped[ibis]
0.15s call tests/expr_and_series/name/prefix_test.py::test_suffix_after_alias[ibis]
0.14s call tests/selectors_test.py::test_datetime[pandas]
0.14s call tests/dtypes_test.py::test_cast_decimal_to_native
0.13s call tests/expr_and_series/binary_test.py::test_expr_binary[duckdb]
0.13s call tests/expr_and_series/dt/truncate_test.py::test_truncate_tz_aware_duckdb
0.13s call tests/expr_and_series/mean_test.py::test_expr_mean_expr[ibis-expr0]
0.13s call tests/expr_and_series/rank_test.py::test_lazy_rank_expr_desc[sqlframe-data1-ordinal]
0.13s call tests/expr_and_series/dt/truncate_test.py::test_truncate[sqlframe-1d-expected6]
0.13s call tests/expr_and_series/exp_test.py::test_exp_expr[duckdb]
0.13s call tests/frame/to_native_test.py::test_to_native[ibis]
0.12s call tests/read_scan_test.py::test_scan_csv[sqlframe]
0.12s call tests/expr_and_series/cast_test.py::test_cast[sqlframe]
0.12s call tests/expr_and_series/cast_test.py::test_cast_datetime_tz_aware[pandas[pyarrow]]
0.12s call tests/v1_test.py::test_dask_order_dependent_ops
0.12s call tests/dtypes_test.py::test_2d_array[ibis]
0.12s call tests/expr_and_series/fill_null_test.py::test_fill_null_strategies_with_limit_as_none[sqlframe]
0.12s call tests/expr_and_series/operators_test.py::test_logic_operators_expr[duckdb-__or__-expected1]
0.11s call tests/read_scan_test.py::test_scan_parquet_kwargs
0.11s call tests/expr_and_series/shift_test.py::test_shift_lazy[duckdb]
======= 8402 passed, 359 skipped, 509 xfailed in 22.81s |
So I went ahead and did some experimenting locally: Show diff
diff --git a/tests/series_only/from_iterable_test.py b/tests/series_only/from_iterable_test.py
index 733c0728c..17f7fb105 100644
--- a/tests/series_only/from_iterable_test.py
+++ b/tests/series_only/from_iterable_test.py
@@ -21,6 +21,7 @@ if TYPE_CHECKING:
ValuesView,
)
+ from _pytest.mark import ParameterSet
from typing_extensions import TypeAlias
from narwhals._namespace import EagerAllowed
@@ -29,6 +30,10 @@ if TYPE_CHECKING:
IntoIterable: TypeAlias = Callable[..., Iterable[Any]]
+def param_slow(*args: object) -> ParameterSet:
+ return pytest.param(*args, marks=pytest.mark.slow)
+
+
class UserDefinedIterable:
def __init__(self, iterable: Iterable[Any]) -> None:
self.iterable: Iterable[Any] = iterable
@@ -53,7 +58,7 @@ def dict_values(iterable: Iterable[Any]) -> ValuesView[Any]:
return dict(enumerate(iterable)).values()
-_INTO_ITER_3RD_PARTY: list[IntoIterable] = []
+_INTO_ITER_3RD_PARTY: list[IntoIterable | ParameterSet] = []
if find_spec("numpy"): # pragma: no cover
import numpy as np
@@ -64,7 +69,7 @@ else: # pragma: no cover
if find_spec("pandas"): # pragma: no cover
import pandas as pd
- _INTO_ITER_3RD_PARTY.extend([pd.Index, pd.array, pd.Series])
+ _INTO_ITER_3RD_PARTY.extend([pd.Index, param_slow(pd.array), pd.Series])
else: # pragma: no cover
...
if find_spec("polars"): # pragma: no cover
@@ -83,16 +88,20 @@ if find_spec("pyarrow"): # pragma: no cover
else: # pragma: no cover
...
-_INTO_ITER_STDLIB: tuple[IntoIterable, ...] = (
+_INTO_ITER_STDLIB: tuple[IntoIterable | ParameterSet, ...] = (
list,
tuple,
iter,
- deque,
- generator_function,
+ param_slow(deque),
+ param_slow(generator_function),
generator_expression,
)
-_INTO_ITER_STDLIB_EXOTIC: tuple[IntoIterable, ...] = dict_keys, dict_values
-INTO_ITER: tuple[IntoIterable, ...] = (
+
+_INTO_ITER_STDLIB_EXOTIC: tuple[ParameterSet, ...] = (
+ param_slow(dict_keys),
+ param_slow(dict_values),
+)
+INTO_ITER: tuple[IntoIterable | ParameterSet, ...] = (
*_INTO_ITER_STDLIB,
*_INTO_ITER_STDLIB_EXOTIC,
UserDefinedIterable,
@@ -113,7 +122,7 @@ def _ids_into_iter(obj: Any) -> str:
@pytest.mark.parametrize(
("values", "dtype"),
[
- ((4, 1, 2), nw.Int32),
+ param_slow((4, 1, 2), nw.Int32),
((-1, 5, 100), None),
((2.1, 2.7, 2.0), nw.Float64),
(("one", "two"), nw.String),
@@ -167,10 +176,10 @@ def test_series_from_iterable(
@pytest.mark.parametrize(("values", "expected_dtype"), [((4, 1, 2), nw.Int64)])
def test_series_from_iterable_infer(
- eager_backend: EagerAllowed, values: Sequence[Any], expected_dtype: IntoDType
+ eager_implementation: EagerAllowed, values: Sequence[Any], expected_dtype: IntoDType
) -> None:
name = "b"
- result = nw.Series.from_iterable(name, values, backend=eager_backend)
+ result = nw.Series.from_iterable(name, values, backend=eager_implementation)
assert result.dtype == expected_dtype
assert_equal_series(result, values, name)
@@ -182,14 +191,14 @@ def test_series_from_iterable_not_eager() -> None:
nw.Series.from_iterable("", [1, 2, 3], backend=backend)
-def test_series_from_iterable_numpy_not_1d(eager_backend: EagerAllowed) -> None:
+def test_series_from_iterable_numpy_not_1d(eager_implementation: EagerAllowed) -> None:
pytest.importorskip("numpy")
import numpy as np
with pytest.raises(ValueError, match="only.+1D numpy arrays"):
- nw.Series.from_iterable("", np.array([[0], [2]]), backend=eager_backend)
+ nw.Series.from_iterable("", np.array([[0], [2]]), backend=eager_implementation)
-def test_series_from_iterable_not_iterable(eager_backend: EagerAllowed) -> None:
+def test_series_from_iterable_not_iterable(eager_implementation: EagerAllowed) -> None:
with pytest.raises(TypeError, match="iterable.+got.+int"):
- nw.Series.from_iterable("", 2000, backend=eager_backend) # type: ignore[arg-type]
+ nw.Series.from_iterable("", 2000, backend=eager_implementation) # type: ignore[arg-type]
So besides just outright removing cases or randomly generating them - the best option would be running with Something else we could try is adapting this into a It could be good to use this as a chance to get wider coverage - in addition to just making the testing feedback loop shorter π |
This comment was marked as resolved.
This comment was marked as resolved.
narwhals/_polars/series.py
Outdated
| iterable = cast("Sequence[Any]", data) | ||
| native = pl.Series(name=name, values=iterable) | ||
| if dtype: | ||
| native = native.cast(narwhals_to_native_dtype(dtype, version)) | ||
| return cls.from_native(native, context=context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this change is what's causing #2933 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well great, (revert: try not downcasting polars) did indeed fix tubular ci
But there's some version branching needed for at least polars==0.20.21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli I think this was fixed in this polars PR you reviewed
If so, should I re-add the cast fix I had for polars<0.20.26 only - or is an xfail preferred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In (fix: backport respect dtype, optimize branching) - I'm taking the view that Series.from_iterable/nw.new_series is a critical entrypoint.
So the aim is to both:
- Make sure it has wide version-compatibility
- Add as little overhead as possible, especially on the happy path
The fix in (48cf17b) may have been the source of (#2933 (comment))
Trying to keep this entry point as lightweight as possible, while being reliable across versions #2933 (comment)
FBruzzesi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dangotbanned - I left only one comment regarding the tests run time.
One question I have (which might be written somewhere but I missed it): are we planning to replace the code in new_series with this eventually? Everything (except the trial to see if UNKNOWN implementation which I commented below) is the same as _new_series_impl
| else: # pragma: no cover | ||
| if (not SERIES_ACCEPTS_PD_INDEX) and is_pandas_index(values): | ||
| values = values.to_series() | ||
| native = pl.Series(name, values) | ||
| if dtype_pl: | ||
| native = native.cast(dtype_pl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy π©
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried something new here when thinking about (#2933 (comment)), which might not be visible from the diff alone
- Keeps the compat code fast (we evaulate this once)
- Doesn't clutter the impl with comments
- Can use markdown to explain the problem + link to issue/PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to find a compromise here to Marco's comment on these tests running for too long. To run half of them, one idea is to use only eager_implementation. At the end of the day it does not really matter how we instantiate the Implementation, and using eager_backend just duplicates the number of tests.
If anything, we should test somewhere else that Implementation.from_backend(Implementation.PANDAS) and Implementation.from_backend("pandas") return the same implementation. Once that guaranteed (which I am pretty sure it is), we can run half of these tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @FBruzzesi
Did you know that I've been doing that in this PR since (efc5c4e)?
I added eager_implementation in this PR see diff, and I agree that we should use it more elsewhere π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad! There is still one use of eager_backend but it's for a small number of tests anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries!
Btw, I am open to reducing the test runtime and investigated a little in (#2933 (comment))
My preference is to defer this to another issue - but if you see it as blocking - then I can work on it here?
Absolutely! (#2933 (comment)) should cover most of it I guess the plan in my head is:
So we'll then have this and the extension point will work everywhere π₯³ from typing import Any
from narwhals._utils import Version, Implementation
from narwhals.typing import IntoBackend
from narwhals.series import Series
some_version = Version.MAIN
def new_series(
name: str,
values: Any,
dtype: IntoDType | None = None,
*,
backend: IntoBackend, # <-- or whatever we end up annotating
) -> Series[Any]:
return some_version.series.from_iterable(name, values, dtype, backend=backend) |
|
@dangotbanned @MarcoGorelli what's left in this PR? Is the concern only on test speed? Marco maybe run it once again while not live streaming, I can expect the results to be much different π |
MarcoGorelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.


Closes #2925
What type of PR is this? (check all applicable)
Related issues
nw.new_seriesin favour ofSeries.from_iterableΒ #2642Checklist
Tasks
v1v2pyarrowversionSeries.from_iterableΒ #2933 (comment))