Skip to content

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Aug 3, 2025

Closes #2925

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

Tasks

@dangotbanned dangotbanned added the enhancement New feature or request label Aug 3, 2025
@dangotbanned dangotbanned changed the title feat: Add `Series.from_iterable feat: Add Series.from_iterable Aug 3, 2025
@dangotbanned
Copy link
Member Author

dangotbanned commented Aug 3, 2025

I feel like we should be issuing a warning here at-a-minimum

if isinstance_or_issubclass(dtype, dtypes.Datetime):
# Pandas does not support "ms" or "us" time units before version 2.0
if implementation is Implementation.PANDAS and backend_version < (
2,
): # pragma: no cover
dt_time_unit = "ns"
else:
dt_time_unit = dtype.time_unit

But really I'd want an exception raised and an update to the nw.Datetime docs πŸ€”

Update

Moved to #2964

def _dataframe(self) -> type[DataFrame[Any]]:
return DataFrame

# TODO @dangotbanned: Fix `from_numpy` override missing in `v2` in another PR
Copy link
Member Author

@dangotbanned dangotbanned Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

  • Resolve missing v2 method(s) and tests in another PR

@dangotbanned dangotbanned marked this pull request as ready for review August 3, 2025 14:58
@dangotbanned

This comment was marked as resolved.

@dangotbanned dangotbanned marked this pull request as draft August 4, 2025 14:59
@dangotbanned
Copy link
Member Author

dangotbanned commented Aug 8, 2025

Thanks for the review @MarcoGorelli 😍

this from_iterable test takes nearly 3 seconds on my laptop, which I feel is quite a lot, is it possible to trim it down slightly?

Sure I could take a look!

Would you be able to run pytest --durations=0 on that module please?
I'm just hoping to have the biggest impact on your setup

They aren't showing in my top 100 slowest:

Show durations

>>> pytest --durations=100
====== slowest 100 durations ===
2.10s call     tests/tpch_q1_test.py::test_q1[dask]
1.35s call     tests/frame/schema_test.py::test_nested_dtypes_dask
1.27s call     tests/v1_test.py::test_gather_every_dask_v1[2-1]
1.12s call     tests/no_imports_test.py::test_dask
1.11s call     tests/v1_test.py::test_gather_every_dask_v1[1-2]
0.98s call     tests/dependencies/imports_test.py::test_to_native_namespace_min_version[dask]
0.85s call     tests/namespace_test.py::test_preserve_type_var[dask]
0.67s call     tests/expr_and_series/over_test.py::test_over_unsupported_dask
0.61s call     tests/new_series_test.py::test_new_series_dask
0.48s call     tests/translate/from_native_test.py::test_eager_only_lazy_dask[False-context0]
0.44s call     tests/frame/join_test.py::test_anti_join[sqlframe-join_key0-filter_expr0-expected0]
0.43s call     tests/expr_and_series/rolling_mean_test.py::test_rolling_mean_expr_lazy_ungrouped[sqlframe-expected_a1-2-2-False]
0.42s call     tests/frame/join_test.py::test_join_duplicate_column_names[sqlframe]
0.42s call     tests/expr_and_series/rank_test.py::test_lazy_rank_expr_desc[sqlframe-data1-max]
0.41s call     tests/frame/lazy_test.py::test_lazy_backend[pandas-dask1]
0.41s call     tests/expr_and_series/quantile_test.py::test_quantile_expr[ibis-lower-expected0]
0.40s call     tests/expr_and_series/diff_test.py::test_diff_lazy[sqlframe]
0.39s call     tests/expr_and_series/rolling_sum_test.py::test_rolling_sum_expr_lazy_grouped[sqlframe-expected_a5-4-1-True]
0.37s call     tests/expr_and_series/is_first_distinct_test.py::test_is_first_distinct_expr_lazy_grouped[sqlframe]
0.36s call     tests/expr_and_series/operators_test.py::test_comparand_operators_expr[sqlframe-__gt__-expected5]
0.36s call     tests/modern_polars/unpivot_test.py::test_unpivot[sqlframe]
0.35s call     tests/expr_and_series/name/to_uppercase_test.py::test_to_uppercase_anonymous[sqlframe]
0.34s call     tests/v1_test.py::test_toplevel
0.34s call     tests/expr_and_series/rolling_mean_test.py::test_rolling_mean_expr_lazy_grouped[sqlframe-expected_a1-2-2-False]
0.33s call     tests/expr_and_series/binary_test.py::test_expr_binary[sqlframe]
0.33s call     tests/frame/join_test.py::test_anti_join[pandas[pyarrow]-join_key2-filter_expr2-expected2]
0.33s call     tests/expr_and_series/name/prefix_test.py::test_suffix_after_alias[sqlframe]
0.33s call     tests/expr_and_series/exp_test.py::test_exp_expr[sqlframe]
0.33s call     tests/system_info_test.py::test_get_deps_info
0.32s call     tests/frame/select_test.py::test_select_duplicates[ibis]
0.32s call     tests/frame/join_test.py::test_inner_join_single_key[sqlframe]
0.32s call     tests/expr_and_series/shift_test.py::test_shift_lazy[sqlframe]
0.32s call     tests/dtypes_test.py::test_2d_array[sqlframe]
0.31s call     tests/frame/join_test.py::test_joinasof_time[pandas[pyarrow]-forward-expected1]
0.31s call     tests/v2_test.py::test_with_version[sqlframe]
0.31s call     tests/expr_and_series/rank_test.py::test_lazy_rank_expr[sqlframe-data0-dense]
0.31s call     tests/frame/join_test.py::test_inner_join_two_keys[sqlframe]
0.31s call     tests/expr_and_series/null_count_test.py::test_null_count_expr[sqlframe]
0.30s call     tests/expr_and_series/cum_sum_test.py::test_lazy_cum_sum_grouped[sqlframe-True-expected_a1]
0.30s call     tests/selectors_test.py::test_datetime[polars[eager]]
0.30s call     tests/expr_and_series/cum_sum_test.py::test_lazy_cum_sum_ordered_by_nulls[sqlframe-False-expected_a0]
0.30s call     tests/translate/from_native_test.py::test_eager_only_pass_through_main[ibis-True-True-context2]
0.28s call     tests/frame/to_native_test.py::test_to_native[sqlframe]
0.28s call     tests/expr_and_series/rolling_sum_test.py::test_rolling_sum_expr_lazy_ungrouped[sqlframe-expected_a0-2-None-False]
0.27s call     tests/expr_and_series/mean_test.py::test_expr_mean_expr[sqlframe-expr1]
0.27s call     tests/frame/explode_test.py::test_explode_single_col[sqlframe-l3-expected_values1]
0.26s call     tests/frame/join_test.py::test_left_join_overlapping_column[sqlframe]
0.26s call     tests/expr_and_series/all_horizontal_test.py::test_allh[sqlframe]
0.25s call     tests/expr_and_series/operators_test.py::test_comparand_operators_scalar_expr[ibis-__gt__-expected5]
0.25s call     tests/expr_and_series/exclude_test.py::test_exclude[ibis-exclude_selector2-expected_cols2]
0.25s call     tests/expr_and_series/str/zfill_test.py::test_str_zfill[sqlframe]
0.24s call     tests/expr_and_series/clip_test.py::test_clip_expr[ibis-0-4-expected1]
0.22s call     tests/frame/explode_test.py::test_explode_single_col[sqlframe-l2-expected_values0]
0.22s call     tests/expr_and_series/name/to_uppercase_test.py::test_to_uppercase[ibis]
0.22s call     tests/system_info_test.py::test_get_sys_info
0.22s call     tests/expr_and_series/fill_null_test.py::test_fill_null_limits[sqlframe]
0.21s call     tests/expr_and_series/clip_test.py::test_clip_expr[sqlframe-None-4-expected2]
0.21s call     tests/expr_and_series/all_horizontal_test.py::test_allh_iterator[ibis]
0.21s call     tests/expr_and_series/rolling_mean_test.py::test_rolling_mean_expr_lazy_ungrouped[ibis-expected_a5-4-1-True]
0.20s call     tests/expr_and_series/null_count_test.py::test_null_count_expr[ibis]
0.19s call     tests/expr_and_series/operators_test.py::test_comparand_operators_expr[sqlframe-__ne__-expected1]
0.19s call     tests/frame/select_test.py::test_alias_invalid[sqlframe]
0.19s call     tests/system_info_test.py::test_show_versions
0.18s call     tests/expr_and_series/rank_test.py::test_lazy_rank_expr_desc[duckdb-data0-average]
0.18s call     tests/expr_and_series/exclude_test.py::test_exclude[sqlframe-exclude_selector0-expected_cols0]
0.18s call     tests/v2_test.py::test_with_version[ibis]
0.17s call     tests/frame/join_test.py::test_left_join[sqlframe]
0.17s call     tests/expr_and_series/rank_test.py::test_rank_expr_in_over_desc[ibis-max]
0.17s call     tests/expr_and_series/cum_sum_test.py::test_lazy_cum_sum_grouped[ibis-True-expected_a1]
0.16s call     tests/expr_and_series/operators_test.py::test_logic_operators_expr_scalar[duckdb-__ror__-expected3]
0.16s call     tests/translate/from_native_test.py::test_eager_only_pass_through_main[sqlframe-False-False-context0]
0.16s call     tests/modern_polars/unpivot_test.py::test_unpivot[ibis]
0.16s call     tests/expr_and_series/binary_test.py::test_expr_binary[ibis]
0.16s call     tests/expr_and_series/exp_test.py::test_exp_expr[ibis]
0.16s call     tests/frame/group_by_test.py::test_fancy_functions[sqlframe]
0.16s call     tests/expr_and_series/is_first_distinct_test.py::test_is_first_distinct_expr_lazy[ibis]
0.16s call     tests/expr_and_series/operators_test.py::test_comparand_operators_scalar_expr[ibis-__le__-expected2]
0.15s call     tests/expr_and_series/str/zfill_test.py::test_str_zfill[ibis]
0.15s call     tests/expr_and_series/rolling_sum_test.py::test_rolling_sum_expr[polars[eager]]
0.15s call     tests/expr_and_series/shift_test.py::test_shift_lazy_grouped[ibis]
0.15s call     tests/expr_and_series/diff_test.py::test_diff_lazy_grouped[ibis]
0.15s call     tests/expr_and_series/name/prefix_test.py::test_suffix_after_alias[ibis]
0.14s call     tests/selectors_test.py::test_datetime[pandas]
0.14s call     tests/dtypes_test.py::test_cast_decimal_to_native
0.13s call     tests/expr_and_series/binary_test.py::test_expr_binary[duckdb]
0.13s call     tests/expr_and_series/dt/truncate_test.py::test_truncate_tz_aware_duckdb
0.13s call     tests/expr_and_series/mean_test.py::test_expr_mean_expr[ibis-expr0]
0.13s call     tests/expr_and_series/rank_test.py::test_lazy_rank_expr_desc[sqlframe-data1-ordinal]
0.13s call     tests/expr_and_series/dt/truncate_test.py::test_truncate[sqlframe-1d-expected6]
0.13s call     tests/expr_and_series/exp_test.py::test_exp_expr[duckdb]
0.13s call     tests/frame/to_native_test.py::test_to_native[ibis]
0.12s call     tests/read_scan_test.py::test_scan_csv[sqlframe]
0.12s call     tests/expr_and_series/cast_test.py::test_cast[sqlframe]
0.12s call     tests/expr_and_series/cast_test.py::test_cast_datetime_tz_aware[pandas[pyarrow]]
0.12s call     tests/v1_test.py::test_dask_order_dependent_ops
0.12s call     tests/dtypes_test.py::test_2d_array[ibis]
0.12s call     tests/expr_and_series/fill_null_test.py::test_fill_null_strategies_with_limit_as_none[sqlframe]
0.12s call     tests/expr_and_series/operators_test.py::test_logic_operators_expr[duckdb-__or__-expected1]
0.11s call     tests/read_scan_test.py::test_scan_parquet_kwargs
0.11s call     tests/expr_and_series/shift_test.py::test_shift_lazy[duckdb]
======= 8402 passed, 359 skipped, 509 xfailed in 22.81s

@dangotbanned
Copy link
Member Author

dangotbanned commented Aug 8, 2025

#2933 (comment)

@MarcoGorelli

is it possible to trim it down slightly?

So I went ahead and did some experimenting locally:

Show diff

diff --git a/tests/series_only/from_iterable_test.py b/tests/series_only/from_iterable_test.py
index 733c0728c..17f7fb105 100644
--- a/tests/series_only/from_iterable_test.py
+++ b/tests/series_only/from_iterable_test.py
@@ -21,6 +21,7 @@ if TYPE_CHECKING:
         ValuesView,
     )
 
+    from _pytest.mark import ParameterSet
     from typing_extensions import TypeAlias
 
     from narwhals._namespace import EagerAllowed
@@ -29,6 +30,10 @@ if TYPE_CHECKING:
     IntoIterable: TypeAlias = Callable[..., Iterable[Any]]
 
 
+def param_slow(*args: object) -> ParameterSet:
+    return pytest.param(*args, marks=pytest.mark.slow)
+
+
 class UserDefinedIterable:
     def __init__(self, iterable: Iterable[Any]) -> None:
         self.iterable: Iterable[Any] = iterable
@@ -53,7 +58,7 @@ def dict_values(iterable: Iterable[Any]) -> ValuesView[Any]:
     return dict(enumerate(iterable)).values()
 
 
-_INTO_ITER_3RD_PARTY: list[IntoIterable] = []
+_INTO_ITER_3RD_PARTY: list[IntoIterable | ParameterSet] = []
 
 if find_spec("numpy"):  # pragma: no cover
     import numpy as np
@@ -64,7 +69,7 @@ else:  # pragma: no cover
 if find_spec("pandas"):  # pragma: no cover
     import pandas as pd
 
-    _INTO_ITER_3RD_PARTY.extend([pd.Index, pd.array, pd.Series])
+    _INTO_ITER_3RD_PARTY.extend([pd.Index, param_slow(pd.array), pd.Series])
 else:  # pragma: no cover
     ...
 if find_spec("polars"):  # pragma: no cover
@@ -83,16 +88,20 @@ if find_spec("pyarrow"):  # pragma: no cover
 else:  # pragma: no cover
     ...
 
-_INTO_ITER_STDLIB: tuple[IntoIterable, ...] = (
+_INTO_ITER_STDLIB: tuple[IntoIterable | ParameterSet, ...] = (
     list,
     tuple,
     iter,
-    deque,
-    generator_function,
+    param_slow(deque),
+    param_slow(generator_function),
     generator_expression,
 )
-_INTO_ITER_STDLIB_EXOTIC: tuple[IntoIterable, ...] = dict_keys, dict_values
-INTO_ITER: tuple[IntoIterable, ...] = (
+
+_INTO_ITER_STDLIB_EXOTIC: tuple[ParameterSet, ...] = (
+    param_slow(dict_keys),
+    param_slow(dict_values),
+)
+INTO_ITER: tuple[IntoIterable | ParameterSet, ...] = (
     *_INTO_ITER_STDLIB,
     *_INTO_ITER_STDLIB_EXOTIC,
     UserDefinedIterable,
@@ -113,7 +122,7 @@ def _ids_into_iter(obj: Any) -> str:
 @pytest.mark.parametrize(
     ("values", "dtype"),
     [
-        ((4, 1, 2), nw.Int32),
+        param_slow((4, 1, 2), nw.Int32),
         ((-1, 5, 100), None),
         ((2.1, 2.7, 2.0), nw.Float64),
         (("one", "two"), nw.String),
@@ -167,10 +176,10 @@ def test_series_from_iterable(
 
 @pytest.mark.parametrize(("values", "expected_dtype"), [((4, 1, 2), nw.Int64)])
 def test_series_from_iterable_infer(
-    eager_backend: EagerAllowed, values: Sequence[Any], expected_dtype: IntoDType
+    eager_implementation: EagerAllowed, values: Sequence[Any], expected_dtype: IntoDType
 ) -> None:
     name = "b"
-    result = nw.Series.from_iterable(name, values, backend=eager_backend)
+    result = nw.Series.from_iterable(name, values, backend=eager_implementation)
     assert result.dtype == expected_dtype
     assert_equal_series(result, values, name)
 
@@ -182,14 +191,14 @@ def test_series_from_iterable_not_eager() -> None:
         nw.Series.from_iterable("", [1, 2, 3], backend=backend)
 
 
-def test_series_from_iterable_numpy_not_1d(eager_backend: EagerAllowed) -> None:
+def test_series_from_iterable_numpy_not_1d(eager_implementation: EagerAllowed) -> None:
     pytest.importorskip("numpy")
     import numpy as np
 
     with pytest.raises(ValueError, match="only.+1D numpy arrays"):
-        nw.Series.from_iterable("", np.array([[0], [2]]), backend=eager_backend)
+        nw.Series.from_iterable("", np.array([[0], [2]]), backend=eager_implementation)
 
 
-def test_series_from_iterable_not_iterable(eager_backend: EagerAllowed) -> None:
+def test_series_from_iterable_not_iterable(eager_implementation: EagerAllowed) -> None:
     with pytest.raises(TypeError, match="iterable.+got.+int"):
-        nw.Series.from_iterable("", 2000, backend=eager_backend)  # type: ignore[arg-type]
+        nw.Series.from_iterable("", 2000, backend=eager_implementation)  # type: ignore[arg-type]

  • I can reproduce ~3s using pytest "tests/series_only/from_iterable_test.py" --numprocesses=1
  • No individual case is consistently slower than others
    • There's just a lot of them! (200ish)πŸ˜…
  • Arbitrarily marking half as slow only brings down the timing to 2.59s

So besides just outright removing cases or randomly generating them - the best option would be running with --numprocesses=logical (which I'm guessing you are already?)

Something else we could try is adapting this into a hypothesis test?
I'm thinking we could mark the entire thing as slow, but then use some strategies to randomly sample constructors + DTypes
Maybe as a sub-issue on (#2959) or (#2818) as a follow-up?

It could be good to use this as a chance to get wider coverage - in addition to just making the testing feedback loop shorter πŸ™‚

@dangotbanned

This comment was marked as resolved.

Comment on lines 179 to 183
iterable = cast("Sequence[Any]", data)
native = pl.Series(name=name, values=iterable)
if dtype:
native = native.cast(narwhals_to_native_dtype(dtype, version))
return cls.from_native(native, context=context)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this change is what's causing #2933 (comment)

Copy link
Member Author

@dangotbanned dangotbanned Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well great, (revert: try not downcasting polars) did indeed fix tubular ci

But there's some version branching needed for at least polars==0.20.21

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli I think this was fixed in this polars PR you reviewed

If so, should I re-add the cast fix I had for polars<0.20.26 only - or is an xfail preferred?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In (fix: backport respect dtype, optimize branching) - I'm taking the view that Series.from_iterable/nw.new_series is a critical entrypoint.

So the aim is to both:

  • Make sure it has wide version-compatibility
  • Add as little overhead as possible, especially on the happy path

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dangotbanned - I left only one comment regarding the tests run time.

One question I have (which might be written somewhere but I missed it): are we planning to replace the code in new_series with this eventually? Everything (except the trial to see if UNKNOWN implementation which I commented below) is the same as _new_series_impl

Comment on lines +189 to +194
else: # pragma: no cover
if (not SERIES_ACCEPTS_PD_INDEX) and is_pandas_index(values):
values = values.to_series()
native = pl.Series(name, values)
if dtype_pl:
native = native.cast(dtype_pl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy 😩

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something new here when thinking about (#2933 (comment)), which might not be visible from the diff alone

  • Keeps the compat code fast (we evaulate this once)
  • Doesn't clutter the impl with comments
  • Can use markdown to explain the problem + link to issue/PR
image image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to find a compromise here to Marco's comment on these tests running for too long. To run half of them, one idea is to use only eager_implementation. At the end of the day it does not really matter how we instantiate the Implementation, and using eager_backend just duplicates the number of tests.

If anything, we should test somewhere else that Implementation.from_backend(Implementation.PANDAS) and Implementation.from_backend("pandas") return the same implementation. Once that guaranteed (which I am pretty sure it is), we can run half of these tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @FBruzzesi

Did you know that I've been doing that in this PR since (efc5c4e)?

I added eager_implementation in this PR see diff, and I agree that we should use it more elsewhere πŸ™‚

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad! There is still one use of eager_backend but it's for a small number of tests anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries!

Btw, I am open to reducing the test runtime and investigated a little in (#2933 (comment))

My preference is to defer this to another issue - but if you see it as blocking - then I can work on it here?

@dangotbanned
Copy link
Member Author

are we planning to replace the code in new_series with this eventually?

Absolutely!

(#2933 (comment)) should cover most of it

I guess the plan in my head is:

  1. Add all of (API: io functions for v2Β #2116), which only contain the working parts
  2. Address (Establish safe patterns using Implementation.UNKNOWNΒ #2786), which will be even easier after (feat(typing): Add IntoBackend aliasΒ #2971) πŸ˜‰
  3. Migrate the fixed nw.new_series, nw.from_numpy, etc code into these new constructors

So we'll then have this and the extension point will work everywhere πŸ₯³

from typing import Any

from narwhals._utils import Version, Implementation
from narwhals.typing import IntoBackend
from narwhals.series import Series

some_version = Version.MAIN


def new_series(
    name: str,
    values: Any,
    dtype: IntoDType | None = None,
    *,
    backend: IntoBackend,  # <-- or whatever we end up annotating
) -> Series[Any]:
    return some_version.series.from_iterable(name, values, dtype, backend=backend)

@FBruzzesi
Copy link
Member

@dangotbanned @MarcoGorelli what's left in this PR? Is the concern only on test speed? Marco maybe run it once again while not live streaming, I can expect the results to be much different πŸ˜‚

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks both, happy to ship it!

@dangotbanned
Copy link
Member Author

Thank you both for getting this over the line

this narwhal loves you

@dangotbanned dangotbanned merged commit f57991c into main Aug 17, 2025
31 checks passed
@dangotbanned dangotbanned deleted the series-from-iterable branch August 17, 2025 16:26
dangotbanned added a commit that referenced this pull request Aug 17, 2025
dangotbanned added a commit that referenced this pull request Aug 18, 2025
@MarcoGorelli MarcoGorelli removed the fix label Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request high priority Your PR will be reviewed very quickly if you address this

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Series.from_iterable

5 participants