feat: Add `narwhals.struct` top level function #3261

msalvany · 2025-10-31T14:39:30Z

What type of PR is this? (check all applicable)

Related issues

Related issue [Enh]: Implement narwhals.struct #3247
Closes [Enh]: Implement narwhals.struct #3247

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

TODO:

msalvany · 2025-10-31T14:45:07Z

So far this is what this PR does, I'll attempt polars/arrow next:

df_native_pd = pd.DataFrame({
    "a": [1, 2, 3],
    "b": ["x", "y", "z"],
    "c": [True, False, True],
})
df_pd = nw.from_native(df_native_pd)
df_struct_pd = df_pd.select(nw.concat_struct([nw.col("a"), nw.col("b"), nw.col("c")]).alias("t"))

┌─────────────────────────────────┐
|       Narwhals DataFrame        |
|---------------------------------|
|                                t|
|0   {'a': 1, 'b': 'x', 'c': True}|
|1  {'a': 2, 'b': 'y', 'c': False}|
|2   {'a': 3, 'b': 'z', 'c': True}|
└─────────────────────────────────┘

What I have not yet figure out is where to place the imports, nor where to add unit test apart from the doctests.

msalvany · 2025-10-31T14:48:55Z

narwhals/_pandas_like/namespace.py

+        import pandas as pd  # TODO: where pd.ArrowDtype should come from?
+        import pyarrow.compute as pc  # TODO: where to put this import?


Where should these imports go? is ArrowDtype available through self?

As a reference, something like the following would be the preferred way:

narwhals/narwhals/_pandas_like/utils.py

Lines 553 to 560 in 01aab21

if isinstance_or_issubclass(dtype, dtypes.Date):

try:

import pyarrow as pa # ignore-banned-import

except ModuleNotFoundError as exc:

# BUG: Never re-raised?

msg = "'pyarrow>=13.0.0' is required for `Date` dtype."

raise ModuleNotFoundError(msg) from exc

return "date32[pyarrow]"

msalvany · 2025-10-31T16:20:32Z

At this point, we also get these results for polars df and arrow tables:

Polars:

df_native_pl = pl.DataFrame({
    "a": [1, 2, 3],
    "b": ["x", "y", "z"],
    "c": [True, False, True],
})
df_pl = nw.from_native(df_native_pl)
df_struct_pl = df_pl.select(nw.concat_struct([nw.col("a"), nw.col("b"), nw.col("c")]).alias("t"))

┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|  shape: (3, 1)   |
|  ┌───────────┐   |
|  │ t         │   |
|  │ ---       │   |
|  │ struct[2] │   |
|  ╞═══════════╡   |
|  │ {1,"x"}   │   |
|  │ {2,"y"}   │   |
|  │ {3,"z"}   │   |
|  └───────────┘   |
└──────────────────┘

Arrow:

table_native_pa = pa.table({
    "a": [1, 2, 3],
    "b": ["x", "y", "z"],
    "c": [True, False, True],
})
df_pa = nw.from_native(table_native_pa)
df_struct_pa = df_pa.select(nw.concat_struct([nw.col("a"), nw.col("b"), nw.col("c")]).alias("t"))


┌──────────────────────────────┐
|      Narwhals DataFrame      |
|------------------------------|
|pyarrow.Table                 |
|t: struct<a: int64, b: string>|
|  child 0, a: int64           |
|  child 1, b: string          |
|----                          |
|t: [                          |
|  -- is_valid: all not null   |
|  -- child 0 type: int64      |
|[1,2,3]                       |
|  -- child 1 type: string     |
|["x","y","z"]]                |
└──────────────────────────────┘

dangotbanned · 2025-10-31T17:34:10Z

@msalvany I think some wires may have been crossed 😅

This feature is narwhals.struct, which gets the name from polars:

https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.struct.html

msalvany · 2025-10-31T18:03:08Z

@msalvany I think some wires may have been crossed 😅

Hi @dangotbanned . I see that the original issue is narwhals.struct. But in the discord conversation with @MarcoGorelli we talked about concat_{str, list} (despite concat_list is not yet there). I thought that in the same manner, concat_tuple would work, would it not? That's why I went for concat_struct. But whatever people find more consistent works for me.

msalvany · 2025-10-31T21:49:50Z

I have started with the tests. I see that there are more backends than pandas, polars and arrow.

(narwhals) ➜  narwhals git:(issue_3247) ✗ pytest tests/expr_and_series/concat_struct_test.py -v -k dryrun --tb=no
============================================================ test session starts ============================================================
platform darwin -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0 -- /Users/maria/Documents/OpenSource/Narwhals/narwhals/.venv/bin/python3
cachedir: .pytest_cache
Using --randomly-seed=1430920357
hypothesis profile 'default'
rootdir: /Users/maria/Documents/OpenSource/Narwhals/narwhals
configfile: pyproject.toml
plugins: xdist-3.8.0, randomly-4.0.1, hypothesis-6.142.4, env-1.2.0, cov-7.0.0
collected 7 items                                                                                                                           

tests/expr_and_series/concat_struct_test.py::test_dryrun[pandas] PASSED                                                               [ 14%]
tests/expr_and_series/concat_struct_test.py::test_dryrun[sqlframe] FAILED                                                             [ 28%]
tests/expr_and_series/concat_struct_test.py::test_dryrun[pyarrow] PASSED                                                              [ 42%]
tests/expr_and_series/concat_struct_test.py::test_dryrun[pandas[pyarrow]] PASSED                                                      [ 57%]
tests/expr_and_series/concat_struct_test.py::test_dryrun[polars[eager]] PASSED                                                        [ 71%]
tests/expr_and_series/concat_struct_test.py::test_dryrun[ibis] FAILED                                                                 [ 85%]
tests/expr_and_series/concat_struct_test.py::test_dryrun[duckdb] FAILED                                                               [100%]

========================================================== short test summary info ==========================================================
FAILED tests/expr_and_series/concat_struct_test.py::test_dryrun[sqlframe] - AttributeError: 'SparkLikeNamespace' object has no attribute 'concat_struct'. Did you mean: 'concat_str'?
FAILED tests/expr_and_series/concat_struct_test.py::test_dryrun[ibis] - AttributeError: 'IbisNamespace' object has no attribute 'concat_struct'. Did you mean: 'concat_str'?
FAILED tests/expr_and_series/concat_struct_test.py::test_dryrun[duckdb] - AttributeError: 'DuckDBNamespace' object has no attribute 'concat_struct'. Did you mean: 'concat_str'?
======================================================== 3 failed, 4 passed in 0.53s ========================================================

Should we also implemment the missing ones?

FBruzzesi · 2025-10-31T22:18:46Z

Hey @msalvany - thanks for the contribution 🚀

As a little side note/to expand a bit more on Dan's comment - we try to mirror the polars API, therefore we will aim to have narwhals.struct as mentioned in the original issue, that behaves the same as the polars.struct function for all the backends .

In a similar way, narwhals.concat_list will mirror polars.concat_list.

However:

I thought that in the same manner, concat_tuple would work

concat_tuple is not a polars function, therefore we won't have it either. There are a few exceptions to this rule, but this is not one of them.

Regarding other backends:

I have started with the tests. I see that there are more backends than pandas, polars and arrow.

For now you can start by xfailing them in the tests. I can see you are already xfailing certain polars version, so you can do something along the following lines:

def test_dryrun(constructor: Constructor, *, request: pytest.FixtureRequest) -> None:
    if "polars" in str(constructor) and POLARS_VERSION < (1, 0, 0):
        # nth only available after 1.0
        request.applymarker(pytest.mark.xfail)

+    if any(x in str(constructor) for x in ("dask", "duckdb", "ibis", "pyspark", "sqlframe")):
+        reason = "Not supported/not implemented"
+        request.applymarker(pytest.mark.xfail(reason))

and in those backend namespaces you can add struct = not_implemented() instead of defining the method.

I hope it helps! Let's get pandas, polars and pyarrow in first, and then we can iterate for the others 🤞🏼

msalvany · 2025-11-02T12:31:59Z

Hi,

Thanks for the clarification @FBruzzesi, I totally get it now! I have changed all concat_struct references to struct.

FBruzzesi · 2025-11-03T00:29:08Z

Hey @msalvany first and foremost, thanks for updating the PR - it looks close to the finish line 🙏🏼

I have a few of comments, especially regarding tests:

In the test, you are running the function, but then it would be good to add a comparison with an expected output. Something along the lines of:

 result = ...
 expected = ...  # <- this is a dictionary that matches the result dataframe content as key: list of values mapping
 assert_data_equal(result, expected)

Locally make sure to run pytest narwhals --doctest-modules as well. I think there is some formatting misalignment in the docstring example
I just noticed that in the contributing guide the part on pre-commit is not very clear. I would suggest to run:
```
uv pip install pre-commit
pre-commit install
pre-commit run --all-files
```
I will update the PR title and convert it to draft - you are always free to change it back whenever you think it's ready

MarcoGorelli · 2025-11-03T10:19:38Z

thanks all! just a comment on

I hope it helps! Let's get pandas, polars and pyarrow in first, and then we can iterate for the others 🤞🏼

we should at least verify that this operation is feasible for spark/duckdb. fortunately, in this case, it looks like it's easily done with struct_pack, e.g.

In [35]: rel = duckdb.sql("select * from values (1,4,0),(1,5,1),(2,6,2) df(a,b,i)")

In [36]: rel
Out[36]:
┌───────┬───────┬───────┐
│   a   │   b   │   i   │
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│     1 │     4 │     0 │
│     1 │     5 │     1 │
│     2 │     6 │     2 │
└───────┴───────┴───────┘

In [37]: rel.select('a', 'b', 'i', duckdb.FunctionExpression('struct_pack', 'a', 'b'))
Out[37]:
┌───────┬───────┬───────┬──────────────────────────────┐
│   a   │   b   │   i   │      struct_pack(a, b)       │
│ int32 │ int32 │ int32 │ struct(a integer, b integer) │
├───────┼───────┼───────┼──────────────────────────────┤
│     1 │     4 │     0 │ {'a': 1, 'b': 4}             │
│     1 │     5 │     1 │ {'a': 1, 'b': 5}             │
│     2 │     6 │     2 │ {'a': 2, 'b': 6}             │
└───────┴───────┴───────┴──────────────────────────────┘

in pyspark it looks like it's just struct

msalvany · 2025-11-03T10:31:30Z

In [35]: rel = duckdb.sql("select * from values (1,4,0),(1,5,1),(2,6,2) df(a,b,i)")

In [36]: rel
Out[36]:
┌───────┬───────┬───────┐
│   a   │   b   │   i   │
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│     1 │     4 │     0 │
│     1 │     5 │     1 │
│     2 │     6 │     2 │
└───────┴───────┴───────┘

In [37]: rel.select('a', 'b', 'i', duckdb.FunctionExpression('struct_pack', 'a', 'b'))
Out[37]:
┌───────┬───────┬───────┬──────────────────────────────┐
│   a   │   b   │   i   │      struct_pack(a, b)       │
│ int32 │ int32 │ int32 │ struct(a integer, b integer) │
├───────┼───────┼───────┼──────────────────────────────┤
│     1 │     4 │     0 │ {'a': 1, 'b': 4}             │
│     1 │     5 │     1 │ {'a': 1, 'b': 5}             │
│     2 │     6 │     2 │ {'a': 2, 'b': 6}             │
└───────┴───────┴───────┴──────────────────────────────┘

Hello @MarcoGorelli, I'm going to use your example here to ask if the output we expect after nw.struct() is a new column containing the struct inside the original dataframe (as you showed here), or rather a new independent df with a single column containing the struct.

If I understand this right, what polars.struct() generates is the 2nd option, but I might be mistaken.

So far, this is what I was mimicking, just let me know if it should be changed. Thanks!

MarcoGorelli · 2025-11-03T10:33:42Z

a new column containing the struct inside the original dataframe (as you showed here), or rather a new independent df with a single column containing the struct.

this depends on whether you use with_columns or select

msalvany · 2025-11-04T11:24:30Z

in pyspark it looks like it's just struct

I simply tested the struct from pyspark to be sure we get the same, and it looks fine too:

data = [(1, 4, 0), (1, 5, 1), (2, 6, 2)]
columns = ["a", "b", "i"]

spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame(data, columns)
df_with_struct = df.select("a", "b", "i", struct("a", "b").alias("struct_col"))
df_with_struct.show(truncate=False)

+---+---+---+----------+
|a  |b  |i  |struct_col|
+---+---+---+----------+
|1  |4  |0  |{1, 4}    |
|1  |5  |1  |{1, 5}    |
|2  |6  |2  |{2, 6}    |
+---+---+---+----------+

for more information, see https://pre-commit.ci

…lframe & `xfail` in struct_test.py

for more information, see https://pre-commit.ci

MarcoGorelli · 2025-11-04T16:16:45Z

narwhals/_pandas_like/namespace.py

+                values = df[col].tolist()
+                non_null_values = [v for v in values if not pd.isna(v)]


quick note that tolist and iterating over values in Python isn't allowed here, as it's very inefficient - you'll need to look for a way to do this using the pandas api

msalvany commented Oct 31, 2025

View reviewed changes

FBruzzesi added the enhancement New feature or request label Nov 1, 2025

msalvany changed the title ~~DRAFT: ADD concat_struct~~ DRAFT: ADD struct Nov 2, 2025

FBruzzesi marked this pull request as draft November 3, 2025 00:29

FBruzzesi changed the title ~~DRAFT: ADD struct~~ feat: Add narwhals.struct top level function Nov 3, 2025

msalvany force-pushed the issue_3247 branch from 812531f to ce371c9 Compare November 4, 2025 10:22

msalvany marked this pull request as ready for review November 4, 2025 13:43

msalvany and others added 11 commits November 4, 2025 14:49

ADD concat_struct for pandas_like

6900283

[pre-commit.ci] auto fixes from pre-commit.com hooks

c15443f

for more information, see https://pre-commit.ci

remove TODO comment

4bb8a1e

[pre-commit.ci] auto fixes from pre-commit.com hooks

1761bc7

for more information, see https://pre-commit.ci

ADD concat_struct for polars

3679569

[pre-commit.ci] auto fixes from pre-commit.com hooks

934b113

for more information, see https://pre-commit.ci

ADD concat_struct for arrow

36a49a8

[pre-commit.ci] auto fixes from pre-commit.com hooks

88a8fa3

for more information, see https://pre-commit.ci

ADD dry-run test

eeaa3dc

[pre-commit.ci] auto fixes from pre-commit.com hooks

41b201d

for more information, see https://pre-commit.ci

removed duplicated raise

d9e70d3

msalvany and others added 19 commits November 4, 2025 14:49

Change concat_struct to structin all files

9cdf7ac

[pre-commit.ci] auto fixes from pre-commit.com hooks

c1b6481

for more information, see https://pre-commit.ci

Add struct = not_implemented() for dask", duckdb, ibis, pyspark, sq…

76b0267

…lframe & `xfail` in struct_test.py

[pre-commit.ci] auto fixes from pre-commit.com hooks

7e1fc08

for more information, see https://pre-commit.ci

remove 'print' from test.py

d8b4af9

[pre-commit.ci] auto fixes from pre-commit.com hooks

41f4b87

for more information, see https://pre-commit.ci

TypeError for pandas_like namespace

661a29c

[pre-commit.ci] auto fixes from pre-commit.com hooks

ad855a5

for more information, see https://pre-commit.ci

test_struct: assert_equal_data(result, expected)

9fa78ab

doctstring example test passed

c9d4b0c

add struct to api-reference top-level functions

75f5b77

TypeError improvement

561aaf0

[pre-commit.ci] auto fixes from pre-commit.com hooks

0ecf99f

for more information, see https://pre-commit.ci

imports update for pandas namespace

ad2bbea

[pre-commit.ci] auto fixes from pre-commit.com hooks

2475cd1

for more information, see https://pre-commit.ci

remove folder

d58cd73

pandas_like supporting NaN values test check

5539438

updated docstring example in v2 (test passed)

64db266

[pre-commit.ci] auto fixes from pre-commit.com hooks

04e828a

for more information, see https://pre-commit.ci

msalvany force-pushed the issue_3247 branch from 216567c to 04e828a Compare November 4, 2025 13:49

MarcoGorelli reviewed Nov 4, 2025

View reviewed changes

		import pandas as pd # TODO: where pd.ArrowDtype should come from?
		import pyarrow.compute as pc # TODO: where to put this import?

	if isinstance_or_issubclass(dtype, dtypes.Date):
	try:
	import pyarrow as pa # ignore-banned-import
	except ModuleNotFoundError as exc:
	# BUG: Never re-raised?
	msg = "'pyarrow>=13.0.0' is required for `Date` dtype."
	raise ModuleNotFoundError(msg) from exc
	return "date32[pyarrow]"

		values = df[col].tolist()
		non_null_values = [v for v in values if not pd.isna(v)]

feat: Add narwhals.struct top level function #3261

Are you sure you want to change the base?

feat: Add narwhals.struct top level function #3261

Uh oh!

Conversation

msalvany commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

msalvany commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msalvany Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

FBruzzesi Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

msalvany commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dangotbanned commented Oct 31, 2025

Uh oh!

msalvany commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msalvany commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBruzzesi commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msalvany commented Nov 2, 2025

Uh oh!

FBruzzesi commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli commented Nov 3, 2025

Uh oh!

msalvany commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MarcoGorelli commented Nov 3, 2025

Uh oh!

msalvany commented Nov 4, 2025

Uh oh!

MarcoGorelli Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Add `narwhals.struct` top level function #3261

feat: Add `narwhals.struct` top level function #3261

msalvany commented Oct 31, 2025 •

edited

Loading

msalvany commented Oct 31, 2025 •

edited

Loading

msalvany commented Oct 31, 2025 •

edited

Loading

msalvany commented Oct 31, 2025 •

edited

Loading

msalvany commented Oct 31, 2025 •

edited

Loading

FBruzzesi commented Oct 31, 2025 •

edited

Loading

FBruzzesi commented Nov 3, 2025 •

edited

Loading

msalvany commented Nov 3, 2025 •

edited

Loading