feat: `ArrowDataFrame.explode` by FBruzzesi · Pull Request #1644 · narwhals-dev/narwhals

FBruzzesi · 2024-12-22T11:38:43Z

What type of PR is this? (check all applicable)

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

I will leave this as draft until we decide how to move forward.

To summarize the discussion(s) in #1542 :

pyarrow native methods ignore nulls and empty list in explode
the workaround here is to have a "fast_path" for when nulls or empty lists are not present, and a dedicated path for when they are
the issue is that we enter python world to create the index via one .to_pylist() call
pandas seems to enter cython anyway to explode a list

…/narwhals into feat/explode-method

dangotbanned · 2025-03-25T11:04:11Z

@FBruzzesi I feel like this shouldn't have got lost!

ArrowDataFrame.explode is 1 of 3 remaining implementations we need

narwhals/narwhals/_arrow/dataframe.py

Line 350 in 2bcc6bb

explode = not_implemented()

narwhals/narwhals/_arrow/dataframe.py

Line 466 in 2bcc6bb

join_asof = not_implemented()

I might add a PR for ArrowDataFrame.clone - since it can just utilize arrow data being immutable

narwhals/narwhals/_arrow/dataframe.py

Line 669 in 2bcc6bb

clone = not_implemented()

FBruzzesi · 2025-03-25T11:08:12Z

I feel like this shouldn't have got lost!

Thanks @dangotbanned ♥️ The main concern was a conversion to python object: filled_counts.to_pylist() in:

    if fast_path:
        indices = pc.list_parent_indices(native_frame[to_explode[0]])
        flatten_func = pc.list_flatten

    else:
        filled_counts = pc.max_element_wise(counts, 1, skip_nulls=True)
        indices = pa.array(
            [
                i
                for i, count in enumerate(filled_counts.to_pylist())
                for _ in range(count)
            ]
        )

dangotbanned · 2025-03-25T11:10:13Z

#1644 (comment)

Maybe we can figure out another path hidden somewhere in the stubs? 🤔

Mentioned in #1644 (comment) #2207

https://results.pre-commit.ci/run/github/760058710/1742905302.AsTci5pETIqquA1eJPcxNQ

`.to_pylist` being called on a scalar is all that is left

dangotbanned · 2025-03-25T13:20:53Z

Series[list].explode() should not return None for empty lists pola-rs/polars#17664

@FBruzzesi @MarcoGorelli

It seems like polars wants to make a breaking change in the next major version - resulting in the same behavior as pyarrow.

If we had that behavior as the goal - I think pc.list_flatten(..., recursive=True) would get us most of the way there.
Just something to keep in mind for the future 🙂

Just leaving as-is, since this'll probably change in the future #1644 (comment)

> error: Incompatible redefinition (redefinition with type "Callable[[ChunkedArray[ListScalar[Any]]], ChunkedArray[Any]]", original type overloaded function) [misc] https://github.com/narwhals-dev/narwhals/actions/runs/14060304329/job/39369169923?pr=1644

dangotbanned · 2025-12-07T13:21:11Z

@FBruzzesi I'm coming back to this after my recent fiddling with pyarrow 😄
There may still be a way to match polars - while staying in pyarrow-land - but ~~I'm only just starting to look now 🤞~~

Updated: Yep, it works! 🥳 (#3347)

Regarding: (#1644 (comment))

3 weeks ago on that issue (pola-rs/polars#17664 (comment))

Note that the option got added to resolve this in #25289.

feat: Add empty_as_null and keep_nulls flags to Expr.explode pola-rs/polars#25289

So now polars.DataFrame.explode has this signature:

    def explode(
        self,
        columns: ColumnNameOrSelector | Iterable[ColumnNameOrSelector],
        *more_columns: ColumnNameOrSelector,
        empty_as_null: bool = True,
        keep_nulls: bool = True,
    ) -> DataFrame:

Mentioned in (#1644 (comment)) Multi-column coming up next 😄

FBruzzesi and others added 19 commits December 8, 2024 22:46

feat: DataFrame and LazyFrame explode

3061fe9

arrow refactor

2326b08

raise for invalid type and docstrings

32af22e

Update narwhals/dataframe.py

3b52ab5

old versions

c3bf009

merge main

b427e79

Merge branch 'main' into feat/explode-method

c77dc62

almost all native

72314a2

doctest

7f04579

Merge branch 'main' into feat/explode-method

7be326e

Merge branch 'main' into feat/explode-method

5da1ad6

Merge branch 'main' into feat/explode-method

4a098b8

Merge branch 'feat/explode-method' of https://github.com/narwhals-dev…

380a6cb

…/narwhals into feat/explode-method

Merge branch 'main' into feat/explode-method

c7a47c9

better error message, fail for arrow with nulls

864e932

doctest-modules

cc72f6b

completely remove pyarrow implementation

1156beb

feat: ArrowDataFrame explode method

03081cb

merge main

8fc8c0a

dangotbanned mentioned this pull request Mar 17, 2025

chore: Spec CompliantLazyFrame #2232

Merged

12 tasks

dangotbanned added enhancement New feature or request pyarrow Issue is related to pyarrow backend labels Mar 25, 2025

dangotbanned added a commit that referenced this pull request Mar 25, 2025

feat: Add DataFrame.clone for pyarrow

310b080

Mentioned in #1644 (comment) #2207

dangotbanned mentioned this pull request Mar 25, 2025

feat: Add DataFrame.clone for pyarrow #2288

Merged

10 tasks

dangotbanned added 3 commits March 25, 2025 12:21

Merge remote-tracking branch 'upstream/main' into feat/pyarrow-explode

7369925

fix: remove not_implemented

d04fc7d

https://results.pre-commit.ci/run/github/760058710/1742905302.AsTci5pETIqquA1eJPcxNQ

refactor: move imports

fc79540

dangotbanned added 2 commits March 25, 2025 12:28

chore: use ArrowDataFrame.native

1f1ac63

fix(typing): Resolve most issues

79b8fd4

`.to_pylist` being called on a scalar is all that is left

dangotbanned and others added 3 commits March 25, 2025 13:41

pyright ignore

80fcc02

Just leaving as-is, since this'll probably change in the future #1644 (comment)

Merge branch 'main' into feat/pyarrow-explode

8e1e025

dangotbanned added a commit that referenced this pull request Dec 7, 2025

feat: Support ArrowDataFrame.explode([column])

3afdaba

Mentioned in (#1644 (comment)) Multi-column coming up next 😄

dangotbanned mentioned this pull request Dec 9, 2025

feat(expr-ir): Add {DataFrame,Series}.explode(empty_as_nulls, keep_nulls) #3347

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `ArrowDataFrame.explode`#1644

feat: `ArrowDataFrame.explode`#1644
FBruzzesi wants to merge 27 commits intomainfrom
feat/pyarrow-explode

FBruzzesi commented Dec 22, 2024 •

edited

Loading

Uh oh!

dangotbanned commented Mar 25, 2025 •

edited

Loading

Uh oh!

FBruzzesi commented Mar 25, 2025 •

edited

Loading

Uh oh!

dangotbanned commented Mar 25, 2025

Uh oh!

dangotbanned commented Mar 25, 2025 •

edited

Loading

Uh oh!

dangotbanned commented Dec 7, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FBruzzesi commented Dec 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this? (check all applicable)

Checklist

If you have comments or can explain your changes, please do so below

Uh oh!

dangotbanned commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBruzzesi commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dangotbanned commented Mar 25, 2025

Uh oh!

dangotbanned commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dangotbanned commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FBruzzesi commented Dec 22, 2024 •

edited

Loading

dangotbanned commented Mar 25, 2025 •

edited

Loading

FBruzzesi commented Mar 25, 2025 •

edited

Loading

dangotbanned commented Mar 25, 2025 •

edited

Loading

dangotbanned commented Dec 7, 2025 •

edited

Loading