Skip to content

feat: Allow nested structures in lit#3424

Merged
FBruzzesi merged 15 commits intomainfrom
feat/allow-lit-nested
Jan 31, 2026
Merged

feat: Allow nested structures in lit#3424
FBruzzesi merged 15 commits intomainfrom
feat/allow-lit-nested

Conversation

@FBruzzesi
Copy link
Member

@FBruzzesi FBruzzesi commented Jan 27, 2026

Description

I thought this was going to be more complex for more backends. The only troubled one is, as you might expect, pandas. That said, I am quite ok for how it turned out

Important

  1. It's totally possible that there is a simpler way to achieve it
  2. It uses pd.arrays.ArrowExtensionArray which is marked as experimental

TODO:

What type of PR is this? (check all applicable)

  • 💾 Refactor
  • ✨ Feature
  • 🐛 Bug Fix
  • 🔧 Optimization
  • 📝 Documentation
  • ✅ Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

@FBruzzesi FBruzzesi added the enhancement New feature or request label Jan 27, 2026
Comment on lines +668 to +669
@no_type_check
def broadcast_series_to_index(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned I hope you can forgive me one day!

As commented in the description, I added all the relevant methods and properties, yet, as in #3398, the implementation_test.py fails for Modin

Copy link
Member

@dangotbanned dangotbanned Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combining these guys should work 😄

It uses pd.arrays.ArrowExtensionArray which is marked as experimental

I wouldn't worry about pandas warning about experimental.

They've had that since introducing pyarrow stuff in (1.5.*), which was over 3 years ago

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing! Thanks a ton @dangotbanned - applied all the suggestions in 8cf634b

I left one TODO regarding whether or not we should pass copy=False to pd.array

@FBruzzesi FBruzzesi added the nested data `list`, `struct`, etc label Jan 29, 2026
@FBruzzesi FBruzzesi marked this pull request as ready for review January 29, 2026 19:08
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable, thanks @FBruzzesi !

Comment on lines +107 to +113
# Use ArrowExtensionArray to avoid pandas unpacking the nested structure
ns = self._implementation.to_native_namespace()
pandas_series_native = ns.Series(
pd.arrays.ArrowExtensionArray(pa_array), # type: ignore[attr-defined]
name="literal",
index=df._native_frame.index[0:1],
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wrapping part of (#3424 (comment)) should be reused here.

Maybe broadcast_series_to_index is too specfic of a function?

The two more useful parts IMO are:

  • repeat
  • something related to reconstruction?

Copy link
Member

@dangotbanned dangotbanned Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(#3424 (comment))

Okay yeah so what I'm thinking is a new constructor (or two?) on PandasLikeSeries might work?

All of these are parts of a constructor for both native & compliant, and they're spread across 4 modules.

def _extract_comparand(self, other: PandasLikeSeries) -> pd.Series[Any]:

def lit(self, value: PythonLiteral, dtype: IntoDType | None) -> PandasLikeExpr:
def _lit_pandas_series(df: PandasLikeDataFrame) -> PandasLikeSeries:

@classmethod
def _align_full_broadcast(cls, *series: Self) -> Sequence[Self]:
Series = series[0].__native_namespace__().Series

def broadcast_series_to_index(

Comment on lines +263 to +267
def xfail_if_pyspark_connect( # pragma: no cover
constructor: Constructor, request: pytest.FixtureRequest, reason: str = ""
) -> None:
if is_pyspark_connect(constructor):
request.applymarker(pytest.mark.xfail(reason=reason))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def xfail_if_pyspark_connect( # pragma: no cover
constructor: Constructor, request: pytest.FixtureRequest, reason: str = ""
) -> None:
if is_pyspark_connect(constructor):
request.applymarker(pytest.mark.xfail(reason=reason))
request.applymarker(pytest.mark.xfail(is_pyspark_connect(constructor), reason=reason))

@dangotbanned dangotbanned self-requested a review January 30, 2026 19:14
@dangotbanned dangotbanned marked this pull request as draft January 30, 2026 19:24
@dangotbanned
Copy link
Member

dangotbanned commented Jan 30, 2026

Note

Marking Marked as draft while I fix fixed the typing (#3424 (comment))
(69abb91)

- Fixed typo
- Skip using `qualified_type_name` when we know it would be `builtins` (and get stripped anyway)
@dangotbanned dangotbanned marked this pull request as ready for review January 30, 2026 20:25
Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @FBruzzesi!

Only a few suggestions from me, nice work 😎

applause narwhal

)

def lit(self, value: NonNestedLiteral, dtype: IntoDType | None) -> ArrowExpr:
def lit(self, value: PythonLiteral, dtype: IntoDType | None) -> ArrowExpr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, isn't this a nice diff 😄

return self._expr._from_elementwise_horizontal_op(func, *exprs)

def lit(self, value: Any, dtype: IntoDType | None) -> IbisExpr:
def lit(self, value: PythonLiteral, dtype: IntoDType | None) -> IbisExpr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice, no more Any too!

Comment on lines +107 to +113
# Use ArrowExtensionArray to avoid pandas unpacking the nested structure
ns = self._implementation.to_native_namespace()
pandas_series_native = ns.Series(
pd.arrays.ArrowExtensionArray(pa_array), # type: ignore[attr-defined]
name="literal",
index=df._native_frame.index[0:1],
)
Copy link
Member

@dangotbanned dangotbanned Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(#3424 (comment))

Okay yeah so what I'm thinking is a new constructor (or two?) on PandasLikeSeries might work?

All of these are parts of a constructor for both native & compliant, and they're spread across 4 modules.

def _extract_comparand(self, other: PandasLikeSeries) -> pd.Series[Any]:

def lit(self, value: PythonLiteral, dtype: IntoDType | None) -> PandasLikeExpr:
def _lit_pandas_series(df: PandasLikeDataFrame) -> PandasLikeSeries:

@classmethod
def _align_full_broadcast(cls, *series: Self) -> Sequence[Self]:
Series = series[0].__native_namespace__().Series

def broadcast_series_to_index(

@FBruzzesi
Copy link
Member Author

Thanks both for your reviews! I am going to merge so that it can make it into Monday's release (yes, it's time to make a new release!!!)

@dangotbanned regarding #3424 (comment), we can follow up with a refactor. I am not 100% sure of what you have in mind 😂 so feel free to open a PR for it

@FBruzzesi FBruzzesi merged commit b9b04d0 into main Jan 31, 2026
36 of 37 checks passed
@FBruzzesi FBruzzesi deleted the feat/allow-lit-nested branch January 31, 2026 10:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request nested data `list`, `struct`, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support nested structure in nw.lit

3 participants