Skip to content

Conversation

@felixgwilliams
Copy link
Contributor

@felixgwilliams felixgwilliams commented Sep 20, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

This PR adds a from_dicts function and methods, that can be used to create a data frame from a sequence of dicts that represent rows. It is available for eager backends.

Tracking

@felixgwilliams felixgwilliams marked this pull request as ready for review September 20, 2025 22:11
@FBruzzesi FBruzzesi changed the title Feat/from dicts feat: Add support for {nw, DataFrame}.from_dicts Sep 21, 2025
@FBruzzesi FBruzzesi added the enhancement New feature or request label Sep 21, 2025
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @felixgwilliams this looks promising and already close to the finish line.

I left a few comments, arguably nitpicks.

A couple of things that are missing:

  • Exposing the functionalities in stable.v1
  • A test to check for both empty data and no schema resulting in a shape=(0,0) dataframe.

The same as:

def test_from_dict_empty(eager_backend: EagerAllowed) -> None:
    result = nw.DataFrame.from_dict({}, backend=eager_backend)
    assert result.shape == (0, 0)

- expose from_dicts in v1
change docstrings for consistency
- add test for empty data without a schema and get it running for polars
- remove native namespace argument so that tests pass
- Not done: replace Sequence[dict[str, Any]] with Sequence[dict[str, PythonLiteral]]
@FBruzzesi
Copy link
Member

Thanks for all the adjustments @felixgwilliams - I would stall for a moment waiting for an opinion on #3148 (comment), aside that the feature seems ready πŸ‘ŒπŸΌ

@dangotbanned

This comment was marked as resolved.

@felixgwilliams

This comment was marked as resolved.

@dangotbanned

This comment was marked as resolved.

dangotbanned added a commit to dangotbanned/polars that referenced this pull request Sep 23, 2025
Closes pola-rs#24583

Downstream in `narwhals`, we discovered the typing wasn't updated alongside the runtime support added in `1.30.0`

### Related
- pola-rs#22638
- pola-rs#19322
- narwhals-dev/narwhals#3148 (comment)
- narwhals-dev/narwhals#3148 (comment)
@felixgwilliams

This comment was marked as resolved.

- `pl.DataFrame(schema)` defaults to `None`
- Having a single branch for empty `data`, means we can safely index into it
@felixgwilliams
Copy link
Contributor Author

I noticed that when the schemas of the dicts is not consistent and schema is not specified, the behavior is different when using the PyArrow backend, because PyArrow only looks at the first row to establish the (see here), whereas pandas uses all the rows and polars seems to use the first 100 by default.

Do you think it's worth highlighting the differences in the docstring or do you think "If not specified, the schema will be inferred by the native library." covers it? I think it's surprising enough to be worth mentioning, but I'm conscious of not wanting to be too verbose.

@dangotbanned
Copy link
Member

I noticed that when the schemas of the dicts is not consistent and schema is not specified, the behavior is different

  • pyarrow only looks at the first row to establish the schema (see here)
  • pandas uses all the rows
  • polars uses the first 100 by default.

Well spotted!

Do you think it's worth highlighting the differences in the docstring or do you think

"If not specified, the schema will be inferred by the native library."

covers it?
I think it's surprising enough to be worth mentioning, but I'm conscious of not wanting to be too verbose.

Agreed, this does seem like it would be helpful to document, considering:

Someone will get burned by this eventually 😳

What to do?

Docs

Although we could specify these differences in the docstring of from_dicts, I suspect we might also see the same kind of differences in:

If that's the case, then we could benefit from some narrative docs in the user guide.
That way we could do side-by-side comparisons without clogging up the docstrings πŸ˜…

Note

Might be best to split that out into a follow-up issue?

@dangotbanned
Copy link
Member

Tests

I noticed that when the schemas of the dicts is not consistent and schema is not specified, the behavior is different

Add tests for this then πŸ˜„

Each should have data with a schema change in row:

  1. 2 (xfail pyarrow)
  2. 99 (xfail pyarrow)
  3. 101 (xfail pyarrow, polars)

I'd recommend using this fixture instead of eager_backend, but in this case it's because the input data size is larger than normal

narwhals/tests/conftest.py

Lines 320 to 323 in 63c5022

@pytest.fixture(params=[el for el in TEST_EAGER_BACKENDS if not isinstance(el, str)])
def eager_implementation(request: pytest.FixtureRequest) -> EagerAllowed:
"""Use if a test is heavily parametric, skips `str` backend."""
return request.param # type: ignore[no-any-return]

@felixgwilliams
Copy link
Contributor Author

Add tests for this then πŸ˜„

I'll write the tests this evening. Thanks for the pointers πŸ‘.

As for the docstrings, are we happy with leaving them as is and covering the issue separately? Otherwise I wrote some bullet points explaining the differences that I haven't committed yet. I don't really have a good feeling for what is too long for a narwhals docstring.

@dangotbanned
Copy link
Member

dangotbanned commented Sep 25, 2025

Otherwise I wrote some bullet points explaining the differences that I haven't committed yet. I don't really have a good feeling for what is too long for a narwhals docstring.

I'm not 100% sure if you need more permissions for it, but I'd normally use a suggestion (step 7) for that kinda thing

I'm happy to take a look

@dangotbanned
Copy link
Member

@felixgwilliams sorry for the delay, I'm hoping to review this later today πŸ™

Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @felixgwilliams

I've only got a few minor suggestions - I think we're pretty much good to go

praise be narwhal

@felixgwilliams
Copy link
Contributor Author

Thanks @dangotbanned for your helpful suggestions. I'll be sure to remember the tip about comments being part of the code.

Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @felixgwilliams, welcome aboard πŸ˜‰

cat-ptain on a narwhal

@dangotbanned dangotbanned changed the title feat: Add support for {nw, DataFrame}.from_dicts feat: Add {nw,DataFrame}.from_dicts Sep 29, 2025
@dangotbanned dangotbanned merged commit 25447d3 into narwhals-dev:main Sep 29, 2025
28 of 31 checks passed
@felixgwilliams felixgwilliams deleted the feat/from_dicts branch September 30, 2025 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eager-only enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enh]: Add nw.from_dicts to convert a sequence of dictionaries representing rows to a data frame

3 participants