-
Notifications
You must be signed in to change notification settings - Fork 172
feat: Add {nw,DataFrame}.from_dicts
#3148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add {nw,DataFrame}.from_dicts
#3148
Conversation
86400b3 to
0e49901
Compare
{nw, DataFrame}.from_dicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @felixgwilliams this looks promising and already close to the finish line.
I left a few comments, arguably nitpicks.
A couple of things that are missing:
- Exposing the functionalities in
stable.v1 - A test to check for both empty data and no schema resulting in a
shape=(0,0)dataframe.
The same as:
def test_from_dict_empty(eager_backend: EagerAllowed) -> None:
result = nw.DataFrame.from_dict({}, backend=eager_backend)
assert result.shape == (0, 0)- expose from_dicts in v1 change docstrings for consistency - add test for empty data without a schema and get it running for polars - remove native namespace argument so that tests pass - Not done: replace Sequence[dict[str, Any]] with Sequence[dict[str, PythonLiteral]]
|
Thanks for all the adjustments @felixgwilliams - I would stall for a moment waiting for an opinion on #3148 (comment), aside that the feature seems ready ππΌ |
β¦ tests for a non-dict mapping
β¦mappings with `from_dicts`
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Closes pola-rs#24583 Downstream in `narwhals`, we discovered the typing wasn't updated alongside the runtime support added in `1.30.0` ### Related - pola-rs#22638 - pola-rs#19322 - narwhals-dev/narwhals#3148 (comment) - narwhals-dev/narwhals#3148 (comment)
This comment was marked as resolved.
This comment was marked as resolved.
- `pl.DataFrame(schema)` defaults to `None` - Having a single branch for empty `data`, means we can safely index into it
Co-authored-by: Dan Redding <[email protected]>
I should really update the others soon
Seems a few of these were missed in narwhals-dev#2981
Will add an IDE preview
The ids look like `"polars0-dict"`, `"pyarrow0-mappingproxy"`
|
I noticed that when the schemas of the dicts is not consistent and schema is not specified, the behavior is different when using the PyArrow backend, because PyArrow only looks at the first row to establish the (see here), whereas pandas uses all the rows and polars seems to use the first 100 by default. Do you think it's worth highlighting the differences in the docstring or do you think "If not specified, the schema will be inferred by the native library." covers it? I think it's surprising enough to be worth mentioning, but I'm conscious of not wanting to be too verbose. |
Co-authored-by: Dan Redding <[email protected]>
Well spotted!
Agreed, this does seem like it would be helpful to document, considering:
Someone will get burned by this eventually π³ What to do?DocsAlthough we could specify these differences in the docstring of
If that's the case, then we could benefit from some narrative docs in the user guide. Note Might be best to split that out into a follow-up issue? |
Tests
Add tests for this then π Each should have
I'd recommend using this fixture instead of Lines 320 to 323 in 63c5022
|
I'll write the tests this evening. Thanks for the pointers π. As for the docstrings, are we happy with leaving them as is and covering the issue separately? Otherwise I wrote some bullet points explaining the differences that I haven't committed yet. I don't really have a good feeling for what is too long for a narwhals docstring. |
I'm not 100% sure if you need more permissions for it, but I'd normally use a suggestion (step 7) for that kinda thing I'm happy to take a look |
β¦ctionary keys in from_dicts
|
@felixgwilliams sorry for the delay, I'm hoping to review this later today π |
dangotbanned
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @felixgwilliams
I've only got a few minor suggestions - I think we're pretty much good to go
Co-authored-by: Dan Redding <[email protected]>
|
Thanks @dangotbanned for your helpful suggestions. I'll be sure to remember the tip about comments being part of the code. |
dangotbanned
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @felixgwilliams, welcome aboard π
{nw, DataFrame}.from_dicts{nw,DataFrame}.from_dicts

What type of PR is this? (check all applicable)
Related issues
nw.from_dictsto convert a sequence of dictionaries representing rows to a data frameΒ #3145nw.from_dictsto convert a sequence of dictionaries representing rows to a data frameΒ #3145Checklist
If you have comments or can explain your changes, please do so below
This PR adds a
from_dictsfunction and methods, that can be used to create a data frame from a sequence of dicts that represent rows. It is available for eager backends.Tracking
from_dictstoIterable[Mapping[str, Any]]Β pola-rs/polars#24584