-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10151: [Python] Add support for MapArray conversion to Pandas #8337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Ping @pitrou @jorisvandenbossche to please take a look, thanks! |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. Here are a couple comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to make this inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why incref here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the line below adding another reference to that object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, right, it should be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this? I would expect py_items to contain a None entry already...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, let me try that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this doesn't seem to quite work. The backing array is numpy so if the dtype is an object, it will return None, but if numeric, it will be nan. I think we want the conversion to have None for all null values right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? I would expect the above to be always successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not an expert on the python/numpy c-apis, so I was following ConvertStruct here. I thought that PyArray_GETITEM might set an error, but looks like it would return null on failure. Is it possible the reset of the item_value PyObject could cause an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I though PyArray_GETITEM was a simple macro but it seems more involved. So it can return an error indeed.
|
Also, you'll want to rebase. |
28fb789 to
2ac0282
Compare
|
Thanks for reviewing @pitrou , I updated but still had a couple questions and I'm not totally sure I got the reference counting right. I'll have to take a closer look tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*out_values = list_item.detach(), simply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, thanks for fixing that up
2ac0282 to
090177e
Compare
|
Will merge if CI is ok, thank you. |
|
Thanks for all the help @pitrou ! |
This change adds conversion for a
pyarrow.MapArrayto Pandas as a column of lists of tuples, where each tuple is a key/item pair. Unit tests were added for python to verify conversion for Pandas round-trip, chunked arrays andMapArraywith NULL map and NULL items.