Skip to content

Conversation

@BryanCutler
Copy link
Member

This change adds conversion for a pyarrow.MapArray to Pandas as a column of lists of tuples, where each tuple is a key/item pair. Unit tests were added for python to verify conversion for Pandas round-trip, chunked arrays and MapArray with NULL map and NULL items.

@github-actions
Copy link

github-actions bot commented Oct 5, 2020

@BryanCutler
Copy link
Member Author

Ping @pitrou @jorisvandenbossche to please take a look, thanks!

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. Here are a couple comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to make this inline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why incref here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the line below adding another reference to that object?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, right, it should be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? I would expect py_items to contain a None entry already...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me try that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this doesn't seem to quite work. The backing array is numpy so if the dtype is an object, it will return None, but if numeric, it will be nan. I think we want the conversion to have None for all null values right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? I would expect the above to be always successful?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not an expert on the python/numpy c-apis, so I was following ConvertStruct here. I thought that PyArray_GETITEM might set an error, but looks like it would return null on failure. Is it possible the reset of the item_value PyObject could cause an error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I though PyArray_GETITEM was a simple macro but it seems more involved. So it can return an error indeed.

@pitrou
Copy link
Member

pitrou commented Oct 6, 2020

Also, you'll want to rebase.

@BryanCutler
Copy link
Member Author

Thanks for reviewing @pitrou , I updated but still had a couple questions and I'm not totally sure I got the reference counting right. I'll have to take a closer look tomorrow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*out_values = list_item.detach(), simply.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, thanks for fixing that up

@pitrou pitrou force-pushed the map-type-to-pandas branch from 2ac0282 to 090177e Compare October 7, 2020 09:33
@pitrou
Copy link
Member

pitrou commented Oct 7, 2020

Will merge if CI is ok, thank you.

@pitrou pitrou closed this in 1a2d048 Oct 7, 2020
@BryanCutler BryanCutler deleted the map-type-to-pandas branch October 7, 2020 18:51
@BryanCutler
Copy link
Member Author

Thanks for all the help @pitrou !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants