Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 4 additions & 12 deletions lib/iris/pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,26 +130,18 @@ def _as_pandas_coord(coord):

def _assert_shared(np_obj, pandas_obj):
"""Ensure the pandas object shares memory."""
if hasattr(pandas_obj, 'base'):
base = pandas_obj.base
if hasattr(pandas_obj, 'values'):
values = pandas_obj.values
else:
base = pandas_obj[0].base

# Prior to Pandas 0.17, when pandas_obj is a Series, pandas_obj.values
# returns a view of the underlying array, and pandas_obj.base, which calls
# pandas_obj.values.base, returns the underlying array. In 0.17 and 0.18
# pandas_obj.values returns the underlying array, so base may be None even
# if the array is shared.
if base is None:
base = pandas_obj.values
values = pandas_obj[0].values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @hdyson, but what is going on here? What are we getting back by getting the first item from the pandas object? What kind of object are you expecting this to be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good question. If we go back to the original implementation here:

if hasattr(pandas_obj, 'base'):
    base = pandas_obj.base
else:
    base = pandas_obj[0].base

then it's not clear to me what the if/else is protecting against either (bearing in mind that .base expands to .values.base, so the change in argument to hasattr from base to values has no affect on what pandas_obj refers to). I assumed it had a use, though, so didn't want to discard it blindly.

My interpretation is that assert_shared could be called with an argument of either a single pandas object (dataframe or series) or a sequence of them. The only uses seem to be a single object though, so if you'd prefer for me to discard the if/else, and go to a simpler:

base = pandas_obj.base

then I think we'll at least understand what the code is doing (and hopefully a test will fail if that breaks something).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is appealing to be sure. I'm happy to go either way on this. On the whole, I'm pretty happy as is with the existing change. I'd like to get our Travis testing passing again before merging this, but otherwise 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little code archaeology, and it looks as though it was inherited from when pandas Series and Dataframes had a quite different data structure: 19a5487#diff-53a1661a36736be8a6017847069f7319R117 via a change to support numpy 1.7: 97ca7e9

I'll try killing it - my suspicion is that if it had some utility at one point, it probably doesn't any more.


def _get_base(array):
# Chase the stack of NumPy `base` references back to the original array
while array.base is not None:
array = array.base
return array

base = _get_base(base)
base = _get_base(values)
np_base = _get_base(np_obj)
if base is not np_base:
msg = ('Pandas {} does not share memory'
Expand Down