Skip to content
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
914d47b
Update as_data_frame()
hsteptoe Mar 29, 2022
26be4a5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 29, 2022
5c2fe98
Minor typo fixes
hsteptoe Jul 27, 2022
3bf85b2
Matching data raveling with dimension meshgrid
hsteptoe Jul 29, 2022
a0e280c
Revise test_simple to check long-syle dataframe
hsteptoe Jul 29, 2022
d41cbb6
Revise NaN and 1D dataframe tests
hsteptoe Jul 29, 2022
d8c5b40
Better pandas.MultiIndex solution
hsteptoe Aug 2, 2022
03ded10
Add 3D test case
hsteptoe Aug 2, 2022
a46c08a
Fixes for cube with partially defined dims
hsteptoe Aug 2, 2022
2df0af3
Update tests for partially defined dims
hsteptoe Aug 2, 2022
1e0a5c8
Update time tests
hsteptoe Aug 2, 2022
26cbcdf
Reuse _as_pandas_coord()
hsteptoe Aug 2, 2022
a7ce983
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 2, 2022
61b801a
Remove Series conversion
hsteptoe Aug 2, 2022
071d07f
Remove option for copy
hsteptoe Aug 2, 2022
2c20e87
Merge branch 'main' into better-pandas-conversion-issue-4526
hsteptoe Aug 2, 2022
9bbada5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 2, 2022
e9d5d53
Merge branch 'SciTools:main' into better-pandas-conversion-issue-4526
hsteptoe Aug 4, 2022
1df3072
First go at adding aux coords
hsteptoe Aug 9, 2022
2c2ce69
First go at adding global attributes
hsteptoe Aug 9, 2022
1c9fbff
Update doc string
hsteptoe Aug 9, 2022
ef55874
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 9, 2022
4f5ed98
Fix for time based AuxCoords
hsteptoe Aug 11, 2022
4fb7631
Minor misc fixes
hsteptoe Aug 11, 2022
332e260
Fix copy issue
hsteptoe Aug 12, 2022
eba020b
Fix black weirdness and add copy tests
hsteptoe Aug 12, 2022
04d11a6
Fix conflicts
hsteptoe Aug 12, 2022
ce03290
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 12, 2022
abc8437
Add attributes test
hsteptoe Aug 12, 2022
5785977
Name fixes
hsteptoe Aug 12, 2022
3afe27f
Add AuxCoord test
hsteptoe Aug 16, 2022
a608e47
Improved aux coord indexing
hsteptoe Aug 16, 2022
553417b
Blacking
hsteptoe Aug 16, 2022
81d60ca
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 16, 2022
a8efb0f
Preserve index on aux coord merge
hsteptoe Aug 17, 2022
f6e0f48
Simplify adding AuxCoords
hsteptoe Aug 17, 2022
07d6d9b
Add assertRaises test for attributes
hsteptoe Aug 17, 2022
8301caa
Better dim coord making logic
hsteptoe Aug 17, 2022
03c3ee9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 17, 2022
2afbe32
Updates to docstring
hsteptoe Aug 17, 2022
08b61b5
Merge branch 'better-pandas-conversion-issue-4526' of github.com:hste…
hsteptoe Aug 17, 2022
54c7300
Handle multidim AuxCoords
hsteptoe Sep 9, 2022
0740273
Add handling for AuxCoords + scalar coord info
hsteptoe Sep 12, 2022
329c2b1
Fix STASH attribute handling
hsteptoe Sep 12, 2022
958ed28
Merge branch 'pandas_ndim' into better-pandas-conversion-issue-4526
hsteptoe Sep 22, 2022
c7a941f
Merge branch 'better-pandas-conversion-issue-4526' of github.com:hste…
hsteptoe Sep 22, 2022
4551343
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 22, 2022
27b8419
Improve and simplify dim extraction
hsteptoe Sep 23, 2022
6ea4970
Re-fix copy behaviour
hsteptoe Sep 23, 2022
01b983d
Doc updates
hsteptoe Sep 23, 2022
c4470a7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 23, 2022
a927388
Add Series example
hsteptoe Sep 23, 2022
6f5cc5c
Merge branch 'better-pandas-conversion-issue-4526' of github.com:hste…
hsteptoe Sep 23, 2022
4c2b46c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 23, 2022
7e48ee1
What's New update
hsteptoe Sep 26, 2022
b929f86
Add `as_series` depreciation warning
hsteptoe Sep 26, 2022
3ac9072
Fix indent
hsteptoe Sep 26, 2022
43c1846
flake8 fixes
hsteptoe Sep 26, 2022
bff3e3a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 26, 2022
c46192d
Fix deprecation warning
hsteptoe Oct 24, 2022
8789478
Fix pytest styles
hsteptoe Oct 24, 2022
38efc6b
Fix pytest styles
hsteptoe Oct 24, 2022
dcf1b7a
Minor syntax change to list
hsteptoe Oct 24, 2022
df7d24f
Update error style
hsteptoe Oct 24, 2022
32917ed
Fix error type
hsteptoe Oct 26, 2022
582ff69
Reinstate copy warning behaviour
hsteptoe Oct 26, 2022
eab743a
Add author link
hsteptoe Oct 26, 2022
3b94276
Remove add_global_attribute code
hsteptoe Oct 27, 2022
bfe9b77
Further global attribute code removal
hsteptoe Oct 27, 2022
f85e284
Add instance checking test
hsteptoe Oct 27, 2022
a10449d
Improve _make_dim_coord_list efficiency
hsteptoe Oct 27, 2022
11f324f
Correct list ouput
hsteptoe Oct 27, 2022
31cc25b
Make tests more efficient
hsteptoe Oct 27, 2022
3d1e2f5
Add scalar coordinate test
hsteptoe Oct 27, 2022
78c1bc4
Docstring fixes
hsteptoe Oct 27, 2022
09ed019
Refactor making of aux_coord_list
hsteptoe Oct 27, 2022
a544486
Fix type error
hsteptoe Oct 27, 2022
99cb8c5
Add masked array -> nan warning in doc
hsteptoe Nov 2, 2022
afcfbef
Consolidate use of `_as_pandas_coord`
hsteptoe Nov 2, 2022
5f995b0
Roll back breaking _make_dim_coord_list changes
hsteptoe Nov 2, 2022
a1d30bc
Raise error for Ancillary variables without dims
hsteptoe Nov 2, 2022
5fff6a3
Ancillary variables tweaks
hsteptoe Nov 2, 2022
2fcb9bb
Ancillary variables tests
hsteptoe Nov 2, 2022
4e65380
Split out metadata for consistency with `as_cubes`
hsteptoe Nov 2, 2022
ea4502c
Add cell_measure tests
hsteptoe Nov 2, 2022
dcf8f63
Docstring fixes
hsteptoe Nov 4, 2022
f337c2d
Test kwarg fixes
hsteptoe Nov 4, 2022
897973f
`_make_aux_coord_list` optimisation
hsteptoe Nov 4, 2022
b25648e
Refactor metadata merging
hsteptoe Nov 4, 2022
213afb6
Roundabout :issue: fix (remove)
hsteptoe Nov 4, 2022
d39b052
Roundabout :issue: fix (re-add)
hsteptoe Nov 4, 2022
be4fee8
Docstring typo fixes
hsteptoe Nov 4, 2022
eca08e9
Fix doctests
hsteptoe Nov 4, 2022
9432890
origin vs remote fixes
hsteptoe Nov 4, 2022
6b1625d
Update docs/src/whatsnew/latest.rst
hsteptoe Nov 4, 2022
3d5acc4
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 4, 2022
64447f8
Further doctest fixes
hsteptoe Nov 4, 2022
9eff2a5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 4, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 40 additions & 82 deletions lib/iris/pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,65 +124,7 @@ def _as_pandas_coord(coord):
return index


def _assert_shared(np_obj, pandas_obj):
"""Ensure the pandas object shares memory."""
values = pandas_obj.values

def _get_base(array):
# Chase the stack of NumPy `base` references back to the original array
while array.base is not None:
array = array.base
return array

base = _get_base(values)
np_base = _get_base(np_obj)
if base is not np_base:
msg = "Pandas {} does not share memory".format(
type(pandas_obj).__name__
)
raise AssertionError(msg)


def as_series(cube, copy=True):
"""
Convert a 1D cube to a Pandas Series.

Args:

* cube - The cube to convert to a Pandas Series.

Kwargs:

* copy - Whether to make a copy of the data.
Defaults to True. Must be True for masked data.

.. note::

This function will copy your data by default.
If you have a large array that cannot be copied,
make sure it is not masked and use copy=False.

"""
data = cube.data
if ma.isMaskedArray(data):
if not copy:
raise ValueError("Masked arrays must always be copied.")
data = data.astype("f").filled(np.nan)
elif copy:
data = data.copy()

index = None
if cube.dim_coords:
index = _as_pandas_coord(cube.dim_coords[0])

series = pandas.Series(data, index)
if not copy:
_assert_shared(data, series)

return series


def as_data_frame(cube, copy=True):
def as_data_frame(cube, dropna=True, asmultiindex=False, add_aux_coord=None):
"""
Convert a 2D cube to a Pandas DataFrame.

Expand All @@ -192,39 +134,55 @@ def as_data_frame(cube, copy=True):

Kwargs:

* copy - Whether to make a copy of the data.
Defaults to True. Must be True for masked data
and some data types (see notes below).
* dropna - Remove missing values from returned dataframe.
Defaults to True.

.. note::

This function will copy your data by default.
If you have a large array that cannot be copied,
make sure it is not masked and use copy=False.

.. note::

Pandas will sometimes make a copy of the array,
for example when creating from an int32 array.
Iris will detect this and raise an exception if copy=False.
TBC

"""
data = cube.data
if ma.isMaskedArray(data):
if not copy:
raise ValueError("Masked arrays must always be copied.")
data = data.astype("f").filled(np.nan)
elif copy:
data = data.copy()

index = columns = None
if cube.coords(dimensions=[0]):
index = _as_pandas_coord(cube.coord(dimensions=[0]))
if cube.coords(dimensions=[1]):
columns = _as_pandas_coord(cube.coord(dimensions=[1]))

data_frame = pandas.DataFrame(data, index, columns)
if not copy:
_assert_shared(data, data_frame)
# Extract dim coord information
if cube.ndim != len(cube.dim_coords):
# Create dummy dim coord information if dim coords not defined
coord_names = ["dim" + str(n) for n in range(cube.ndim)]
coords = [range(dim) for dim in cube.shape]
for c in cube.dim_coords:
for i, dummyc in enumerate(coords):
if len(dummyc) == len(c.points):
coords[i] = _as_pandas_coord(c)
coord_names[i] = c.name()
else:
pass
else:
coord_names = list(map(lambda x: x.name(), cube.dim_coords))
coords = list(map(lambda x: _as_pandas_coord(x), cube.dim_coords))

index = pandas.MultiIndex.from_product(coords, names=coord_names)
data_frame = pandas.DataFrame({cube.name(): data.flatten()}, index=index)

# Add aux coord information
if add_aux_coord:
aux_coord_names = list(map(lambda x: x.name(), cube.aux_coords))
for acoord in add_aux_coord:
assert acoord in aux_coord_names, f'\"{acoord}\" not in cube' # Check aux coord exists
aux_coord = cube.coord(acoord)
coord_bool = np.array(cube.shape) == aux_coord.shape[0] # Which dim coords match aux coord length
aux_coord_index = np.array(coords)[coord_bool][0] # Get corresponding dim coord
# Build aux coord dataframe
acoord_df = pd.DataFrame({acoord: aux_coord.points}, index = pd.Index(data=aux_coord_index, name=np.array(coord_names)[coord_bool][0]))
# Join to main data frame
data_frame = data_frame.join(acoord_df, on=np.array(coord_names)[coord_bool][0])

if dropna:
data_frame.dropna(inplace=True)
if not asmultiindex:
data_frame.reset_index(inplace=True)

return data_frame
Loading