Make `iris.pandas.as_data_frame()` n-dimensional behaviour opt-in #5059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

lbdreyer merged 7 commits into SciTools:pandas_ndim from trexfeathers:pandas_ndim_optin

Nov 16, 2022

docs/src/common_links.inc

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -21,7 +21,7 @@
  
    .. _isort: https://pycqa.github.io/isort/

    .. _issue: https://github.com/SciTools/iris/issues

    .. _issues: https://github.com/SciTools/iris/issues

    .. _legacy documentation: https://scitools.org.uk/iris/docs/v2.4.0/

    .. _legacy documentation: https://github.com/SciTools/scitools.org.uk/tree/master/iris/docs/archive

    .. _matplotlib: https://matplotlib.org/stable/

    .. _napolean: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/sphinxcontrib.napoleon.html

    .. _nox: https://nox.thea.codes/en/stable/

docs/src/index.rst

-Original file line number
+Diff line change
@@ Expand Up / @@ -88,6 +88,8 @@ Icons made by `FreePik <https://www.freepik.com>`_ from @@
     `Flaticon <https://www.flaticon.com/>`_
+    .. _iris_support:
     Support
     ~~~~~~~
@@ Expand All / @@ -101,7 +103,11 @@ The legacy support resources: @@
     * `Users Google Group <https://groups.google.com/forum/#!forum/scitools-iris>`_
     * `Developers Google Group <https://groups.google.com/forum/#!forum/scitools-iris-dev>`_
-    * `Legacy Documentation`_ (Iris 2.4 or earlier)
+    * `Legacy Documentation`_ (Iris 2.4 or earlier).  This is an archive of zip
+      files of past documentation.  You can download, unzip and view the
+      documentation locally (index.html).  There may be some incorrect rendering
+      and older javascvript (.js) files may show a warning when uncompressing, in
+      which case we suggest you use a different unzip tool.
     .. toctree::
@@ Expand Down @@

docs/src/whatsnew/latest.rst

-Original file line number
+Diff line change
@@ Expand Up @@
        non-existing paths, and added expansion functionality to :func:`~iris.io.save`.
        (:issue:`4772`, :pull:`4913`)
-    #. `@hsteptoe`_ and `@trexfeathers`_ (reviewer) added :func:`iris.pandas.as_data_frame`,
-       which provides improved conversion of :class:`~iris.cube.Cube`\s to
-       :class:`~pandas.DataFrame`\s. This includes better handling of multiple
-       :class:`~iris.cube.Cube` dimensions, auxiliary coordinates and attribute information.
+    #. `@hsteptoe`_ and `@trexfeathers`_ improved
+       :func:`iris.pandas.as_data_frame`\'s conversion of :class:`~iris.cube.Cube`\s to
+       :class:`~pandas.DataFrame`\s. This includes better handling of multiple
+       :class:`~iris.cube.Cube` dimensions, auxiliary coordinates and attribute
+       information. **Note:** the improvements are opt-in, by setting the
+       :obj:`iris.FUTURE.pandas_ndim` flag (see :class:`iris.Future` for more).
        (:issue:`4526`, :pull:`4669`)
@@ Expand Down Expand Up @@
     #. N/A
+    #. `@tkknight`_ updated the links for the Iris documentation to v2.4 and
+       earlier to point to the archive of zip files instead. (:pull:`5064`)
     💼 Internal
     ===========
@@ Expand Down @@

docs/src/why_iris.rst

-Original file line number
+Diff line change
@@ Expand Up @@
     from Iris' use of standard NumPy/dask arrays as its underlying data storage.
     Iris is part of SciTools, for more information see https://scitools.org.uk/.
-    For **Iris 2.4** and earlier documentation please see the
-    :link-badge:`https://scitools.org.uk/iris/docs/v2.4.0/,"legacy documentation",cls=badge-info text-white`.
+    For **Iris 2.4** and earlier documentation please see :ref:`iris_support`.

lib/iris/__init__.py

-Original file line number
+Diff line change
@@ Expand Up @@
             To adjust the values simply update the relevant attribute from
             within your code. For example::
+                # example_future_flag is a fictional example.
                 iris.FUTURE.example_future_flag = False
             If Iris code is executed with multiple threads, note the values of
             these options are thread-specific.
-            .. note::
-                iris.FUTURE.example_future_flag does not exist. It is provided
-                as an example.
-            .. todo::
-                Document the ``pandas_ndim`` flag once iris#4669 is merged - can
-                add cross-referencing documentation both here and in
-                iris.pandas.as_dataframe().
+            Parameters
+            ----------
+            datum_support : bool, default=False
+                Opts in to loading coordinate system datum information from NetCDF
+                files into :class:`~iris.coord_systems.CoordSystem`\\ s, wherever
+                this information is present.
+            pandas_ndim : bool, default=False
+                See :func:`iris.pandas.as_data_frame` for details - opts in to the
+                newer n-dimensional behaviour.
             """
             # The flag 'example_future_flag' is provided as a reference for the
@@ Expand Down Expand Up / @@ -218,14 +218,11 @@ def context(self, **kwargs): @@
             statement, the previous state is restored.
             For example::
+                # example_future_flag is a fictional example.
                 with iris.FUTURE.context(example_future_flag=False):
                     # ... code that expects some past behaviour
-            .. note::
-                iris.FUTURE.example_future_flag does not exist and is
-                provided only as an example.
             """
             # Save the current context
             current_state = self.__dict__.copy()
@@ Expand Down @@

lib/iris/pandas.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -378,7 +378,10 @@ def as_cubes(
  
            )

            raise ValueError(message)

        if not pandas_index.is_monotonic:

        if not (

            pandas_index.is_monotonic_increasing

            or pandas_index.is_monotonic_decreasing

        ):

            # Need monotonic index for use in DimCoord(s).

            # This function doesn't sort_index itself since that breaks the

            #  option to return a data view instead of a copy.

    @@ -627,7 +630,7 @@ def as_data_frame(
  
        add_ancillary_variables=False,

    ):

        """

        Convert a 2D cube to a Pandas DataFrame.

        Convert a :class:`~iris.cube.Cube` to a :class:`pandas.DataFrame`.

        :attr:`~iris.cube.Cube.dim_coords` and :attr:`~iris.cube.Cube.data` are

        flattened into a long-style :class:`~pandas.DataFrame`.  Other

    @@ -658,6 +661,29 @@ def as_data_frame(
  
            A :class:`~pandas.DataFrame` with :class:`~iris.cube.Cube` dimensions

            forming a :class:`~pandas.MultiIndex`

        Warnings

        --------

        #. This documentation is for the new ``as_data_frame()`` behaviour, which

           is **currently opt-in** to preserve backwards compatibility. The default

           legacy behaviour is documented in pre-``v3.4`` documentation (summary:

           limited to 2-dimensional :class:`~iris.cube.Cube`\\ s, with only the

           :attr:`~iris.cube.Cube.data` and :attr:`~iris.cube.Cube.dim_coords`

           being added). The legacy behaviour will be removed in a future version

           of Iris, so please opt-in to the new behaviour at your earliest

           convenience, via :class:`iris.Future`:

               >>> iris.FUTURE.pandas_ndim = True

           **Breaking change:** to enable the improvements, the new opt-in

           behaviour flattens multi-dimensional data into a single

           :class:`~pandas.DataFrame` column (the legacy behaviour preserves 2

           dimensions via rows and columns).

           |

        #. Where the :class:`~iris.cube.Cube` contains masked values, these become

           :data:`numpy.nan` in the returned :class:`~pandas.DataFrame`.

        Notes

        -----

        Dask ``DataFrame``\\s are not supported.

    @@ -669,11 +695,6 @@ def as_data_frame(
  
        :class:`~iris.cube.Cube` data `dtype` is preserved.

        Warnings

        --------

        Where the :class:`~iris.cube.Cube` contains masked values, these become

        :data:`numpy.nan` in the returned :class:`~pandas.DataFrame`.

        Examples

        --------

        >>> import iris

    @@ -817,37 +838,72 @@ def merge_metadata(meta_var_list):
  
                    )

            return data_frame

        # Checks

        if not isinstance(cube, iris.cube.Cube):

            raise TypeError(

                f"Expected input to be iris.cube.Cube instance, got: {type(cube)}"

        if iris.FUTURE.pandas_ndim:

            # Checks

            if not isinstance(cube, iris.cube.Cube):

                raise TypeError(

                    f"Expected input to be iris.cube.Cube instance, got: {type(cube)}"

                )

            if copy:

                data = cube.data.copy()

            else:

                data = cube.data

            if ma.isMaskedArray(data):

                if not copy:

                    raise ValueError("Masked arrays must always be copied.")

                data = data.astype("f").filled(np.nan)

            # Extract dim coord information: separate lists for dim names and dim values

            coord_names, coords = _make_dim_coord_list(cube)

            # Make base DataFrame

            index = pandas.MultiIndex.from_product(coords, names=coord_names)

            data_frame = pandas.DataFrame(

                data.ravel(), columns=[cube.name()], index=index

            )

        if copy:

            data = cube.data.copy()

            if add_aux_coords:

                data_frame = merge_metadata(_make_aux_coord_list(cube))

            if add_ancillary_variables:

                data_frame = merge_metadata(_make_ancillary_variables_list(cube))

            if add_cell_measures:

                data_frame = merge_metadata(_make_cell_measures_list(cube))

            if copy:

                result = data_frame.reorder_levels(coord_names).sort_index()

            else:

                data_frame.reorder_levels(coord_names).sort_index(inplace=True)

                result = data_frame

        else:

            message = (

                "You are using legacy 2-dimensional behaviour in"

                "'iris.pandas.as_data_frame()'. This will be removed in a future"

                "version of Iris. Please opt-in to the improved "

                "n-dimensional behaviour at your earliest convenience by setting: "

                "'iris.FUTURE.pandas_ndim = True'. More info is in the "

                "documentation."

            )

            warnings.warn(message, FutureWarning)

            # The legacy behaviour.

            data = cube.data

        if ma.isMaskedArray(data):

            if ma.isMaskedArray(data):

                if not copy:

                    raise ValueError("Masked arrays must always be copied.")

                data = data.astype("f").filled(np.nan)

            elif copy:

                data = data.copy()

            index = columns = None

            if cube.coords(dimensions=[0]):

                index = _as_pandas_coord(cube.coord(dimensions=[0]))

            if cube.coords(dimensions=[1]):

                columns = _as_pandas_coord(cube.coord(dimensions=[1]))

            data_frame = pandas.DataFrame(data, index, columns)

            if not copy:

                raise ValueError("Masked arrays must always be copied.")

            data = data.astype("f").filled(np.nan)

        # Extract dim coord information: separate lists for dim names and dim values

        coord_names, coords = _make_dim_coord_list(cube)

        # Make base DataFrame

        index = pandas.MultiIndex.from_product(coords, names=coord_names)

        data_frame = pandas.DataFrame(

            data.ravel(), columns=[cube.name()], index=index

        )

                _assert_shared(data, data_frame)

        if add_aux_coords:

            data_frame = merge_metadata(_make_aux_coord_list(cube))

        if add_ancillary_variables:

            data_frame = merge_metadata(_make_ancillary_variables_list(cube))

        if add_cell_measures:

            data_frame = merge_metadata(_make_cell_measures_list(cube))

            result = data_frame

        if copy:

            return data_frame.reorder_levels(coord_names).sort_index()

        else:

            data_frame.reorder_levels(coord_names).sort_index(inplace=True)

            return data_frame

        return result

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `iris.pandas.as_data_frame()` n-dimensional behaviour opt-in #5059

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Make iris.pandas.as_data_frame() n-dimensional behaviour opt-in #5059

Uh oh!

Make iris.pandas.as_data_frame() n-dimensional behaviour opt-in #5059

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Make `iris.pandas.as_data_frame()` n-dimensional behaviour opt-in #5059

Make `iris.pandas.as_data_frame()` n-dimensional behaviour opt-in #5059