Skip to content

Converting a cube to a Pandas dataframe  #4526

@kaedonkers

Description

@kaedonkers

✨ Feature Request

Return a "long" DataFrame which retains all the metadata of the cube (either by default or using kwarg table="long").

Motivation

Currently iris.pandas.as_data_frame turns a 2D cube (a specification not present in the documentation) into a "pivot table"/"wide table" DataFrame with one of the former dim coord values as the column names and the other dim coord values as the indices. The values of the DataArray at the centre of the cube become the table values.

I would argue that this result is unexpected for Pandas users, not particularly useful and loses lots of metadata in the process.

Feature list

To keep track of ideas as I find them:

  • Specify in the documentation that, currently, only a 2D cube can be converted to a DataFrame
  • Remove dependence on cube.coord(dimensions=[0/1]) as this throws unnecessary errors related to the presence of AuxCoords

Proposed change

A better default behaviour would be to generate a "long" table in which all the coord values from the cube are in separate columns of the DataFrame, with the coord name as the name of the column. The DataArray values would be in another column named after the cube. Attributes could also be included as their own columns for maximum metadata retention, although this might want a toggle kwarg as it could clutter the resulting DataFrame.

This would also allow the conversion to handle more than 2D data (which should really be added to the current documentation as a requirement).

For example:

  • Current situation
cube
>>> precipitation_amount / (kg m-2 day-1) (ensemble_member: 12; time: 7200)
        Dimension coordinates:
            ensemble_member                               x         -
            time                                          -         x
        Cell methods:
            mean                          time
        Attributes:
            Conventions                   CF-1.7

iris.pandas.as_data_frame(cube)
>>>     1980-12-01 12:00:00  1980-12-02 12:00:00  1980-12-03 12:00:00  ...  2000-11-30 12:00:00
    1              1.707561             0.000042             0.003254  ...             0.000034
    4              0.000037             0.001898             0.002343  ...             0.052522
    5              2.079478             0.080967             0.156937  ...             0.051903
    6             16.845169             3.951329             0.034234  ...             0.000010
    7              1.636697             0.000034             0.000013  ...             0.000002
    8              4.940392            15.944072             7.649919  ...             0.001480
    9              0.000037             0.458525             0.122213  ...             1.298610
    10            11.951665             0.038251             3.440227  ...             0.000024
    11             0.001125             0.000270             0.000000  ...             0.007789
    12             0.590116             3.575591            10.533336  ...             0.000010
    13             5.615687             0.000008             0.014623  ...             1.066165
    15             6.515756            17.816267             0.077115  ...             0.527009
    [12 rows x 7200 columns]
  • Proposed change
cube
>>> precipitation_amount / (kg m-2 day-1) (ensemble_member: 12; time: 7200)
        Dimension coordinates:
            ensemble_member                               x         -
            time                                          -         x
        Cell methods:
            mean                          time
        Attributes:
            Conventions                   CF-1.7

iris.pandas.as_data_frame(cube)
>>>        ensemble_member                 time     precipitation_amount
    0                    1  1980-12-01 12:00:00                 1.707561
    1                    4  1980-12-01 12:00:00                 0.000037
    2                    5  1980-12-01 12:00:00                 2.079478
    3                    6  1980-12-01 12:00:00                16.845169
    4                    7  1980-12-01 12:00:00                 1.636697
    ...                ...                  ...                      ...
    86395               10  2000-11-30 12:00:00                 0.047245
    86396               11  2000-11-30 12:00:00                 0.000072
    86397               12  2000-11-30 12:00:00                 0.028169
    86398               13  2000-11-30 12:00:00                 3.638094
    86399               15  2000-11-30 12:00:00                 0.408061
    
    [86400 rows x 3 columns]

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions