-
Notifications
You must be signed in to change notification settings - Fork 300
Description
✨ Feature Request
Return a "long" DataFrame which retains all the metadata of the cube (either by default or using kwarg table="long").
Motivation
Currently iris.pandas.as_data_frame turns a 2D cube (a specification not present in the documentation) into a "pivot table"/"wide table" DataFrame with one of the former dim coord values as the column names and the other dim coord values as the indices. The values of the DataArray at the centre of the cube become the table values.
I would argue that this result is unexpected for Pandas users, not particularly useful and loses lots of metadata in the process.
Feature list
To keep track of ideas as I find them:
- Specify in the documentation that, currently, only a 2D cube can be converted to a DataFrame
- Remove dependence on
cube.coord(dimensions=[0/1])as this throws unnecessary errors related to the presence of AuxCoords
Proposed change
A better default behaviour would be to generate a "long" table in which all the coord values from the cube are in separate columns of the DataFrame, with the coord name as the name of the column. The DataArray values would be in another column named after the cube. Attributes could also be included as their own columns for maximum metadata retention, although this might want a toggle kwarg as it could clutter the resulting DataFrame.
This would also allow the conversion to handle more than 2D data (which should really be added to the current documentation as a requirement).
For example:
- Current situation
cube
>>> precipitation_amount / (kg m-2 day-1) (ensemble_member: 12; time: 7200)
Dimension coordinates:
ensemble_member x -
time - x
Cell methods:
mean time
Attributes:
Conventions CF-1.7
iris.pandas.as_data_frame(cube)
>>> 1980-12-01 12:00:00 1980-12-02 12:00:00 1980-12-03 12:00:00 ... 2000-11-30 12:00:00
1 1.707561 0.000042 0.003254 ... 0.000034
4 0.000037 0.001898 0.002343 ... 0.052522
5 2.079478 0.080967 0.156937 ... 0.051903
6 16.845169 3.951329 0.034234 ... 0.000010
7 1.636697 0.000034 0.000013 ... 0.000002
8 4.940392 15.944072 7.649919 ... 0.001480
9 0.000037 0.458525 0.122213 ... 1.298610
10 11.951665 0.038251 3.440227 ... 0.000024
11 0.001125 0.000270 0.000000 ... 0.007789
12 0.590116 3.575591 10.533336 ... 0.000010
13 5.615687 0.000008 0.014623 ... 1.066165
15 6.515756 17.816267 0.077115 ... 0.527009
[12 rows x 7200 columns]- Proposed change
cube
>>> precipitation_amount / (kg m-2 day-1) (ensemble_member: 12; time: 7200)
Dimension coordinates:
ensemble_member x -
time - x
Cell methods:
mean time
Attributes:
Conventions CF-1.7
iris.pandas.as_data_frame(cube)
>>> ensemble_member time precipitation_amount
0 1 1980-12-01 12:00:00 1.707561
1 4 1980-12-01 12:00:00 0.000037
2 5 1980-12-01 12:00:00 2.079478
3 6 1980-12-01 12:00:00 16.845169
4 7 1980-12-01 12:00:00 1.636697
... ... ... ...
86395 10 2000-11-30 12:00:00 0.047245
86396 11 2000-11-30 12:00:00 0.000072
86397 12 2000-11-30 12:00:00 0.028169
86398 13 2000-11-30 12:00:00 3.638094
86399 15 2000-11-30 12:00:00 0.408061
[86400 rows x 3 columns]