From 76a2de53ba190d1dfd026a0967ccc437f6a715be Mon Sep 17 00:00:00 2001 From: Peter Killick Date: Fri, 3 Nov 2017 12:20:07 +0000 Subject: [PATCH 1/4] Added missing data whitepaper --- .../src/whitepapers/missing_data_handling.rst | 81 +++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 docs/iris/src/whitepapers/missing_data_handling.rst diff --git a/docs/iris/src/whitepapers/missing_data_handling.rst b/docs/iris/src/whitepapers/missing_data_handling.rst new file mode 100644 index 0000000000..956d48e38e --- /dev/null +++ b/docs/iris/src/whitepapers/missing_data_handling.rst @@ -0,0 +1,81 @@ +============================= +Missing Data Handling in Iris +============================= + +This document provides a brief overview of how Iris handles missing data values +when datasets are loaded as cubes, and when cubes are saved or modified. + +A fill-value or missing data value defines the value used within a dataset to +indicate that data point is missing or not set. +This value is included as part of a dataset's metadata. + +For example, in a gridded global ocean dataset, no data values will be recorded +over land, so land point will be missing data. +In such a case, land points could be indicated by being set to the dataset's +missing data value. + + +Loading +------- + +On loading, any fill-value or missing data value defined in the loaded dataset +should be used as the ``fill_value`` of the NumPy masked array data attribute of the +:class:`~iris.cube.Cube`. This will only appear when the cube's data is realised. + + +Saving +------ + +On save, the fill-value of a cube's masked data array is **not** used in saving data. +Instead, Iris **always** uses the default fill-value for the file-format, +**except** when a fill-value is specified by the user via a fileformat-specific saver. + +For example:: + + >>> iris.save(my_cube, 'my_file.nc', fill_value=-99999) + +.. note:: + Not all savers accept the ``fill_value`` keyword argument. + +Iris will check for and issue warnings of fill-value 'collisions'. +This basically means that whenever there are unmasked values that would read back +as masked, we issue a warning and suggest a workaround. + +This will occur in the following cases: + +* where masked data contains _unmasked_ points matching the fill-value, or +* where unmasked data contains the fill-value (either the format-specific default fill-value, + or a fill-value specified by the user in the save call). + + +NetCDF +~~~~~~ + +NetCDF is a special case, because all ordinary variable data is "potentially masked", +owing to the use of default fill values. The default fill-value used depends on the type +of the variable data. + +The exceptions to this are: + +* One-byte values are not masked unless the variable has an explicit ``_FillValue`` attribute. + That is, there is no default fill-value for ``byte`` types in NetCDF. +* Data may be tagged with a ``_NoFill`` attribute. This is not currently officially + documented or widely implemented. +* Small integers create problems by _not_ having the exemption applied to byte data. + Thus, in principle, ``int32`` data cannot use the full range of 2**16 valid values. + + +Merging +------- + +Merged data should have a fill-value equal to that of the components, if they +all have the same fill-value. If the components have differing fill-values, a +default fill-value will be used instead. + + +Other operations +---------------- + +Other operations, such as :class:`~iris.cube.Cube` arithmetic operations, +generally produce output with a default (NumPy) fill-value. That is, these operations +ignore the fill-values of the input(s) to the operation. \ No newline at end of file From d1d1a8620037522b977a73d940c4f43ff3c1b1a1 Mon Sep 17 00:00:00 2001 From: Peter Killick Date: Fri, 3 Nov 2017 12:33:13 +0000 Subject: [PATCH 2/4] Tweaks --- docs/iris/src/whitepapers/index.rst | 1 + .../iris/src/whitepapers/missing_data_handling.rst | 14 +++++++------- 2 files changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/iris/src/whitepapers/index.rst b/docs/iris/src/whitepapers/index.rst index 313d3b7a27..bbad89064f 100644 --- a/docs/iris/src/whitepapers/index.rst +++ b/docs/iris/src/whitepapers/index.rst @@ -10,3 +10,4 @@ Extra information on specific technical issues. :numbered: um_files_loading.rst + missing_data_handling.rst diff --git a/docs/iris/src/whitepapers/missing_data_handling.rst b/docs/iris/src/whitepapers/missing_data_handling.rst index 956d48e38e..a1eef83535 100644 --- a/docs/iris/src/whitepapers/missing_data_handling.rst +++ b/docs/iris/src/whitepapers/missing_data_handling.rst @@ -5,7 +5,7 @@ Missing Data Handling in Iris This document provides a brief overview of how Iris handles missing data values when datasets are loaded as cubes, and when cubes are saved or modified. -A fill-value or missing data value defines the value used within a dataset to +A missing data value, or fill-value, defines the value used within a dataset to indicate that data point is missing or not set. This value is included as part of a dataset's metadata. @@ -18,7 +18,7 @@ missing data value. Loading ------- -On loading, any fill-value or missing data value defined in the loaded dataset +On load, any fill-value or missing data value defined in the loaded dataset should be used as the ``fill_value`` of the NumPy masked array data attribute of the :class:`~iris.cube.Cube`. This will only appear when the cube's data is realised. @@ -27,8 +27,8 @@ Saving ------ On save, the fill-value of a cube's masked data array is **not** used in saving data. -Instead, Iris **always** uses the default fill-value for the file-format, -**except** when a fill-value is specified by the user via a fileformat-specific saver. +Instead, Iris always uses the default fill-value for the fileformat, *except* +when a fill-value is specified by the user via a fileformat-specific saver. For example:: @@ -43,7 +43,7 @@ as masked, we issue a warning and suggest a workaround. This will occur in the following cases: -* where masked data contains _unmasked_ points matching the fill-value, or +* where masked data contains *unmasked* points matching the fill-value, or * where unmasked data contains the fill-value (either the format-specific default fill-value, or a fill-value specified by the user in the save call). @@ -61,7 +61,7 @@ The exceptions to this are: That is, there is no default fill-value for ``byte`` types in NetCDF. * Data may be tagged with a ``_NoFill`` attribute. This is not currently officially documented or widely implemented. -* Small integers create problems by _not_ having the exemption applied to byte data. +* Small integers create problems by *not* having the exemption applied to byte data. Thus, in principle, ``int32`` data cannot use the full range of 2**16 valid values. @@ -78,4 +78,4 @@ Other operations Other operations, such as :class:`~iris.cube.Cube` arithmetic operations, generally produce output with a default (NumPy) fill-value. That is, these operations -ignore the fill-values of the input(s) to the operation. \ No newline at end of file +ignore the fill-values of the input(s) to the operation. From f9d427f0e1dbc9ee12b9c7f6a7fe7da9481d634c Mon Sep 17 00:00:00 2001 From: Peter Killick Date: Fri, 3 Nov 2017 13:38:11 +0000 Subject: [PATCH 3/4] Don't number whitepapers --- docs/iris/src/whitepapers/index.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/iris/src/whitepapers/index.rst b/docs/iris/src/whitepapers/index.rst index bbad89064f..dd0876d257 100644 --- a/docs/iris/src/whitepapers/index.rst +++ b/docs/iris/src/whitepapers/index.rst @@ -7,7 +7,6 @@ Extra information on specific technical issues. .. toctree:: :maxdepth: 1 - :numbered: um_files_loading.rst missing_data_handling.rst From 57572ef153b8dd12c0153d53b86618a1ccb9d1ef Mon Sep 17 00:00:00 2001 From: Peter Killick Date: Fri, 3 Nov 2017 14:40:27 +0000 Subject: [PATCH 4/4] Review action --- docs/iris/src/whitepapers/missing_data_handling.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/iris/src/whitepapers/missing_data_handling.rst b/docs/iris/src/whitepapers/missing_data_handling.rst index a1eef83535..cd6ef038c2 100644 --- a/docs/iris/src/whitepapers/missing_data_handling.rst +++ b/docs/iris/src/whitepapers/missing_data_handling.rst @@ -10,7 +10,7 @@ indicate that data point is missing or not set. This value is included as part of a dataset's metadata. For example, in a gridded global ocean dataset, no data values will be recorded -over land, so land point will be missing data. +over land, so land points will be missing data. In such a case, land points could be indicated by being set to the dataset's missing data value.