Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
8f7524d
adopt conda-forge
marqh Feb 23, 2017
d23687c
unpin
marqh Feb 23, 2017
1a11aa3
Define @skip_biggus test decorator. (#2353)
pp-mo Feb 13, 2017
880574d
Generic lazy data handling. (#2356)
pp-mo Feb 13, 2017
d1eb253
Use _lazy_data functions for cube data.
pp-mo Feb 10, 2017
9d437fa
Hack for dual lazy support, i.e. biggus OR dask.
pp-mo Feb 10, 2017
5f1956d
Add mask/NaN translations into iris._lazy_data.
pp-mo Feb 12, 2017
f1ba629
Started skipping tests.
pp-mo Feb 12, 2017
6e1a463
Revert unnecessary change to integration/test_pp.
pp-mo Feb 13, 2017
b63fa1c
Various skips.
pp-mo Feb 13, 2017
44e73f2
Disable Travis example + docs tests for now.
pp-mo Feb 13, 2017
1dac2b7
dask based merge
marqh Feb 14, 2017
37a6a2d
skip all iris_grib tests
marqh Feb 14, 2017
94be3dd
Lazy pp loading
djkirkham Feb 14, 2017
eb7ae6d
switched netcdf loader from biggus to dask. untested. (#35)
corinnebosley Feb 14, 2017
4973527
skip failing netcdf unit mock tests: chunks do not add up to shape
marqh Feb 14, 2017
e1b3ce7
pp_load data property fix
marqh Feb 15, 2017
5a9dd2d
as_concrete_array always returns a masked array
marqh Feb 15, 2017
f33e4da
Use Dask for concatenate (#38)
AlexHilson Feb 15, 2017
50b8597
pp unit test
marqh Feb 15, 2017
c7e3390
Don't make lazy wrappers for cube shape and dtype. (#37)
pp-mo Feb 15, 2017
3ca81d9
biggus ArrayStack.multidim_array_stack with da.stack
marqh Feb 15, 2017
64c91dd
is_lazy_data over isinstance
marqh Feb 15, 2017
b0db97b
test_field_collection with dask
marqh Feb 15, 2017
49e7bc7
use np.dtype in mock tests
marqh Feb 15, 2017
1cf2152
typo fix and fill_value guarantee (#39)
corinnebosley Feb 15, 2017
1917099
Replace biggus ndarray with lazy as_concrete_data in pp pyke rules. (…
pp-mo Feb 15, 2017
7f1c893
remove biggus lazy data, skip netcdf save
marqh Feb 15, 2017
f04ad14
fix cube pickle test
marqh Feb 15, 2017
a34d556
skip netCDF save
marqh Feb 15, 2017
85e4bb1
Fixes for biggus array checks (#41)
corinnebosley Feb 15, 2017
8c45ea2
replace biggus lazy use for now, patch out netcdf save tests
marqh Feb 15, 2017
42a5990
skip as fill value lost
marqh Feb 15, 2017
e1cfe5b
Don't try and merge 0-d arrays (#42)
AlexHilson Feb 15, 2017
8d1c602
biggus skippers (#43)
corinnebosley Feb 16, 2017
a1eb87f
plot skippers (#45)
corinnebosley Feb 16, 2017
0608a91
skippers added for more not-serious failures (#44)
corinnebosley Feb 16, 2017
7d59bcf
header dates corrected (#46)
corinnebosley Feb 16, 2017
cfb370a
test implementation tweaks
marqh Feb 16, 2017
8c729d7
skip non lazy coord loading
marqh Feb 16, 2017
03bec3c
pickling test skip
marqh Feb 16, 2017
2bf1dfc
removed some unnecessary skippers, mostly on concatenate tests (#2388)
corinnebosley Feb 21, 2017
2af3603
Data first (#2392)
marqh Feb 23, 2017
6c65048
code migration
marqh Feb 24, 2017
f55012c
return grib to the flock
marqh Feb 24, 2017
eefd17c
test skippers: netcdf4 time
marqh Feb 24, 2017
fb1f740
pin minimal
marqh Feb 24, 2017
4b2c718
skip
marqh Feb 24, 2017
e044e0b
imports
marqh Feb 24, 2017
84153df
adopt eccodes
marqh Feb 24, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 2 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,6 @@ env:
- TEST_TARGET=default
- TEST_TARGET=default TEST_MINIMAL=true
- TEST_TARGET=coding
- TEST_TARGET=example
- TEST_TARGET=doctest

git:
depth: 10000
Expand Down Expand Up @@ -49,12 +47,12 @@ install:

# Customise the testing environment
# ---------------------------------
- conda config --add channels scitools
- conda config --add channels conda-forge
- if [[ "$TEST_MINIMAL" == true ]]; then
conda install --quiet --file minimal-conda-requirements.txt;
else
if [[ "$TRAVIS_PYTHON_VERSION" == 3* ]]; then
sed -e '/ecmwf_grib/d' -e '/esmpy/d' -e '/iris_grib/d' -e 's/#.\+$//' conda-requirements.txt | xargs conda install --quiet;
sed -e '/python-ecmwf_grib/d' -e '/esmpy/d' -e 's/#.\+$//' conda-requirements.txt | xargs conda install --quiet;
else
conda install --quiet --file conda-requirements.txt;
fi
Expand Down
5 changes: 3 additions & 2 deletions conda-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ numpy
pyke
udunits2
cf_units
dask

# Iris build dependencies
setuptools
Expand All @@ -19,14 +20,14 @@ mock
nose
pep8
sphinx
iris_sample_data
iris-sample-data
filelock
imagehash
requests

# Optional iris dependencies
nc_time_axis
iris_grib
python-eccodes
esmpy>=7.0
gdal
libmo_unpack
Expand Down
23 changes: 23 additions & 0 deletions docs/iris/src/developers_guide/dask_interface.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Iris Dask Interface
*******************

Iris uses dask (http://dask.pydata.org) to manage lazy data interfaces and processing graphs. The key principles which define this interface are:

* A call to `cube.data` will always load all of the data.
* Once this has happened:
* `cube.data` is a mutable numpy masked array or ndarray;
* `cube._numpy_array` is a private numpy masked array, accessible via `cube.data`, which may strip off the mask and return a reference to the bare ndarray.
* `cube.data` may be used to set the data, this accepts:
* a numpy array (including masked array), which is assigned to `cube._numpy_array`;
* a dask array, which is assigned to `cube._dask_array` an `cube._numpy_array` is set to None.
* `cube._dask_array` may be None, otherwise it is expected to be a dask graph:
* this may wrap a proxy to a file collection;
* this may wrap the numpy array in `cube._numpy_array`.
* All dask graphs wrap array-like object where missing data is represented by `nan`:
* masked arrays derived from these arrays shall create their mask using the nan location;
* where dask wrapped `int` arrays require masks, these will first be cast to `float`.
* In order to support this mask conversion, cube's have a `fill_value` as part of their metadata, which may be None.
* Array copying is kept to an absolute minimum:
* array references should always be passed, not new arrays created, unless an explicit copy operation is requested.
* To test for the presence of a dask array of any sort, we use:
* `iris._lazy_data.is_lazy_data` which is implemented as `hasattr(data, 'compute')`.
1 change: 1 addition & 0 deletions docs/iris/src/developers_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,4 @@
tests.rst
deprecations.rst
release.rst
dask_interface.rst
6 changes: 3 additions & 3 deletions lib/iris/_concatenate.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# (C) British Crown Copyright 2013 - 2016, Met Office
# (C) British Crown Copyright 2013 - 2017, Met Office
#
# This file is part of Iris.
#
Expand Down Expand Up @@ -26,7 +26,7 @@
from collections import defaultdict, namedtuple
from copy import deepcopy

import biggus
import dask.array as da
import numpy as np

import iris.coords
Expand Down Expand Up @@ -842,7 +842,7 @@ def _build_data(self):
skeletons = self._skeletons
data = [skeleton.data for skeleton in skeletons]

data = biggus.LinearMosaic(tuple(data), axis=self.axis)
data = da.concatenate(data, self.axis)

return data

Expand Down
54 changes: 54 additions & 0 deletions lib/iris/_lazy_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# (C) British Crown Copyright 2017, Met Office
#
# This file is part of Iris.
#
# Iris is free software: you can redistribute it and/or modify it under
# the terms of the GNU Lesser General Public License as published by the
# Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Iris is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public License
# along with Iris. If not, see <http://www.gnu.org/licenses/>.
"""
Routines for lazy data handling.

To avoid replicating implementation-dependent test and conversion code.

"""
from __future__ import (absolute_import, division, print_function)
from six.moves import (filter, input, map, range, zip) # noqa

import dask.array as da
import numpy as np


def is_lazy_data(data):
"""
Return whether the argument is an Iris 'lazy' data array.

At present, this means simply a Dask array.
We determine this by checking for a "compute" property.
NOTE: ***for now only*** accept Biggus arrays also.

"""
result = hasattr(data, 'compute')
return result


def array_masked_to_nans(array, mask=None):
"""
Convert a masked array to a normal array with NaNs at masked points.
This is used for dask integration, as dask does not support masked arrays.
Note that any fill value will be lost.
"""
if mask is None:
mask = array.mask
if array.dtype.kind == 'i':
array = array.astype(np.dtype('f8'))
array[mask] = np.nan
return array
50 changes: 39 additions & 11 deletions lib/iris/_merge.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# (C) British Crown Copyright 2010 - 2016, Met Office
# (C) British Crown Copyright 2010 - 2017, Met Office
#
# This file is part of Iris.
#
Expand Down Expand Up @@ -29,10 +29,11 @@
from collections import namedtuple, OrderedDict
from copy import deepcopy

import biggus
import dask.array as da
import numpy as np
import numpy.ma as ma

from iris._lazy_data import is_lazy_data, array_masked_to_nans
import iris.cube
import iris.coords
import iris.exceptions
Expand Down Expand Up @@ -1068,6 +1069,27 @@ def derive_space(groups, relation_matrix, positions, function_matrix=None):
return space


def _multidim_daskstack(stack):
"""
Recursively build a multidensional stacked dask array.

The argument is an ndarray of dask arrays.
This is needed because dask.array.stack only accepts a 1-dimensional list.

"""
if stack.ndim == 0:
# A 0-d array cannot be merged.
result = stack.item()
elif stack.ndim == 1:
# 'Another' base case : simple 1-d goes direct in dask.
result = da.stack(list(stack))
else:
# Recurse because dask.stack does not do multi-dimensional.
result = da.stack([_multidim_daskstack(subarray)
for subarray in stack])
return result


class ProtoCube(object):
"""
Framework for merging source-cubes into one or more higher
Expand Down Expand Up @@ -1192,10 +1214,10 @@ def merge(self, unique=True):
# Generate group-depth merged cubes from the source-cubes.
for level in range(group_depth):
# Stack up all the data from all of the relevant source
# cubes in a single biggus ArrayStack.
# cubes in a single dask "stacked" array.
# If it turns out that all the source cubes already had
# their data loaded then at the end we can convert the
# ArrayStack back to a numpy array.
# their data loaded then at the end we convert the stack back
# into a plain numpy array.
stack = np.empty(self._stack_shape, 'object')
all_have_data = True
for nd_index in nd_indexes:
Expand All @@ -1204,17 +1226,23 @@ def merge(self, unique=True):
group = group_by_nd_index[nd_index]
offset = min(level, len(group) - 1)
data = self._skeletons[group[offset]].data
# Ensure the data is represented as a biggus.Array and
# slot that Array into the stack.
if isinstance(data, biggus.Array):
# Ensure the data is represented as a dask array and
# slot that array into the stack.
if is_lazy_data(data):
all_have_data = False
else:
data = biggus.NumpyArrayAdapter(data)
if isinstance(data, ma.MaskedArray):
if ma.is_masked(data):
data = array_masked_to_nans(data)
data = data.data
data = da.from_array(data, chunks=data.shape)
stack[nd_index] = data

merged_data = biggus.ArrayStack(stack)
merged_data = _multidim_daskstack(stack)
if all_have_data:
merged_data = merged_data.masked_array()
# All inputs were concrete, so turn the result back into a
# normal array.
merged_data = merged_data.compute()
# Unmask the array only if it is filled.
if (ma.isMaskedArray(merged_data) and
ma.count_masked(merged_data) == 0):
Expand Down
Loading