diff --git a/docs/src/further_topics/dask_best_practices/dask_bags_and_greed.rst b/docs/src/further_topics/dask_best_practices/dask_bags_and_greed.rst new file mode 100644 index 0000000000..007a58d400 --- /dev/null +++ b/docs/src/further_topics/dask_best_practices/dask_bags_and_greed.rst @@ -0,0 +1,235 @@ +.. _examples_bags_greed: + +3. Dask Bags and Greedy Parallelism +----------------------------------- + +Here is a journey that demonstrates: + +* How to apply dask.bags to an existing script +* The equal importance of optimisation of non-parallel parts of a script +* Protection against multiple softwares trying to manage parallelism + simultaneously + + +3.1 The Problem - Slow Loading +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +We have ~7000 GRIB files spread between 256 dated directories:: + + . + |-- 20180401 + | |-- gfs.t00z.icing.0p25.grb2f006 + | |-- gfs.t00z.icing.0p25.grb2f006.1 + | |-- gfs.t00z.icing.0p25.grb2f012 + | |-- gfs.t00z.icing.0p25.grb2f018 + | |-- gfs.t00z.icing.0p25.grb2f024 + | |-- gfs.t00z.icing.0p25.grb2f030 + | `-- gfs.t00z.icing.0p25.grb2f036 + |-- 20180402 + | `-- gfs.t00z.icing.0p25.grb2f006 + |-- 20180403 + | |-- gfs.t12z.icing.0p25.grb2f006 + | |-- gfs.t12z.icing.0p25.grb2f012 + +With this script, a sample of 11 GRIB files takes ~600secs to load:: + + import iris + import glob + + fpaths=glob.glob('20190416/*t18z*f???') + cubes = iris.load(fpaths, callback=callback) + + def callback(cube, field, fname): + if field.sections[5]['bitsPerValue'] == 0: + raise iris.exceptions.IgnoreCubeException + if field.sections[4]['parameterNumber'] == 20: + raise iris.exceptions.IgnoreCubeException + elif field.sections[4]['parameterNumber'] == 234: + cube.long_name = 'Icing Severity' + +3.2 Parallelisation +^^^^^^^^^^^^^^^^^^^ +We'll try using `dask.bag `_ to +parallelise the function calls. It's important that Dask is given the freedom +to break the task down in an efficient manner - the function that is mapped +across the bag should only load a single file, and the bag itself can +iterate through the list of files. Here's the restructured script:: + + import glob + import multiprocessing + import os + + import dask + import dask.bag as db + import iris + + def callback(cube, field, fname): + if field.sections[5]['bitsPerValue'] == 0: + raise iris.exceptions.IgnoreCubeException + if field.sections[4]['parameterNumber'] == 20: + raise iris.exceptions.IgnoreCubeException + elif field.sections[4]['parameterNumber'] == 234: + cube.long_name = 'Icing Severity' + + def func(fname): + return iris.load_cube(fname, callback=callback) + + fpaths = list(glob.glob('20190416/*t18z*f???')) + + # Determine the number of processors visible .. + cpu_count = multiprocessing.cpu_count() + + # .. or as given by slurm allocation. + # Only relevant when using Slurm for job scheduling + if 'SLURM_NTASKS' in os.environ: + cpu_count = os.environ['SLURM_NTASKS'] + + # Do not exceed the number of CPUs available, leaving 1 for the system. + num_workers = cpu_count - 1 + print('Using {} workers from {} CPUs...'.format(num_workers, cpu_count)) + + # Now do the parallel load. + with dask.config.set(num_workers=num_workers): + bag = db.from_sequence(fpaths).map(func) + cubes = iris.cube.CubeList(bag.compute()).merge() + +This achieves approximately a 10-fold improvement if enough CPUs are +available to have one per file. See this benchmarking: + ++---------------+-----------------------+---------------+---------------+ +| Machine | CPUs Available | CPUs Used | Time Taken | ++===============+=======================+===============+===============+ +| A | 4 | 3 | 4m 05s | +| | +---------------+---------------+ +| | | 4 | 3m 22s | ++---------------+-----------------------+---------------+---------------+ +| B | 8 | 1 | 9m 10s | +| | +---------------+---------------+ +| | | 7 | 2m 35s | +| | +---------------+---------------+ +| | | 8 | 2m 20s | ++---------------+-----------------------+---------------+---------------+ + + +.. _examples_bags_greed_profile: + +3.3 Profiling +^^^^^^^^^^^^^ +1m 10s is still a surprisingly long time. When faced with a mystery like +this it is helpful to profile the script to see if there are any steps that +are taking more time than we would expect. For this we use a tool called +`kapture `_ to produce a +flame chart visualising the time spent performing each call: + +.. image:: images/grib-bottleneck.png + :width: 1000 + :align: center + +From this we see that 96% of the runtime is taken by this call:: + + res = gribapi.grib_get_array(self._message_id, key) + +This is the call being used during the ``callback`` function when it uses +GRIB messages to filter out cubes with certain unwanted properties. + +3.4 Improving GRIB Key Handling +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Even with parallelisation, we are still limited by the time it takes to run +a single instance of a function. This is going to become much more important +when running 7000 files instead of 11, since there will be nowhere near +enough CPUs even on a large multi-processing system, meaning each CPU will be running many instances +of the function. **Parallelisation can only go so far to solving speed issues** -- +it's effectively the 'brute force' method. + +:ref:`examples_bags_greed_profile` showed us where the major bottleneck is. To improve efficiency +we can re-write the script to filter on GRIB messages *before* converting +the GRIB file to a cube:: + + import dask + import dask.bag as db + import glob + import iris + import multiprocessing + import os + + def func(fname): + import iris + from iris_grib import load_pairs_from_fields + from iris_grib.message import GribMessage # perform GRIB message level filtering... + filtered_messages = [] + for message in GribMessage.messages_from_filename(fname): + if (message.sections[5]['bitsPerValue'] != 0 and + message.sections[4]['parameterNumber'] == 234): + filtered_messages.append(message) # now convert the messages to cubes... + cubes = [cube for cube, message in load_pairs_from_fields(filtered_messages)] + return iris.cube.CubeList(cubes).merge_cube() + + fpaths = list(glob.glob('/scratch/frcz/ICING/GFS_DATA/20190416/*t18z*f???')) + cpu_count = multiprocessing.cpu_count() + + # Only relevant when using Slurm for job scheduling + if 'SLURM_NTASKS' in os.environ: + cpu_count = os.environ['SLURM_NTASKS'] + + num_workers = cpu_count - 1 + + print('Using {} workers from {} CPUs...'.format(num_workers, cpu_count)) + with dask.config.set(num_workers=num_workers): + bag = db.from_sequence(fpaths).map(func) + cubes = iris.cube.CubeList(bag.compute()) + +This achieves a significant performance improvement - more than twice as +fast as the previous benchmarks: + ++---------------+-----------------------+---------------+---------------+-----------+ +| Machine | CPUs Available | CPUs Used | Previous Time | New Time | ++===============+=======================+===============+===============+===========+ +| Example | 8 | 7 | 2m 35s | 1m 05s | +| | +---------------+---------------+-----------+ +| | | 8 | 2m 20s | 1m 03s | ++---------------+-----------------------+---------------+---------------+-----------+ + +3.5 Managing External Factors +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The speed will still need to be further improved before we can process 7000 +files. The main gains we can achieve are by making sure it is **only Dask** +that manages multi-processing - if multi-processing is coming from more +than one place there are predictable clashes. + +First, NumPy must be prevented from performing it's own multi-processing by +adding the following **before** ``import numpy`` is called. You can read more +about this in :ref:`numpy_threads`. + +:: + + import os + + os.environ["OMP_NUM_THREADS"] = "1" + os.environ["OPENBLAS_NUM_THREADS"] = "1" + os.environ["MKL_NUM_THREADS"] = "1" + os.environ["VECLIB_MAXIMUM_THREADS"] = "1" + os.environ["NUMEXPR_NUM_THREADS"] = "1" + +Lastly, if you are using SLURM on the computing cluster then SLURM must be configured to prevent it +optimising the number of cores necessary for the job. See the SLURM commands +below, to be added before running the python script. It's important that +``ntasks`` matches the number of CPUs specified in the python script. You +can read more about these points in :ref:`multi-pro_slurm`. + +:: + + #SBATCH --ntasks=12 + #SBATCH --ntasks-per-core=1 + +This has all been based on a real example. Once all the above had been set +up correctly, the completion time had dropped from an estimated **55 days** +to **less than 1 day**. + +3.6 Lessons +^^^^^^^^^^^ +* Dask isn't a magic switch - it's important to write your script so that + there is a way to create small sub-tasks. In this case by providing + dask.bag with the file list and the function separated +* Parallelism is not the only performance improvement to try - the script + will still be slow if the individual function is slow +* All multi-processing needs to be managed by Dask. Several other factors + may introduce multi-processing and these need to be configured not to diff --git a/docs/src/further_topics/dask_best_practices/dask_parallel_loop.rst b/docs/src/further_topics/dask_best_practices/dask_parallel_loop.rst new file mode 100644 index 0000000000..836503314c --- /dev/null +++ b/docs/src/further_topics/dask_best_practices/dask_parallel_loop.rst @@ -0,0 +1,169 @@ +.. _examples_parallel_loop: + +2. Parallelising a Loop of Multiple Calls to a Third Party Library +------------------------------------------------------------------ + +Whilst Iris does provide extensive functionality for performing statistical and +mathematical operations on your data, it is sometimes necessary to use a third +party library. + +The following example describes a real world use case of how to parallelise +multiple calls to a third party library using dask bags. + +2.1 The Problem - Parallelising a Loop +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In this particular example, the user is calculating a sounding parcel for each +column in their dataset. The cubes that are used are of shape:: + + (model_level_number: 20; grid_latitude: 1536; grid_longitude: 1536) + +As a sounding is calculated for each column, this means there are 1536x1536 +individual calculations. + +In Python, it is common practice to vectorize the calculation of for loops. +Vectorising is done by using NumPy to operate on the whole array at once rather +than a single element at a time. Unfortunately, not all operations are +vectorisable, including the calculation in this example, and so we look to +other methods to improve the performance. + +2.2 Original Code with Loop +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +We start out by loading cubes of pressure, temperature, dewpoint temperature and height:: + + import iris + import numpy as np + from skewt import SkewT as sk + + pressure = iris.load_cube('a.press.19981109.pp') + temperature = iris.load_cube('a.temp.19981109.pp') + dewpoint = iris.load_cube('a.dewp.19981109.pp') + height = iris.load_cube('a.height.19981109.pp') + +We set up the NumPy arrays we will be filling with the output data:: + + output_arrays = [np.zeros(pressure.shape[0]) for _ in range(6)] + cape, cin, lcl, lfc, el, tpw = output_data + +Now we loop over the columns in the data to calculate the soundings:: + + for y in range(nlim): + for x in range(nlim): + mydata = {'pres': pressure[:, y, x], + 'temp': temperature[:, y, x], + 'dwpt': dewpoint[:, y, x], + 'hght': height[:, y, x]} + + # Calculate the sounding with the selected column of data. + S = sk.Sounding(soundingdata=mydata) + try: + startp, startt, startdp, type_ = S.get_parcel(parcel_def) + P_lcl, P_lfc, P_el, CAPE, CIN = S.get_cape( + startp, startt, startdp, totalcape='tot') + TPW = S.precipitable_water() + except: + P_lcl, P_lfc, P_el, CAPE, CIN, TPW = [ + np.ma.masked for _ in range(6)] + + # Fill the output arrays with the results + cape[y,x] = CAPE + cin[y,x] = CIN + lcl[y,x] = P_lcl + lfc[y,x] = P_lfc + el[y,x] = P_el + tpw[y,x] = TPW + +2.3 Profiling the Code with Kapture +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Kapture is a useful statistical profiler. For more information see `the +Kapture repo `_. + +Results below: + +.. image:: images/loop_third_party_kapture_results.png + :width: 1000 + :align: center + +As we can see above, (looking at the highlighted section of the red bar) it spends most of the time in the call to :: + + S.get_parcel(parcel_def) + +As there are over two million columns in the data, we would greatly benefit +from parallelising this work. + +2.4 Parallelising with Dask Bags +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Dask bags are collections of Python objects that you can map a computation over +in a parallel manner. + +For more information about dask bags, see the `Dask Bag Documentation +`_. + +Dask bags work best with lightweight objects, so we will create a collection of +indices into our data arrays. + +First, we put the loop into a function that takes a slice object to index the +appropriate section of the array.:: + + def calculate_sounding(y_slice): + for y in range(y_slice.stop-y_slice.start): + for x in range(nlim): + mydata = {'pres': pressure[:, y_slice][:, y, x], + 'temp': temperature[:, y_slice][:, y, x], + 'dwpt': dewpoint[:, y_slice][:, y, x], + 'hght': height[:, y_slice][:, y, x]} + + # Calculate the sounding with the selected column of data. + S = sk.Sounding(soundingdata=mydata) + try: + startp, startt, startdp, type_ = S.get_parcel(parcel_def) + P_lcl, P_lfc, P_el, CAPE, CIN = S.get_cape( + startp, startt, startdp, totalcape=total_cape) + TPW = S.precipitable_water() + except: + P_lcl, P_lfc, P_el, CAPE, CIN, TPW = [ + np.ma.masked for _ in range(6)] + + # Fill the output arrays with the results + cape[:, y_slice][y,x] = CAPE + cin[:, y_slice][y,x] = CIN + lcl[:, y_slice][y,x] = P_lcl + lfc[:, y_slice][y,x] = P_lfc + el[:, y_slice][y,x] = P_el + tpw[:, y_slice][y,x] = TPW + +Then we create a dask bag of slice objects that will create multiple partitions +along the y axis.:: + + num_of_workers = 4 + len_of_y_axis = pressure.shape[1] + + part_loc = [int(loc) for loc in np.floor(np.linspace(0, len_of_y_axis, + num_of_workers + 1))] + + dask_bag = db.from_sequence( + [slice(part_loc[i], part_loc[i+1]) for i in range(num_of_workers)]) + + with dask.config.set(scheduler='processes'): + dask_bag.map(calculate_sounding).compute() + +When this was run on a machine with 4 workers, a speedup of ~4x was achieved, +as expected. + +Note that if using the processes scheduler this is some extra time spent +serialising the data to pass it between workers. For more information on the +different schedulers available in Dask, see `Dask Scheduler Overview +`_. + +For more speed up, it is possible to run the same code on a multi-processing +system where you will have access to more CPUs. + +In this particular example, we are handling multiple numpy arrays and so we use +dask bags. If working with a single numpy array, it may be more appropriate to +use Dask Arrays (see `Dask Arrays +`_ for more information). + + +2.5 Lessons +^^^^^^^^^^^ +* If possible, dask bags should contain lightweight objects +* Minimise the number of tasks that are created diff --git a/docs/src/further_topics/dask_best_practices/dask_pp_to_netcdf.rst b/docs/src/further_topics/dask_best_practices/dask_pp_to_netcdf.rst new file mode 100644 index 0000000000..28784154b4 --- /dev/null +++ b/docs/src/further_topics/dask_best_practices/dask_pp_to_netcdf.rst @@ -0,0 +1,92 @@ +.. _examples_pp_to_ff: + +1. Speed up Converting PP Files to NetCDF +----------------------------------------- + +Here is an example of how dask objects can be tuned for better performance. + +1.1 The Problem - Slow Saving +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +We have ~300 PP files which we load as follows: + +.. code-block:: python + + import iris + import glob + + files = glob.glob("pp_files/*.pp") + cube = iris.load_cube(files, "mass_fraction_of_ozone_in_air") + +Note that loading here may also be parallelised in a similar manner as +described in :ref:`examples_bags_greed`. Either way, the resulting cube looks +as follows: + +.. code-block:: text + + mass_fraction_of_ozone_in_air / (kg kg-1) (time: 276; model_level_number: 85; latitude: 144; longitude: 192) + Dimension coordinates: + time x - - - + model_level_number - x - - + latitude - - x - + longitude - - - x + Auxiliary coordinates: + forecast_period x - - - + level_height - x - - + sigma - x - - + Scalar coordinates: + forecast_reference_time: 1850-01-01 00:00:00 + Attributes: + STASH: m01s34i001 + source: Data from Met Office Unified Model + um_version: 10.9 + Cell methods: + mean: time (1 hour) + +The cube is then immediately saved as a netCDF file. + +.. code-block:: python + + nc_chunks = [chunk[0] for chunk in cube.lazy_data().chunks] + iris.save(cube, "outfile.nc", nc_chunks) + +This operation was taking longer than expected and we would like to improve +performance. Note that when this cube is being saved, the data is still lazy, +data is both read and written at the saving step and is done so in chunks. +The way this data is divided into chunks can affect performance. By tweaking +the way these chunks are structured it may be possible to improve performance +when saving. + + +.. _dask_rechunking: + +1.2 Rechunking +^^^^^^^^^^^^^^ +We may inspect the cube's lazy data before saving: + +.. code-block:: python + + # We can access the cubes Dask array + lazy_data = cube.lazy_data() + # We can find the shape of the chunks + # Note that the chunksize of a Dask array is the shape of the chunk + # as a tuple. + print(lazy_data.chunksize) + +Doing so, we find that the chunks currently have the shape:: + +(1, 1, 144, 192) + +This is significantly smaller than the `size which Dask recommends +`_. Bear in mind that the +ideal chunk size depends on the platform you are running on (for this example, +the code is being run on a desktop with 8 CPUs). In this case, we have 23460 +small chunks. We can reduce the number of chunks by rechunking before saving: + +.. code-block:: python + + lazy_data = cube.lazy_data() + lazy_data = lazy_data.rechunk(1, 85, 144, 192) + cube.data = lazy_data + +We now have 276 moderately sized chunks. When we try saving again, we find +that it is approximately 4 times faster, saving in 2m13s rather than 10m33s. diff --git a/docs/src/further_topics/dask_best_practices/images/grib-bottleneck.png b/docs/src/further_topics/dask_best_practices/images/grib-bottleneck.png new file mode 100644 index 0000000000..c029d57e5e Binary files /dev/null and b/docs/src/further_topics/dask_best_practices/images/grib-bottleneck.png differ diff --git a/docs/src/further_topics/dask_best_practices/images/loop_third_party_kapture_results.png b/docs/src/further_topics/dask_best_practices/images/loop_third_party_kapture_results.png new file mode 100644 index 0000000000..8f388bb89c Binary files /dev/null and b/docs/src/further_topics/dask_best_practices/images/loop_third_party_kapture_results.png differ diff --git a/docs/src/further_topics/dask_best_practices/index.rst b/docs/src/further_topics/dask_best_practices/index.rst new file mode 100644 index 0000000000..eb3321345b --- /dev/null +++ b/docs/src/further_topics/dask_best_practices/index.rst @@ -0,0 +1,221 @@ +.. _dask best practices: + +Dask Best Practices +******************* + +This section outlines some of the best practices when using Dask with Iris. These +practices involve improving performance through rechunking, making the best use of +computing clusters and avoiding parallelisation conflicts between Dask and NumPy. + + +.. note:: + + Here, we have collated advice and a handful of examples, from the topics most + relevant when using Dask with Iris, that we hope will assist users to make + the best start when using Dask. It is *not* a fully comprehensive guide + encompassing all best practices. You can find more general dask information in the + `official Dask Documentation `_. + + +Introduction +============ + +`Dask `_ is a powerful tool for speeding up data handling +via lazy loading and parallel processing. To get the full benefit of using +Dask, it is important to configure it correctly and supply it with +appropriately structured data. For example, we may need to "chunk" data arrays +into smaller pieces to process, read and write it; getting the "chunking" right +can make a significant different to performance! + + +.. _numpy_threads: + +NumPy Threads +============= + +In certain scenarios NumPy will attempt to perform threading using an +external library - typically OMP, MKL or openBLAS - making use of **every** +CPU available. This interacts badly with Dask: + +* Dask may create multiple instances of NumPy, each generating enough + threads to use **all** the available CPUs. The resulting sharing of CPUs + between threads greatly reduces performance. The more cores there are, the + more pronounced this problem is. +* NumPy will generate enough threads to use all available CPUs even + if Dask is deliberately configured to only use a subset of CPUs. The + resulting sharing of CPUs between threads greatly reduces performance. +* `Dask is already designed to parallelise with NumPy arrays `_, so adding NumPy's 'competing' layer of + parallelisation could cause unpredictable performance. + +Therefore it is best to prevent NumPy performing its own parallelisation, `a +suggestion made in Dask's own documentation `_. +The following commands will ensure this in all scenarios: + +in Python... + +:: + + # Must be run before importing NumPy. + import os + os.environ["OMP_NUM_THREADS"] = "1" + os.environ["OPENBLAS_NUM_THREADS"] = "1" + os.environ["MKL_NUM_THREADS"] = "1" + os.environ["VECLIB_MAXIMUM_THREADS"] = "1" + os.environ["NUMEXPR_NUM_THREADS"] = "1" + +or in Linux command line... + +:: + + export OMP_NUM_THREADS=1 + export OPENBLAS_NUM_THREADS=1 + export MKL_NUM_THREADS=1 + export VECLIB_MAXIMUM_THREADS=1 + export NUMEXPR_NUM_THREADS=1 + + +.. _multi-pro_systems: + +Dask on Computing Clusters +========================== + +Dask is well suited for use on computing clusters, but there are some important factors you must be +aware of. In particular, you will always need to explicitly control parallel +operation, both in Dask and likewise in NumPy. + + +.. _multi-pro_slurm: + +CPU Allocation +-------------- + +When running on a computing cluster, unless configured otherwise, Dask will attempt to create +one parallel 'worker' task for each CPU. However, when using a job scheduler such as Slurm, only *some* of +these CPUs are actually accessible -- often, and by default, only one. This leads to a serious +over-commitment unless it is controlled. + +So, **whenever Iris is used on a computing cluster, you must always control the number +of dask workers to a sensible value**, matching the slurm allocation. You do +this with:: + + dask.config.set(num_workers=N) + +For an example, see :doc:`dask_bags_and_greed`. + +Alternatively, when there is only one CPU allocated, it may actually be more +efficient to use a "synchronous" scheduler instead, with:: + + dask.config.set(scheduler='synchronous') + +See the Dask documentation on `Single thread synchronous scheduler +`_. + + +.. _multi-pro_numpy: + +NumPy Threading +--------------- + +NumPy also interrogates the visible number of CPUs to multi-thread its operations. +The large number of CPUs available in a computing cluster will thus cause confusion if NumPy +attempts its own parallelisation, so this must be prevented. Refer back to +:ref:`numpy_threads` for more detail. + + +Distributed +----------- + +Even though allocations on a computing cluster are generally restricted to a single node, there +are still good reasons for using 'dask.distributed' in many cases. See `Single Machine: dask.distributed +`_ in the Dask documentation. + + +Chunking +======== + +Dask breaks down large data arrays into chunks, allowing efficient +parallelisation by processing several smaller chunks simultaneously. For more +information, see the documentation on +`Dask Array `_. + +Iris provides a basic chunking shape to Dask, attempting to set the shape for +best performance. The chunking that is used can depend on the file format that +is being loaded. See below for how chunking is performed for: + + * :ref:`chunking_netcdf` + * :ref:`chunking_pp_ff` + +It can in some cases be beneficial to re-chunk the arrays in Iris cubes. +For information on how to do this, see :ref:`dask_rechunking`. + + +.. _chunking_netcdf: + +NetCDF Files +------------ + +NetCDF files can include their own chunking specification. This is either +specified when creating the file, or is automatically assigned if one or +more of the dimensions is `unlimited `_. +Importantly, netCDF chunk shapes are **not optimised for Dask +performance**. + +Chunking can be set independently for any variable in a netCDF file. +When a netCDF variable uses an unlimited dimension, it is automatically +chunked: the chunking is the shape of the whole variable, but with '1' instead +of the length in any unlimited dimensions. + +When chunking is specified for netCDF data, Iris will set the dask chunking +to an integer multiple or fraction of that shape, such that the data size is +near to but not exceeding the dask array chunk size. + + +.. _chunking_pp_ff: + +PP and Fieldsfiles +------------------ + +PP and Fieldsfiles contain multiple 2D fields of data. When loading PP or +Fieldsfiles into Iris cubes, the chunking will automatically be set to a chunk +per field. + +For example, if a PP file contains 2D lat-lon fields for each of the +85 model level numbers, it will load in a cube that looks as follows:: + + (model_level_number: 85; latitude: 144; longitude: 192) + +The data in this cube will be partitioned with chunks of shape +:code:`(1, 144, 192)`. + +If the file(s) being loaded contain multiple fields, this can lead to an +excessive amount of chunks which will result in poor performance. + +When the default chunking is not appropriate, it is possible to rechunk. +:doc:`dask_pp_to_netcdf` provides a detailed demonstration of how Dask can optimise +that process. + + +Examples +======== + +We have written some examples of use cases for using Dask, that come with advice and +explanations for why and how the tasks are performed the way they are. + +If you feel you have an example of a Dask best practice that you think may be helpful to others, +please share them with us by raising a new `discussion on the Iris repository `_. + + * :doc:`dask_pp_to_netcdf` + * :doc:`dask_parallel_loop` + * :doc:`dask_bags_and_greed` + +.. toctree:: + :hidden: + :maxdepth: 1 + + dask_pp_to_netcdf + dask_parallel_loop + dask_bags_and_greed diff --git a/docs/src/userguide/index.rst b/docs/src/userguide/index.rst index fdd0c4d03e..771aa450a3 100644 --- a/docs/src/userguide/index.rst +++ b/docs/src/userguide/index.rst @@ -45,4 +45,5 @@ they may serve as a useful reference for future exploration. ../further_topics/metadata ../further_topics/lenient_metadata ../further_topics/lenient_maths + ../further_topics/dask_best_practices/index ../further_topics/ugrid/index diff --git a/docs/src/whatsnew/latest.rst b/docs/src/whatsnew/latest.rst index 0e2896b7a1..9b62715be6 100644 --- a/docs/src/whatsnew/latest.rst +++ b/docs/src/whatsnew/latest.rst @@ -71,6 +71,10 @@ This document explains the changes made to Iris for this release to use it. By default the theme will be based on the users system settings, defaulting to ``light`` if no system setting is found. (:pull:`5299`) +#. `@HGWright`_ added a :doc:`/further_topics/dask_best_practices/index` + section into the user guide, containing advice and use cases to help users + get the best out of Dask with Iris. + 💼 Internal ===========