diff --git a/docs/src/further_topics/dask_best_practices/dask_bags_and_greed.rst b/docs/src/further_topics/dask_best_practices/dask_bags_and_greed.rst
new file mode 100644
index 0000000000..007a58d400
--- /dev/null
+++ b/docs/src/further_topics/dask_best_practices/dask_bags_and_greed.rst
@@ -0,0 +1,235 @@
+.. _examples_bags_greed:
+
+3. Dask Bags and Greedy Parallelism
+-----------------------------------
+
+Here is a journey that demonstrates:
+
+* How to apply dask.bags to an existing script
+* The equal importance of optimisation of non-parallel parts of a script
+* Protection against multiple softwares trying to manage parallelism
+  simultaneously
+
+
+3.1 The Problem - Slow Loading
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+We have ~7000 GRIB files spread between 256 dated directories::
+
+    .
+    |-- 20180401
+    |   |-- gfs.t00z.icing.0p25.grb2f006
+    |   |-- gfs.t00z.icing.0p25.grb2f006.1
+    |   |-- gfs.t00z.icing.0p25.grb2f012
+    |   |-- gfs.t00z.icing.0p25.grb2f018
+    |   |-- gfs.t00z.icing.0p25.grb2f024
+    |   |-- gfs.t00z.icing.0p25.grb2f030
+    |   `-- gfs.t00z.icing.0p25.grb2f036
+    |-- 20180402
+    |   `-- gfs.t00z.icing.0p25.grb2f006
+    |-- 20180403
+    |   |-- gfs.t12z.icing.0p25.grb2f006
+    |   |-- gfs.t12z.icing.0p25.grb2f012
+
+With this script, a sample of 11 GRIB files takes ~600secs to load::
+
+    import iris
+    import glob
+
+    fpaths=glob.glob('20190416/*t18z*f???')
+    cubes = iris.load(fpaths, callback=callback)
+
+    def callback(cube, field, fname):
+        if field.sections[5]['bitsPerValue'] == 0:
+            raise iris.exceptions.IgnoreCubeException
+        if field.sections[4]['parameterNumber'] == 20:
+            raise iris.exceptions.IgnoreCubeException
+        elif field.sections[4]['parameterNumber'] == 234:
+            cube.long_name = 'Icing Severity'
+
+3.2 Parallelisation
+^^^^^^^^^^^^^^^^^^^
+We'll try using `dask.bag <https://docs.dask.org/en/latest/bag.html>`_ to
+parallelise the function calls. It's important that Dask is given the freedom
+to break the task down in an efficient manner - the function that is mapped
+across the bag should only load a single file, and the bag itself can
+iterate through the list of files. Here's the restructured script::
+
+    import glob
+    import multiprocessing
+    import os
+
+    import dask
+    import dask.bag as db
+    import iris
+
+    def callback(cube, field, fname):
+        if field.sections[5]['bitsPerValue'] == 0:
+            raise iris.exceptions.IgnoreCubeException
+        if field.sections[4]['parameterNumber'] == 20:
+            raise iris.exceptions.IgnoreCubeException
+        elif field.sections[4]['parameterNumber'] == 234:
+            cube.long_name = 'Icing Severity'
+
+    def func(fname):
+        return iris.load_cube(fname, callback=callback)
+
+    fpaths = list(glob.glob('20190416/*t18z*f???'))
+
+    # Determine the number of processors visible ..
+    cpu_count = multiprocessing.cpu_count()
+
+    # .. or as given by slurm allocation.
+    # Only relevant when using Slurm for job scheduling
+    if 'SLURM_NTASKS' in os.environ:
+        cpu_count = os.environ['SLURM_NTASKS']
+
+    # Do not exceed the number of CPUs available, leaving 1 for the system.
+    num_workers = cpu_count - 1
+    print('Using {} workers from {} CPUs...'.format(num_workers, cpu_count))
+
+    # Now do the parallel load.
+    with dask.config.set(num_workers=num_workers):
+        bag = db.from_sequence(fpaths).map(func)
+        cubes = iris.cube.CubeList(bag.compute()).merge()
+
+This achieves approximately a 10-fold improvement if enough CPUs are
+available to have one per file. See this benchmarking:
+
++---------------+-----------------------+---------------+---------------+
+| Machine       | CPUs Available        | CPUs Used     | Time Taken    |
++===============+=======================+===============+===============+
+| A             | 4                     | 3             | 4m 05s        |
+|               |                       +---------------+---------------+
+|               |                       | 4             | 3m 22s        |
++---------------+-----------------------+---------------+---------------+
+| B             | 8                     | 1             | 9m 10s        |
+|               |                       +---------------+---------------+
+|               |                       | 7             | 2m 35s        |
+|               |                       +---------------+---------------+
+|               |                       | 8             | 2m 20s        |
++---------------+-----------------------+---------------+---------------+
+
+
+.. _examples_bags_greed_profile:
+
+3.3 Profiling
+^^^^^^^^^^^^^
+1m 10s is still a surprisingly long time. When faced with a mystery like
+this it is helpful to profile the script to see if there are any steps that
+are taking more time than we would expect. For this we use a tool called
+`kapture <https://github.com/SciTools-incubator/kapture>`_ to produce a
+flame chart visualising the time spent performing each call:
+
+.. image:: images/grib-bottleneck.png
+    :width: 1000
+    :align: center
+
+From this we see that 96% of the runtime is taken by this call::
+
+    res = gribapi.grib_get_array(self._message_id, key)
+
+This is the call being used during the ``callback`` function when it uses
+GRIB messages to filter out cubes with certain unwanted properties.
+
+3.4 Improving GRIB Key Handling
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Even with parallelisation, we are still limited by the time it takes to run
+a single instance of a function. This is going to become much more important
+when running 7000 files instead of 11, since there will be nowhere near
+enough CPUs even on a large multi-processing system, meaning each CPU will be running many instances
+of the function. **Parallelisation can only go so far to solving speed issues** --
+it's effectively the 'brute force' method.
+
+:ref:`examples_bags_greed_profile` showed us where the major bottleneck is. To improve efficiency
+we can re-write the script to filter on GRIB messages *before* converting
+the GRIB file to a cube::
+
+    import dask
+    import dask.bag as db
+    import glob
+    import iris
+    import multiprocessing
+    import os
+
+    def func(fname):
+        import iris
+        from iris_grib import load_pairs_from_fields
+        from iris_grib.message import GribMessage    # perform GRIB message level filtering...
+        filtered_messages = []
+        for message in GribMessage.messages_from_filename(fname):
+            if (message.sections[5]['bitsPerValue'] != 0 and
+                message.sections[4]['parameterNumber'] == 234):
+                filtered_messages.append(message)    # now convert the messages to cubes...
+        cubes = [cube for cube, message in load_pairs_from_fields(filtered_messages)]
+        return iris.cube.CubeList(cubes).merge_cube()
+
+    fpaths = list(glob.glob('/scratch/frcz/ICING/GFS_DATA/20190416/*t18z*f???'))
+    cpu_count = multiprocessing.cpu_count()
+
+    # Only relevant when using Slurm for job scheduling
+    if 'SLURM_NTASKS' in os.environ:
+        cpu_count = os.environ['SLURM_NTASKS']
+
+    num_workers = cpu_count - 1
+
+    print('Using {} workers from {} CPUs...'.format(num_workers, cpu_count))
+    with dask.config.set(num_workers=num_workers):
+        bag = db.from_sequence(fpaths).map(func)
+        cubes = iris.cube.CubeList(bag.compute())
+
+This achieves a significant performance improvement - more than twice as
+fast as the previous benchmarks:
+
++---------------+-----------------------+---------------+---------------+-----------+
+| Machine       | CPUs Available        | CPUs Used     | Previous Time | New Time  |
++===============+=======================+===============+===============+===========+
+| Example       | 8                     | 7             | 2m 35s        | 1m 05s    |
+|               |                       +---------------+---------------+-----------+
+|               |                       | 8             | 2m 20s        | 1m 03s    |
++---------------+-----------------------+---------------+---------------+-----------+
+
+3.5 Managing External Factors
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The speed will still need to be further improved before we can process 7000
+files. The main gains we can achieve are by making sure it is **only Dask**
+that manages multi-processing - if multi-processing is coming from more
+than one place there are predictable clashes.
+
+First, NumPy must be prevented from performing it's own multi-processing by
+adding the following **before** ``import numpy`` is called. You can read more
+about this in :ref:`numpy_threads`.
+
+::
+
+    import os
+
+    os.environ["OMP_NUM_THREADS"] = "1"
+    os.environ["OPENBLAS_NUM_THREADS"] = "1"
+    os.environ["MKL_NUM_THREADS"] = "1"
+    os.environ["VECLIB_MAXIMUM_THREADS"] = "1"
+    os.environ["NUMEXPR_NUM_THREADS"] = "1"
+
+Lastly, if you are using SLURM on the computing cluster then SLURM must be configured to prevent it
+optimising the number of cores necessary for the job. See the SLURM commands
+below, to be added before running the python script. It's important that
+``ntasks`` matches the number of CPUs specified in the python script. You
+can read more about these points in :ref:`multi-pro_slurm`.
+
+::
+
+    #SBATCH --ntasks=12
+    #SBATCH --ntasks-per-core=1
+
+This has all been based on a real example. Once all the above had been set
+up correctly, the completion time had dropped from an estimated **55 days**
+to **less than 1 day**.
+
+3.6 Lessons
+^^^^^^^^^^^
+* Dask isn't a magic switch - it's important to write your script so that
+  there is a way to create small sub-tasks. In this case by providing
+  dask.bag with the file list and the function separated
+* Parallelism is not the only performance improvement to try - the script
+  will still be slow if the individual function is slow
+* All multi-processing needs to be managed by Dask. Several other factors
+  may introduce multi-processing and these need to be configured not to
diff --git a/docs/src/further_topics/dask_best_practices/dask_parallel_loop.rst b/docs/src/further_topics/dask_best_practices/dask_parallel_loop.rst
new file mode 100644
index 0000000000..836503314c
--- /dev/null
+++ b/docs/src/further_topics/dask_best_practices/dask_parallel_loop.rst
@@ -0,0 +1,169 @@
+.. _examples_parallel_loop:
+
+2. Parallelising a Loop of Multiple Calls to a Third Party Library
+------------------------------------------------------------------
+
+Whilst Iris does provide extensive functionality for performing statistical and
+mathematical operations on your data, it is sometimes necessary to use a third
+party library.
+
+The following example describes a real world use case of how to parallelise
+multiple calls to a third party library using dask bags.
+
+2.1 The Problem - Parallelising a Loop
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+In this particular example, the user is calculating a sounding parcel for each
+column in their dataset. The cubes that are used are of shape::
+
+    (model_level_number: 20; grid_latitude: 1536; grid_longitude: 1536)
+
+As a sounding is calculated for each column, this means there are 1536x1536
+individual calculations.
+
+In Python, it is common practice to vectorize the calculation of for loops.
+Vectorising is done by using NumPy to operate on the whole array at once rather
+than a single element at a time. Unfortunately, not all operations are
+vectorisable, including the calculation in this example, and so we look to
+other methods to improve the performance.
+
+2.2 Original Code with Loop
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+We start out by loading cubes of pressure, temperature, dewpoint temperature and height::
+
+    import iris
+    import numpy as np
+    from skewt import SkewT as sk
+
+    pressure = iris.load_cube('a.press.19981109.pp')
+    temperature = iris.load_cube('a.temp.19981109.pp')
+    dewpoint = iris.load_cube('a.dewp.19981109.pp')
+    height = iris.load_cube('a.height.19981109.pp')
+
+We set up the NumPy arrays we will be filling with the output data::
+
+    output_arrays = [np.zeros(pressure.shape[0]) for _ in range(6)]
+    cape, cin, lcl, lfc, el, tpw = output_data
+
+Now we loop over the columns in the data to calculate the soundings::
+
+    for y in range(nlim):
+        for x in range(nlim):
+            mydata = {'pres': pressure[:, y, x],
+                      'temp': temperature[:, y, x],
+                      'dwpt': dewpoint[:, y, x],
+                      'hght': height[:, y, x]}
+
+            # Calculate the sounding with the selected column of data.
+            S = sk.Sounding(soundingdata=mydata)
+            try:
+                startp, startt, startdp, type_ = S.get_parcel(parcel_def)
+                P_lcl, P_lfc, P_el, CAPE, CIN = S.get_cape(
+                    startp, startt, startdp, totalcape='tot')
+            TPW = S.precipitable_water()
+            except:
+                P_lcl, P_lfc, P_el, CAPE, CIN, TPW = [
+                    np.ma.masked for _ in range(6)]
+
+            # Fill the output arrays with the results
+            cape[y,x] = CAPE
+            cin[y,x] = CIN
+            lcl[y,x] = P_lcl
+            lfc[y,x] = P_lfc
+            el[y,x] = P_el
+            tpw[y,x] = TPW
+
+2.3 Profiling the Code with Kapture
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Kapture is a useful statistical profiler. For more information see `the
+Kapture repo <https://github.com/SciTools-incubator/kapture>`_.
+
+Results below:
+
+.. image:: images/loop_third_party_kapture_results.png
+    :width: 1000
+    :align: center
+
+As we can see above, (looking at the highlighted section of the red bar) it spends most of the time in the call to ::
+
+    S.get_parcel(parcel_def)
+
+As there are over two million columns in the data, we would greatly benefit
+from parallelising this work.
+
+2.4 Parallelising with Dask Bags
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Dask bags are collections of Python objects that you can map a computation over
+in a parallel manner.
+
+For more information about dask bags, see the `Dask Bag Documentation
+<https://docs.dask.org/en/latest/bag.html>`_.
+
+Dask bags work best with lightweight objects, so we will create a collection of
+indices into our data arrays.
+
+First, we put the loop into a function that takes a slice object to index the
+appropriate section of the array.::
+
+    def calculate_sounding(y_slice):
+        for y in range(y_slice.stop-y_slice.start):
+            for x in range(nlim):
+                mydata = {'pres': pressure[:, y_slice][:, y, x],
+                          'temp': temperature[:, y_slice][:, y, x],
+                          'dwpt': dewpoint[:, y_slice][:, y, x],
+                          'hght': height[:, y_slice][:, y, x]}
+
+                # Calculate the sounding with the selected column of data.
+                S = sk.Sounding(soundingdata=mydata)
+                try:
+                    startp, startt, startdp, type_ = S.get_parcel(parcel_def)
+                    P_lcl, P_lfc, P_el, CAPE, CIN = S.get_cape(
+                        startp, startt, startdp, totalcape=total_cape)
+                    TPW = S.precipitable_water()
+                except:
+                    P_lcl, P_lfc, P_el, CAPE, CIN, TPW = [
+                        np.ma.masked for _ in range(6)]
+
+                # Fill the output arrays with the results
+                cape[:, y_slice][y,x] = CAPE
+                cin[:, y_slice][y,x] = CIN
+                lcl[:, y_slice][y,x] = P_lcl
+                lfc[:, y_slice][y,x] = P_lfc
+                el[:, y_slice][y,x] = P_el
+                tpw[:, y_slice][y,x] = TPW
+
+Then we create a dask bag of slice objects that will create multiple partitions
+along the y axis.::
+
+    num_of_workers = 4
+    len_of_y_axis = pressure.shape[1]
+
+    part_loc = [int(loc) for loc in np.floor(np.linspace(0, len_of_y_axis,
+                                                         num_of_workers + 1))]
+
+    dask_bag = db.from_sequence(
+        [slice(part_loc[i], part_loc[i+1]) for i in range(num_of_workers)])
+
+    with dask.config.set(scheduler='processes'):
+        dask_bag.map(calculate_sounding).compute()
+
+When this was run on a machine with 4 workers, a speedup of ~4x was achieved,
+as expected.
+
+Note that if using the processes scheduler this is some extra time spent
+serialising the data to pass it between workers. For more information on the
+different schedulers available in Dask, see `Dask Scheduler Overview
+<https://docs.dask.org/en/stable/scheduler-overview.html>`_.
+
+For more speed up, it is possible to run the same code on a multi-processing
+system where you will have access to more CPUs.
+
+In this particular example, we are handling multiple numpy arrays and so we use
+dask bags. If working with a single numpy array, it may be more appropriate to
+use Dask Arrays (see `Dask Arrays
+<https://docs.dask.org/en/stable/array.html>`_ for more information).
+
+
+2.5 Lessons
+^^^^^^^^^^^
+* If possible, dask bags should contain lightweight objects
+* Minimise the number of tasks that are created
diff --git a/docs/src/further_topics/dask_best_practices/dask_pp_to_netcdf.rst b/docs/src/further_topics/dask_best_practices/dask_pp_to_netcdf.rst
new file mode 100644
index 0000000000..28784154b4
--- /dev/null
+++ b/docs/src/further_topics/dask_best_practices/dask_pp_to_netcdf.rst
@@ -0,0 +1,92 @@
+.. _examples_pp_to_ff:
+
+1. Speed up Converting PP Files to NetCDF
+-----------------------------------------
+
+Here is an example of how dask objects can be tuned for better performance.
+
+1.1 The Problem - Slow Saving
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+We have ~300 PP files which we load as follows:
+
+.. code-block:: python
+
+    import iris
+    import glob
+
+    files = glob.glob("pp_files/*.pp")
+    cube = iris.load_cube(files, "mass_fraction_of_ozone_in_air")
+
+Note that loading here may also be parallelised in a similar manner as
+described in :ref:`examples_bags_greed`. Either way, the resulting cube looks
+as follows:
+
+.. code-block:: text
+
+    mass_fraction_of_ozone_in_air / (kg kg-1) (time: 276; model_level_number: 85; latitude: 144; longitude: 192)
+         Dimension coordinates:
+              time                                 x                        -             -               -
+              model_level_number                   -                        x             -               -
+              latitude                             -                        -             x               -
+              longitude                            -                        -             -               x
+         Auxiliary coordinates:
+              forecast_period                      x                        -             -               -
+              level_height                         -                        x             -               -
+              sigma                                -                        x             -               -
+         Scalar coordinates:
+              forecast_reference_time: 1850-01-01 00:00:00
+         Attributes:
+              STASH: m01s34i001
+              source: Data from Met Office Unified Model
+              um_version: 10.9
+         Cell methods:
+              mean: time (1 hour)
+
+The cube is then immediately saved as a netCDF file.
+
+.. code-block:: python
+
+    nc_chunks = [chunk[0] for chunk in cube.lazy_data().chunks]
+    iris.save(cube, "outfile.nc", nc_chunks)
+
+This operation was taking longer than expected and we would like to improve
+performance. Note that when this cube is being saved, the data is still lazy,
+data is both read and written at the saving step and is done so in chunks.
+The way this data is divided into chunks can affect performance. By tweaking
+the way these chunks are structured it may be possible to improve performance
+when saving.
+
+
+.. _dask_rechunking:
+
+1.2 Rechunking
+^^^^^^^^^^^^^^
+We may inspect the cube's lazy data before saving:
+
+.. code-block:: python
+
+    # We can access the cubes Dask array
+    lazy_data = cube.lazy_data()
+    # We can find the shape of the chunks
+    # Note that the chunksize of a Dask array is the shape of the chunk
+    # as a tuple.
+    print(lazy_data.chunksize)
+
+Doing so, we find that the chunks currently have the shape::
+
+(1, 1, 144, 192)
+
+This is significantly smaller than the `size which Dask recommends
+<https://docs.dask.org/en/latest/array-chunks.html>`_. Bear in mind that the
+ideal chunk size depends on the platform you are running on (for this example,
+the code is being run on a desktop with 8 CPUs). In this case, we have 23460
+small chunks. We can reduce the number of chunks by rechunking before saving:
+
+.. code-block:: python
+
+    lazy_data = cube.lazy_data()
+    lazy_data = lazy_data.rechunk(1, 85, 144, 192)
+    cube.data = lazy_data
+
+We now have 276 moderately sized chunks. When we try saving again, we find
+that it is approximately 4 times faster, saving in 2m13s rather than 10m33s.
diff --git a/docs/src/further_topics/dask_best_practices/images/grib-bottleneck.png b/docs/src/further_topics/dask_best_practices/images/grib-bottleneck.png
new file mode 100644
index 0000000000..c029d57e5e
Binary files /dev/null and b/docs/src/further_topics/dask_best_practices/images/grib-bottleneck.png differ
diff --git a/docs/src/further_topics/dask_best_practices/images/loop_third_party_kapture_results.png b/docs/src/further_topics/dask_best_practices/images/loop_third_party_kapture_results.png
new file mode 100644
index 0000000000..8f388bb89c
Binary files /dev/null and b/docs/src/further_topics/dask_best_practices/images/loop_third_party_kapture_results.png differ
diff --git a/docs/src/further_topics/dask_best_practices/index.rst b/docs/src/further_topics/dask_best_practices/index.rst
new file mode 100644
index 0000000000..eb3321345b
--- /dev/null
+++ b/docs/src/further_topics/dask_best_practices/index.rst
@@ -0,0 +1,221 @@
+.. _dask best practices:
+
+Dask Best Practices
+*******************
+
+This section outlines some of the best practices when using Dask with Iris. These
+practices involve improving performance through rechunking, making the best use of
+computing clusters and avoiding parallelisation conflicts between Dask and NumPy.
+
+
+.. note::
+
+    Here, we have collated advice and a handful of examples, from the topics most
+    relevant when using Dask with Iris, that we hope will assist users to make
+    the best start when using Dask. It is *not* a fully comprehensive guide
+    encompassing all best practices. You can find more general dask information in the
+    `official Dask Documentation <https://docs.dask.org/en/stable/>`_.
+
+
+Introduction
+============
+
+`Dask <https://dask.org/>`_ is a powerful tool for speeding up data handling
+via lazy loading and parallel processing. To get the full benefit of using
+Dask, it is important to configure it correctly and supply it with
+appropriately structured data. For example, we may need to "chunk" data arrays
+into smaller pieces to process, read and write it; getting the "chunking" right
+can make a significant different to performance!
+
+
+.. _numpy_threads:
+
+NumPy Threads
+=============
+
+In certain scenarios NumPy will attempt to perform threading using an
+external library - typically OMP, MKL or openBLAS - making use of **every**
+CPU available. This interacts badly with Dask:
+
+* Dask may create multiple instances of NumPy, each generating enough
+  threads to use **all** the available CPUs. The resulting sharing of CPUs
+  between threads greatly reduces performance. The more cores there are, the
+  more pronounced this problem is.
+* NumPy will generate enough threads to use all available CPUs even
+  if Dask is deliberately configured to only use a subset of CPUs. The
+  resulting sharing of CPUs between threads greatly reduces performance.
+* `Dask is already designed to parallelise with NumPy arrays <https://docs
+  .dask.org/en/latest/array.html>`_, so adding NumPy's 'competing' layer of
+  parallelisation could cause unpredictable performance.
+
+Therefore it is best to prevent NumPy performing its own parallelisation, `a
+suggestion made in Dask's own documentation <https://docs.dask
+.org/en/stable/array-best-practices.html#avoid-oversubscribing-threads>`_.
+The following commands will ensure this in all scenarios:
+
+in Python...
+
+::
+
+    # Must be run before importing NumPy.
+    import os
+    os.environ["OMP_NUM_THREADS"] = "1"
+    os.environ["OPENBLAS_NUM_THREADS"] = "1"
+    os.environ["MKL_NUM_THREADS"] = "1"
+    os.environ["VECLIB_MAXIMUM_THREADS"] = "1"
+    os.environ["NUMEXPR_NUM_THREADS"] = "1"
+
+or in Linux command line...
+
+::
+
+    export OMP_NUM_THREADS=1
+    export OPENBLAS_NUM_THREADS=1
+    export MKL_NUM_THREADS=1
+    export VECLIB_MAXIMUM_THREADS=1
+    export NUMEXPR_NUM_THREADS=1
+
+
+.. _multi-pro_systems:
+
+Dask on Computing Clusters
+==========================
+
+Dask is well suited for use on computing clusters, but there are some important factors you must be
+aware of. In particular, you will always need to explicitly control parallel
+operation, both in Dask and likewise in NumPy.
+
+
+.. _multi-pro_slurm:
+
+CPU Allocation
+--------------
+
+When running on a computing cluster, unless configured otherwise, Dask will attempt to create
+one parallel 'worker' task for each CPU. However, when using a job scheduler such as Slurm, only *some* of
+these CPUs are actually accessible -- often, and by default, only one. This leads to a serious
+over-commitment unless it is controlled.
+
+So, **whenever Iris is used on a computing cluster, you must always control the number
+of dask workers to a sensible value**, matching the slurm allocation.  You do
+this with::
+
+    dask.config.set(num_workers=N)
+
+For an example, see :doc:`dask_bags_and_greed`.
+
+Alternatively, when there is only one CPU allocated, it may actually be more
+efficient to use a "synchronous" scheduler instead, with::
+
+    dask.config.set(scheduler='synchronous')
+
+See the Dask documentation on `Single thread synchronous scheduler
+<https://docs.dask.org/en/latest/scheduling.html?highlight=single-threaded#single-thread>`_.
+
+
+.. _multi-pro_numpy:
+
+NumPy Threading
+---------------
+
+NumPy also interrogates the visible number of CPUs to multi-thread its operations.
+The large number of CPUs available in a computing cluster will thus cause confusion if NumPy
+attempts its own parallelisation, so this must be prevented. Refer back to
+:ref:`numpy_threads` for more detail.
+
+
+Distributed
+-----------
+
+Even though allocations on a computing cluster are generally restricted to a single node, there
+are still good reasons for using 'dask.distributed' in many cases. See `Single Machine: dask.distributed
+<https://docs.dask.org/en/latest/setup/single-distributed.html>`_ in the Dask documentation.
+
+
+Chunking
+========
+
+Dask breaks down large data arrays into chunks, allowing efficient
+parallelisation by processing several smaller chunks simultaneously. For more
+information, see the documentation on
+`Dask Array <https://docs.dask.org/en/latest/array.html>`_.
+
+Iris provides a basic chunking shape to Dask, attempting to set the shape for
+best performance. The chunking that is used can depend on the file format that
+is being loaded. See below for how chunking is performed for:
+
+  * :ref:`chunking_netcdf`
+  * :ref:`chunking_pp_ff`
+
+It can in some cases be beneficial to re-chunk the arrays in Iris cubes.
+For information on how to do this, see :ref:`dask_rechunking`.
+
+
+.. _chunking_netcdf:
+
+NetCDF Files
+------------
+
+NetCDF files can include their own chunking specification. This is either
+specified when creating the file, or is automatically assigned if one or
+more of the dimensions is `unlimited <https://www.unidata.ucar
+.edu/software/netcdf/docs/unlimited_dims.html>`_.
+Importantly, netCDF chunk shapes are **not optimised for Dask
+performance**.
+
+Chunking can be set independently for any variable in a netCDF file.
+When a netCDF variable uses an unlimited dimension, it is automatically
+chunked: the chunking is the shape of the whole variable, but with '1' instead
+of the length in any unlimited dimensions.
+
+When chunking is specified for netCDF data, Iris will set the dask chunking
+to an integer multiple or fraction of that shape, such that the data size is
+near to but not exceeding the dask array chunk size.
+
+
+.. _chunking_pp_ff:
+
+PP and Fieldsfiles
+------------------
+
+PP and Fieldsfiles contain multiple 2D fields of data. When loading PP or
+Fieldsfiles into Iris cubes, the chunking will automatically be set to a chunk
+per field.
+
+For example, if a PP file contains 2D lat-lon fields for each of the
+85 model level numbers, it will load in a cube that looks as follows::
+
+    (model_level_number: 85; latitude: 144; longitude: 192)
+
+The data in this cube will be partitioned with chunks of shape
+:code:`(1, 144, 192)`.
+
+If the file(s) being loaded contain multiple fields, this can lead to an
+excessive amount of chunks which will result in poor performance.
+
+When the default chunking is not appropriate, it is possible to rechunk.
+:doc:`dask_pp_to_netcdf` provides a detailed demonstration of how Dask can optimise
+that process.
+
+
+Examples
+========
+
+We have written some examples of use cases for using Dask, that come with advice and
+explanations for why and how the tasks are performed the way they are.
+
+If you feel you have an example of a Dask best practice that you think may be helpful to others,
+please share them with us by raising a new `discussion on the Iris repository <https://github.com/SciTools/iris
+/discussions/>`_.
+
+  * :doc:`dask_pp_to_netcdf`
+  * :doc:`dask_parallel_loop`
+  * :doc:`dask_bags_and_greed`
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+
+   dask_pp_to_netcdf
+   dask_parallel_loop
+   dask_bags_and_greed
diff --git a/docs/src/userguide/index.rst b/docs/src/userguide/index.rst
index fdd0c4d03e..771aa450a3 100644
--- a/docs/src/userguide/index.rst
+++ b/docs/src/userguide/index.rst
@@ -45,4 +45,5 @@ they may serve as a useful reference for future exploration.
    ../further_topics/metadata
    ../further_topics/lenient_metadata
    ../further_topics/lenient_maths
+   ../further_topics/dask_best_practices/index
    ../further_topics/ugrid/index
diff --git a/docs/src/whatsnew/latest.rst b/docs/src/whatsnew/latest.rst
index 0e2896b7a1..9b62715be6 100644
--- a/docs/src/whatsnew/latest.rst
+++ b/docs/src/whatsnew/latest.rst
@@ -71,6 +71,10 @@ This document explains the changes made to Iris for this release
    to use it.  By default the theme will be based on the users system settings,
    defaulting to ``light`` if no system setting is found.  (:pull:`5299`)
 
+#. `@HGWright`_ added a :doc:`/further_topics/dask_best_practices/index`
+   section into the user guide, containing advice and use cases to help users
+   get the best out of Dask with Iris.
+
 
 💼 Internal
 ===========