Skip to content

Conversation

@wjbenfold
Copy link
Contributor

@wjbenfold wjbenfold commented Nov 5, 2021

🚀 Pull Request

Description

Aiming to speed up the regridding of chunked data with iris.analysis.AreaWeighted regridder

Moving more weight construction and construction of indices to be applied to the regridded cube all into the prepare step of the regridder, rather than the perform step (which gets repeated on chunked data)

See #4280 for context

To do:

  • Check benchmark performance
  • Consider additional testing and benchmarking
  • Explain here which lines have just had indentation changes rather than being rewritten
  • Add to whatsnew

Consult Iris pull request check list

@wjbenfold wjbenfold marked this pull request as draft November 5, 2021 13:58
@wjbenfold
Copy link
Contributor Author

More details on the changes in lib/iris/experimental/regrid.py

_regrid_area_weighted_array

The nested for loop in _regrid_area_weighted_array is no longer present - instead the arguments of this function now include index_info which allows the src_data to be indexed to generate the src_area_datas (with an additional "masking" process to account for some values needing to be 0).

The logic around src_masked is broadly the same as before, but now much more condensed rather than spread over the for loops.

_regrid_area_weighted_rectilinear_src_and_grid__prepare

In _regrid_area_weighted_rectilinear_src_and_grid__prepare we now, in addition to forming the weights, form the indices that will be passed to _regrid_area_weighted_array. This includes a nested for loop (as I can't see a straightforward way to get rid of it) but this now only happens once and is only doing the quick building of indices rather than any indexing.

I have also removed the dynamic definition of a function to do some of the work within this function as it seemed pointless, though I can put it back to reduce the diff if that's helpful.

@wjbenfold wjbenfold marked this pull request as ready for review November 10, 2021 13:51
Copy link
Contributor

@stephenworsley stephenworsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, just a couple comments for the time being.

@wjbenfold wjbenfold force-pushed the wjbenfold-glosea-regrid-slowness branch from 71de834 to 60fbd23 Compare November 11, 2021 16:32
@jamesp
Copy link
Member

jamesp commented Nov 11, 2021

From the benchmarking CI:

All benchmarks:

       before           after         ratio
     [2dda3a93]       [cfce454f]
     <_base>                    
              ...
        10.2±0.1s          619±8ms    ~0.06  regridding.HorizontalChunkedRegridding.time_regrid_area_w

😍

@rcomer
Copy link
Member

rcomer commented Nov 11, 2021

Am I reading this right? GloSea regridding will potentially run ~20 times faster?

@wjbenfold
Copy link
Contributor Author

When they pass in lazy data, yes @rcomer (I believe they're currently doing similar regrids by first realising the data as a workaround for this having previously been so slow)

@rcomer
Copy link
Member

rcomer commented Nov 11, 2021

That's awesome, thanks @wjbenfold! 🤩

@wjbenfold wjbenfold force-pushed the wjbenfold-glosea-regrid-slowness branch from 5816898 to 31b05a5 Compare November 11, 2021 17:40
Copy link
Contributor

@stephenworsley stephenworsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@jamesp
Copy link
Member

jamesp commented Nov 17, 2021

Well done @wjbenfold and @stephenworsley, really great to see this massive improvement in performance 💨

@wjbenfold wjbenfold force-pushed the wjbenfold-glosea-regrid-slowness branch from a2f7b7f to b60b6ad Compare November 17, 2021 11:14
wjbenfold pushed a commit to wjbenfold/iris that referenced this pull request Nov 17, 2021
wjbenfold pushed a commit to wjbenfold/iris that referenced this pull request Nov 17, 2021
@stephenworsley stephenworsley merged commit 9ba2add into SciTools:main Nov 17, 2021
@rcomer
Copy link
Member

rcomer commented Nov 17, 2021

On behalf of Team GloSea, thanks for all your work on this @wjbenfold @stephenworsley .

@wjbenfold wjbenfold deleted the wjbenfold-glosea-regrid-slowness branch November 17, 2021 11:28
stephenworsley pushed a commit that referenced this pull request Nov 17, 2021
tkknight added a commit to tkknight/iris that referenced this pull request Dec 7, 2021
* main: (23 commits)
  Suggest type hinting (SciTools#4390)
  area weight regrid test fixes (SciTools#4432)
  Update latest.rst (SciTools#4425)
  Added @wjbenfold to the core dev list (SciTools#4423)
  Removed addition of period from wrap_lons. (SciTools#4421)
  Add release docs sections describing the role of a Release Manager (SciTools#4413)
  Subset should always return None if no value matches are found (SciTools#4417)
  What's new for SciTools#4400 (SciTools#4422)
  `iris.analysis.AreaWeighted` regrid speedup (SciTools#4400)
  [pre-commit.ci] pre-commit autoupdate (SciTools#4419)
  Remove newline to satisfy setuptools (SciTools#4418)
  Updated environment lockfiles (SciTools#4416)
  NAME loader fixes (SciTools#4411)
  Updated whatsnew for PR 4402 (SciTools#4410)
  Support test data in benchmark workflows (SciTools#4402)
  What's new for pr 4387 (SciTools#4405)
  Make concat mismatch warning for scalar coords more accurate (SciTools#4387)
  Added line to latest release notes for updates to pp_save_rules.py (SciTools#4404)
  Update pp_save_rules.py (SciTools#4391)
  [pre-commit.ci] pre-commit autoupdate (SciTools#4403)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants