-
Notifications
You must be signed in to change notification settings - Fork 300
Stop PPDataProxy accessing the file when no data is needed. #3659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
trexfeathers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pp-mo, I follow the logic of all this. There are a couple of changes to address 🙂
|
Hi @trexfeathers sorry for the delay on this -- I forgot to push it up ! Following offline discussion, I've now reduced the scope of the behaviour, in favour of simpler code that only supports the exact current need (e.g. it no longer catches "[1:1]" as an empty slice case) However, I've also rather increased the complexity of the testing. As usual, it's tough to be comprehensive because numpy has so many indexing styles.
I'm just not considering those as possibilities. |
|
Note: when we have agreed on an implementation here, we should also adopt it in iris-grib. |
docs/iris/src/whatsnew/contributions_2.4.1/bugfix_2020-Feb-14_pp_emptyslices.txt
Show resolved
Hide resolved
trexfeathers
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some comments @pp-mo, although as we discussed you will be abstracting this code to be outside the PPDataProxy specifically, so that it can also serve for GRIB files too
|
Thanks @trexfeathers |
|
All looks great, great idea with the abstraction into util 👏 |
* Stop PPDataProxy accessing the file when no data is needed. (#3659) * Add 2.4 whatsnew into full whatsnew list. Co-authored-by: Martin Yeo <[email protected]>
* Fixed tests since Numpy 1.18 deprecation of non-int num arguments for linspace. (#3655) * remove redundant tests (#3650) * Remove obsolete test. (#3662) * Remove grib-specific test. (#3663) * Removed ununused skipIf. (#3632) * Remove test_grib_save_rules.py which has been moved to iris-grib (#3666) * 2v4 mergeback picks (#3668) * Stop PPDataProxy accessing the file when no data is needed. (#3659) * Add 2.4 whatsnew into full whatsnew list. Co-authored-by: Martin Yeo <[email protected]> * Remove uri callback test which is moved to iris-grib (#3665) * Remove test_grib2 integration tests (#3664) * Remove test_grib_save.py (#3669) * Remove cube iter (#3656) * Fixed asv project name to 'scitools-iris'. (#3660) * Removed grib-specific test to iris-grib. (#3671) * Removed iris.tests.integration.test_grib_load and related CML files. (#3670) * Remove TestGribMessage (#3672) * Switched use of datetime.weekday() to datetime.dayofwk. (#3687) * New image hashes for mpl 3x2 (#3682) * New image hash for iris.test.test_plot.TestSymbols.test_cloud_cover with matplotlib 3.2.0. * Further images changes for mpl3x2. * Yet more updated image results. * Correct and improve dev-guide section on fixing graphics-tests. (#3683) * Correct and improve dev-guide section on fixing graphics-tests. * Review changes + general rethink. * Reduce duplication between 'graphics-tests' and general 'tests' page. * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <[email protected]> * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <[email protected]> * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <[email protected]> * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <[email protected]> * Update docs/iris/src/developers_guide/graphics_tests.rst Co-Authored-By: Martin Yeo <[email protected]> Co-authored-by: Martin Yeo <[email protected]> Co-authored-by: Martin Yeo <[email protected]> Co-authored-by: Bill Little <[email protected]> Co-authored-by: lbdreyer <[email protected]> Co-authored-by: Jon Seddon <[email protected]>
* Stop PPDataProxy accessing the file when no data is needed. (SciTools#3659) * Add 2.4 whatsnew into full whatsnew list. Co-authored-by: Martin Yeo <[email protected]>
We just found that PP and Fieldsfile loading has been slow and heavy on memory, apparently since Dask 2.0.
In these cases, the PPDataProxy does not actually use the file data, so this fix allows it to return a suitable empty array object of the expected type and shape, without reading the file.
The reason appears to be that
dask.array.from_array(arraylike)now extracts an empty slice of the argument, likearraylike[0:0, 0:0...].Note, this is even worse than it sounds :
See this line in
meta_from_array, called from here inArray.__new__, as used infrom_array.Speculation : I think dask is doing this to capture the metadata of a slice from the argument, which it presumably believes can be different from that of the argument itself. I'm guessing this is needed for generalised array types e.g. dask, CUDA