Slicing and interpolating tiles as accurately as possible #167

weiji14 · 2019-08-29T06:12:15Z

Split off the the slow REMA gapfilling task from the bulky data_prep.selective_tile script, adhering to DRY principles (don't have a with in a with statement reading from a raster...). The lighter function will give us more room to play with parallel code, potentially making our tiling code perform faster, squeezing out more precision without worrying about speed!

TODO:

Deprecate selective_tile_old and ability to gapfill using another raster (690c365)
Refactor selective_tiling to be faster and even more precise
Write the one off script to gapfill 100m REMA with 200m filler version
Create new training tiles once more, consider resampling MEaSUREs Ice Velocity back to 500m

Setting fire to all that code to gapfill a raster with another raster as it's very messy, and we only need it for REMA now since we are no longer gapfilling MEaSUREs with #165 merged in. Still keeping the option to gapfill with a single floating point number, but we're removing the selective_tile_old function that has sat aside selective_tile since aac21fb in #156 of v0.9.2. Temporarily using a bilinear resampled 200m REMA in deepbedmap.ipynb. Will follow up with code to produce a gapfilled 100m resolution REMA geotiff!

review-notebook-app · 2019-08-29T06:12:21Z

Check out this pull request on ReviewNB: https://app.reviewnb.com/weiji14/deepbedmap/pull/167

You'll be able to see notebook diffs and discuss changes. Powered by ReviewNB.

With great precision comes great need for optimization. Forcing our data_prep.selective_tile function to precisely bilinear interpolate the Z value down to every XY point, instead of 'slicing' that might still have sub-pixel discrepancies. Extends the work done in 7fd3345. The out_shape option is replaced with a 'resolution' setting now, and the geotiff/netcdf files are loaded in as a dask array by default now. Yes, this slows things down by a fair bit, but we've wrapped the computationally heavy interpolation and masking methods using dask.delayed, so that tasks can be gathered up and processed in one go (if I can get the right scheduler settings...). Technically the code should be able to run nicely in parallel now, but GeoTIFFs aren't exactly built for parallel reads and big ol' REMA is a RAM-hungry beast so we're stopping short here of using dask.distributed.

Write script for gapfilling the 100m REMA Ice Surface Elevation DEM with the seamless 200m version (bilinear interpolated), i.e. a more proper version of cd72a30 of #64. Based on the old selective_tiling code's gapfill_raster section that we deprecated in 690c365. I've experimented with alternative methods such as making a virtual (.vrt) GeoTIFF, mosaicking using pure GDAL and rasterio's merge tool, even considered GMT's grdblend, but nothing really merges the two together (with REMA_100 as highest priority, then REMA_200) properly the way I want it, in a reasonable-ish amount of time. Might be good to actually output this homemade 100m gapfilled REMA to a Cloud-Optimized GeoTIFF, NetCDF or Zarr, but we'll stick to good ol' GeoTIFF for now, even if it is 9.9GB.

ghost · 2019-08-30T11:11:07Z

DeepCode Report (#1e7f69)

DeepCode analyzed this pull request.
There is 1 new info report.

Keeping things DRY again, moving the save_array_to_grid function from deepbedmap.ipynb to data_prep.ipynb since we're using it to write the REMA_100_dem_filled GeoTIFF. Made it into a more reusable function, though there's a lot of nasty hardcoded default settings specific to DeepBedMap (who uses a -2000m value for NaN?!) and far too many options that probably should be turned into kwargs. It now takes in an ndim=3 array in CHW format instead of NCHW as before, has optional saving to NetCDF, and defaults to creating a BigTIFF! It was really just the GeoTIFF compression I wanted, because Quilt wasn't accepting the >9GB REMA_100m_dem_filled.tif file. Using LZW compression, the filesize is now down to about 4.7GB, an we are trading off size for speed here, i.e. reading from this compressed REMA GeoTIFF can be significantly slower than the original uncompressed version. Preferrably would have used ZSTD compression (see https://gis.stackexchange.com/questions/1104/should-gdal-be-set-to-produce-geotiff-files-with-compression-which-algorithm-sh/333578#333578), if only the rasterio wheels actually had it (see rasterio/rasterio-wheels#23). . Was also doing some detective work on why rasterio has GDAL 2.4.1 whereas our conda version is 2.4.2, might need to set a GDAL_DRIVER_PATH?

Bringing back the 1km padding to resolve lousy predictions at the border areas, and resample the MEaSUREs Ice Velocity dataset from 450m to 500m resolution again, sigh. Basically reverting 7cd28af, but with a bit of a twist. Also using the new REMA_100m_dem_filled.tif we've recently created. Quilt packages reuploaded, now with hash e9494ecd4b4fe1bc1f8b28d4d73a287d093c83dc76eb173822b7e90753d92f27.

Sorry, didn't upload the correct BEDMAP2 (X_data) tiles in my haste yesterday. Properly reuploading those padded tiles with shape (3379,1,11,11) instead of (3379,1,9,9), patching 50ddb5e. Correct quilt data hash to use now is d092c5c00e9b6ceaac3a3bf431dd070f0a2809ef4f552e55faa9d01c1d5dd270.

Towards even better pixel registration at the expense of more computing cost! Patches #150 properly. The xyz_to_grid function has been giving us gridline node registered grids since forever, because that's `gmt surface`'s default setting. Now converting it to pixel node registration to align better with rasterio and other geospatial packages. Also using simple slicing for our high resolution tiles to avoid interpolating to NaNs at the edges! Total tile counts have increased from 3379 to 4028, O.O, mostly from the Basler grid around the Siple Coast region. New training tiles updated on Quilt with hash af86cf135ffe5ed9f78fc65231e3aa0bfc90f45e33b6bccda9f08f392c090113. Though `surface` does have an `-r` setting to set pixel node registration directly, I looked at it and it gave strange diagonal strips for 2007tx.nc, so nope, we'll use `grdsample` instead, the only fault being we lose data lineage/provenance when checking the grid using `gmt grdinfo`. Hopefully the half pixel ghost doesn't haunt us ever again. Also note to self, to write up wrappers for `gmt blockmedian` and `gmt grdsample` to reduce boilerplate code in xyz_to_grid function. Reason for doing this was because the more precise selective_tile script (since e7936db) was so exact, that it realized our bounding boxes was outside of the (gridline registered) groundtruth grids and gave us NULL values! This problem didn't surface itself until I was trying to train the neural network, and was strangely getting NaN values in the discriminator loss after just one epoch, no matter what hyperparameters I used.

In order to get those border predictions better, we are very carefully reverting some aspects of 75b7493. Specifically, we're going for valid padding in our ESRGAN model's input block convolutions and directly using the MEaSUREs Ice Velocity we've resampled back to 500m in 50ddb5e instead of resampling on the fly. Keeping the 4km kernel/filters though! The amazing part is that the parameter counts all stay the same, +1 for efficiency of convolutions! Also made small adjustments to the hardcoded Topographic Loss Mean Absolute Error function in d599ee8. Test dataset reuploaded, so again we have a new quilt hash to use - e11988479975a091dd52e44b142370c37a03409f41cb6fec54fd7382ee1f99bc.

Closes #167 Slicing and interpolating tiles as accurately as possible.

weiji14 added bug 🪲 Something isn't working enhancement ✨ New feature or request data 🗃️ Pull requests that update input datasets labels Aug 29, 2019

weiji14 self-assigned this Aug 29, 2019

weiji14 added 2 commits August 30, 2019 17:45

weiji14 added 4 commits August 31, 2019 17:09

weiji14 changed the title ~~Make tiling more efficient using parallelization~~ Slicing and interpolating tiles as accurately as possible Sep 2, 2019

weiji14 force-pushed the refactor/selective_tile branch from 59ea059 to 1e7f699 Compare September 3, 2019 02:29

weiji14 marked this pull request as ready for review September 3, 2019 02:39

weiji14 added a commit that referenced this pull request Sep 3, 2019

🔀 Merge branch 'refactor/selective_tile' into enh/revise_deepbedmap

d84437d

Closes #167 Slicing and interpolating tiles as accurately as possible.

weiji14 merged commit 1e7f699 into enh/revise_deepbedmap Sep 3, 2019

weiji14 deleted the refactor/selective_tile branch September 3, 2019 02:47

weiji14 mentioned this pull request Sep 14, 2019

Giving DeepBedMap the bigger picture #166

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slicing and interpolating tiles as accurately as possible #167

Slicing and interpolating tiles as accurately as possible #167

Uh oh!

weiji14 commented Aug 29, 2019 •

edited

Loading

Uh oh!

review-notebook-app bot commented Aug 29, 2019

Uh oh!

ghost commented Aug 30, 2019 •

edited by ghost

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Slicing and interpolating tiles as accurately as possible #167

Slicing and interpolating tiles as accurately as possible #167

Uh oh!

Conversation

weiji14 commented Aug 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Aug 29, 2019

Uh oh!

ghost commented Aug 30, 2019 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DeepCode Report (#1e7f69)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weiji14 commented Aug 29, 2019 •

edited

Loading

ghost commented Aug 30, 2019 •

edited by ghost

Loading