-
-
Notifications
You must be signed in to change notification settings - Fork 26
Slicing and interpolating tiles as accurately as possible #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Setting fire to all that code to gapfill a raster with another raster as it's very messy, and we only need it for REMA now since we are no longer gapfilling MEaSUREs with #165 merged in. Still keeping the option to gapfill with a single floating point number, but we're removing the selective_tile_old function that has sat aside selective_tile since aac21fb in #156 of v0.9.2. Temporarily using a bilinear resampled 200m REMA in deepbedmap.ipynb. Will follow up with code to produce a gapfilled 100m resolution REMA geotiff!
|
Check out this pull request on ReviewNB: https://app.reviewnb.com/weiji14/deepbedmap/pull/167 You'll be able to see notebook diffs and discuss changes. Powered by ReviewNB. |
With great precision comes great need for optimization. Forcing our data_prep.selective_tile function to precisely bilinear interpolate the Z value down to every XY point, instead of 'slicing' that might still have sub-pixel discrepancies. Extends the work done in 7fd3345. The out_shape option is replaced with a 'resolution' setting now, and the geotiff/netcdf files are loaded in as a dask array by default now. Yes, this slows things down by a fair bit, but we've wrapped the computationally heavy interpolation and masking methods using dask.delayed, so that tasks can be gathered up and processed in one go (if I can get the right scheduler settings...). Technically the code should be able to run nicely in parallel now, but GeoTIFFs aren't exactly built for parallel reads and big ol' REMA is a RAM-hungry beast so we're stopping short here of using dask.distributed.
Write script for gapfilling the 100m REMA Ice Surface Elevation DEM with the seamless 200m version (bilinear interpolated), i.e. a more proper version of cd72a30 of #64. Based on the old selective_tiling code's gapfill_raster section that we deprecated in 690c365. I've experimented with alternative methods such as making a virtual (.vrt) GeoTIFF, mosaicking using pure GDAL and rasterio's merge tool, even considered GMT's grdblend, but nothing really merges the two together (with REMA_100 as highest priority, then REMA_200) properly the way I want it, in a reasonable-ish amount of time. Might be good to actually output this homemade 100m gapfilled REMA to a Cloud-Optimized GeoTIFF, NetCDF or Zarr, but we'll stick to good ol' GeoTIFF for now, even if it is 9.9GB.
DeepCode Report (#1e7f69)DeepCode analyzed this pull request. |
Keeping things DRY again, moving the save_array_to_grid function from deepbedmap.ipynb to data_prep.ipynb since we're using it to write the REMA_100_dem_filled GeoTIFF. Made it into a more reusable function, though there's a lot of nasty hardcoded default settings specific to DeepBedMap (who uses a -2000m value for NaN?!) and far too many options that probably should be turned into kwargs. It now takes in an ndim=3 array in CHW format instead of NCHW as before, has optional saving to NetCDF, and defaults to creating a BigTIFF! It was really just the GeoTIFF compression I wanted, because Quilt wasn't accepting the >9GB REMA_100m_dem_filled.tif file. Using LZW compression, the filesize is now down to about 4.7GB, an we are trading off size for speed here, i.e. reading from this compressed REMA GeoTIFF can be significantly slower than the original uncompressed version. Preferrably would have used ZSTD compression (see https://gis.stackexchange.com/questions/1104/should-gdal-be-set-to-produce-geotiff-files-with-compression-which-algorithm-sh/333578#333578), if only the rasterio wheels actually had it (see rasterio/rasterio-wheels#23). . Was also doing some detective work on why rasterio has GDAL 2.4.1 whereas our conda version is 2.4.2, might need to set a GDAL_DRIVER_PATH?
Bringing back the 1km padding to resolve lousy predictions at the border areas, and resample the MEaSUREs Ice Velocity dataset from 450m to 500m resolution again, sigh. Basically reverting 7cd28af, but with a bit of a twist. Also using the new REMA_100m_dem_filled.tif we've recently created. Quilt packages reuploaded, now with hash e9494ecd4b4fe1bc1f8b28d4d73a287d093c83dc76eb173822b7e90753d92f27.
Sorry, didn't upload the correct BEDMAP2 (X_data) tiles in my haste yesterday. Properly reuploading those padded tiles with shape (3379,1,11,11) instead of (3379,1,9,9), patching 50ddb5e. Correct quilt data hash to use now is d092c5c00e9b6ceaac3a3bf431dd070f0a2809ef4f552e55faa9d01c1d5dd270.
Towards even better pixel registration at the expense of more computing cost! Patches #150 properly. The xyz_to_grid function has been giving us gridline node registered grids since forever, because that's `gmt surface`'s default setting. Now converting it to pixel node registration to align better with rasterio and other geospatial packages. Also using simple slicing for our high resolution tiles to avoid interpolating to NaNs at the edges! Total tile counts have increased from 3379 to 4028, O.O, mostly from the Basler grid around the Siple Coast region. New training tiles updated on Quilt with hash af86cf135ffe5ed9f78fc65231e3aa0bfc90f45e33b6bccda9f08f392c090113. Though `surface` does have an `-r` setting to set pixel node registration directly, I looked at it and it gave strange diagonal strips for 2007tx.nc, so nope, we'll use `grdsample` instead, the only fault being we lose data lineage/provenance when checking the grid using `gmt grdinfo`. Hopefully the half pixel ghost doesn't haunt us ever again. Also note to self, to write up wrappers for `gmt blockmedian` and `gmt grdsample` to reduce boilerplate code in xyz_to_grid function. Reason for doing this was because the more precise selective_tile script (since e7936db) was so exact, that it realized our bounding boxes was outside of the (gridline registered) groundtruth grids and gave us NULL values! This problem didn't surface itself until I was trying to train the neural network, and was strangely getting NaN values in the discriminator loss after just one epoch, no matter what hyperparameters I used.
In order to get those border predictions better, we are very carefully reverting some aspects of 75b7493. Specifically, we're going for valid padding in our ESRGAN model's input block convolutions and directly using the MEaSUREs Ice Velocity we've resampled back to 500m in 50ddb5e instead of resampling on the fly. Keeping the 4km kernel/filters though! The amazing part is that the parameter counts all stay the same, +1 for efficiency of convolutions! Also made small adjustments to the hardcoded Topographic Loss Mean Absolute Error function in d599ee8. Test dataset reuploaded, so again we have a new quilt hash to use - e11988479975a091dd52e44b142370c37a03409f41cb6fec54fd7382ee1f99bc.
59ea059 to
1e7f699
Compare
weiji14
added a commit
that referenced
this pull request
Sep 3, 2019
Closes #167 Slicing and interpolating tiles as accurately as possible.
10 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
bug 🪲
Something isn't working
data 🗃️
Pull requests that update input datasets
enhancement ✨
New feature or request
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Split off the the slow REMA gapfilling task from the bulky data_prep.selective_tile script, adhering to DRY principles (don't have a
within awithstatement reading from a raster...). The lighter function will give us more room to play with parallel code, potentially making our tiling code perform faster, squeezing out more precision without worrying about speed!TODO: