Skip to content

🧐 Implement test_step on independent hold-out set of images#1

Merged
weiji14 merged 6 commits into
mainfrom
test_step
May 24, 2022
Merged

🧐 Implement test_step on independent hold-out set of images#1
weiji14 merged 6 commits into
mainfrom
test_step

Conversation

@weiji14
Copy link
Copy Markdown
Owner

@weiji14 weiji14 commented May 19, 2022

To allow fair comparison of different models, compute F1 and IoU metrics on 8 independent images from a hold-out set (see 00a1c5b) that was handpicked to be from different geographic localities compared to the train/val set. The evaluate function now has a boolean calc_loss parameter, so that the expensive loss computation can be skipped during the test_step loop (i.e. compute metrics only).

Note on issues and workarounds:

  • Computing the test metrics means the full-size masks need to be loaded, and for some reason the geographical extent of the mask doesn't align with the image so need to do rio.clip_box.
  • Predicted tensor shapes are sometimes not exactly 5x that of the input image, so need to interpolate to the right size (which isn't actually proper, I know).
  • Needed to use CPU for test_step because GPU did not have enough memory, and use 32-bit precision too. Edit: fixed with model sharding ⚡ DeepSpeed ZeRO Stage 2 model parallel training #2.

TODO:

To allow fair comparison of different models, compute F1 and IoU metrics on 8 independent images from a hold-out set (see 00a1c5b) that was handpicked to be from different geographic localities compared to the train/val set. The evaluate function now has a boolean `calc_loss` parameter, so that the expensive loss computation can be skipped during the `test_step` loop (i.e. compute metrics only).

Note on issues and workarounds. Computing the test metrics means the full-size masks need to be loaded, and for some reason the geographical extent of the mask doesn't align with the image so need to do `rio.clip_box`. Also, the predicted tensor shapes are sometimes not exactly 5x that of the input image, so need to interpolate to the right size (which isn't actually proper, I know). Also needed to use CPU for test_step because GPU did not have enough memory, and use 32-bit precision too.
@weiji14 weiji14 added the enhancement New feature or request label May 19, 2022
@weiji14 weiji14 self-assigned this May 19, 2022
weiji14 added 5 commits May 21, 2022 16:26
With the DeepSpeed ZeRO strategy, the test_step computation can now happen on the GPU instead of the CPU. However, will need to have at least 2 GPUs (i.e. run only on the HPC server's A40s) to have enough GPU memory for a full-size Sentinel-2 image. Also made a one-line to to get height and width from segmmask tensor instead of superres tensor at L362, and force using one (CPU or GPU) device only in test_s2s2net.
Clip Sentinel-2 mask to bounding box extent of the binary mask instead of the other way around! This code has actually been living in a separate file for months, but now it's all in the S2S2Dataset class! The mask itself is first clipped to its own non-NaN bbox extent, and this is rounded to 10m resolution instead of the mask's 2m resolution. Then the Sentinel-2 image is clipped to this clipped mask's bbox extent. There are quite a few chained operations to prevent a memory leak that causes GPU out of memory issues.
Get fix that ensures checkpoint states are saved in a common filepath with deepspeed Lightning-AI/pytorch-lightning#12887.
Save the best model (based on highest validation F1) while training! Previously we just saved the neural network model at the end of the training run, but that might not be the best one as the metrics fluctuate up and down, so now we save the model with the maximum validation F1 instead. The model checkpoint when using the DeepSpeed ZeRO Stage 2 strategy is actually a folder (?), so need to convert it using a Pytorch Lightning utility function into a regular single file model checkpoint. Increased number of training epochs from 27 to 52, which takes about 1hr10min to run with DeepSpeed :D Also increased the num_workes from 1 to 4 for the predict and test dataloaders.
@weiji14 weiji14 marked this pull request as ready for review May 24, 2022 01:58
@weiji14 weiji14 merged commit 71e83a8 into main May 24, 2022
@weiji14 weiji14 deleted the test_step branch May 24, 2022 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant