Retune ESRGAN hyperparameters on stronger discriminator #129

weiji14 · 2019-03-19T19:39:05Z

Retuning hyperparameters as there's been some significant changes made since #92. Main push was because I noticed some the discriminator was implemented somewhat differently compared with ESRGAN's. But also, there's been a new dataset added at #112 and a new Tesla V100 GPU (0a1f719) to train on so might as well!

TODO:

Patch Enhancing the Super Resolution Generative Adversarial Network #78 after a closer review at the code:
- implementing discriminator to be the same as ESRGAN's (6f8d7ae)
- Fix bug with content loss and adversarial loss weighing being accidentally swapped (d3a5b74)
Tune hyperparameters (6f46087)
Set new hyperparameter defaults (465e95d)

Noticed that the discriminator network doesn't quite follow ESRGAN (it is more like SRGAN's). Patching #78 by increasing depth of discriminator from 9 to 10 blocks, use kernel size of 4 in some Conv2D layers, and setting penultimate fully connected layer with 100 instead of 1024 neurons. See https://github.com/xinntao/BasicSR/blame/902b4ae1f4beec7359de6e62ed0aebfc335d8dfd/codes/models/modules/architecture.py#L86-L129 for original Pytorch implementation details. The discriminator has become stronger, and it actually took a few experments to get a good RMSE test result. That means there will be a need to retune our hyperparameters. This commit references experiment at https://www.comet.ml/weiji14/deepbedmap/80c51658b2074743ba5151cde7d24560 with an RMSE test of 46.46.

review-notebook-app · 2019-03-19T19:39:07Z

Check out this pull request on ReviewNB: https://app.reviewnb.com/weiji14/deepbedmap/pull/129

Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows.

All this time we had a higher adversarial weighting than the content loss?!! Critical patch for a2d9749 in #78. Unit tests updated, and rightly so, the loss is higher meaning we have some optimization work to do!

Fine tuning hyperparameters in light of all the recent fixes earlier in #129. The reason we widened the hyperparameter search space was because an earlier round of tuning (prior to d3a5b74) kinda suggested it. The jupyter notebook here reports only the best of the last 200 experiments ran (small syntax error bug in n_trials=50 setting not being respected)... Using a fresh new sqlite db, we tune for 200 rounds using these hyperparameters: num_residual_blocks 10 to 14; residual_scaling 0.1 to 0.5, learning rate 1e-3 to 5e-4, num_epochs 80 to 120, batch size exponent 7 only, i.e. batch size of 128. Will decide on new default settings in a following commit.

Setting new default hyperparameters for our Super-Resolution Generative Adversarial Network. Going deeper from 10 to 12 Residual-in-Residual Blocks (RRDB), learning rate from 5e-4 to 6e-4, and num_epochs from 100 to 110. Looking at the results, I quite like the sharper peak from the deepbedmap model's prediction (i.e. smaller standard deviation) compared with the flat ones before. Visual inspection of the 3D topography also not looking too shabby.

weiji14 · 2019-03-27T14:56:59Z

Weird problem with rasterio breaking the build, even though we have not changed any software versions?!! Error message as follows:

---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

rasterio/_shim.pyx in rasterio._shim.open_dataset()

rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: 'highres/2010tr.nc' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
<ipython-input-13-34f5dd97040f> in <module>
      1 hireses = [
      2     selective_tile(filepath=f"highres/{f}", window_bounds=w)
----> 3     for f, w in zip(filepaths, window_bounds)
      4 ]
      5 hires = np.concatenate(hireses)

<ipython-input-13-34f5dd97040f> in <listcomp>(.0)
      1 hireses = [
      2     selective_tile(filepath=f"highres/{f}", window_bounds=w)
----> 3     for f, w in zip(filepaths, window_bounds)
      4 ]
      5 hires = np.concatenate(hireses)

<ipython-input-3-07c3751fe850> in selective_tile(filepath, window_bounds, padding, out_shape, gapfill_raster_filepath)
     32     array_list = []
     33 
---> 34     with rasterio.open(filepath) as dataset:
     35         print(f"Tiling: {filepath}")
     36         for window_bound in window_bounds:

~/.local/share/virtualenvs/deepbedmap-qAKEkYde/lib/python3.6/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    419 
    420         with env_ctor(session=session):
--> 421             return f(*args, **kwds)
    422 
    423     return wrapper

~/.local/share/virtualenvs/deepbedmap-qAKEkYde/lib/python3.6/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    214         # None.
    215         if mode == 'r':
--> 216             s = DatasetReader(path, driver=driver, **kwargs)
    217         elif mode == 'r+':
    218             s = get_writer_for_path(path)(path, mode, driver=driver, **kwargs)

rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

RasterioIOError: 'highres/2010tr.nc' not recognized as a supported file format.

Closes #129 Retune ESRGAN hyperparameters on stronger discriminator.

Yet another critical patch for #78 and #129, can't believe it... Change discriminator to use HeNormal initialization instead of GlorotUniform, a hangover from using Keras, see #81. Refer to relevant code in ESRGAN's original Pytorch implementation at https://github.com/xinntao/BasicSR/blob/477e14e97eca4cb776d3b37667d42f8484b8b68b/codes/models/networks.py (where it's called kaiming initialization). This initializer change was recorded in one successful training round reviewable at https://www.comet.ml/weiji14/deepbedmap/17cfbfd5a54043c3a39b5ba183b1cc68. Also noticed that Chainer's BatchNormalization behaves differently when the global_config.train flag is set to True/False. My assumption that simply not passing in an optimizer during the evaluation stage was incorrect. To be precise, the setting should only affect the Discriminator neural network since that's where we have BatchNormalization layers, but we've added the config flag to both the train_eval_generator and train_eval_discriminator functions to be extra sure. With that, the final recorded Comet.ML experiment this commit references is at https://www.comet.ml/weiji14/deepbedmap/44acdbc1127f4440891ed905846401cf. Note that we have not retuned any hyperparameters, though that would be a smart thing to do. If you review the Comet.ML experiments, you'll notice that there were two cases of exploding gradients when these hotfixes were implemented. The discriminator's loss and accuracy charts look very different now, and in particular, there is a significant gap between the training and validation metrics.

weiji14 added the model 🏗️ Pull requests that update neural network model label Mar 19, 2019

weiji14 added this to the v0.7.0 milestone Mar 19, 2019

weiji14 self-assigned this Mar 19, 2019

weiji14 added 3 commits March 22, 2019 10:28

🚑 Content loss 1e-2 and Adversarial loss 5e-3, not vice versa

d3a5b74

All this time we had a higher adversarial weighting than the content loss?!! Critical patch for a2d9749 in #78. Unit tests updated, and rightly so, the loss is higher meaning we have some optimization work to do!

weiji14 marked this pull request as ready for review March 27, 2019 14:00

weiji14 merged commit 465e95d into master Mar 27, 2019

weiji14 added a commit that referenced this pull request Mar 27, 2019

🔀 Merge branch 'retune_on_new_discriminator' (#129)

d1106a3

Closes #129 Retune ESRGAN hyperparameters on stronger discriminator.

weiji14 deleted the retune_on_new_discriminator branch March 27, 2019 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retune ESRGAN hyperparameters on stronger discriminator #129

Retune ESRGAN hyperparameters on stronger discriminator #129

Uh oh!

weiji14 commented Mar 19, 2019 •

edited

Loading

Uh oh!

review-notebook-app bot commented Mar 19, 2019

Uh oh!

weiji14 commented Mar 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Retune ESRGAN hyperparameters on stronger discriminator #129

Retune ESRGAN hyperparameters on stronger discriminator #129

Uh oh!

Conversation

weiji14 commented Mar 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Mar 19, 2019

Uh oh!

weiji14 commented Mar 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weiji14 commented Mar 19, 2019 •

edited

Loading