Skip to content

Conversation

@weiji14
Copy link
Owner

@weiji14 weiji14 commented Mar 19, 2019

Retuning hyperparameters as there's been some significant changes made since #92. Main push was because I noticed some the discriminator was implemented somewhat differently compared with ESRGAN's. But also, there's been a new dataset added at #112 and a new Tesla V100 GPU (0a1f719) to train on so might as well!

TODO:

Noticed that the discriminator network doesn't quite follow ESRGAN (it is more like SRGAN's). Patching #78 by increasing depth of discriminator from 9 to 10 blocks, use kernel size of 4 in some Conv2D layers, and setting penultimate fully connected layer with 100 instead of 1024 neurons. See https://github.com/xinntao/BasicSR/blame/902b4ae1f4beec7359de6e62ed0aebfc335d8dfd/codes/models/modules/architecture.py#L86-L129 for original Pytorch implementation details.

The discriminator has become stronger, and it actually took a few experments to get a good RMSE test result. That means there will be a need to retune our hyperparameters. This commit references experiment at https://www.comet.ml/weiji14/deepbedmap/80c51658b2074743ba5151cde7d24560 with an RMSE test of 46.46.
@weiji14 weiji14 added the model 🏗️ Pull requests that update neural network model label Mar 19, 2019
@weiji14 weiji14 added this to the v0.7.0 milestone Mar 19, 2019
@weiji14 weiji14 self-assigned this Mar 19, 2019
@review-notebook-app
Copy link

Check out this pull request on ReviewNB: https://app.reviewnb.com/weiji14/deepbedmap/pull/129

Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows.

weiji14 added 3 commits March 22, 2019 10:28
All this time we had a higher adversarial weighting than the content loss?!! Critical patch for a2d9749 in #78. Unit tests updated, and rightly so, the loss is higher meaning we have some optimization work to do!
Fine tuning hyperparameters in light of all the recent fixes earlier in #129. The reason we widened the hyperparameter search space was because an earlier round of tuning (prior to d3a5b74) kinda suggested it. The jupyter notebook here reports only the best of the last 200 experiments ran (small syntax error bug in n_trials=50 setting not being respected)...

Using a fresh new sqlite db, we tune for 200 rounds using these hyperparameters: num_residual_blocks 10 to 14; residual_scaling 0.1 to 0.5, learning rate 1e-3 to 5e-4, num_epochs 80 to 120, batch size exponent 7 only, i.e. batch size of 128. Will decide on new default settings in a following commit.
Setting new default hyperparameters for our Super-Resolution Generative Adversarial Network. Going deeper from 10 to 12 Residual-in-Residual Blocks (RRDB), learning rate from 5e-4 to 6e-4, and num_epochs from 100 to 110.

Looking at the results, I quite like the sharper peak from the deepbedmap model's prediction (i.e. smaller standard deviation) compared with the flat ones before. Visual inspection of the 3D topography also not looking too shabby.
@weiji14 weiji14 marked this pull request as ready for review March 27, 2019 14:00
@weiji14
Copy link
Owner Author

weiji14 commented Mar 27, 2019

Weird problem with rasterio breaking the build, even though we have not changed any software versions?!! Error message as follows:

---------------------------------------------------------------------------
CPLE_OpenFailedError                      Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

rasterio/_shim.pyx in rasterio._shim.open_dataset()

rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()

CPLE_OpenFailedError: 'highres/2010tr.nc' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
<ipython-input-13-34f5dd97040f> in <module>
      1 hireses = [
      2     selective_tile(filepath=f"highres/{f}", window_bounds=w)
----> 3     for f, w in zip(filepaths, window_bounds)
      4 ]
      5 hires = np.concatenate(hireses)

<ipython-input-13-34f5dd97040f> in <listcomp>(.0)
      1 hireses = [
      2     selective_tile(filepath=f"highres/{f}", window_bounds=w)
----> 3     for f, w in zip(filepaths, window_bounds)
      4 ]
      5 hires = np.concatenate(hireses)

<ipython-input-3-07c3751fe850> in selective_tile(filepath, window_bounds, padding, out_shape, gapfill_raster_filepath)
     32     array_list = []
     33 
---> 34     with rasterio.open(filepath) as dataset:
     35         print(f"Tiling: {filepath}")
     36         for window_bound in window_bounds:

~/.local/share/virtualenvs/deepbedmap-qAKEkYde/lib/python3.6/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    419 
    420         with env_ctor(session=session):
--> 421             return f(*args, **kwds)
    422 
    423     return wrapper

~/.local/share/virtualenvs/deepbedmap-qAKEkYde/lib/python3.6/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    214         # None.
    215         if mode == 'r':
--> 216             s = DatasetReader(path, driver=driver, **kwargs)
    217         elif mode == 'r+':
    218             s = get_writer_for_path(path)(path, mode, driver=driver, **kwargs)

rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

RasterioIOError: 'highres/2010tr.nc' not recognized as a supported file format.

@weiji14 weiji14 merged commit 465e95d into master Mar 27, 2019
weiji14 added a commit that referenced this pull request Mar 27, 2019
Closes #129 Retune ESRGAN hyperparameters on stronger discriminator.
@weiji14 weiji14 deleted the retune_on_new_discriminator branch March 27, 2019 15:02
weiji14 added a commit that referenced this pull request Mar 28, 2019
Yet another critical patch for #78 and #129, can't believe it... Change discriminator to use HeNormal initialization instead of GlorotUniform, a hangover from using Keras, see #81. Refer to relevant code in ESRGAN's original Pytorch implementation at https://github.com/xinntao/BasicSR/blob/477e14e97eca4cb776d3b37667d42f8484b8b68b/codes/models/networks.py (where it's called kaiming initialization). This initializer change was recorded in one successful training round reviewable at https://www.comet.ml/weiji14/deepbedmap/17cfbfd5a54043c3a39b5ba183b1cc68.

Also noticed that Chainer's BatchNormalization behaves differently when the global_config.train flag is set to True/False. My assumption that simply not passing in an optimizer during the evaluation stage was incorrect. To be precise, the setting should only affect the Discriminator neural network since that's where we have BatchNormalization layers, but we've added the config flag to both the train_eval_generator and train_eval_discriminator functions to be extra sure. With that, the final recorded Comet.ML experiment this commit references is at https://www.comet.ml/weiji14/deepbedmap/44acdbc1127f4440891ed905846401cf.

Note that we have not retuned any hyperparameters, though that would be a smart thing to do. If you review the Comet.ML experiments, you'll notice that there were two cases of exploding gradients when these hotfixes were implemented. The discriminator's loss and accuracy charts look very different now, and in particular, there is a significant gap between the training and validation metrics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model 🏗️ Pull requests that update neural network model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant