-
-
Notifications
You must be signed in to change notification settings - Fork 26
Retune ESRGAN hyperparameters on stronger discriminator #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Noticed that the discriminator network doesn't quite follow ESRGAN (it is more like SRGAN's). Patching #78 by increasing depth of discriminator from 9 to 10 blocks, use kernel size of 4 in some Conv2D layers, and setting penultimate fully connected layer with 100 instead of 1024 neurons. See https://github.com/xinntao/BasicSR/blame/902b4ae1f4beec7359de6e62ed0aebfc335d8dfd/codes/models/modules/architecture.py#L86-L129 for original Pytorch implementation details. The discriminator has become stronger, and it actually took a few experments to get a good RMSE test result. That means there will be a need to retune our hyperparameters. This commit references experiment at https://www.comet.ml/weiji14/deepbedmap/80c51658b2074743ba5151cde7d24560 with an RMSE test of 46.46.
|
Check out this pull request on ReviewNB: https://app.reviewnb.com/weiji14/deepbedmap/pull/129 Visit www.reviewnb.com to know how we simplify your Jupyter Notebook workflows. |
Fine tuning hyperparameters in light of all the recent fixes earlier in #129. The reason we widened the hyperparameter search space was because an earlier round of tuning (prior to d3a5b74) kinda suggested it. The jupyter notebook here reports only the best of the last 200 experiments ran (small syntax error bug in n_trials=50 setting not being respected)... Using a fresh new sqlite db, we tune for 200 rounds using these hyperparameters: num_residual_blocks 10 to 14; residual_scaling 0.1 to 0.5, learning rate 1e-3 to 5e-4, num_epochs 80 to 120, batch size exponent 7 only, i.e. batch size of 128. Will decide on new default settings in a following commit.
Setting new default hyperparameters for our Super-Resolution Generative Adversarial Network. Going deeper from 10 to 12 Residual-in-Residual Blocks (RRDB), learning rate from 5e-4 to 6e-4, and num_epochs from 100 to 110. Looking at the results, I quite like the sharper peak from the deepbedmap model's prediction (i.e. smaller standard deviation) compared with the flat ones before. Visual inspection of the 3D topography also not looking too shabby.
Owner
Author
|
Weird problem with rasterio breaking the build, even though we have not changed any software versions?!! Error message as follows: ---------------------------------------------------------------------------
CPLE_OpenFailedError Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
rasterio/_shim.pyx in rasterio._shim.open_dataset()
rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()
CPLE_OpenFailedError: 'highres/2010tr.nc' not recognized as a supported file format.
During handling of the above exception, another exception occurred:
RasterioIOError Traceback (most recent call last)
<ipython-input-13-34f5dd97040f> in <module>
1 hireses = [
2 selective_tile(filepath=f"highres/{f}", window_bounds=w)
----> 3 for f, w in zip(filepaths, window_bounds)
4 ]
5 hires = np.concatenate(hireses)
<ipython-input-13-34f5dd97040f> in <listcomp>(.0)
1 hireses = [
2 selective_tile(filepath=f"highres/{f}", window_bounds=w)
----> 3 for f, w in zip(filepaths, window_bounds)
4 ]
5 hires = np.concatenate(hireses)
<ipython-input-3-07c3751fe850> in selective_tile(filepath, window_bounds, padding, out_shape, gapfill_raster_filepath)
32 array_list = []
33
---> 34 with rasterio.open(filepath) as dataset:
35 print(f"Tiling: {filepath}")
36 for window_bound in window_bounds:
~/.local/share/virtualenvs/deepbedmap-qAKEkYde/lib/python3.6/site-packages/rasterio/env.py in wrapper(*args, **kwds)
419
420 with env_ctor(session=session):
--> 421 return f(*args, **kwds)
422
423 return wrapper
~/.local/share/virtualenvs/deepbedmap-qAKEkYde/lib/python3.6/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
214 # None.
215 if mode == 'r':
--> 216 s = DatasetReader(path, driver=driver, **kwargs)
217 elif mode == 'r+':
218 s = get_writer_for_path(path)(path, mode, driver=driver, **kwargs)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()
RasterioIOError: 'highres/2010tr.nc' not recognized as a supported file format. |
weiji14
added a commit
that referenced
this pull request
Mar 27, 2019
Closes #129 Retune ESRGAN hyperparameters on stronger discriminator.
weiji14
added a commit
that referenced
this pull request
Mar 28, 2019
Yet another critical patch for #78 and #129, can't believe it... Change discriminator to use HeNormal initialization instead of GlorotUniform, a hangover from using Keras, see #81. Refer to relevant code in ESRGAN's original Pytorch implementation at https://github.com/xinntao/BasicSR/blob/477e14e97eca4cb776d3b37667d42f8484b8b68b/codes/models/networks.py (where it's called kaiming initialization). This initializer change was recorded in one successful training round reviewable at https://www.comet.ml/weiji14/deepbedmap/17cfbfd5a54043c3a39b5ba183b1cc68. Also noticed that Chainer's BatchNormalization behaves differently when the global_config.train flag is set to True/False. My assumption that simply not passing in an optimizer during the evaluation stage was incorrect. To be precise, the setting should only affect the Discriminator neural network since that's where we have BatchNormalization layers, but we've added the config flag to both the train_eval_generator and train_eval_discriminator functions to be extra sure. With that, the final recorded Comet.ML experiment this commit references is at https://www.comet.ml/weiji14/deepbedmap/44acdbc1127f4440891ed905846401cf. Note that we have not retuned any hyperparameters, though that would be a smart thing to do. If you review the Comet.ML experiments, you'll notice that there were two cases of exploding gradients when these hotfixes were implemented. The discriminator's loss and accuracy charts look very different now, and in particular, there is a significant gap between the training and validation metrics.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Retuning hyperparameters as there's been some significant changes made since #92. Main push was because I noticed some the discriminator was implemented somewhat differently compared with ESRGAN's. But also, there's been a new dataset added at #112 and a new Tesla V100 GPU (0a1f719) to train on so might as well!
TODO: