-
-
Notifications
You must be signed in to change notification settings - Fork 26
Hyperparameter Optimization Initialization #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Hyperparameter Optimization framework! https://github.com/pfnet/optuna
Enabling hyperparameter optimization in the srgan_train.ipynb script by putting pretty much everything inside of functions. Doing so inside a huge objective() function. Using a Tree-structure Parzen Estimator, something Bayesian :) Suggested hyperparameters are declared at runtime using optuna.trial.Trial, and experiments are logged to Comet.ML. Datasets now loaded using load_data_into_memory() with a better GPU_enabled check. Train/Dev iterators made using get_train_dev_iterators(). Generator/Discriminator models and optimizers declared using compile_srgan_model(), similar to what was removed in ee1e9df. Training of one epoch now wrapped inside a trainer() function, which will need to be refactored later into a nicer Chainer Updater. Some refactoring tweaks made to how the model weights and architecture are saved and loaded, basically making it a lot more explicit, with some changes made to deepbedmap.ipynb to reflect that.
Bumps [comet-ml](https://www.comet.ml) from 1.0.42 to 1.0.45. Signed-off-by: dependabot[bot] <support@dependabot.com>
60f235f to
c65128b
Compare
Storing the optuna hyperparameter optimization study details in an sqlite database for resuming later. Also improve efficiency in searching for optimal hyperparamaters by pruning unpromising experimental trials with vanishing/exploding gradients. Specifically, the pruning occurs when PSNR becomes negative, or when the Generator's Loss becomes NaN (usually a big number), or when the Discriminator's Loss becomes NaN (usually 0 or a small number). Note that these new 50 experimental runs with code sha 'faa96f1b' (e.g. at https://www.comet.ml/weiji14/deepbedmap/d6b2bf37408a45ad8331ae587c6aeb99) are resumed from the previous 100 runs stored in train2.db (renamed to train.db here), i.e. those recorded in comet.ml experiment with code sha 'f3ff3d57' (e.g. at https://www.comet.ml/weiji14/deepbedmap/18aeab738d9a4c56815d24cd712aafc6). Tentatively, the best hyperparameters so far is {'batch_size': 64, 'learning_rate': 0.0006500000000000001, 'num_epochs': 43, 'num_residual_blocks': 8}. Also made a quick update of the ONNX model architecture opset version.
Setting our hyperparameters to those found in the best trial, and focus our hyperparameter search around that. Namely, we are using a deeper ESRGAN model with num_residual_blocks=8, batch_size=64, learning_rate=6.5e-4, num_epochs=45 (close to 43). Ran 10 experimental runs on a fresh train.db database (i.e. discarding the old one) to check that things work, and it does seem to be quite stable within the hyperparameter search space of learning_rate between 4e-4 and 8e-4 and num_epochs between 30 and 60. Also updated a test on generator model parameter count, and set cupy.random.seed if cupy is available (when using GPU) for slightly better reproducibility. The .npz weight file download had some issues because we hardcoded retrieving asset at index 0, but it recently switched to 1. So I just added an assert check to make sure file extension ends with ".npz" and used the opportunity to refactor the code to use the new Comet_ML API library instead of requests.
Some sensible code refactoring for stuff coming up like brute force training, adding integration tests and modularizing code blocks better. Added an enable_livelossplot boolean flag to objective function, defaulting to False so that we can train faster. Added an enable_comet_logging boolean flag that may be useful for testing the objective function in some unit/integration test. The get_train_dev_iterators function now has a proper seed argument in place (was using the global seed before...). Final notebook cell prints top ten RMSE values (smallest ones) instead of last ten by time.
Training model on 2 GPUs in parallel e.g. via `CUDA_VISIBLE_DEVICES=1 jupyter nbconvert --ExecutePreprocessor.timeout=None --execute srgan_train.ipynb --to notebook --output model/logs/srgan_train_device1.ipynb &` for device1, swapping 1 to 0 for device0. Set cupy seed more properly, instead of only for device 0 as before. Remove potential problem with hardcoded sending of neural network model to gpu_id=0. Note that there is a chance for collision in using two processes accessing the same database... Results after ~90 training runs still not that great, may need to deepen the network more or start tuning more hyperparameters again. Also patch d569727's _download_deepbedmap_model_weights_from_comet() which had a hardcoded fix for a hardcoded problem. Developed a for-loop check instead of remove hardcoded way of downloading the npz parameter weights file.
Make things go faster by skipping quilt download and using cached track data after first run. Patch a2866b6 to prevent parallel GPU experimental trials colliding with one another, simply by using a unique trial_id when creating files for testing. Also changed batch_size suggestion to use an integer exponent instead of categorical, which might help with the bayesian model? Now trying batch sizes 64 and 128, with range of num_residual_blocks between 8 and 12. Was going to train for 25 runs by killed at 10 as there were a lot of pruned ones... Also fix get_deepbedmap_test_result() returning an np.float64 instead of a float, which somehow turns into a string when uploaded to comet.ml, causing sorting issues?!!
Adding residual scaling as a new hyperparameter to tune, and brute force training 400 times. The residual scaling factor was previously set at 0.2 as suggested by [Wang et al. 2018](https://arxiv.org/abs/1809.00219)'s ESRGAN paper, but here we try a range between 0.1 and 0.3 as mentioned in [Szegedy et al., 2016](https://arxiv.org/abs/1602.07261)'s paper. [Lim et al., 2017](https://arxiv.org/abs/1707.02921)'s EDSR paper actually used a residual scaling of 0.1. In our case, it appears that 0.3 is better? Best result in the 400 trials is an RMSE_test of 36.7995 using params {'batch_size_exponent': 7, 'learning_rate': 5e-4, 'num_epochs': 46, 'num_residual_blocks': 11, 'residual_scaling': 0.3}. However, another good hyperparameter setting that gives good RMSE_test values below 50 is batch_size=128, learning rate=5e-4, num_epochs=~45, num_residual_block=10, residual_scaling=0.3. For reference, our cubic interpolation benchmark is 62.24. Also patch ace6aa1 to prune unpromising experimental trials based on NaN metrics from the training set instead of the dev/validation set, a problem noticed in the last commit at 2cf27d4 which saw some validation metrics return NaN while training metrics were still valid numbers. Another patch for a2866b6 to set a different Tree-Structured Parzen Estimator (TPE) seed for each GPU so that there is more variety in each Optuna trial.
Setting new hyperparameters from our previous tuning at 9c65fb4, and testing our objective function (at least a mirror of it) in a behave integration test (for one epoch). Using a bigger batch size (128) and deeper model with more training epochs (for better convergence), specifically {"batch_size_exponent": 7, "num_residual_blocks": 10, "residual_scaling": 0.3, "learning_rate": 5e-4, "num_epochs": 100}. We report here an RMSE_test of 49.26 from experiment at https://www.comet.ml/weiji14/deepbedmap/315bd591ab944c1ebf87bce44cb83c21. Visual inspection of results already show a pretty good surface, except for the strange striped artifact some distance away from the border. May need to train and test this configuration a few more times? The objective function can now run properly as a standalone function using the optuna.trial.FixedTrial! For speed, and because the Continuous Integration server does not have a GPU, we run our srgan_train integration test on a test dataset with a total of only 1 tile. This required a hacky modification to the load_dataset_into_memory() function, which now outputs a dev_iter of size 1. For speed, and because the Continuous Integration server does not have a GPU, we run our srgan_train integration test on a test dataset with a total of only 1 tile. This required a hacky modification to the load_dataset_into_memory() function, which now outputs a dev_iter of size 1.
weiji14
added a commit
that referenced
this pull request
Feb 21, 2019
Closes #92 Hyperparameter Optimization Initialization - using the Optuna framework.
weiji14
added a commit
that referenced
this pull request
Feb 21, 2019
Closes #92 Hyperparameter Optimization Initialization - using the Optuna framework.
3 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Start fine-tuning our Super Resolution Generative Adversarial Network's hyperparameters for better results. Using a Tree-Structured Parzen Estimator (TPE) which is a Bayesian Optimization approach.
List of hyperparameters to tune:
TODO: