-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-563] Refactor R optimizers to fix memory leak #11374
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeremiedb can you add tests for the optimizers - #7196 None of the optimizers have tests.
Let me know if you want me to pitch in, we can collaborate on this. I can begin work on fixing some of the broken optimizers, starting next month.
@hetong007 Please take a look at this. |
@anirudhacharya Sure, I'll add tests. |
@jeremiedb sure! |
@hetong007 With Adadelta and Adagrad, the same functionnalities as now supported (and non-centered rmsprop has been added within rmsprop). Tests to be added. |
R-package/R/optimizer.R
Outdated
|
||
count <- 0 | ||
num_update <- 0 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please eliminate trailing whitespaces.
R-package/R/optimizer.R
Outdated
#' Step size. | ||
#' @param gamma1 float, default=0.95 | ||
#' | ||
#' @param learning.rate float, default=1e-3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a strong reason to change the default values? It may break other people's code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to align with the Python's package default. I'll revert to existing default if you see if you see more harms from this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to keep the default value as in R package. It's not necessary to set default values the same across interfaces, while it may break user's script if we change it, especially for parameters in optimizer.
R-package/R/optimizer.R
Outdated
mx.opt.get.updater <- function(optimizer, weights, ctx) { | ||
|
||
exec_list <- lapply(seq_along(weights), function(i) { | ||
if (is.null(weights[[i]])) return(NULL) else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please format here and below as
if (condition) {
xxx
} else {
yyy
}
You may use the help from your editor to eliminate trailing spaces. Also, please add tests for new optimizers. |
@hetong007 Tests added and trailing space fixed. |
R-package/R/optimizer.R
Outdated
#' @param learning.rate float, default=0.002 | ||
#' Step size. | ||
#' The initial learning rate. | ||
#' @param gamma1 float, default=0.95 | ||
#' decay factor of moving average for gradient, gradient^2. | ||
#' @param gamm2 float, default=0.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gamm2
-> gamma2
epsilon = 1e-8, | ||
wd = 0, | ||
rescale.grad = 1, | ||
clip_gradient = -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between setting it to 1
and -1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When clip_gradient
is < 0, no clipping is applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add that to the docstring, otherwise people may feel confused without looking at the code.
Hi, i have same problem with GPU memory (package mxnet, fuction mx.mlp, just started with that). But as i'm traffic engineer, i have a little problem with orientation in provided solution (i'm use r for long time but not in this detail). Can you pleas be so kind and provide some "Refactor R optimizers for dumies" version of solution? :). |
@hetong007 Is there any blocking element remaining? |
@aplikaplik Is your issue specificly about mxnet MLP function or also related to gpuR? As for mx.mlp, this PR should fix memory encountered with both mx.mlp and mx.model.Feedforward.create since they are all relying on the same optimizers update routine. |
@jeremiedb Hi, i referenced gpuR because it could be usefull for @cdeterman. My problem is much simpler, because i don't know how implement this code (optimizer.R) to my R instalation or mxnet package? Im sorry for that elemtal question. |
* refactor R optimizers to fix memory leak * add Adadelta and Adagrad * fix comments * fix comments * fix comments * add tests * fix whitespaces * fix whitespaces * fix typo * fix typo * add doc on clipping
@hetong007 @jeremiedb |
Fix R memory leakage through refactor of the optimizers.
Given the mutatable NDArray isn't supported in R, optimizers have been ported as symbolic update, an executor being created for each weight.
! Only optimizers update symbols have been used for now, so only SGD, rmsprop and Adam are now supported. Will need to reimplement the manual update for Adagrad and Adadelta.
Memory is now kept at low level even for very large networks and embeddings.
Tested on CPU and single GPU, not multiple GPUs.