-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New way of finding optimal batch size #744
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is it true that with
benchmark=False
the predictive optimal batch size computation can be run with much lowern
, maybe evenn=1
? - documentation is lacking, both in the master branch and the new implemented methods;
- overall looks good, just a few things to polish:)
and "DefaultCPUAllocator: can't allocate memory" in exception.args[0] | ||
) | ||
|
||
def is_oom_error(self, exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entire error parsing part can be simplified to just a few lines in one method
raise | ||
|
||
|
||
def _compute_optimal_batch_size(self, method='train', inputs=None, targets=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is quite strange that this method is "private", while it uses "public" methods inside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should rename things into:
compute_optimal_batch_size
_compute_batch_size_predictive
, maybe with no underscore_compute_batch_size_bruteforce
, maybe with no underscore
Also, as there are now more methods, documentation should be moved from the Mixin to the specific methods
if high[-1] - low[-1] <= 1: | ||
break | ||
|
||
self.garbage_collection_cuda() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this before the if/break
: this way, you would not need to call gc
after the for-loop
return batch_size[-1], batch_size[-1] * factor | ||
|
||
|
||
def _run_binary_scaling(self, inputs=None, targets=None, low=2, high=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function is pretty much identical to the previous one: only the way to compute the next tested batch size differs
notifier = Notifier(n_iters=n_iters, bar=pbar, | ||
monitors=[{'source': low, 'name': 'low'}, | ||
{'source': high, 'name': 'high'}, | ||
{'source': batch_size, 'name': 'batch_size'}]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making all of the variables an instance of list
so that Notifier can track them is not intuitive; while I like the idea, it should be commented in code
self.is_cudnn_snafu(exception) or | ||
self.is_out_of_cpu_memory(exception)) | ||
|
||
def garbage_collection_cuda(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, if you copy some things from other places, at least re-write them to your (and project) code-style
""" Compute memory usage for multiple batch sizes. """ | ||
|
||
# first calculate optimal batch_size estimation | ||
if use_estimation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would rather have parameter estimation_method='predictive'/'bruteforce'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like this version a lot more!
What do you think about moving thix mixin into a separate .py file?
max_memory=100, pbar='n', tail_size=20, | ||
frequency=0.05, delta_batch_size=16, | ||
max_batch_size=1024, max_iters_estimation=2): | ||
""" Computes optimal batch size in two steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compute
|
||
Parameters | ||
---------- | ||
method: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the argument should be one of the predefined string values, document it as method: str, {'option1', 'option2'}
estimation_method: str | ||
Whether `bruteforce` or `predictive` estimation method will be used. | ||
|
||
inputs: np.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.ndarray
|
||
inputs: np.array | ||
The inputs to the model, that will be used in optimal batch computation. | ||
If none, then the placeholder will be created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not provided
Also, if you want to explicitly mention None
value, capitalize the first letter
Maximum number of binary search iterations. | ||
|
||
factor: int | ||
Value by which we multiply batch_size at each iteration. Uses in bruteforce estimation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use quotes to mention other parameters: `batch_size`
spread: int | ||
Defines how wide the interval for binary search will be. Uses in predictive estimation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is not an int; also, the documentation gives very little insight into how it works / what values should the user pass
Returns | ||
------- | ||
optimal_batch_size: int | ||
Batch size that perfectly fits in max_memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UwU
@@ -0,0 +1,336 @@ | |||
""" Contains mixin for :class:`~.torch.TorchModel` to provide textual and graphical visualizations. """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I would rather name this file base_batchsize_mixin.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with batch_opt_mixin.py
? I think it's more self-explanatory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the mixins for TorchModel
are now in base_*.py
files: that is the main reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
""" Compute optimal batch size in two steps: | ||
1. Calculate batch size estimation using `predictive` or `bruteforce` method. | ||
2. Calculate exact optimal batch size using binary search. | ||
|
||
Parameters | ||
---------- | ||
method: str, {`train`, `predict`} | ||
Defines in which method (`train` or `predict`) the optimal batch size will | ||
be computed. Default: `train`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, fix docstyle: look at newlines in other docs.
(Between header and entire text should be an empty newline, between parameters are no empty newlines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, first line should be a short method purpose. All details must be provided below it.
|
||
Parameters | ||
---------- | ||
method: str, {`train`, `predict`} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set is exhaustive, no need to write str.
Example:
https://numpydoc.readthedocs.io/en/latest/format.html#parameters
order : {'C', 'F', 'A'}
Description of `order`.
frequency=0.05, delta_batch_size=16, | ||
max_batch_size=1024, max_iters_estimation=2): | ||
""" Compute optimal batch size in two steps: | ||
1. Calculate batch size estimation using `predictive` or `bruteforce` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear enough for first time reading.
Whether `bruteforce` or `predictive` estimation method will be used. | ||
|
||
inputs: np.ndarray | ||
The inputs to the model, that will be used in optimal batch computation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to write words in the direct order: The inputs to the model
-> Model inputs
. It is shorter and easier to understand.
that will be used in optimal batch computation.
-> to use in optimal batch computation.
|
||
inputs: np.ndarray | ||
The inputs to the model, that will be used in optimal batch computation. | ||
If not provided, then the placeholder will be created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If parameter can be not provided, then you should add ,optional
or None
into the datatype line.
break | ||
|
||
# Make and solve a system of equations for `item_size`, `model_size` | ||
matrix = np.array([[batch_size, 1] for batch_size in table.keys()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be better to use preallocation with items assignment instead of comprehension?
optimal_batch_size = (max_memory - model_size) / item_size | ||
optimal_batch_size = int(optimal_batch_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimal_batch_size = (max_memory - model_size) / item_size | |
optimal_batch_size = int(optimal_batch_size) | |
optimal_batch_size = (max_memory - model_size) // item_size |
If you wanted floor rounding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your implementation optimal_batch_size will be float, because both numenator and denominator are floats. So anyway I have to convert it to int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, my mistake
def get_memory_utilization(self, batch_size, method='train', inputs=None, targets=None, n=20, frequency=0.05, | ||
time_threshold=3, tail_size=20, std_threshold=0.1): | ||
""" For a given `batch_size`, make `inputs` and `targets` and compute memory utilization. """ | ||
inputs = inputs or self.make_placeholder_data(batch_size, to_device=False) # maybe use to_device=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find the method make_placeholder_data
in this file and this class has no parent classes from which this method can be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is should be defined in the child class, please, provide an empty method with such a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The make_placeholder_data
method lies in TorchModel
class. The main idea of mixins to the TorchModel
class is to split one huge class into several modules for convenience. So following the example of other mixins to this class, I used child's method in a mixin's method.
|
||
def _get_memory_utilization(self, method, inputs, targets, n, frequency, | ||
time_threshold, tail_size, std_threshold): | ||
""" Run method `n` times and make sure that memory measurements are stable. """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear description.
_ = self.predict(inputs=inputs, microbatch_size=False) | ||
|
||
if not self.config.get('benchmark'): | ||
data = monitor.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be placed before if
n_iters = None | ||
generator = self._bruteforce_batch_size_generator(factor=factor, max_memory=max_memory) | ||
else: | ||
raise ValueError("Wrong update method! Could be `bruteforce` or `binary`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unknown `update_method`: select either `'predictive'` or `'binary'`.
Now optimal batch_size can be found via running
compute_optimal_batch_size
function, which does two steps:Batch Size Estimation:
predictive
method: This approach runs the model with various batch sizes.It then calculates the optimal batch size by solving a linear system of equations,
measured_memory = batch_size * item_size + model_size + eps
, to find both item_size and model_size.bruteforce
method: This approach runs the model with progressively larger batch sizesuntil an Out Of Memory (OOM) error occurs.
Optimal Batch Size Calculation:
The exact optimal batch size is determined using a binary search algorithm.