Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New way of finding optimal batch size #744

Merged
merged 13 commits into from
Jul 29, 2024
Merged

Conversation

BulatVakhitov
Copy link
Contributor

@BulatVakhitov BulatVakhitov commented Jun 17, 2024

Now optimal batch_size can be found via running compute_optimal_batch_size function, which does two steps:

  1. Batch Size Estimation:

    • predictive method: This approach runs the model with various batch sizes.
      It then calculates the optimal batch size by solving a linear system of equations,
      measured_memory = batch_size * item_size + model_size + eps, to find both item_size and model_size.
    • bruteforce method: This approach runs the model with progressively larger batch sizes
      until an Out Of Memory (OOM) error occurs.
  2. Optimal Batch Size Calculation:
    The exact optimal batch size is determined using a binary search algorithm.

Copy link
Member

@SergeyTsimfer SergeyTsimfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • is it true that with benchmark=False the predictive optimal batch size computation can be run with much lower n, maybe even n=1?
  • documentation is lacking, both in the master branch and the new implemented methods;
  • overall looks good, just a few things to polish:)

and "DefaultCPUAllocator: can't allocate memory" in exception.args[0]
)

def is_oom_error(self, exception):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire error parsing part can be simplified to just a few lines in one method

raise


def _compute_optimal_batch_size(self, method='train', inputs=None, targets=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is quite strange that this method is "private", while it uses "public" methods inside

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should rename things into:

  • compute_optimal_batch_size
  • _compute_batch_size_predictive, maybe with no underscore
  • _compute_batch_size_bruteforce, maybe with no underscore

Also, as there are now more methods, documentation should be moved from the Mixin to the specific methods

if high[-1] - low[-1] <= 1:
break

self.garbage_collection_cuda()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this before the if/break: this way, you would not need to call gc after the for-loop

return batch_size[-1], batch_size[-1] * factor


def _run_binary_scaling(self, inputs=None, targets=None, low=2, high=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function is pretty much identical to the previous one: only the way to compute the next tested batch size differs

Comment on lines 362 to 365
notifier = Notifier(n_iters=n_iters, bar=pbar,
monitors=[{'source': low, 'name': 'low'},
{'source': high, 'name': 'high'},
{'source': batch_size, 'name': 'batch_size'}])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making all of the variables an instance of list so that Notifier can track them is not intuitive; while I like the idea, it should be commented in code

self.is_cudnn_snafu(exception) or
self.is_out_of_cpu_memory(exception))

def garbage_collection_cuda(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey, if you copy some things from other places, at least re-write them to your (and project) code-style

""" Compute memory usage for multiple batch sizes. """

# first calculate optimal batch_size estimation
if use_estimation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would rather have parameter estimation_method='predictive'/'bruteforce'

Copy link
Member

@SergeyTsimfer SergeyTsimfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like this version a lot more!
What do you think about moving thix mixin into a separate .py file?

max_memory=100, pbar='n', tail_size=20,
frequency=0.05, delta_batch_size=16,
max_batch_size=1024, max_iters_estimation=2):
""" Computes optimal batch size in two steps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compute


Parameters
----------
method: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the argument should be one of the predefined string values, document it as method: str, {'option1', 'option2'}

estimation_method: str
Whether `bruteforce` or `predictive` estimation method will be used.

inputs: np.array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.ndarray


inputs: np.array
The inputs to the model, that will be used in optimal batch computation.
If none, then the placeholder will be created
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not provided
Also, if you want to explicitly mention None value, capitalize the first letter

Maximum number of binary search iterations.

factor: int
Value by which we multiply batch_size at each iteration. Uses in bruteforce estimation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use quotes to mention other parameters: `batch_size`

Comment on lines 283 to 284
spread: int
Defines how wide the interval for binary search will be. Uses in predictive estimation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is not an int; also, the documentation gives very little insight into how it works / what values should the user pass

Returns
-------
optimal_batch_size: int
Batch size that perfectly fits in max_memory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UwU

@@ -0,0 +1,336 @@
""" Contains mixin for :class:`~.torch.TorchModel` to provide textual and graphical visualizations. """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I would rather name this file base_batchsize_mixin.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with batch_opt_mixin.py? I think it's more self-explanatory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the mixins for TorchModel are now in base_*.py files: that is the main reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

Comment on lines 41 to 50
""" Compute optimal batch size in two steps:
1. Calculate batch size estimation using `predictive` or `bruteforce` method.
2. Calculate exact optimal batch size using binary search.

Parameters
----------
method: str, {`train`, `predict`}
Defines in which method (`train` or `predict`) the optimal batch size will
be computed. Default: `train`.

Copy link
Contributor

@HollowPrincess HollowPrincess Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix docstyle: look at newlines in other docs.
(Between header and entire text should be an empty newline, between parameters are no empty newlines)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, first line should be a short method purpose. All details must be provided below it.


Parameters
----------
method: str, {`train`, `predict`}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set is exhaustive, no need to write str.
Example:
https://numpydoc.readthedocs.io/en/latest/format.html#parameters

order : {'C', 'F', 'A'}
    Description of `order`.

frequency=0.05, delta_batch_size=16,
max_batch_size=1024, max_iters_estimation=2):
""" Compute optimal batch size in two steps:
1. Calculate batch size estimation using `predictive` or `bruteforce` method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear enough for first time reading.

Whether `bruteforce` or `predictive` estimation method will be used.

inputs: np.ndarray
The inputs to the model, that will be used in optimal batch computation.
Copy link
Contributor

@HollowPrincess HollowPrincess Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to write words in the direct order: The inputs to the model -> Model inputs. It is shorter and easier to understand.
that will be used in optimal batch computation. -> to use in optimal batch computation.


inputs: np.ndarray
The inputs to the model, that will be used in optimal batch computation.
If not provided, then the placeholder will be created
Copy link
Contributor

@HollowPrincess HollowPrincess Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If parameter can be not provided, then you should add ,optional or None into the datatype line.

break

# Make and solve a system of equations for `item_size`, `model_size`
matrix = np.array([[batch_size, 1] for batch_size in table.keys()])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be better to use preallocation with items assignment instead of comprehension?

Comment on lines 286 to 287
optimal_batch_size = (max_memory - model_size) / item_size
optimal_batch_size = int(optimal_batch_size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
optimal_batch_size = (max_memory - model_size) / item_size
optimal_batch_size = int(optimal_batch_size)
optimal_batch_size = (max_memory - model_size) // item_size

If you wanted floor rounding

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your implementation optimal_batch_size will be float, because both numenator and denominator are floats. So anyway I have to convert it to int.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, my mistake

def get_memory_utilization(self, batch_size, method='train', inputs=None, targets=None, n=20, frequency=0.05,
time_threshold=3, tail_size=20, std_threshold=0.1):
""" For a given `batch_size`, make `inputs` and `targets` and compute memory utilization. """
inputs = inputs or self.make_placeholder_data(batch_size, to_device=False) # maybe use to_device=False
Copy link
Contributor

@HollowPrincess HollowPrincess Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find the method make_placeholder_data in this file and this class has no parent classes from which this method can be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is should be defined in the child class, please, provide an empty method with such a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The make_placeholder_data method lies in TorchModel class. The main idea of mixins to the TorchModel class is to split one huge class into several modules for convenience. So following the example of other mixins to this class, I used child's method in a mixin's method.


def _get_memory_utilization(self, method, inputs, targets, n, frequency,
time_threshold, tail_size, std_threshold):
""" Run method `n` times and make sure that memory measurements are stable. """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear description.

_ = self.predict(inputs=inputs, microbatch_size=False)

if not self.config.get('benchmark'):
data = monitor.data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be placed before if

n_iters = None
generator = self._bruteforce_batch_size_generator(factor=factor, max_memory=max_memory)
else:
raise ValueError("Wrong update method! Could be `bruteforce` or `binary`")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unknown `update_method`: select either `'predictive'` or `'binary'`.

@SergeyTsimfer SergeyTsimfer merged commit cbe3f54 into master Jul 29, 2024
37 checks passed
@SergeyTsimfer SergeyTsimfer deleted the optimal_batch_size branch July 29, 2024 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants