New way of finding optimal batch size #744

BulatVakhitov · 2024-06-17T14:25:07Z

Now optimal batch_size can be found via running compute_optimal_batch_size function, which does two steps:

Batch Size Estimation:
- predictive method: This approach runs the model with various batch sizes.
  It then calculates the optimal batch size by solving a linear system of equations,
  measured_memory = batch_size * item_size + model_size + eps, to find both item_size and model_size.
- bruteforce method: This approach runs the model with progressively larger batch sizes
  until an Out Of Memory (OOM) error occurs.
Optimal Batch Size Calculation:
The exact optimal batch size is determined using a binary search algorithm.

SergeyTsimfer

is it true that with benchmark=False the predictive optimal batch size computation can be run with much lower n, maybe even n=1?
documentation is lacking, both in the master branch and the new implemented methods;
overall looks good, just a few things to polish:)

SergeyTsimfer · 2024-06-17T21:29:22Z

batchflow/models/torch/base_mixins.py

+            and "DefaultCPUAllocator: can't allocate memory" in exception.args[0]
+        )
+
+    def is_oom_error(self, exception):


The entire error parsing part can be simplified to just a few lines in one method

SergeyTsimfer · 2024-06-17T21:30:23Z

batchflow/models/torch/base_mixins.py

+                raise
+
+
+    def _compute_optimal_batch_size(self, method='train', inputs=None, targets=None,


It is quite strange that this method is "private", while it uses "public" methods inside

should rename things into:

compute_optimal_batch_size

_compute_batch_size_predictive, maybe with no underscore

_compute_batch_size_bruteforce, maybe with no underscore

Also, as there are now more methods, documentation should be moved from the Mixin to the specific methods

SergeyTsimfer · 2024-06-17T21:36:20Z

batchflow/models/torch/base_mixins.py

+            if high[-1] - low[-1] <= 1:
+                break
+
+            self.garbage_collection_cuda()


move this before the if/break: this way, you would not need to call gc after the for-loop

SergeyTsimfer · 2024-06-17T21:36:45Z

batchflow/models/torch/base_mixins.py

+        return batch_size[-1], batch_size[-1] * factor
+
+
+    def _run_binary_scaling(self, inputs=None, targets=None, low=2, high=None,


function is pretty much identical to the previous one: only the way to compute the next tested batch size differs

SergeyTsimfer · 2024-06-17T21:38:15Z

batchflow/models/torch/base_mixins.py

+        notifier = Notifier(n_iters=n_iters, bar=pbar,
+                            monitors=[{'source': low, 'name': 'low'},
+                                      {'source': high, 'name': 'high'},
+                                      {'source': batch_size, 'name': 'batch_size'}])


making all of the variables an instance of list so that Notifier can track them is not intuitive; while I like the idea, it should be commented in code

SergeyTsimfer · 2024-06-17T21:39:35Z

batchflow/models/torch/base_mixins.py

+                self.is_cudnn_snafu(exception) or
+                self.is_out_of_cpu_memory(exception))
+
+    def garbage_collection_cuda(self):


hey, if you copy some things from other places, at least re-write them to your (and project) code-style

SergeyTsimfer · 2024-06-17T21:41:58Z

batchflow/models/torch/base_mixins.py

+        """ Compute memory usage for multiple batch sizes. """
+
+        # first calculate optimal batch_size estimation
+        if use_estimation:


would rather have parameter estimation_method='predictive'/'bruteforce'

SergeyTsimfer

I do like this version a lot more!
What do you think about moving thix mixin into a separate .py file?

SergeyTsimfer · 2024-07-01T14:00:09Z

batchflow/models/torch/base_mixins.py

+                                   max_memory=100, pbar='n', tail_size=20,
+                                   frequency=0.05, delta_batch_size=16,
+                                   max_batch_size=1024, max_iters_estimation=2):
+        """ Computes optimal batch size in two steps:


SergeyTsimfer · 2024-07-01T14:01:13Z

batchflow/models/torch/base_mixins.py

+
+        Parameters
+        ----------
+        method: str


if the argument should be one of the predefined string values, document it as method: str, {'option1', 'option2'}

SergeyTsimfer · 2024-07-01T14:01:28Z

batchflow/models/torch/base_mixins.py

+        estimation_method: str
+            Whether `bruteforce` or `predictive` estimation method will be used.
+
+        inputs: np.array


np.ndarray

SergeyTsimfer · 2024-07-01T14:01:58Z

batchflow/models/torch/base_mixins.py

+
+        inputs: np.array
+            The inputs to the model, that will be used in optimal batch computation.
+            If none, then the placeholder will be created


If not provided
Also, if you want to explicitly mention None value, capitalize the first letter

SergeyTsimfer · 2024-07-01T14:05:29Z

batchflow/models/torch/base_mixins.py

+            Maximum number of binary search iterations.
+
+        factor: int
+            Value by which we multiply batch_size at each iteration. Uses in bruteforce estimation.


Use quotes to mention other parameters: `batch_size`

SergeyTsimfer · 2024-07-01T14:06:23Z

batchflow/models/torch/base_mixins.py

+        spread: int
+            Defines how wide the interval for binary search will be. Uses in predictive estimation.


This value is not an int; also, the documentation gives very little insight into how it works / what values should the user pass

SergeyTsimfer · 2024-07-01T14:06:47Z

batchflow/models/torch/base_mixins.py

+        Returns
+        -------
+        optimal_batch_size: int
+            Batch size that perfectly fits in max_memory


SergeyTsimfer · 2024-07-15T08:42:46Z

batchflow/models/torch/batch_opt_mixin.py

@@ -0,0 +1,336 @@
+""" Contains mixin for :class:`~.torch.TorchModel` to provide textual and graphical visualizations. """


Also, I would rather name this file base_batchsize_mixin.py

What's wrong with batch_opt_mixin.py? I think it's more self-explanatory

All of the mixins for TorchModel are now in base_*.py files: that is the main reason

Fair enough

HollowPrincess · 2024-07-17T12:43:36Z

batchflow/models/torch/base_batchsize_mixin.py

+        """ Compute optimal batch size in two steps:
+        1. Calculate batch size estimation using `predictive` or `bruteforce` method.
+        2. Calculate exact optimal batch size using binary search.
+
+        Parameters
+        ----------
+        method: str, {`train`, `predict`}
+            Defines in which method (`train` or `predict`) the optimal batch size will
+            be computed. Default: `train`.
+


Please, fix docstyle: look at newlines in other docs.
(Between header and entire text should be an empty newline, between parameters are no empty newlines)

Also, first line should be a short method purpose. All details must be provided below it.

HollowPrincess · 2024-07-17T12:46:47Z

batchflow/models/torch/base_batchsize_mixin.py

+
+        Parameters
+        ----------
+        method: str, {`train`, `predict`}


Set is exhaustive, no need to write str.
Example:
https://numpydoc.readthedocs.io/en/latest/format.html#parameters

order : {'C', 'F', 'A'} Description of `order`.

HollowPrincess · 2024-07-17T12:47:44Z

batchflow/models/torch/base_batchsize_mixin.py

+                                   frequency=0.05, delta_batch_size=16,
+                                   max_batch_size=1024, max_iters_estimation=2):
+        """ Compute optimal batch size in two steps:
+        1. Calculate batch size estimation using `predictive` or `bruteforce` method.


Not clear enough for first time reading.

HollowPrincess · 2024-07-17T12:48:55Z

batchflow/models/torch/base_batchsize_mixin.py

+            Whether `bruteforce` or `predictive` estimation method will be used.
+
+        inputs: np.ndarray
+            The inputs to the model, that will be used in optimal batch computation.


Try to write words in the direct order: The inputs to the model -> Model inputs. It is shorter and easier to understand.
that will be used in optimal batch computation. -> to use in optimal batch computation.

HollowPrincess · 2024-07-17T12:50:19Z

batchflow/models/torch/base_batchsize_mixin.py

+
+        inputs: np.ndarray
+            The inputs to the model, that will be used in optimal batch computation.
+            If not provided, then the placeholder will be created


If parameter can be not provided, then you should add ,optional or None into the datatype line.

HollowPrincess · 2024-07-17T13:13:04Z

batchflow/models/torch/base_batchsize_mixin.py

+                break
+
+        # Make and solve a system of equations for `item_size`, `model_size`
+        matrix = np.array([[batch_size, 1] for batch_size in table.keys()])


Maybe it would be better to use preallocation with items assignment instead of comprehension?

HollowPrincess · 2024-07-17T13:13:52Z

batchflow/models/torch/base_batchsize_mixin.py

+        optimal_batch_size = (max_memory - model_size) / item_size
+        optimal_batch_size = int(optimal_batch_size)


Suggested change

optimal_batch_size = (max_memory - model_size) / item_size

optimal_batch_size = int(optimal_batch_size)

optimal_batch_size = (max_memory - model_size) // item_size

If you wanted floor rounding

In your implementation optimal_batch_size will be float, because both numenator and denominator are floats. So anyway I have to convert it to int.

You are right, my mistake

HollowPrincess · 2024-07-17T13:16:53Z

batchflow/models/torch/base_batchsize_mixin.py

+    def get_memory_utilization(self, batch_size, method='train', inputs=None, targets=None, n=20, frequency=0.05,
+                               time_threshold=3, tail_size=20, std_threshold=0.1):
+        """ For a given `batch_size`, make `inputs` and `targets` and compute memory utilization. """
+        inputs = inputs or self.make_placeholder_data(batch_size, to_device=False) # maybe use to_device=False


I can't find the method make_placeholder_data in this file and this class has no parent classes from which this method can be used.

If it is should be defined in the child class, please, provide an empty method with such a comment.

The make_placeholder_data method lies in TorchModel class. The main idea of mixins to the TorchModel class is to split one huge class into several modules for convenience. So following the example of other mixins to this class, I used child's method in a mixin's method.

HollowPrincess · 2024-07-17T13:19:59Z

batchflow/models/torch/base_batchsize_mixin.py

+
+    def _get_memory_utilization(self, method, inputs, targets, n, frequency,
+                                time_threshold, tail_size, std_threshold):
+        """ Run method `n` times and make sure that memory measurements are stable. """


Not clear description.

HollowPrincess · 2024-07-17T13:20:59Z

batchflow/models/torch/base_batchsize_mixin.py

+                    _ = self.predict(inputs=inputs, microbatch_size=False)
+
+        if not self.config.get('benchmark'):
+            data = monitor.data


Should be placed before if

SergeyTsimfer · 2024-07-22T12:36:05Z

batchflow/models/torch/base_batchsize_mixin.py

+            n_iters = None
+            generator = self._bruteforce_batch_size_generator(factor=factor, max_memory=max_memory)
+        else:
+            raise ValueError("Wrong update method! Could be `bruteforce` or `binary`")


Unknown `update_method`: select either `'predictive'` or `'binary'`.

New way of finding optimal batch size

f3d5886

BulatVakhitov requested review from AlexeyKozhevin, HollowPrincess, SergeyTsimfer and EvgeniyS99 June 17, 2024 14:25

SergeyTsimfer reviewed Jun 17, 2024

View reviewed changes

Fix PR comments

fbfab2b

SergeyTsimfer reviewed Jul 1, 2024

View reviewed changes

BulatVakhitov added 3 commits July 1, 2024 15:50

Moved to separate .py file, fix docs

d41b2ec

fix import

6ecb80a

Fix pylint

007b20c

SergeyTsimfer approved these changes Jul 15, 2024

View reviewed changes

BulatVakhitov added 2 commits July 15, 2024 11:24

fix PR comments

851af3b

fix import

3a46c81

HollowPrincess requested changes Jul 17, 2024

View reviewed changes

BulatVakhitov added 4 commits July 18, 2024 11:41

fix PR comments

74401d3

fix docs, some refactoring

db65bdd

Merge branch 'master' into optimal_batch_size

39b91ac

small fixes

98bd880

BulatVakhitov requested a review from HollowPrincess July 19, 2024 10:52

SergeyTsimfer approved these changes Jul 22, 2024

View reviewed changes

BulatVakhitov added 2 commits July 22, 2024 12:48

paraphrase error message

addef45

fix pylint

5225e36

SergeyTsimfer merged commit cbe3f54 into master Jul 29, 2024
37 checks passed

SergeyTsimfer deleted the optimal_batch_size branch July 29, 2024 08:49

		raise


		def _compute_optimal_batch_size(self, method='train', inputs=None, targets=None,

		return batch_size[-1], batch_size[-1] * factor


		def _run_binary_scaling(self, inputs=None, targets=None, low=2, high=None,

		spread: int
		Defines how wide the interval for binary search will be. Uses in predictive estimation.

		@@ -0,0 +1,336 @@
		""" Contains mixin for :class:`~.torch.TorchModel` to provide textual and graphical visualizations. """

		optimal_batch_size = (max_memory - model_size) / item_size
		optimal_batch_size = int(optimal_batch_size)

New way of finding optimal batch size #744

New way of finding optimal batch size #744

Conversation

BulatVakhitov commented Jun 17, 2024 • edited Loading

SergeyTsimfer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SergeyTsimfer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HollowPrincess Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HollowPrincess Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

HollowPrincess Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HollowPrincess Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BulatVakhitov commented Jun 17, 2024 •

edited

Loading

HollowPrincess Jul 17, 2024 •

edited

Loading

HollowPrincess Jul 17, 2024 •

edited

Loading

HollowPrincess Jul 17, 2024 •

edited

Loading

HollowPrincess Jul 17, 2024 •

edited

Loading