Memory Profile

There is also an option to profile the memory usage on GPU while training sparsechem. This branch requires the installation of the python modules: torchinfo and pytorch_memlab.

How to use Memory Profile option

To enable memory profiling the following option needs to be activated when running train.py:

  --profile PROFILE     Set this to 1 to output memory profile information

This will enable an extra output after the training step has finished:

Peak GPU memory used: 3630.65185546875MB

For more details an extra file memprofile.txt is also produced in the ouput folder. In this file you can have more details on the different parts of the memory usage, for example:

Storage on cuda:0
Tensor148                                               (1,)   512.00B
Tensor149                                               (1,)   512.00B
Tensor150                                       (6805, 2890)    37.51M
Tensor151                                         (2, 61205)   956.50K
Tensor152                                           (61205,)   239.50K
Tensor153                                           (61205,)   120.00K
Tensor154                                               (1,)   512.00B
Tensor155                                               (1,)   512.00B
Tensor1                                              (2890,)    11.50K
net.2.net.2.weight                              (2890, 6000)    66.15M
net.2.net.2.weight.grad                         (2890, 6000)    66.15M
net.2.net.2.bias                                     (2890,)    11.50K
net.2.net.2.bias.grad                                (2890,)    11.50K
Tensor82                                                (1,)   512.00B
Tensor83                                                (1,)   512.00B
Tensor67                                       (32000, 6000)   732.42M
Tensor69                                             (6000,)    23.50K
Tensor71                                        (2890, 6000)    66.15M
Tensor73                                             (2890,)    11.50K
Tensor66                                       (32000, 6000)   732.42M
Tensor68                                             (6000,)    23.50K
Tensor70                                        (2890, 6000)    66.15M
Tensor72                                             (2890,)    11.50K
net.0.net_freq.weight                          (32000, 6000)   732.42M
net.0.net_freq.weight.grad                     (32000, 6000)   732.42M
net.0.net_freq.bias                                  (6000,)    23.50K
net.0.net_freq.bias.grad                             (6000,)    23.50K
-------------------------------------------------------------------------------
Total Tensors: 857309726        Used Memory: 3.16G
The allocated memory on cuda:0: 3.47G
Memory differs due to the matrix alignment or invisible gradient buffer tensors
-------------------------------------------------------------------------------

You can also open tensorboard using:

tensorboard --logdir <outputdir>

and opening

http://localhost:6006/#scalars&_smoothingWeight=0

You can also see how GPU memory behaves over training time:

Mixed Precision

To use sparsechem in mixed precision mode you can enable it using this option:

--mixed_precision MIXED_PRECISION
                        Set this to 1 to run in mixed precision mode (vs
                        single precision)

Remark

When profiling sparsechem runs it is recommended to use unique run names for every run as some calculations like peak memory GPU usage are based on logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memprofile.md

memprofile.md

Memory Profile

How to use Memory Profile option

Mixed Precision

Remark

Files

memprofile.md

Latest commit

History

memprofile.md

File metadata and controls

Memory Profile

How to use Memory Profile option

Mixed Precision

Remark