Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions deepspeed/runtime/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

import os
import psutil
import gc
from math import ceil
from math import floor
from bisect import bisect_left, bisect_right
Expand Down Expand Up @@ -551,6 +552,9 @@ def see_memory_usage(message, force=False):
if torch.distributed.is_initialized() and not torch.distributed.get_rank() == 0:
return

# python doesn't do real-time garbage collection so do it explicitly to get the correct RAM reports
gc.collect()

# Print message except when distributed but not rank 0
logger.info(message)
logger.info(
Expand All @@ -564,6 +568,10 @@ def see_memory_usage(message, force=False):
logger.info(
f'CPU Virtual Memory: used = {used_GB} GB, percent = {vm_stats.percent}%')

# get the peak memory to report correct data, so reset the counter for the next call
if hasattr(torch.cuda, "reset_peak_memory_stats"): # pytorch 1.4+
torch.cuda.reset_peak_memory_stats()


def call_to_str(base, *args, **kwargs):
"""Construct a string representation of a call.
Expand Down
4 changes: 2 additions & 2 deletions docs/_tutorials/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -265,8 +265,8 @@ local machine to discover the number of slots available. The `--include` and
`--exclude` arguments work as normal, but the user should specify 'localhost'
as the hostname.

Also note that `CUDA_VISIBLE_DEVICES` can't be used with DeepSpeed to control
which devices should be used. For example, to use only gpu1 of the current
Also note that `CUDA_VISIBLE_DEVICES` can't be used with DeepSpeed to control
which devices should be used. For example, to use only gpu1 of the current
node, do:
```bash
deepspeed --include localhost:1 ...
Expand Down
2 changes: 1 addition & 1 deletion docs/_tutorials/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ For example, a machine with 16 GPUs must have as much local CPU memory as 16 tim
DeepSpeed provides a `LayerSpec` class that delays the construction of
modules until the model layers have been partitioned across workers.
Then each worker will allocate only the layers it's assigned to. So, comparing to the
example from the previous paragraph, using `LayerSpec` a machine with 16 GPUs will need to
example from the previous paragraph, using `LayerSpec` a machine with 16 GPUs will need to
allocate a total of 1x model size on its CPU memory and not 16x.

Here is an example of the abbreviated AlexNet model, but expressed only
Expand Down