Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/_tutorials/zero.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ class ParallelTransformerLayer(MegatronModule):
#### Allocating Massive Megatron-LM Models

We make two further changes to model initalization in order to support models
that exceed *local* system memory, but not not *total* system memory.
that exceed *local* system memory, but not *total* system memory.

1. Allocate the model in a memory-scalable fashion. The model parameters will
be allocated and immediately partitioned across the data parallel group. If
Expand Down
8 changes: 4 additions & 4 deletions docs/code-docs/source/zero3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,13 @@ Getting Started

If you are new to DeepSpeed, check out our `Getting Started <https://www.deepspeed.ai/getting-started/>`_ page.

Once you are training with DeepSpeed, enabling ZeRO-3 offload is as simple as enabling it
Once you are training with DeepSpeed, enabling ZeRO-3 Offload is as simple as enabling it
in your DeepSpeed configuration! Below are a few examples of ZeRO-3 configurations. Please see
our `config guide <https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training>`_
for a complete list of options for configuration and performance tuning.

.. note::
ZeRO-Offload works best with our heavily optimized
ZeRO-3 Offload works best with our heavily optimized
:class:`deepspeed.ops.adam.DeepSpeedCPUAdam` optimizer. We recommend using
our `optimizer config <https://www.deepspeed.ai/docs/config-json/#optimizer-parameters>`_
to instruct :meth:`deepspeed.initialize` to build the optimizer for you.
Expand Down Expand Up @@ -149,8 +149,8 @@ DeepSpeed provides mechanisms for collecting (or *gathering*) a partitioned para

Some models partitioned with :class:`deepspeed.zero.Init` may need to access
a module’s weights outside of the class constructor or its ``forward()``
method. We refer to these weights as **external parameters**, since they
parameters are accessed outside of the module that created it. To do so, use
method. We refer to these weights as **external parameters**, since these
parameters are accessed outside of the module that created them. To do so, use
:class:`deepspeed.zero.GatheredParameters` or :meth:`deepspeed.zero.register_external_parameter`.

.. autoclass:: deepspeed.zero.GatheredParameters
Expand Down