From 632f190cdf5517feb7bed6969facd280d4a1b549 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Mon, 8 Mar 2021 20:38:38 -0800 Subject: [PATCH] small tweaks --- docs/_tutorials/zero.md | 2 +- docs/code-docs/source/zero3.rst | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/_tutorials/zero.md b/docs/_tutorials/zero.md index cbcdf417bb24..e594427f460f 100644 --- a/docs/_tutorials/zero.md +++ b/docs/_tutorials/zero.md @@ -227,7 +227,7 @@ class ParallelTransformerLayer(MegatronModule): #### Allocating Massive Megatron-LM Models We make two further changes to model initalization in order to support models -that exceed *local* system memory, but not not *total* system memory. +that exceed *local* system memory, but not *total* system memory. 1. Allocate the model in a memory-scalable fashion. The model parameters will be allocated and immediately partitioned across the data parallel group. If diff --git a/docs/code-docs/source/zero3.rst b/docs/code-docs/source/zero3.rst index 047aa08d684d..c986990444f3 100644 --- a/docs/code-docs/source/zero3.rst +++ b/docs/code-docs/source/zero3.rst @@ -21,13 +21,13 @@ Getting Started If you are new to DeepSpeed, check out our `Getting Started `_ page. -Once you are training with DeepSpeed, enabling ZeRO-3 offload is as simple as enabling it +Once you are training with DeepSpeed, enabling ZeRO-3 Offload is as simple as enabling it in your DeepSpeed configuration! Below are a few examples of ZeRO-3 configurations. Please see our `config guide `_ for a complete list of options for configuration and performance tuning. .. note:: - ZeRO-Offload works best with our heavily optimized + ZeRO-3 Offload works best with our heavily optimized :class:`deepspeed.ops.adam.DeepSpeedCPUAdam` optimizer. We recommend using our `optimizer config `_ to instruct :meth:`deepspeed.initialize` to build the optimizer for you. @@ -149,8 +149,8 @@ DeepSpeed provides mechanisms for collecting (or *gathering*) a partitioned para Some models partitioned with :class:`deepspeed.zero.Init` may need to access a module’s weights outside of the class constructor or its ``forward()`` -method. We refer to these weights as **external parameters**, since they -parameters are accessed outside of the module that created it. To do so, use +method. We refer to these weights as **external parameters**, since these +parameters are accessed outside of the module that created them. To do so, use :class:`deepspeed.zero.GatheredParameters` or :meth:`deepspeed.zero.register_external_parameter`. .. autoclass:: deepspeed.zero.GatheredParameters