Skip to content

Commit

Permalink
mcore customization doc minor fix (NVIDIA#8421) (NVIDIA#8437)
Browse files Browse the repository at this point in the history
Signed-off-by: Huiying Li <[email protected]>
Co-authored-by: Huiying <[email protected]>
  • Loading branch information
github-actions[bot] and HuiyingLi authored Feb 16, 2024
1 parent 3fda655 commit 88126f3
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/nlp/nemo_megatron/mcore_customization.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Megatron Core Customization
---------------------------

Megatron Core (Mcore) offers a range of functionalities, one of the most notable being the ability for users to train GPT models on an epic scale. Users can use ``megatron.core.models.gpt.GPTModel`` (Mcore GPTModel) to initialize the model, and then pretrain/load weights into the model. Mcore GPTModel adopts the typical GPT structure, beginning with embedding layer, positional encoding, followed by a series of transformer layers and finally output layer.
Megatron Core (Mcore) offers a range of functionalities, one of the most notable being the ability for users to train Transformer models on an epic scale. Users can enable decoder/GPT variants by using ``megatron.core.models.gpt.GPTModel`` (Mcore GPTModel) to initialize the model, and then pretrain/load weights into the model. Mcore GPTModel adopts the typical GPT structure, beginning with embedding layer, positional encoding, followed by a series of transformer layers and finally output layer.

In the rapidly advancing world of LLM, it is increasingly important to experiment with various configurations of the transformer block within each transformer layer. Some of these configurations involve the use of different module classes. While it is possible to achieve this with “if else” statements in Mcore, doing so makes Mcore less readable and less maintainable in the long term. Mcore spec intends to solve this challenge by allowing users to specify a customization of the transformer block in each layer, without modifying code in mcore.
We will dive more into the details of mcore spec in the first section of this blog. Then, we will demonstrate the usefulness of mcore spec using Falcon as an example.
Expand Down

0 comments on commit 88126f3

Please sign in to comment.