Distributed optimizer support for experimental FP8 tensors #7885

timmoon10 · 2023-11-14T00:31:38Z

What does this PR do ?

This PR integrates with Transformer Engine's experimental Float8Tensors. This allows the model to only store FP8 weight matrices. The distributed optimizer stores an FP32 master copy of the weights and performs param all-gathers in FP8.

Collection: NLP

Changelog

Adds distributed optimizer support for FP8 parameters
Adds the option to initialize GPT with FP8 parameters

Usage

Run GPT, e.g. with the config at https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml.

Enable FP8 support with model.fp8=True, FP8 parameters with model.fp8_params=True, and the distributed optimizer with model.optim.name=distributed_fused_adam.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Depends on Distributed optimizer support for contiguous param buffer with FP8 params apex#1749
Uses Transformer Engine Float8Tensor added in [PyTorch] Experimental FP8 tensor class TransformerEngine#452
Closes Distributed optimizer support for experimental FP8 tensors #7469 and Distributed optimizer support for experimental FP8 tensors (r1.20.0 branch) #7565

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

github-actions · 2023-11-30T01:45:43Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2023-12-07T01:45:56Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

Support TE FP8 parameters with Apex distributed Adam

1abc577

Signed-off-by: Tim Moon <[email protected]>

github-actions bot added core Changes to NeMo Core NLP CI labels Nov 14, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

2cd1cfb

for more information, see https://pre-commit.ci

timmoon10 requested a review from erhoo82 November 15, 2023 02:32

This was referenced Nov 17, 2023

Add distopt support for FP8 params and BF16 optimizer state #7909

Merged

[PyTorch] Float8Tensor uses cached transpose if available NVIDIA/TransformerEngine#524

Closed

github-actions bot added the stale label Nov 30, 2023

github-actions bot closed this Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed optimizer support for experimental FP8 tensors #7885

Distributed optimizer support for experimental FP8 tensors #7885

timmoon10 commented Nov 14, 2023

github-actions bot commented Nov 30, 2023

github-actions bot commented Dec 7, 2023

Distributed optimizer support for experimental FP8 tensors #7885

Distributed optimizer support for experimental FP8 tensors #7885

Conversation

timmoon10 commented Nov 14, 2023

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

github-actions bot commented Nov 30, 2023

github-actions bot commented Dec 7, 2023