Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed optimizer support for experimental FP8 tensors #7885

Closed
wants to merge 2 commits into from

Conversation

timmoon10
Copy link
Collaborator

What does this PR do ?

This PR integrates with Transformer Engine's experimental Float8Tensors. This allows the model to only store FP8 weight matrices. The distributed optimizer stores an FP32 master copy of the weights and performs param all-gathers in FP8.

Collection: NLP

Changelog

  • Adds distributed optimizer support for FP8 parameters
  • Adds the option to initialize GPT with FP8 parameters

Usage

Run GPT, e.g. with the config at https://github.com/NVIDIA/NeMo/blob/main/examples/nlp/language_modeling/conf/megatron_gpt_config.yaml.

Enable FP8 support with model.fp8=True, FP8 parameters with model.fp8_params=True, and the distributed optimizer with model.optim.name=distributed_fused_adam.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

@github-actions github-actions bot added core Changes to NeMo Core NLP CI labels Nov 14, 2023
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Nov 30, 2023
Copy link
Contributor

github-actions bot commented Dec 7, 2023

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI core Changes to NeMo Core NLP stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant