Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test AMP and Apex checkpointing #11885

Closed
awaelchli opened this issue Feb 11, 2022 · 4 comments
Closed

Test AMP and Apex checkpointing #11885

awaelchli opened this issue Feb 11, 2022 · 4 comments
Labels
checkpointing Related to checkpointing precision: amp Automatic Mixed Precision precision: apex (removed) NVIDIA/apex precision tests

Comments

@awaelchli
Copy link
Contributor

awaelchli commented Feb 11, 2022

🚀 Feature

We currently don't have any tests that the amp/apex states (i.e. scaler) are saved and restored correctly (I couldn't find any such tests).

Motivation

PRs like #11638 which change the loading and saving behavior risk introducing bugs, especially when complicated logic is involved to remain backward compatible.

Pitch

Add tests.

Alternatives

None

Additional context


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.

  • Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.

  • Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @awaelchli @ananthsub @ninginthecloud @rohitgr7 @Borda @akihironitta @carmocca @justusschock

@awaelchli awaelchli added feature Is an improvement or enhancement tests labels Feb 11, 2022
@awaelchli awaelchli added this to the 1.6 milestone Feb 11, 2022
@awaelchli awaelchli added precision: apex (removed) NVIDIA/apex precision precision: amp Automatic Mixed Precision labels Feb 11, 2022
@carmocca carmocca removed the feature Is an improvement or enhancement label Feb 12, 2022
@Borda
Copy link
Member

Borda commented Feb 14, 2022

I think it is quite an important extension to our legacy checkpoint testing... #11403

@carmocca carmocca modified the milestones: 1.6, future Mar 21, 2022
@akihironitta akihironitta added the checkpointing Related to checkpointing label Mar 31, 2022
@awaelchli
Copy link
Contributor Author

Since the apex plugin is the only one dumping state in checkpoints, we would probably close this one if we decide to move forward with #14416

@akihironitta
Copy link
Contributor

Given the comment #14416 (comment), shall we close this issue?

@awaelchli
Copy link
Contributor Author

Yes!

@carmocca carmocca removed this from the pl:future milestone Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
checkpointing Related to checkpointing precision: amp Automatic Mixed Precision precision: apex (removed) NVIDIA/apex precision tests
Projects
None yet
Development

No branches or pull requests

4 participants