-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add throughput utilities to Fabric and the Trainer #18848
Conversation
SpeedMonitor
and measure_flops
ThroughputMonitor
and measure_flops
⚡ Required checks status: All passing 🟢Groups summary🟢 pytorch_lightning: Tests workflowThese checks are required after the changes to 🟢 pytorch_lightning: Azure GPU
These checks are required after the changes to 🟢 pytorch_lightning: Benchmarks
These checks are required after the changes to 🟢 fabric: Docs
These checks are required after the changes to 🟢 pytorch_lightning: Docs
These checks are required after the changes to 🟢 lightning_fabric: CPU workflowThese checks are required after the changes to 🟢 lightning_fabric: Azure GPU
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 installThese checks are required after the changes to Thank you for your contribution! 💜
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature is for sure very valuable, thank you for adding it. My only gripe is with the way things get logged in Fabric. I see that it is not flexible enough currently and that I would probably struggle using the monitor together with e.g. the wandb logger if the stepping is coded into the monitor itself.
ThroughputMonitor
and measure_flops
What does this PR do?
Ports https://github.com/Lightning-AI/lit-gpt/blob/main/lit_gpt/speed_monitor.py.
The API is changed, as it nows follows the torchmetrics style of
update
andcompute
on Fabric.On the Trainer, a regular
Callback
is kept.Careful consideration of edge cases was added to minimize user errors, for instance, the addition of the
_MonotonicWindow
class instead of a regulardeque
.(Fabric)
Throughput
example:(Fabric)
ThroughputMonitor
example:(Trainer)
ThroughputMonitor
example:📚 Documentation preview 📚: https://pytorch-lightning--18848.org.readthedocs.build/en/18848/
cc @Borda @awaelchli @carmocca @justusschock