Add MFU (Model FLOPs Utilization) logging #2

xingyaoww · 2024-12-03T03:00:57Z

This PR implements MFU (Model FLOPs Utilization) logging as requested in pytorch#2100.

Changes

Added MFU utility module (torchtune/utils/mfu.py) with functions to:
- Calculate theoretical peak FLOPS for GPU
- Calculate actual MFU percentage
- Calculate model FLOPs for one forward pass
Modified LoRAFinetuneRecipeDistributed to:
- Calculate model FLOPs after initialization
- Log MFU alongside other metrics in training loop

The MFU metric is now logged with the same frequency as other metrics (controlled by log_every_n_steps) and is available in all supported logging backends (terminal, disk, WandB, TensorBoard).

Closes pytorch#2100

xingyaoww closed this Dec 3, 2024

xingyaoww force-pushed the add-mfu-logging branch from a884d86 to 32e265d Compare December 3, 2024 05:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MFU (Model FLOPs Utilization) logging #2

Add MFU (Model FLOPs Utilization) logging #2

xingyaoww commented Dec 3, 2024

Add MFU (Model FLOPs Utilization) logging #2

Add MFU (Model FLOPs Utilization) logging #2

Conversation

xingyaoww commented Dec 3, 2024

Changes