Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional references in compile guides #19550

Merged
merged 2 commits into from
Mar 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 18 additions & 1 deletion docs/source-fabric/advanced/compile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Speed up models by compiling them
#################################

Compiling your PyTorch model can result in significant speedups, especially on the latest generations of GPUs.
This guide shows you how to apply ``torch.compile`` correctly in your code.
This guide shows you how to apply `torch.compile <https://pytorch.org/docs/2.2/generated/torch.compile.html>`_ correctly in your code.

.. note::

Expand Down Expand Up @@ -223,6 +223,9 @@ On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically a
Numbers produced with NVIDIA A100 SXM4 40GB, PyTorch 2.2.0, CUDA 12.1.


If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch <https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html#excessive-recompilation>`_ for further investigation.


----


Expand Down Expand Up @@ -301,4 +304,18 @@ However, should you have issues compiling DDP and FSDP models, you can opt out o
model = fabric.setup(model, _reapply_compile=False)


----


********************
Additional Resources
********************

Here are a few resources for further reading after you complete this tutorial:

- `PyTorch 2.0 Paper <https://pytorch.org/blog/pytorch-2-paper-tutorial/>`_
- `GenAI with PyTorch 2.0 blog post series <https://pytorch.org/blog/accelerating-generative-ai-4/>`_
- `Training Production AI Models with PyTorch 2.0 <https://pytorch.org/blog/training-production-ai-models/>`_
- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach <https://pytorch.org/blog/empowering-models-performance/>`_

|
23 changes: 20 additions & 3 deletions docs/source-pytorch/advanced/compile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Speed up models by compiling them
#################################

Compiling your LightningModule can result in significant speedups, especially on the latest generations of GPUs.
This guide shows you how to apply ``torch.compile`` correctly in your code.
This guide shows you how to apply `torch.compile <https://pytorch.org/docs/2.2/generated/torch.compile.html>`_ correctly in your code.

.. note::

Expand Down Expand Up @@ -192,6 +192,8 @@ However, when this is not possible, you can request PyTorch to compile the code
A model compiled with ``dynamic=True`` will typically be slower than a model compiled with static shapes, but it will avoid the extreme cost of recompilation every iteration.
On PyTorch 2.2 and later, ``torch.compile`` will detect dynamism automatically and you should no longer need to set this.

If you still see recompilation issues after dealing with the aforementioned cases, there is a `Compile Profiler in PyTorch <https://pytorch.org/docs/stable/torch.compiler_troubleshooting.html#excessive-recompilation>`_ for further investigation.


----

Expand Down Expand Up @@ -251,9 +253,9 @@ Always compare the speed and memory usage of the compiled model against the orig
Limitations
***********

There are a few limitations you should be aware of when using ``torch.compile`` in conjunction with the Trainer:
There are a few limitations you should be aware of when using ``torch.compile`` **in conjunction with the Trainer**:

* ``torch.compile`` currently does not get reapplied over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment.
* The Trainer currently does not reapply ``torch.compile`` over DDP/FSDP, meaning distributed operations can't benefit from speed ups at the moment.
This limitation will be lifted in the future.

* In some cases, using ``self.log()`` in your LightningModule will cause compilation errors.
Expand All @@ -270,4 +272,19 @@ There are a few limitations you should be aware of when using ``torch.compile``
self.model = torch.compile(self.model)
...


----


********************
Additional Resources
********************

Here are a few resources for further reading after you complete this tutorial:

- `PyTorch 2.0 Paper <https://pytorch.org/blog/pytorch-2-paper-tutorial/>`_
- `GenAI with PyTorch 2.0 blog post series <https://pytorch.org/blog/accelerating-generative-ai-4/>`_
- `Training Production AI Models with PyTorch 2.0 <https://pytorch.org/blog/training-production-ai-models/>`_
- `Empowering Models with Performance: The Art of Generalized Model Transformation Approach <https://pytorch.org/blog/empowering-models-performance/>`_

|
Loading