-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ModuleNotFoundError(str(_TRANSFORMER_ENGINE_AVAILABLE)) #809
Comments
Given that TE is installed, I think it is a bug in the |
@wprazuch Can you show the command you used to install your version of TE so I can replicate? |
I sent a fix here Lightning-AI/utilities#292 such that the utility can parse dev/pre-release versions like the one from transformer engine (e.g., How I installed TE to verify this:
Then I checked this works: from lightning_utilities.core.imports import RequirementCache
assert RequirementCache("transformer-engine>=0.11.0") The installed version of TE was: 1.10.0.dev0+931b44f |
This should solve the requirement check. Whether the TE plugin in Lightning is compatible with recent versions of TE remains to be seen. It hasn't been touched/used in some time. |
Is this resolved? @wprazuch |
@tfogal resolved 👍 |
🐛 Bug
With newer transformer-engine version in our container, fp8 TE benchmarking for
benchmark_litgpt.py
script is not possible.Version where the script works:
Version where the script does not work:
With this in mind, I would like to suggest some workarounds, or even changes in the
benchmark_litgpt.py
script. We could get rid of thefrom lightning.fabric.plugins.precision.transformer_engine import TransformerEnginePrecision
dependency entirely, and instead use a simple function (we used it in our internal benchmarks and it did just fine):We could have similar function for
nn.LayerNorm
, or even add it to the above snippet as a flag to swapLayerNorm
as well. or even swap both layers by default.And while we are in the topic of layer swapping, I think it would make sense to re-visit the scope of swapping layers, which we discussed with @tfogal. Actually, when swapping only
Linear
withoutLayerNorm
we receive much more successful benchmarks altogether as we get less OOM errors. I would say it is worth considering creating something like a fallback mechanism (cc: @tfogal) that works:Linear
only and see if works.I understand this might not be ideal, but at this point
fp8
functionality is nothing like a switch. It is still a matter of different configurations and approaches to run fp8. I am happy to learn what others think about this.If the decision is to stay with the current wrapper class
TransformerEnginePrecision
, please let me know, and I will try to fix it and release a PR in the respectivepytorch-lightning
repository.To Reproduce
Steps to reproduce the behavior:
Code sample
As in:
thunder/benchmarks/benchmark_litgpt.py
Expected behavior
The code should run because the requirement is met.
Environment
As in the
20240719
containerAdditional context
None
The text was updated successfully, but these errors were encountered: