We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I ran out of memory (GPU) when computing the perplexity metric and would like to propose a small optimization to decrease its memory utilization.
For instance, when running the following code PyTorch tries to allocate 1024 GB of GPU memory on my system.
from torchmetrics.text import Perplexity import torch gen = torch.manual_seed(42) preds = torch.rand(512, 1024, 12, generator=gen).cuda() target = torch.randint(12, (512, 1024), generator=gen).cuda() perp = Perplexity().cuda() print(perp(preds, target))
I think the inefficiency is in this line:
torchmetrics/src/torchmetrics/functional/text/perplexity.py
Line 94 in a68455a
probs[:, target] results in a large temporary tensor with (512*1024)^2 elements. Afterwards only the diagonal values are used.
probs[:, target]
(512*1024)^2
In contrast
probs = probs[torch.arange(target.numel()), target][mask]
would only require memory of the size of target.
Would you consider accepting a pull request with this optimization? Or was the previous implementation chosen for another reason?
The text was updated successfully, but these errors were encountered:
Hi! thanks for your contribution!, great first issue!
Sorry, something went wrong.
multi-gpu perplexity metric
a45a3b0
required an optimization in torchmetrics; see Lightning-AI/torchmetrics#2337
Just created PR #2346 with the (small) change. Feel free to merge, when you like it.
Successfully merging a pull request may close this issue.
🐛 Bug
I ran out of memory (GPU) when computing the perplexity metric and would like to propose a small optimization to decrease its memory utilization.
To Reproduce
For instance, when running the following code PyTorch tries to allocate 1024 GB of GPU memory on my system.
Memory Inefficiency
I think the inefficiency is in this line:
torchmetrics/src/torchmetrics/functional/text/perplexity.py
Line 94 in a68455a
probs[:, target]
results in a large temporary tensor with(512*1024)^2
elements. Afterwards only the diagonal values are used.Potential Solution
In contrast
would only require memory of the size of target.
Would you consider accepting a pull request with this optimization? Or was the previous implementation chosen for another reason?
Environment
The text was updated successfully, but these errors were encountered: