-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maximum frames/steps etc for 24GB card? Keep getting OOM #189
Comments
I only have experience with a 4090, 129 frames at 960x544 uses about ~22GB with torch.compile, without torch.compile it will oom, in both comfy native and this wrapper. Compile seems to have a huge effect on VRAM use, and is about 30% faster, but from what I hear compiling isn't working at fp8 on a 3090 and requires 40xx card. With the wrapper sage/flash have proper memory use, sdpa implementation is highly inefficient and much better in comfy native. You can additionally enable swapping for up to 40 single blocks too. As to releasing the VRAM, it's always done when the |
Hey, I have a GTX 1080 TI and i can do these settings, it takes a long time tho because of old GPU, but if you use the same settings how high can you crank the resolution/frames? (1080ti only has 11gb of vram) btw these settings take me 160s per iteration, im curious how long it takes for 3090 as i may buy one soon. |
Currently I've only been testing for a few hours, but below takes 3.2min Sampling 129 frames in 33 latents at 512x384 with 20 inference steps |
@Fredd4e If you are on Linux installing Sage is pretty simple and apparently gives good time savings. |
@kijai block swap keeps me under 24GB but then I get OOM on VideoDecode, so kind of defeats the point unless I'm missing something? I also see that all of this methods are for low vram, I can't get anywhere near the suggested sizes from the model creators and I wouldn't consider 24GB to be low vram for a fp8 model |
You can reduce the tile size on the decode node, works fine with 128 spatial (halves the VRAM use compared to default 256), keep the temporal at 64 though to avoid stuttering/ghosting in the result. Have to disable the auto_size for the adjustments to take effect. The max resolution is very heavy, they did say that takes something like 60GB initially after all, so we are very much in "low VRAM" territory with 24GB. |
Makes sense, can't wait for a dual 5090 setup! For me then for now, it seems I max out at 1024 x 1024 109 frames = 50minutes! Not really worth it, but I'm sure improvements will come soon.[ |
I wish, currently i am on windows 10, i did give it a try to use sageattention - if i udnerstand correctly i need triton to run it, and triton does not seem to support my 1080ti. However if you do still think it should work id love to go deeper. |
It looks like the bug in torch for Windows, it's probably going to be fixed in 2.6.0, for now you can manually edit the code as this PR indicates: https://github.com/pytorch/pytorch/pull/138992/files That file would be in your venv or python_embeded folder, for example: \python_embeded\Lib\site-packages\torch_inductor |
That did it. Thank you. |
As title, just wondering what we should be looking at, even with 720 I'm getting OOMs (sometimes it works if i restart comfy), so maybe something isn't be released after generation
edit 1:
I don't see how people are generating decent size/length videos, I'm only able to get to 624/832 size with 45 frames?
edit 2:
Best I can generate so far with 3090 and sage with block swap. This the best to be expected?
The text was updated successfully, but these errors were encountered: