-
Notifications
You must be signed in to change notification settings - Fork 256
nodes: mmap detach weights before they are pinned #355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Comfy core recently introduced a feature where weights may be pinned when loading, particularly for the case of offloading. Intercept this, and immediately detached each weight before the pinning. This avoids a crash that at least some users are experiencing. Use a little dict on the modules to keep track of whats already done, and when the catch-all detacher loop comes through, use this dict (which has already done modules removed) as the iterator basis.
|
Thanks for this. It It seems to have successfully fixed the cuda oom I was getting when using |
|
I can confirm on my setup that it also solved the recent CUDA crash when using GGUF with the Edit: Anyway, I have a question, maybe it's still related? using --fast basically enables all performance optimizations like fp16_accum, including pinned_memory, right? Does using pinned_memory increase VRAM usage, or should it only increase RAM usage? I tried running the same gguf workflow (WAN i2V w Torch Compile + pytorch 2.9) using exactly the same settings but on different ComfyUI builds. First, I tested on However, on the latest comfy (which already includes pinned_memory), the VRAM usage increased to 98% on the first run, which causing a noticable slowdown. |
This shouldnt have any effect on VRAM. If you have a look at all the available optimizations of --fast, try manually listing all except --pinned_memory and then test with pinned_memory to test the single variable. |
Never mind, I'm dumb. I forgot I made a local edit, so I had commented out Now, in the latest comfy, performance is indeed faster with pinned_memory (VRAM usage stays similar). On the 1st run, I usually got ~90s/it, and now it's around 65s/it. 🤯 Everything looks good now! Thanks for this quick PR, hopefully it will be merged soon. |
|
Thanks for the PR. I did a quick test and don't see any regression on old versions either. There's some small nitpick comments that could be made like using an empty dict instead of |
Comfy core recently introduced a feature where weights may be pinned when loading, particularly for the case of offloading.
Intercept this, and immediately detached each weight before the pinning. This avoids a crash that at least some users are experiencing.
Use a little dict on the modules to keep track of whats already done, and when the catch-all detacher loop comes through, use this dict (which has already done modules removed) as the iterator basis.