nodes: mmap detach weights before they are pinned #355

rattus128 · 2025-11-02T10:53:39Z

Comfy core recently introduced a feature where weights may be pinned when loading, particularly for the case of offloading.

Intercept this, and immediately detached each weight before the pinning. This avoids a crash that at least some users are experiencing.

Use a little dict on the modules to keep track of whats already done, and when the catch-all detacher loop comes through, use this dict (which has already done modules removed) as the iterator basis.

Comfy core recently introduced a feature where weights may be pinned when loading, particularly for the case of offloading. Intercept this, and immediately detached each weight before the pinning. This avoids a crash that at least some users are experiencing. Use a little dict on the modules to keep track of whats already done, and when the catch-all detacher loop comes through, use this dict (which has already done modules removed) as the iterator basis.

thrnz · 2025-11-02T19:52:33Z

Thanks for this. It It seems to have successfully fixed the cuda oom I was getting when using pinned_memory with gguf models.

jprsyt5 · 2025-11-03T00:14:24Z

I can confirm on my setup that it also solved the recent CUDA crash when using GGUF with the --fast argument.

Edit: Anyway, I have a question, maybe it's still related? using --fast basically enables all performance optimizations like fp16_accum, including pinned_memory, right? Does using pinned_memory increase VRAM usage, or should it only increase RAM usage?

I tried running the same gguf workflow (WAN i2V w Torch Compile + pytorch 2.9) using exactly the same settings but on different ComfyUI builds.

First, I tested on 9da397ea and vram usage are like 73% on 1st run.

However, on the latest comfy (which already includes pinned_memory), the VRAM usage increased to 98% on the first run, which causing a noticable slowdown.

rattus128 · 2025-11-03T09:41:38Z

I can confirm on my setup that it also solved the recent CUDA crash when using GGUF with the --fast argument.

Edit: Anyway, I have a question, maybe it's still related? using --fast basically enables all performance optimizations like fp16_accum, including pinned_memory, right? Does using pinned_memory increase VRAM usage, or should it only increase RAM usage?

I tried running the same gguf workflow (WAN i2V w Torch Compile + pytorch 2.9) using exactly the same settings but on different ComfyUI builds.

First, I tested on 9da397ea and vram usage are like 73% on 1st run.

However, on the latest comfy (which already includes pinned_memory), the VRAM usage increased to 98% on the first run, which causing a noticable slowdown.

This shouldnt have any effect on VRAM. If you have a look at all the available optimizations of --fast, try manually listing all except --pinned_memory and then test with pinned_memory to test the single variable.

jprsyt5 · 2025-11-03T12:22:40Z

I can confirm on my setup that it also solved the recent CUDA crash when using GGUF with the --fast argument.
Edit: Anyway, I have a question, maybe it's still related? using --fast basically enables all performance optimizations like fp16_accum, including pinned_memory, right? Does using pinned_memory increase VRAM usage, or should it only increase RAM usage?
I tried running the same gguf workflow (WAN i2V w Torch Compile + pytorch 2.9) using exactly the same settings but on different ComfyUI builds.
First, I tested on 9da397ea and vram usage are like 73% on 1st run.
However, on the latest comfy (which already includes pinned_memory), the VRAM usage increased to 98% on the first run, which causing a noticable slowdown.

This shouldnt have any effect on VRAM. If you have a look at all the available optimizations of --fast, try manually listing all except --pinned_memory and then test with pinned_memory to test the single variable.

Never mind, I'm dumb.

I forgot I made a local edit, so I had commented out @torch.compiler.disable in comfy/ops.py to enable Torch Compile on my old Comfy build, as instructed in the Sage Attention nightly build, but I forgot to comment it out again afterward on the latest comfy. That's why the VRAM usage was higher.

Now, in the latest comfy, performance is indeed faster with pinned_memory (VRAM usage stays similar). On the 1st run, I usually got ~90s/it, and now it's around 65s/it. 🤯

Everything looks good now! Thanks for this quick PR, hopefully it will be merged soon.

city96 · 2025-11-03T14:56:55Z

Thanks for the PR. I did a quick test and don't see any regression on old versions either. There's some small nitpick comments that could be made like using an empty dict instead of None or defining a default for it next to mmap_released so it always exists, but I doubt anything will hit those edgecases, so I'll just merge it and push an updated version to the comfy registry as well.

This was referenced Nov 2, 2025

GGUF + --fast pinned_memory = CUDA crash Comfy-Org/ComfyUI#10601

Open

WanImageToVideo OOM after updating Comfy-Org/ComfyUI#10565

Closed

city96 merged commit 100c06c into city96:main Nov 3, 2025

city96 mentioned this pull request Nov 6, 2025

add default named_modules_to_munmap variable #357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nodes: mmap detach weights before they are pinned #355

nodes: mmap detach weights before they are pinned #355

Uh oh!

rattus128 commented Nov 2, 2025

Uh oh!

thrnz commented Nov 2, 2025

Uh oh!

jprsyt5 commented Nov 3, 2025 •

edited

Loading

Uh oh!

rattus128 commented Nov 3, 2025

Uh oh!

jprsyt5 commented Nov 3, 2025

Uh oh!

city96 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nodes: mmap detach weights before they are pinned #355

nodes: mmap detach weights before they are pinned #355

Uh oh!

Conversation

rattus128 commented Nov 2, 2025

Uh oh!

thrnz commented Nov 2, 2025

Uh oh!

jprsyt5 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattus128 commented Nov 3, 2025

Uh oh!

jprsyt5 commented Nov 3, 2025

Uh oh!

city96 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jprsyt5 commented Nov 3, 2025 •

edited

Loading