add default named_modules_to_munmap variable #357

rattus128 · 2025-11-06T15:33:51Z

Default this to handle to case of cloning a GGUFModelPatcher which then expects it to exist on the clone.

I could never reproduce this condition, however 4 users have the error. This error path is very sensitive to VRAM condtions and workflow and I never got the details so its a bit of a guess.

But from a code point of view, its the right thing anyway

RYG81 · 2025-11-06T18:05:09Z

worked for me

city96 · 2025-11-06T21:13:14Z

This is mostly what I was concerned about in my comment on the original PR as well:

defining a default for it next to mmap_released so it always exists

Could you try something like this instead? We should use mmap_released as the control logic and keep named_modules_to_munmap as a dictionary to avoid other edgecases by separating the two. I think that should work as well:

phaserblast · 2025-11-07T09:37:03Z

Could you try something like this instead? We should use mmap_released as the control logic and keep named_modules_to_munmap as a dictionary to avoid other edgecases by separating the two. I think that should work as well:

I am also having this problem. The above patch causes the following error for me:

torch.AcceleratorError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

When I revert the patch, the error goes away.

rattus128 · 2025-11-07T10:09:23Z

Could you try something like this instead? We should use mmap_released as the control logic and keep named_modules_to_munmap as a dictionary to avoid other edgecases by separating the two. I think that should work as well:

I am also having this problem. The above patch causes the following error for me:
torch.AcceleratorError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
When I revert the patch, the error goes away.

Are you saying your workflow fully works without the patch and this causes regression or are you saying you have the same error as the report, and now with this you get a different error?

Default this to handle to case of cloning a GGUFModelPatcher which then expects it to exist on the clone. I could never reproduce this condition, however 4 users have the error. This error path is very sensitive to VRAM condtions and workflow and I never got the details so its a bit of a guess. But from a code point of view, its the right thing anyway

phaserblast · 2025-11-07T10:13:51Z

Are you saying your workflow fully works without the patch and this causes regression or are you saying you have the same error as the report, and now with this you get a different error?

I get the following error now (without the above patch):
CLIPTextEncode 'GGUFModelPatcher' object has no attribute 'named_modules_to_munmap

With the above patch, when the workflow gets to the sampler, I get this:

torch.AcceleratorError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Undoing the patch gets me back up-and-running, but then I get the GGUFModelPatcher error again.

A possible workaround may be to run ComfyUI in --lowvram mode to avoid the GGUFModelPatcher error.

rattus128 · 2025-11-07T10:30:47Z

This is mostly what I was concerned about in my comment on the original PR as well:

defining a default for it next to mmap_released so it always exists

Could you try something like this instead? We should use mmap_released as the control logic and keep named_modules_to_munmap as a dictionary to avoid other edgecases by separating the two. I think that should work as well:

Fully updated to do this. Thanks

rattus128 · 2025-11-07T10:36:54Z

Are you saying your workflow fully works without the patch and this causes regression or are you saying you have the same error as the report, and now with this you get a different error?

I get the following error now (without the above patch): CLIPTextEncode 'GGUFModelPatcher' object has no attribute 'named_modules_to_munmap

With the above patch, when the workflow gets to the sampler, I get this:
torch.AcceleratorError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Undoing the patch gets me back up-and-running, but then I get the GGUFModelPatcher error again.

A possible workaround may be to run ComfyUI in --lowvram mode to avoid the GGUFModelPatcher error.

Ok there is another report of this at large here:

Comfy-Org/ComfyUI#10662 (comment)

I think you are getting further with this change here but are blocked by this second bug behind it.

Can you update your comfy core to the latest git as a fix was merged earlier today? It worked for some people but OP in that link still has a repro. If you can still reproduce this second bug can I get your workflow log and HW specs as I am short on reliable reproducers.

phaserblast · 2025-11-07T10:47:14Z

Can you update your comfy core to the latest git

git rev-parse HEAD
cf97b033ee80cf245b4592d42f89e6de67e409a4

Already on the latest. I haven't yet encountered the OP's error with my Wan 2.2 workflow since I set ComfyUI to --lowvram mode, but I still get the CUDA error if I patch nodes.py.

I'm on an AMD system, 64GB RAM + 4090 running Debian.

There are workflows where the same underlying model will be loaded twice by two different ModelPatchers. With the current code as-is, the first patcher will load fine and intercept the pins while removing already done pins from the free modules list. It then will load again with the second MP however, this one will not get the callbacks for the already pinned stuff and then rip through and .to().to() them all. This destroys the pinning setup while comfy core still keeps the modules registered as unpinned. This causes a crash on einval when core goes to unpin the not pinned tensors. Comfy core now has a guard against unpinning the not-pinned, however what I think is happening, is the RAM of the pin in freed in the .to().to() and then malloc re-allocates it to new tensors with cuda keeping its records of previous pinning. einval can happen when you try and unpin a tensor in the middle of the block which is definately possible on this use after free pattern.

rattus128 · 2025-11-07T14:11:47Z

I have updated this pull with a second change that fixes my reproducer of that cuda einval from

kijai/ComfyUI-KJNodes#430

@phaserblast if you want to give it a test to see if it helps your use case let me know either way.

nestflow · 2025-11-07T16:33:31Z

Hi everyone here, this PR has fixed the error for my test workflow in kijai/ComfyUI-KJNodes#430, but it still gives the same error if I add a LoraLoader node as below: bugged_3.json

and the log is pretty much the same. I also suggest to check if TorchCompileModel (native) and TorchCompileModelWanVideoV2 (in KJNodes) works with it since it's another common node of WanVideo workflow, but I don't use it right now.

city96 · 2025-11-07T17:46:05Z

I can't re-test at the moment, but the current changes look good to me. I think we can merge this and maybe do a follow-up PR for the LoRA/torch compile code for any fixes that are needed there, just to get main into a working state for the time being if that works for you @rattus128

rattus128 · 2025-11-07T21:54:17Z

Hi everyone here, this PR has fixed the error for my test workflow in kijai/ComfyUI-KJNodes#430, but it still gives the same error if I add a LoraLoader node as below: bugged_3.json

and the log is pretty much the same. I also suggest to check if TorchCompileModel (native) and TorchCompileModelWanVideoV2 (in KJNodes) works with it since it's another common node of WanVideo workflow, but I don't use it right now.

Hi, I have a reproducer with the workflow although its only on a rerun when I change settings.

Can I get a repaste of your full error since changes. Also include the model load statistics as this issue is sensitive to these numbers (an it helps me adapt from your 4090 which I dont have).

Something like:

got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
Using scaled fp8: fp8 matrix mult: False, scale input: False
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
loaded completely; 30235.11 MB usable, 6419.48 MB loaded, full load: True
Requested to load WanVAE
loaded completely; 13917.84 MB usable, 242.03 MB loaded, full load: True
gguf qtypes: F16 (694), Q8_0 (400), F32 (1)
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
loaded partially; 347.74 MB usable, 347.73 MB loaded, 14477.73 MB offloaded, lowvram patches: 0
100%|██████████| 1/1 [00:26<00:00, 26.43s/it]
Requested to load WAN21
0 models unloaded.
loaded partially; 347.73 MB usable, 347.73 MB loaded, 14477.73 MB offloaded, lowvram patches: 0
100%|██████████| 1/1 [00:26<00:00, 26.95s/it]
Requested to load WanVAE
loaded completely; 18185.77 MB usable, 242.03 MB loaded, full load: True
Prompt executed in 94.00 seconds
got prompt
Requested to load WAN21
loaded completely; 16902.20 MB usable, 14825.47 MB loaded, full load: True
100%|██████████| 1/1 [00:03<00:00,  3.41s/it]
Requested to load WAN21
loaded completely; 17061.20 MB usable, 14825.47 MB loaded, full load: True
100%|██████████| 1/1 [00:06<00:00,  6.95s/it]
Requested to load WanVAE
loaded completely; 3864.66 MB usable, 242.03 MB loaded, full load: True
got prompt
got prompt
Prompt executed in 35.49 seconds
Prompt executed in 0.00 seconds
Prompt executed in 0.00 seconds
got prompt
got prompt
Requested to load WAN21
0 models unloaded.
loaded completely; 14827.84 MB usable, 14825.47 MB loaded, full load: True
  0%|          | 0/1 [00:00<?, ?it/s]
!!! Exception during processing !!! Allocation on device 
Traceback (most recent call last):
  File "/home/rattus/ComfyUI/execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/home/rattus/ComfyUI/execution.py", line 286, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 658, in sample
    samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise_seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/sample.py", line 65, in sample_custom
    samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 1053, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 1035, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 997, in outer_sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 980, in inner_sample
    samples = executor.execute(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 752, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/k_diffusion/sampling.py", line 199, in sample_euler
    denoised = model(x, sigma_hat * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 401, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 953, in __call__
    return self.outer_predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 960, in outer_predict_noise
    ).execute(x, timestep, model_options, seed)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 963, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 381, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 206, in calc_cond_batch
    return _calc_cond_batch_outer(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 214, in _calc_cond_batch_outer
    return executor.execute(model, conds, x_in, timestep, model_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 326, in _calc_cond_batch
    output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/model_base.py", line 161, in apply_model
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/model_base.py", line 203, in _apply_model
    model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ldm/wan/model.py", line 627, in forward
    return comfy.patcher_extension.WrapperExecutor.new_class_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ldm/wan/model.py", line 647, in _forward
    return self.forward_orig(x, timestep, context, clip_fea=clip_fea, freqs=freqs, transformer_options=transformer_options, **kwargs)[:, :, :t, :h, :w]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ldm/wan/model.py", line 580, in forward_orig
    x = block(x, e=e0, freqs=freqs, context=context, context_img_len=context_img_len, transformer_options=transformer_options)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/ldm/wan/model.py", line 245, in forward
    y = self.ffn(torch.addcmul(repeat_e(e[3], x), self.norm2(x), 1 + repeat_e(e[4], x)))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/container.py", line 250, in forward
    input = module(input)
            ^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/venv2/lib/python3.12/site-packages/torch/nn/modules/activation.py", line 816, in forward
    return F.gelu(input, approximate=self.approximate)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: Allocation on device 

Got an OOM, unloading all loaded models.
Prompt executed in 22.44 seconds
Requested to load WAN21
loaded partially; 3653.83 MB usable, 3650.10 MB loaded, 11175.37 MB offloaded, lowvram patches: 0
!!! Exception during processing !!! CUDA error: part or all of the requested memory range is already mapped
Search for `cudaErrorHostMemoryAlreadyRegistered' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "/home/rattus/ComfyUI/execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "/home/rattus/ComfyUI/execution.py", line 286, in process_inputs
    result = f(**inputs)
             ^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 658, in sample
    samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise_seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/sample.py", line 65, in sample_custom
    samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 1053, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 1035, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rattus/ComfyUI/comfy/samplers.py", line 990, in outer_sample
    noise = noise.to(device)
            ^^^^^^^^^^^^^^^^
torch.AcceleratorError: CUDA error: part or all of the requested memory range is already mapped
Search for `cudaErrorHostMemoryAlreadyRegistered' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

rattus128 · 2025-11-07T21:59:33Z

I can't re-test at the moment, but the current changes look good to me. I think we can merge this and maybe do a follow-up PR for the LoRA/torch compile code for any fixes that are needed there, just to get main into a working state for the time being if that works for you @rattus128

Yes I think so. If we need to go again here ill do another PR. Just this much will help a significant number of users.

nestflow · 2025-11-08T01:00:03Z

Hi @rattus128 thanks again for your help! I updated the main Comfy repo and this custom node to the latest and did another test. (Besides, I actually have a 5070 Ti rather than a 4090.) Here is a more complete log which errors in the first run:

got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
loaded completely; 13401.80 MB usable, 10835.48 MB loaded, full load: True
Requested to load WanVAE
loaded completely; 327.00 MB usable, 242.03 MB loaded, full load: True
gguf qtypes: F16 (694), Q8_0 (400), F32 (1)
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
loaded partially; 218.89 MB usable, 215.06 MB loaded, 14610.41 MB offloaded, lowvram patches: 0
100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [02:30<00:00, 150.49s/it]
Requested to load WAN21
!!! Exception during processing !!! CUDA error: invalid argument
Search for 'cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "D:\Projects\ComfyUI\execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "D:\Projects\ComfyUI\execution.py", line 286, in process_inputs
    result = f(**inputs)
  File "D:\Projects\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 658, in sample
    samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise_seed)
  File "D:\Projects\ComfyUI\comfy\sample.py", line 65, in sample_custom
    samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1053, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1035, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\samplers.py", line 984, in outer_sample
    self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling
    return executor.execute(model, noise_shape, conds, model_options=model_options)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling
    comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu
    loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_management.py", line 506, in model_load
    self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram
    return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 919, in partially_load
    self.unpatch_model(self.offload_device, unpatch_weights=unpatch_weights)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 77, in unpatch_model
    return super().unpatch_model(device_to=device_to, unpatch_weights=unpatch_weights)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 832, in unpatch_model
    self.model.to(device_to)
    ~~~~~~~~~~~~~^^^^^^^^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1371, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 957, in _apply
    param_applied = fn(param)
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1357, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^
  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to
    new = super().to(*args, **kwargs)
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\_tensor.py", line 1654, in __torch_function__
    ret = func(*args, **kwargs)
torch.AcceleratorError: CUDA error: invalid argument
Search for `cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Prompt executed in 194.29 seconds

phaserblast · 2025-11-08T02:53:09Z

@phaserblast if you want to give it a test to see if it helps your use case let me know either way.

Looks good. Tested it on both a 4090 and a GB10, no problems. This is the line that fixed it for me:

n.mmap_released = getattr(self, "mmap_released", False)

rattus128 · 2025-11-08T08:51:33Z

Hi @rattus128 thanks again for your help! I updated the main Comfy repo and this custom node to the latest and did another test. (Besides, I actually have a 5070 Ti rather than a 4090.) Here is a more complete log which errors in the first run:

got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
loaded completely; 13401.80 MB usable, 10835.48 MB loaded, full load: True
Requested to load WanVAE
loaded completely; 327.00 MB usable, 242.03 MB loaded, full load: True
gguf qtypes: F16 (694), Q8_0 (400), F32 (1)
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
loaded partially; 218.89 MB usable, 215.06 MB loaded, 14610.41 MB offloaded, lowvram patches: 0
100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [02:30<00:00, 150.49s/it]
Requested to load WAN21
!!! Exception during processing !!! CUDA error: invalid argument
Search for 'cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "D:\Projects\ComfyUI\execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "D:\Projects\ComfyUI\execution.py", line 286, in process_inputs
    result = f(**inputs)
  File "D:\Projects\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 658, in sample
    samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise_seed)
  File "D:\Projects\ComfyUI\comfy\sample.py", line 65, in sample_custom
    samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1053, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1035, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\samplers.py", line 984, in outer_sample
    self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling
    return executor.execute(model, noise_shape, conds, model_options=model_options)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling
    comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu
    loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_management.py", line 506, in model_load
    self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram
    return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 919, in partially_load
    self.unpatch_model(self.offload_device, unpatch_weights=unpatch_weights)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 77, in unpatch_model
    return super().unpatch_model(device_to=device_to, unpatch_weights=unpatch_weights)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 832, in unpatch_model
    self.model.to(device_to)
    ~~~~~~~~~~~~~^^^^^^^^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1371, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 957, in _apply
    param_applied = fn(param)
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1357, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^
  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to
    new = super().to(*args, **kwargs)
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\_tensor.py", line 1654, in __torch_function__
    ret = func(*args, **kwargs)
torch.AcceleratorError: CUDA error: invalid argument
Search for `cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Prompt executed in 194.29 seconds

@nestflow I never got a first flow reproducer or the einval you pasted here. Im back in no reproducer purgatory.

I did reproduce an OOM that I have since PRd but it's not your error case.

I have implemented something of a sledgehammer fix to lock down the pinned memory pieces to the core and avoid custom node packs leaking memory or creating use-after-frees.

Can you give this branch of comfy core a go?

https://github.com/rattus128/ComfyUI/tree/wip/pin-lockdown#

If it still fails, paste log, if it work, can you expand the console in the comfyui GUI and paste me the success log too? (I added warming prints to the conditions we dont want to see but I compensated for).

Thanks for your help in tracking these down.

Depending on what happens we may want to cut a fresh issue somewhere.

nestflow · 2025-11-08T23:38:48Z

Hi @rattus128 thanks again for your help! I updated the main Comfy repo and this custom node to the latest and did another test. (Besides, I actually have a 5070 Ti rather than a 4090.) Here is a more complete log which errors in the first run:

got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
loaded completely; 13401.80 MB usable, 10835.48 MB loaded, full load: True
Requested to load WanVAE
loaded completely; 327.00 MB usable, 242.03 MB loaded, full load: True
gguf qtypes: F16 (694), Q8_0 (400), F32 (1)
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
loaded partially; 218.89 MB usable, 215.06 MB loaded, 14610.41 MB offloaded, lowvram patches: 0
100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [02:30<00:00, 150.49s/it]
Requested to load WAN21
!!! Exception during processing !!! CUDA error: invalid argument
Search for 'cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "D:\Projects\ComfyUI\execution.py", line 510, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\execution.py", line 324, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\execution.py", line 298, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "D:\Projects\ComfyUI\execution.py", line 286, in process_inputs
    result = f(**inputs)
  File "D:\Projects\ComfyUI\comfy_extras\nodes_custom_sampler.py", line 658, in sample
    samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise_seed)
  File "D:\Projects\ComfyUI\comfy\sample.py", line 65, in sample_custom
    samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1053, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\samplers.py", line 1035, in sample
    output = executor.execute(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed, latent_shapes=latent_shapes)
  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\samplers.py", line 984, in outer_sample
    self.inner_model, self.conds, self.loaded_models = comfy.sampler_helpers.prepare_sampling(self.model_patcher, noise.shape, self.conds, self.model_options)
                                                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 130, in prepare_sampling
    return executor.execute(model, noise_shape, conds, model_options=model_options)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\patcher_extension.py", line 112, in execute
    return self.original(*args, **kwargs)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\sampler_helpers.py", line 138, in _prepare_sampling
    comfy.model_management.load_models_gpu([model] + models, memory_required=memory_required + inference_memory, minimum_memory_required=minimum_memory_required + inference_memory)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_management.py", line 697, in load_models_gpu
    loaded_model.model_load(lowvram_model_memory, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_management.py", line 506, in model_load
    self.model_use_more_vram(use_more_vram, force_patch_weights=force_patch_weights)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_management.py", line 535, in model_use_more_vram
    return self.model.partially_load(self.device, extra_memory, force_patch_weights=force_patch_weights)
           ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 919, in partially_load
    self.unpatch_model(self.offload_device, unpatch_weights=unpatch_weights)
    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 77, in unpatch_model
    return super().unpatch_model(device_to=device_to, unpatch_weights=unpatch_weights)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Projects\ComfyUI\comfy\model_patcher.py", line 832, in unpatch_model
    self.model.to(device_to)
    ~~~~~~~~~~~~~^^^^^^^^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1371, in to
    return self._apply(convert)
           ~~~~~~~~~~~^^^^^^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 930, in _apply
    module._apply(fn)
    ~~~~~~~~~~~~~^^^^
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 957, in _apply
    param_applied = fn(param)
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\nn\modules\module.py", line 1357, in convert
    return t.to(
           ~~~~^
        device,
        ^^^^^^^
        dtype if t.is_floating_point() or t.is_complex() else None,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        non_blocking,
        ^^^^^^^^^^^^^
    )
    ^
  File "D:\Projects\ComfyUI\custom_nodes\ComfyUI-GGUF\ops.py", line 58, in to
    new = super().to(*args, **kwargs)
  File "D:\Projects\ComfyUI\.venv\Lib\site-packages\torch\_tensor.py", line 1654, in __torch_function__
    ret = func(*args, **kwargs)
torch.AcceleratorError: CUDA error: invalid argument
Search for `cudaErrorInvalidValue' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


Prompt executed in 194.29 seconds

@nestflow I never got a first flow reproducer or the einval you pasted here. Im back in no reproducer purgatory.

I did reproduce an OOM that I have since PRd but it's not your error case.

I have implemented something of a sledgehammer fix to lock down the pinned memory pieces to the core and avoid custom node packs leaking memory or creating use-after-frees.

Can you give this branch of comfy core a go?

https://github.com/rattus128/ComfyUI/tree/wip/pin-lockdown#

If it still fails, paste log, if it work, can you expand the console in the comfyui GUI and paste me the success log too? (I added warming prints to the conditions we dont want to see but I compensated for).

Thanks for your help in tracking these down.

Depending on what happens we may want to cut a fresh issue somewhere.

Hi @rattus128 , thanks for your continual following up on this issue. Yes this fix seemed to work and here is the success log:

got prompt
Using pytorch attention in VAE
Using pytorch attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load WanTEModel
loaded completely; 13401.80 MB usable, 10835.48 MB loaded, full load: True
Requested to load WanVAE
Unloading WanTEModel
loaded completely; 327.00 MB usable, 242.03 MB loaded, full load: True
gguf qtypes: F16 (694), Q8_0 (400), F32 (1)
model weight dtype torch.float16, manual cast: None
model_type FLOW
Requested to load WAN21
Unloading WanTEModel
Unloading WanVAE
2 idle models unloaded.
loaded partially; 218.89 MB usable, 215.06 MB loaded, 14610.41 MB offloaded, lowvram patches: 0
100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [02:32<00:00, 152.74s/it]
Requested to load WAN21
Unloading WAN21
1 active models unloaded for increased offloading.
loaded partially; 212.89 MB usable, 212.89 MB loaded, 14612.58 MB offloaded, lowvram patches: 0
100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [02:37<00:00, 157.89s/it]
Requested to load WanVAE
loaded completely; 1766.30 MB usable, 242.03 MB loaded, full load: True
Prompt executed in 362.12 seconds

RYG81 · 2025-11-09T17:07:51Z

I am having issue of crashing at VAE decode? can someone guide how to fix? I am not able to work on my workflow.

rattus128 · 2025-11-11T15:30:31Z

I am having issue of crashing at VAE decode? can someone guide how to fix? I am not able to work on my workflow.

what is your pytorch version? There is a theory that pytorch 2.7 has an issue. We are yet to confirm this one but we could use the data and its something you could try.

RYG81 · 2025-11-11T16:59:20Z

i had Pytorch 2.7 only
I just installed 2.8 and still crashing same way.

rattus128 mentioned this pull request Nov 6, 2025

Since the latest update, "CLIPTextEncode 'GGUFModelPatcher' object has no attribute 'named_modules_to_munmap" Comfy-Org/ComfyUI#10650

Closed

1 task

rattus128 force-pushed the prs/munmap-pins-2 branch from bf13cf4 to 4f5a547 Compare November 7, 2025 10:29

rattus128 marked this pull request as draft November 7, 2025 10:59

rattus128 marked this pull request as ready for review November 7, 2025 14:09

This was referenced Nov 7, 2025

WanVideoNAG node causes CUDA error with pinned memory enabled in latest ComfyUI kijai/ComfyUI-KJNodes#430

Closed

Pinned memory causes error with GGUF model Comfy-Org/ComfyUI#10662

Closed

city96 merged commit 02dac86 into city96:main Nov 7, 2025

rattus128 mentioned this pull request Nov 8, 2025

Fix VRAM OOM for model upscaling flows Comfy-Org/ComfyUI#10684

Closed

add default named_modules_to_munmap variable #357

add default named_modules_to_munmap variable #357

Uh oh!

Conversation

rattus128 commented Nov 6, 2025

Uh oh!

RYG81 commented Nov 6, 2025

Uh oh!

city96 commented Nov 6, 2025

Uh oh!

phaserblast commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattus128 commented Nov 7, 2025

Uh oh!

phaserblast commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattus128 commented Nov 7, 2025

Uh oh!

rattus128 commented Nov 7, 2025

Uh oh!

phaserblast commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rattus128 commented Nov 7, 2025

Uh oh!

nestflow commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

city96 commented Nov 7, 2025

Uh oh!

rattus128 commented Nov 7, 2025

Uh oh!

rattus128 commented Nov 7, 2025

Uh oh!

nestflow commented Nov 8, 2025

Uh oh!

phaserblast commented Nov 8, 2025

Uh oh!

rattus128 commented Nov 8, 2025

Uh oh!

nestflow commented Nov 8, 2025

Uh oh!

RYG81 commented Nov 9, 2025

Uh oh!

rattus128 commented Nov 11, 2025

Uh oh!

RYG81 commented Nov 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

phaserblast commented Nov 7, 2025 •

edited

Loading

phaserblast commented Nov 7, 2025 •

edited

Loading

phaserblast commented Nov 7, 2025 •

edited

Loading

nestflow commented Nov 7, 2025 •

edited

Loading