Enable Convolution AutoTuning #9301

contentis · 2025-08-12T13:01:16Z

This will test the Top10 algorithms returned by the cuDNN heuristic and select the fastest. On a 5090 I'm seeing

1.41x on SDXL
1.32x on SD15
1.21x on VAE Decoder (SDXL)

This does add a small overhead during the first inference - in my case this was ~200ms.

comfyanonymous · 2025-08-12T22:37:00Z

The overhead in my case during first inference on blackwell 6000 is ~15 seconds for an 1280x1680 SDXL workflow. Will have to decide if this should be enabled by default or behind --fast

contentis · 2025-08-13T06:11:19Z

Was this on Windows or Linux? I retested on an RTX6000, SDLX @ 1280x1680, Ubuntu 24.04 and saw the following:

SDXL

With AutoTuning
- First Run: 6690.49ms
- Second Run 3168.08ms
Without
- First Run: 5524.88ms
- Second Run: 4161.28ms

One thing I noticed is that the alloc backend has a large influence on this - I was using native for all my testing. When using cuda-malloc the first run takes ~12006.10ms

…otuning

contentis · 2025-08-22T08:15:09Z

I added an option to the --fast autotune argument to make this feature opt-in for now.

comfyanonymous · 2025-08-29T18:03:18Z

can you rebase? there's something weird with your branch.

Arcitec · 2025-09-01T07:18:01Z

Sounds like a very good improvement. If it adds up to ~5 - 10 seconds on the first inference, that's not worth worrying about and should be on by default, since it saves so much time in general. People will make up that time after 2 images.

But it seems like it adds less than a second to the first run if a proper model loading backend is used. I guess the reason for longer delays is that cuda-malloc has some overhead that really adds up with repeated tests?

PS: The pull request shows a ton of changes. Needs a rebase. :')

…otuning

contentis · 2025-09-01T07:40:41Z

It was already re-based, github had issues updating the PR - I reset the target branch, it should now show up correctly.

Arcitec · 2025-09-01T07:45:09Z

Thank you, I can finally see the diff now. :) Oh, so cuDNN benchmarking is a built-in PyTorch feature. That's awesome. I'll try to make time to test this on my 3090 on Linux (the 5090 arrives this week so I can try both). I'm working on a lot of projects and my Comfy is currently outdated. I'll try to make time for it!

Edit: I didn't have time to test it before merge, oops. I was super busy. :')

FeepingCreature · 2025-09-02T09:08:35Z

The overhead is a lot higher on ROCm, at least my 7900 XTX, taking over a minute to tune a basic SDXL kernel. So this should probably be turned off on AMD unless explicitly requested.

…e-update * commit '4f5812b93712e0f52ae8fe80a89e8b5e7d0fa309': (77 commits) Update template to 0.1.73 (comfyanonymous#9686) ImageScaleToMaxDimension node. (comfyanonymous#9689) Accept prompt_id in interrupt handler (comfyanonymous#9607) uso -> uxo/uno as requested. (comfyanonymous#9688) USO style reference. (comfyanonymous#9677) Enable Convolution AutoTuning (comfyanonymous#9301) Implement the USO subject identity lora. (comfyanonymous#9674) Probably not necessary anymore. (comfyanonymous#9646) SEEDS: update noise decomposition and refactor (comfyanonymous#9633) convert Primitive nodes to V3 schema (comfyanonymous#9372) convert nodes_stability.py to V3 schema (comfyanonymous#9497) convert Video nodes to V3 schema (comfyanonymous#9489) convert Stable Cascade nodes to V3 schema (comfyanonymous#9373) ComfyUI version 0.3.56 Lower ram usage on windows. (comfyanonymous#9628) ComfyUI v0.3.55 Update template to 0.1.70 (comfyanonymous#9620) Trim audio to video when saving video. (comfyanonymous#9617) Support the 5B fun inpaint model. (comfyanonymous#9614) Support wan2.2 5B fun control model. (comfyanonymous#9611) ...

Windecay · 2025-09-30T19:07:45Z

if use autotune...task start very slow. On rtx 5090 prepare 10s sampling 3s. aha.

Enable Convolution AutoTuning

f91a510

contentis requested a review from comfyanonymous as a code owner August 12, 2025 13:01

contentis added 3 commits August 22, 2025 10:10

Enable Convolution AutoTuning

b6d8805

Merge remote-tracking branch 'public/cudnn_autotuning' into cudnn_aut…

ad92e3f

…otuning

Make Convolution AutoTuning opt-in for now

771b337

contentis requested review from Kosinkadink, christian-byrne, guill, ltdrdata, pythongosssss, robinjhuang, webfiltered and yoland68 as code owners August 22, 2025 08:12

contentis mentioned this pull request Sep 1, 2025

SDPA backend priority #9299

Merged

contentis added 3 commits September 1, 2025 09:30

Enable Convolution AutoTuning

3fd98ba

Make Convolution AutoTuning opt-in for now

205708a

Merge remote-tracking branch 'public/cudnn_autotuning' into cudnn_aut…

6af1872

…otuning

comfyanonymous merged commit e2d1e5d into comfyanonymous:master Sep 2, 2025
7 checks passed

MagicShiba mentioned this pull request Sep 20, 2025

Massive hangups at each step of the workflow for the first run after autotune was enabled by default with --fast flag #9779

Open

1 task

toxicwind pushed a commit to toxicwind/ComfyUI that referenced this pull request Oct 12, 2025

Enable Convolution AutoTuning (comfyanonymous#9301)

362c7e6

adlerfaulkner pushed a commit to LucaLabsInc/ComfyUI that referenced this pull request Oct 16, 2025

Enable Convolution AutoTuning (comfyanonymous#9301)

d19f156

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable Convolution AutoTuning #9301

Enable Convolution AutoTuning #9301

Uh oh!

contentis commented Aug 12, 2025

Uh oh!

comfyanonymous commented Aug 12, 2025 •

edited

Loading

Uh oh!

contentis commented Aug 13, 2025 •

edited

Loading

Uh oh!

contentis commented Aug 22, 2025

Uh oh!

comfyanonymous commented Aug 29, 2025

Uh oh!

Arcitec commented Sep 1, 2025 •

edited

Loading

Uh oh!

contentis commented Sep 1, 2025

Uh oh!

Arcitec commented Sep 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

FeepingCreature commented Sep 2, 2025

Uh oh!

Windecay commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Enable Convolution AutoTuning #9301

Enable Convolution AutoTuning #9301

Uh oh!

Conversation

contentis commented Aug 12, 2025

Uh oh!

comfyanonymous commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

contentis commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

contentis commented Aug 22, 2025

Uh oh!

comfyanonymous commented Aug 29, 2025

Uh oh!

Arcitec commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

contentis commented Sep 1, 2025

Uh oh!

Arcitec commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

FeepingCreature commented Sep 2, 2025

Uh oh!

Windecay commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

comfyanonymous commented Aug 12, 2025 •

edited

Loading

contentis commented Aug 13, 2025 •

edited

Loading

Arcitec commented Sep 1, 2025 •

edited

Loading

Arcitec commented Sep 1, 2025 •

edited

Loading