-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMD generating takes 25 minutes #1958
Comments
Your Fooocus is generating images with CPU only. Thats the reason it takes so long. Have you read the readme concerning AMD GPUs? |
Yes, I followed the instructions. This is my run.bat:
note: gpu memory is being used on the generating process I've seen others running on amd gpus on windows, not sure what is happening |
I'm having the same problems on Windows with my AMD GPU too and still couldn't find out whats wrong. There's an open issue in the DirectML project here: I switched over to Google Colab as I couldn't get good results with my AMD GPU, neither on Windows nor on Linux. |
I'll keep an eye on that issue, thanks |
Maybe someone from the dev team joins and has a solution for that. I personally gave up on harassing my AMD GPU :-D |
@mpirescarvalho looks like it is using a GPU but one with only 1024 RAM? (It says it uses low vram mode) |
Negative, my processor doesnt have onboard graphics |
vram being "1024" is normal. same in comfyui. that's how the dml reports it. *** First of all I have to say thanks to the devs for finally building an app that I can use on windows to generate with SDXL models without crashing instantly or at best at second try. I tried sdwebui, sdnext, comfyui and only with sdnext I was able to gen but there app just gives out of memory instantly or at best scenerio 2nd try. With fooocus if I change a lot of models eventually the same out of memory errors pop up but if I use one or two models constantly I just get slow generation , no crashes at all ... I am using an rx 6600 8 gb and did various things to speed up the generation.
ALSO , That error "the operator 'aten::std_mean.correction' is not currently supported on the DML backend and will fall back to run on the CPU" pops up from time to time on comfyui too, and as far as I know in comfyui that error didn't actually affected the speed at all maybe a bit ?... I am not sure how much it effects the sdxl generation though ... As far as I know it is already slow *** Finally I also have to note here , I am using both my gpu and cpu at underpowered underclocked states. 6600 is 48w power limited and my 3600 is 40w power limited. With this limits in mind ::: Before I moved my swap to nvme , an 8 step LCM sampled "extreme speed" using the default model juggernaught... After moving the swap and making it 2 times my system memory (16 gb mem - 32 gb swap file on nvme) So, swap in nvme and two times system ram is very effective, lcm is effective or using turbo models. I am atm using normal models with lcm lora and lcm samplers in around 10-12 steps. Cfg 2 for turbo and 1.5 for lcm so that I can use negatives. |
@f0n51 thank you for providing the reference to DirectML and @patientx for your insights. microsoft/DirectML#536 (comment) already references to #1321, which i closed 2 weeks ago in #1321 (comment), as this is not an issue of Fooocus. We can still keep this issue open but i'd suggest to close it as there's nothing we can actively do. |
One thing to do is , once the 1st gen starts and that error pops up just skip or stop it, the next ones won't have it. Just tested a bit more and the first one always has that error and the step time is around 40 here too, but if I cancel it and start again this time step time starts around 30 and drops to around 20ish for me. /remember 48w power limited 6600/ so probably with 12gb vram and full power 6700 xt everything would be much faster. |
@mashb1t agreed |
Read Troubleshoot
[x] I admit that I have read the Troubleshoot before making this issue.
Describe the problem
Its working but its taking SUPER long to generate the images.
CPU: AMD Ryzen 7 5700X
RAM: 16 GB
SWAP: 44GB on M.2 SSD
GPU: AMD Radeon RX 6700 XT 12 GB VRAM
Full Console Log
C:\www\stable-diffusion\Fooocus>.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml
Already up-to-date
Update succeeded.
[System ARGV] ['Fooocus\entry_with_update.py', '--directml']
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Fooocus version: 2.1.862
Running on local URL: http://127.0.0.1:7865
To create a public link, set
share=True
inlaunch()
.Using directml with device:
Total VRAM 1024 MB, total RAM 16310 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: privateuseone
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --attention-split
Refiner unloaded.
model_type EPS
UNet ADM Dimension 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: C:\www\stable-diffusion\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [C:\www\stable-diffusion\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors].
Loaded LoRA [C:\www\stable-diffusion\Fooocus\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [C:\www\stable-diffusion\Fooocus\Fooocus\models\checkpoints\juggernautXL_version6Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 3435339128246104584
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] cat in spacesuit, light shining, intricate, elegant, sharp focus, professional color, highly detailed, sublime, innocent, dramatic, cinematic, new classic, beautiful, dynamic, attractive, cute, epic, stunning, brilliant, creative, positive, artistic, awesome, confident, colorful, shiny, iconic, cool, best, pure, quiet, lovely, great, relaxed
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] cat in spacesuit, light flowing colors, extremely detailed, beautiful, intricate, elegant, sharp focus, highly detail, dramatic cinematic perfect, open color, inspired, rich deep vivid vibrant scenic full atmosphere, professional composition, stunning, magical, amazing, creative, wonderful, epic, hopeful, awesome, brilliant, surreal, symmetry, ambient, best, pure, fine, very
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 12.77 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 64.0
[Fooocus Model Management] Moving model(s) has taken 70.08 seconds
0%| | 0/30 [00:00<?, ?it/s]C:\www\stable-diffusion\Fooocus\Fooocus\modules\anisotropic.py:132: UserWarning: The operator 'aten::std_mean.correction' is not currently supported on the DML backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at D:\a_work\1\s\pytorch-directml-plugin\torch_directml\csrc\dml\dml_cpu_fallback.cpp:17.)
s, m = torch.std_mean(g, dim=(1, 2, 3), keepdim=True)
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [11:59<00:00, 23.98s/it]
Requested to load AutoencoderKL
Loading 1 new model
loading in lowvram mode 64.0
[Fooocus Model Management] Moving model(s) has taken 1.60 seconds
Image generated with private log at: C:\www\stable-diffusion\Fooocus\Fooocus\outputs\2024-01-17\log.html
Generating and saving time: 795.84 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 64.0
[Fooocus Model Management] Moving model(s) has taken 58.36 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [24:17<00:00, 48.57s/it]
Requested to load AutoencoderKL
Loading 1 new model
loading in lowvram mode 64.0
[Fooocus Model Management] Moving model(s) has taken 1.25 seconds
Image generated with private log at: C:\www\stable-diffusion\Fooocus\Fooocus\outputs\2024-01-17\log.html
Generating and saving time: 1519.52 seconds
The text was updated successfully, but these errors were encountered: