-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Device type privateuseone is not supported for torch.Generator() api #970
Comments
I have an AMD Radeon RX 6600, using Fooocus on Windows 10. Same issue. RuntimeError: Device type privateuseone is not supported for torch.Generator() api. |
I have an AMD Radeon RX 7800xt, using Fooocus on Windows 10. Same issue. RuntimeError: Device type privateuseone is not supported for torch.Generator() api. |
Same problem for me:
|
Thanks for confirming that I'm not the only one with the problem. I was thinking that maybe I have something wrong with my PC configuration but looks like it's not an isolated case. |
What were the exact steps you used to solve the problem @JarekDerp? Just a "fresh install" worked?
... and that's it? Or were there other steps you did? |
@sappelhoff Sorry, I was a bit unclear on what I wanted to say. I modified my previous comment. Making a fresh installation of ComfyUI works fine while this one doesn't work. I'm not good with python but I'll try to compare the packages and see which one has different versions. |
As a temporary solution manually patch file "./python_embeded/Lib/site-packages/torchsde/_brownian/brownian_interval.py" Find (31 line) def _randn(size, dtype, device, seed):
generator = torch.Generator(device).manual_seed(int(seed))
return torch.randn(size, dtype=dtype, device=device, generator=generator) and change to def _randn(size, dtype, device, seed):
generator = torch.Generator("cpu").manual_seed(int(seed))
return torch.randn(size, dtype=dtype, device=device, generator=generator) |
One more possible issue in installation (which I had) could be this one \python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml
pause Fix - add a missing dot. .\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml
pause |
Did that, a new error appears now. Enter LCM mode. |
Yeah, I already tried that days ago and had the same result. My card has 12GB so 112MB shouldn't be a problem. Forcing it to be CPU in this one place doesn't fix the whole script. I tried a couple of things (I'm a complete python noob btw) and in some instances I got error saying that part of the work was assigned to GPU and part to CPU and it had some problems with tensors, so it's way above my head. |
I run ComfyUI and the file "model_management.py" look nearly identical, it's a file I suspected that's wrong. The output in the beginning's the same:
But ComfyUI works and this one doesn't. Even the installed packages are almost the same, only torchsde is a different version.
But even after running Also tried running it with parameters I have one more suspicion. ComfyUI doesn't give any messages when rendering 512x512 pictures, but when I selected 896×1152 like Fooocus likes to use, it started complaining a lot and then decided to do it anyway. Although it took about 5x longer than a regular 512x512 image (speed 5.5s/it instead of 1s/it). I don't know how to inject a 512x512 image resolution into Fooocus to test if it would work with this 1:1 aspect ratio. |
Well, as I was typing my previous comment, ComfyUI gave me the same error!
The weird thing is that KSampler generated the image but the VAE Decode node failed to display it. Which only confirms my theory about unregular/too large image sizes fails on AMD and torch-directml. |
If you(like I do) just want to run model on cpu, change func(line 90) in file ...\Fooocus\launch.py into
Unfortunately I dont know how to make amd gpu work :( |
I managed to run it on a 6700XT GPU, it was quite slow, 3-4s/it when generating 512x512 image. But it only generates 1-2 images and then stops working due to lack of VRAM because it's doing poor job of clearing the VRAM after each run. Even setting --normalvram or -lowvram or even -novram doesn't work. It either fills up your entire VRAM and then fails to run, or ignores your config and tries to allocate work to CUDA. This is rubbish. I'm uninstalling it and I will be using ComfyUI instead. Not worth my time. |
In some specific situations, the same appears in ComfyUI. I'm running into this problem as Torch DirectML is reserving almost all of the VRAM in the GPU when it starts. So then when you are trying to run encoder/decoder then it gives you error saying it cannot allocate enough memory in the VRAM because all of it is reserved ('reserved', not necessarily 'used'... So even if you load up a checkpoint that is only 2GB and you have 12GB card then it's still reserving like 97% of the card's VRAM) for the checkpoints and Loras. |
Wow, nice. Works quite well on the "Extra Speed" setting. Thanks for the hard work. I would just mention somewhere that you'd still need about 32GB of RAM to run it in DirectML mode. |
@JarekDerp as of https://github.com/lllyasviel/Fooocus?tab=readme-ov-file#minimal-requirement you should only need 8GB RAM. Is the resource consumption of Fooocus significantly off on your machine? |
@mashb1t well, yes. It's using as much RAM as if I was running it on CPU, even though I'm running it on 6700xt that has 12gb VRAM. I have 32GB of RAM and it gets filled up almost completely when running image generation. One time I even noticed memory thrashing -where Windows saves some stuff into virtual memory on my SSD because it run out of RAM. Basically it's using up my 32gb of RAM and 12gb of VRAM. But at least the image generation is quite fast and it doesn't give me "out of memory" errors anymore. Since posting the initial question I learned a lot about python, Directml, pytorch and stable diffusion in general. I managed to avoid these problems in Comfyui by using tiled decoder but it still fails sometimes with bigger images so I'm curious how you managed to make it work here. I'll probably have a look at the code once I have some spare time. BTW, I can paste you the content of the log, maybe I have something wrong with my settings. I have a feeling that it's loading the models multiple times or something. I tested many things - image generations, then interrupting it when noticed I have some wrong settings, restarting it again, then trying inpainting and outpainting, image variations and so on. |
@JarekDerp thank you for the analysis and insights. It would be great if you could provide the terminal output with reference to your issue comment in #1690, so this issue doesn't drift even more off-topic. |
Describe the problem
I have AMD card, 6700 XT and I'm running on Windows 11.
When trying to generate an image I get "Device type privateuseone is not supported for torch.Generator() api. "In the console log below, in line 11, it says that it recognized my device as "Device: privateuseone" and I think that might be the issue.
script "brownian_interval.py" checks in line 52 if the "device is none" and then assigns "device = torch.device("cpu")" but it's not working since it's recognizing some device that it shouldn't recognize.
I think the problem is in the Fooocus/backend/headless/fcbh/model_management.py script, either lines 69(nice):83 or 241:259.
Also, there's an issue with calculating available VRAM. My card has 12GB of VRAM but it's reported in the console log below that I have 1024 MB, probably caused by line 95 that says "mem_total = 1024 * 1024 * 1024 #TODO"
Any chance of getting it fixed? I have no idea about python so I can't do much :/
Full Console Log
[System ARGV] ['E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\launch.py', '--preset', 'realistic', '--normalvram', '--directml', '--disable-xformers', '--auto-launch']
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Fooocus version: 2.1.820
Running on local URL: http://127.0.0.1:7865
To create a public link, set
share=True
inlaunch()
.Using directml with device:
Total VRAM 1024 MB, total RAM 32637 MB
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: privateuseone
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
Refiner unloaded.
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra keys {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors
Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors].
Loaded LoRA [E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors] with 1052 keys at weight 0.25.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 2.07 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 3.0
[Parameters] Seed = 7948768698594532830
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
Request to load LoRAs [('SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25), ('None', 1), ('None', 1), ('None', 1), ('None', 1)] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors].
Loaded LoRA [E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors] with 1052 keys at weight 0.25.
Requested to load SDXLClipModel
Loading 1 new model
unload clone 1
[Fooocus Model Management] Moving model(s) has taken 1.95 seconds
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] xxxxxxxxx
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] xxxxxxxxx
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
Preparation time: 15.39 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828
Traceback (most recent call last):
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\async_worker.py", line 733, in worker
handler(task)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\async_worker.py", line 665, in handler
imgs = pipeline.process_diffusion(
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
Total time: 77.84 seconds
return func(*args, **kwargs)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\default_pipeline.py", line 312, in process_diffusion
modules.patch.BrownianTreeNoiseSamplerPatched.global_init(
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\patch.py", line 169, in global_init
BrownianTreeNoiseSamplerPatched.tree = BatchedBrownianTree(x, t0, t1, seed, cpu=cpu)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in init
self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed]
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in
self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed]
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\derived.py", line 155, in init
self._interval = brownian_interval.BrownianInterval(t0=t0,
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\brownian_interval.py", line 554, in init
W = self._randn(initial_W_seed) * math.sqrt(t1 - t0)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\brownian_interval.py", line 248, in _randn
return _randn(size, self._top._dtype, self._top._device, seed)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\brownian_interval.py", line 31, in _randn
generator = torch.Generator(device).manual_seed(int(seed))
RuntimeError: Device type privateuseone is not supported for torch.Generator() api.
The text was updated successfully, but these errors were encountered: