Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device type privateuseone is not supported for torch.Generator() api #970

Closed
JarekDerp opened this issue Nov 17, 2023 · 20 comments
Closed
Labels
duplicate This issue or pull request already exists

Comments

@JarekDerp
Copy link

Describe the problem
I have AMD card, 6700 XT and I'm running on Windows 11.
When trying to generate an image I get "Device type privateuseone is not supported for torch.Generator() api. "In the console log below, in line 11, it says that it recognized my device as "Device: privateuseone" and I think that might be the issue.

script "brownian_interval.py" checks in line 52 if the "device is none" and then assigns "device = torch.device("cpu")" but it's not working since it's recognizing some device that it shouldn't recognize.

I think the problem is in the Fooocus/backend/headless/fcbh/model_management.py script, either lines 69(nice):83 or 241:259.
Also, there's an issue with calculating available VRAM. My card has 12GB of VRAM but it's reported in the console log below that I have 1024 MB, probably caused by line 95 that says "mem_total = 1024 * 1024 * 1024 #TODO"

Any chance of getting it fixed? I have no idea about python so I can't do much :/

Full Console Log
[System ARGV] ['E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\launch.py', '--preset', 'realistic', '--normalvram', '--directml', '--disable-xformers', '--auto-launch']
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Fooocus version: 2.1.820
Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch().
Using directml with device:
Total VRAM 1024 MB, total RAM 32637 MB
Set vram state to: NORMAL_VRAM
Disabling smart memory management
Device: privateuseone
VAE dtype: torch.float32
Using sub quadratic optimization for cross attention, if you have memory or speed issues try using: --use-split-cross-attention
Refiner unloaded.
model_type EPS
adm 2816
Using split attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using split attention in VAE
extra keys {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors
Request to load LoRAs [['SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors].
Loaded LoRA [E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors] with 1052 keys at weight 0.25.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 2.07 seconds
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 3.0
[Parameters] Seed = 7948768698594532830
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
Request to load LoRAs [('SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors', 0.25), ('None', 1), ('None', 1), ('None', 1), ('None', 1)] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors].
Loaded LoRA [E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\models\loras\SDXL_FILM_PHOTOGRAPHY_STYLE_BetaV0.4.safetensors] for model [E:\StabilityMatrix-win-x64\Data\Models\StableDiffusion\realisticStockPhoto_v10.safetensors] with 1052 keys at weight 0.25.
Requested to load SDXLClipModel
Loading 1 new model
unload clone 1
[Fooocus Model Management] Moving model(s) has taken 1.95 seconds
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] xxxxxxxxx
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] xxxxxxxxx
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
Preparation time: 15.39 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.02916753850877285, sigma_max = 14.614643096923828
Traceback (most recent call last):
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\async_worker.py", line 733, in worker
handler(task)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\async_worker.py", line 665, in handler
imgs = pipeline.process_diffusion(
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
Total time: 77.84 seconds
return func(*args, **kwargs)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\default_pipeline.py", line 312, in process_diffusion
modules.patch.BrownianTreeNoiseSamplerPatched.global_init(
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\modules\patch.py", line 169, in global_init
BrownianTreeNoiseSamplerPatched.tree = BatchedBrownianTree(x, t0, t1, seed, cpu=cpu)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in init
self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed]
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\backend\headless\fcbh\k_diffusion\sampling.py", line 85, in
self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed]
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\derived.py", line 155, in init
self._interval = brownian_interval.BrownianInterval(t0=t0,
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\brownian_interval.py", line 554, in init
W = self._randn(initial_W_seed) * math.sqrt(t1 - t0)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\brownian_interval.py", line 248, in _randn
return _randn(size, self._top._dtype, self._top._device, seed)
File "E:\StabilityMatrix-win-x64\Data\Packages\Fooocus\venv\lib\site-packages\torchsde_brownian\brownian_interval.py", line 31, in _randn
generator = torch.Generator(device).manual_seed(int(seed))
RuntimeError: Device type privateuseone is not supported for torch.Generator() api.

@MorningKek
Copy link

MorningKek commented Nov 21, 2023

I have an AMD Radeon RX 6600, using Fooocus on Windows 10. Same issue.

RuntimeError: Device type privateuseone is not supported for torch.Generator() api.

@llnqdx
Copy link

llnqdx commented Nov 21, 2023

I have an AMD Radeon RX 7800xt, using Fooocus on Windows 10. Same issue.

RuntimeError: Device type privateuseone is not supported for torch.Generator() api.

@sappelhoff
Copy link

Same problem for me:

  • AMD Radeon RX 5700
  • Windows 11

RuntimeError: Device type privateuseone is not supported for torch.Generator() api.

@JarekDerp
Copy link
Author

JarekDerp commented Nov 21, 2023

Thanks for confirming that I'm not the only one with the problem. I was thinking that maybe I have something wrong with my PC configuration but looks like it's not an isolated case.
People who has it installed for month maybe don't have this issue. I made a fresh install so maybe one of the dependencies has updated and stopped working or something.
I have installed ComfyUI with DirectML in a separate environment and it's working fine, even though it's using the same packages as Fooocus.

@sappelhoff
Copy link

What were the exact steps you used to solve the problem @JarekDerp? Just a "fresh install" worked?

... and that's it? Or were there other steps you did?

@JarekDerp
Copy link
Author

@sappelhoff Sorry, I was a bit unclear on what I wanted to say. I modified my previous comment.
Basically, what I meant was that I made a fresh installation a couple of days ago and it's not working. Maybe there are people who installed it a couple of months ago and it's working for them because they didn't upgraded any pip packages.

Making a fresh installation of ComfyUI works fine while this one doesn't work. I'm not good with python but I'll try to compare the packages and see which one has different versions.

@MikeLP
Copy link

MikeLP commented Nov 22, 2023

As a temporary solution manually patch file "./python_embeded/Lib/site-packages/torchsde/_brownian/brownian_interval.py"

Find (31 line)

def _randn(size, dtype, device, seed):
    generator = torch.Generator(device).manual_seed(int(seed))
    return torch.randn(size, dtype=dtype, device=device, generator=generator)

and change to

def _randn(size, dtype, device, seed):
    generator = torch.Generator("cpu").manual_seed(int(seed))
    return torch.randn(size, dtype=dtype, device=device, generator=generator)

@MikeLP
Copy link

MikeLP commented Nov 22, 2023

One more possible issue in installation (which I had) could be this one

\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml
pause

Fix - add a missing dot.

.\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
.\python_embeded\python.exe -m pip install torch-directml
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml
pause

@MorningKek
Copy link

As a temporary solution manually patch file "./python_embeded/Lib/site-packages/torchsde/_brownian/brownian_interval.py"

Find (31 line)

def _randn(size, dtype, device, seed):
    generator = torch.Generator(device).manual_seed(int(seed))
    return torch.randn(size, dtype=dtype, device=device, generator=generator)

and change to

def _randn(size, dtype, device, seed):
    generator = torch.Generator("cpu").manual_seed(int(seed))
    return torch.randn(size, dtype=dtype, device=device, generator=generator)

Did that, a new error appears now.

Enter LCM mode.
[Fooocus] Downloading LCM components ...
[Parameters] Adaptive CFG = 1.0
[Parameters] Sharpness = 0.0
[Parameters] ADM Scale = 1.0 : 1.0 : 0.0
[Parameters] CFG = 1.0
[Parameters] Seed = 1228787800952501620
[Parameters] Sampler = lcm - lcm
[Parameters] Steps = 8 - 8
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] banana, highly detailed, vibrant colors, light, strong crisp, sharp focus, intricate, cinematic, full background, excellent composition, dynamic dramatic futuristic atmosphere, precise, aesthetic, very inspirational, stunning, rich vivid color, ambient epic, professional fine detail, clear, beautiful, creative, positive, attractive, unique, cute, artistic, wonderful, perfect, focused, confident
[Fooocus] Encoding positive #1 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (1024, 1024)
Preparation time: 5.17 seconds
Using lcm scheduler.
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.39970144629478455, sigma_max = 14.614640235900879
Requested to load SDXL
Loading 1 new model
ERROR diffusion_model.output_blocks.0.0.in_layers.2.weight Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available!
ERROR diffusion_model.output_blocks.0.0.in_layers.2.weight Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available!
Traceback (most recent call last):
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 803, in worker
handler(task)
File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\async_worker.py", line 735, in handler
imgs = pipeline.process_diffusion(
File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\default_pipeline.py", line 361, in process_diffusion
sampled_latent = core.ksampler(
File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Downloads\Fooocus_win64_2-1-791\python_embeded\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\core.py", line 315, in ksampler
samples = fcbh.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\sample.py", line 93, in sample
real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask)
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\sample.py", line 86, in prepare_sampling
fcbh.model_management.load_models_gpu([model] + models, model.memory_required(noise_shape) + inference_memory)
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\modules\patch.py", line 496, in patched_load_models_gpu
y = fcbh.model_management.load_models_gpu_origin(*args, **kwargs)
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_management.py", line 410, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_management.py", line 293, in model_load
raise e
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_management.py", line 289, in model_load
self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_patcher.py", line 191, in patch_model
temp_weight = fcbh.model_management.cast_to_device(weight, device_to, torch.float32, copy=True)
File "D:\Downloads\Fooocus_win64_2-1-791\Fooocus\backend\headless\fcbh\model_management.py", line 532, in cast_to_device
return tensor.to(device, copy=copy).to(dtype)
RuntimeError: Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available!
Total time: 9.84 seconds

@JarekDerp
Copy link
Author

JarekDerp commented Nov 22, 2023

Could not allocate tensor with 117964800 bytes. There is not enough GPU video memory available!

Yeah, I already tried that days ago and had the same result. My card has 12GB so 112MB shouldn't be a problem. Forcing it to be CPU in this one place doesn't fix the whole script.
I think the problem is finding the GPU card you have, it's address or hardware id or whatever, and then making it available for image generation.

I tried a couple of things (I'm a complete python noob btw) and in some instances I got error saying that part of the work was assigned to GPU and part to CPU and it had some problems with tensors, so it's way above my head.

@JarekDerp
Copy link
Author

I run ComfyUI and the file "model_management.py" look nearly identical, it's a file I suspected that's wrong. The output in the beginning's the same:

Using directml with device:
Total VRAM 1024 MB, total RAM 32637 MB
Set vram state to: NORMAL_VRAM
Device: privateuseone

But ComfyUI works and this one doesn't. Even the installed packages are almost the same, only torchsde is a different version.

Fooocus
torch
torch_directml
torch_directml_native.cp310-win_amd64.pyd
torch_directml-0.2.0.dev230426.dist-info
torch-2.0.0.dist-info
torchgen
torchmetrics
torchmetrics-1.2.0.dist-info
torchsde
torchsde-0.2.5.dist-info
torchvision
torchvision-0.15.1.dist-info

Comfy
torch
torch_directml
torch_directml_native.cp310-win_amd64.pyd
torch_directml-0.2.0.dev230426.dist-info
torch-2.0.0.dist-info
torchgen
torchsde
torchsde-0.2.6.dist-info
torchvision
torchvision-0.15.1.dist-info

But even after running pip install --upgrade torchsde==0.2.6 it doesn't work. I search the entire solutions and the implementation of directml is nearly identical.

Also tried running it with parameters 'E:\\StabilityMatrix-win-x64\\Data\\Packages\\Fooocus\\launch.py', '--preset', 'realistic', '--disable-xformers', '--cpu' and it run fine, although it took 55s/it on my Ryzen 5 5500 CPU and took 29-31 out of 32GB of RAM on my system.

I have one more suspicion. ComfyUI doesn't give any messages when rendering 512x512 pictures, but when I selected 896×1152 like Fooocus likes to use, it started complaining a lot and then decided to do it anyway. Although it took about 5x longer than a regular 512x512 image (speed 5.5s/it instead of 1s/it). I don't know how to inject a 512x512 image resolution into Fooocus to test if it would work with this 1:1 aspect ratio.

@JarekDerp
Copy link
Author

JarekDerp commented Nov 22, 2023

Well, as I was typing my previous comment, ComfyUI gave me the same error!

Error occurred when executing VAEDecode:
Could not allocate tensor with 264241152 bytes. There is not enough GPU video memory available!

The weird thing is that KSampler generated the image but the VAE Decode node failed to display it. Which only confirms my theory about unregular/too large image sizes fails on AMD and torch-directml.

@Lira2423
Copy link

If you(like I do) just want to run model on cpu, change func(line 90) in file ...\Fooocus\launch.py into

def ini_fcbh_args():
    from args_manager import args
    args.cpu = True
    return args

Unfortunately I dont know how to make amd gpu work :(

@JarekDerp
Copy link
Author

I managed to run it on a 6700XT GPU, it was quite slow, 3-4s/it when generating 512x512 image.

But it only generates 1-2 images and then stops working due to lack of VRAM because it's doing poor job of clearing the VRAM after each run. Even setting --normalvram or -lowvram or even -novram doesn't work. It either fills up your entire VRAM and then fails to run, or ignores your config and tries to allocate work to CUDA.

This is rubbish. I'm uninstalling it and I will be using ComfyUI instead. Not worth my time.

@JarekDerp
Copy link
Author

In some specific situations, the same appears in ComfyUI. I'm running into this problem as Torch DirectML is reserving almost all of the VRAM in the GPU when it starts. So then when you are trying to run encoder/decoder then it gives you error saying it cannot allocate enough memory in the VRAM because all of it is reserved ('reserved', not necessarily 'used'... So even if you load up a checkpoint that is only 2GB and you have 12GB card then it's still reserving like 97% of the card's VRAM) for the checkpoints and Loras.
I'm looking into the problem and trying to find a solution but I'm not a pro and don't even know python so I probably won't be able to figure it out.

@mashb1t
Copy link
Collaborator

mashb1t commented Dec 31, 2023

Duplicate of #763.
Please be aware that in 8e62a72 (latest Fooocus version 2.1.857) AMD with >= 8GB VRAM is now supported.
Please try with min. 8GB VRAM allocated.

@mashb1t mashb1t closed this as not planned Won't fix, can't repro, duplicate, stale Dec 31, 2023
@mashb1t mashb1t added the duplicate This issue or pull request already exists label Dec 31, 2023
@JarekDerp
Copy link
Author

JarekDerp commented Jan 1, 2024

Duplicate of #763. Please be aware that in 8e62a72 (latest Fooocus version 2.1.857) AMD with >= 8GB VRAM is now supported. Please try with min. 8GB VRAM allocated.

Wow, nice. Works quite well on the "Extra Speed" setting. Thanks for the hard work. I would just mention somewhere that you'd still need about 32GB of RAM to run it in DirectML mode.

@mashb1t
Copy link
Collaborator

mashb1t commented Jan 1, 2024

@JarekDerp as of https://github.com/lllyasviel/Fooocus?tab=readme-ov-file#minimal-requirement you should only need 8GB RAM. Is the resource consumption of Fooocus significantly off on your machine?

@JarekDerp
Copy link
Author

@mashb1t well, yes. It's using as much RAM as if I was running it on CPU, even though I'm running it on 6700xt that has 12gb VRAM. I have 32GB of RAM and it gets filled up almost completely when running image generation. One time I even noticed memory thrashing -where Windows saves some stuff into virtual memory on my SSD because it run out of RAM.

Basically it's using up my 32gb of RAM and 12gb of VRAM. But at least the image generation is quite fast and it doesn't give me "out of memory" errors anymore. Since posting the initial question I learned a lot about python, Directml, pytorch and stable diffusion in general. I managed to avoid these problems in Comfyui by using tiled decoder but it still fails sometimes with bigger images so I'm curious how you managed to make it work here. I'll probably have a look at the code once I have some spare time.

BTW, I can paste you the content of the log, maybe I have something wrong with my settings. I have a feeling that it's loading the models multiple times or something. I tested many things - image generations, then interrupting it when noticed I have some wrong settings, restarting it again, then trying inpainting and outpainting, image variations and so on.

@mashb1t
Copy link
Collaborator

mashb1t commented Jan 2, 2024

@JarekDerp thank you for the analysis and insights. It would be great if you could provide the terminal output with reference to your issue comment in #1690, so this issue doesn't drift even more off-topic.
Much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

7 participants