Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesla p4 and m60 forced into low vram mode [Bug]: #2661

Closed
5 tasks done
jhemley opened this issue Mar 29, 2024 · 21 comments
Closed
5 tasks done

Tesla p4 and m60 forced into low vram mode [Bug]: #2661

jhemley opened this issue Mar 29, 2024 · 21 comments
Labels
bug Something isn't working feedback pending Waiting for further information

Comments

@jhemley
Copy link

jhemley commented Mar 29, 2024

Checklist

  • The issue has not been resolved by following the troubleshooting guide
  • The issue exists on a clean installation of Fooocus
  • The issue exists in the current version of Fooocus
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

I have a tesla m60 and p4 running in a linux vm (same problem occured on windows) ive tried running them but it always runs in low vram mode.

Steps to reproduce the problem

run conda activate fooocus
python entry_with_update.py --listen

What should have happened?

I think it shouldnt run in low vram mode(correct me if im wrong) it runs just fine on my 2080maxq but has these lowvram problems on the tesla cards i have tested with.

What browsers do you use to access Fooocus?

Mozilla Firefox

Where are you running Fooocus?

Locally with virtualization (e.g. Docker)

What operating system are you using?

ubuntu20.4 and windows 10

Console logs

python entry_with_update.py --always-normal-vram --listen
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--always-normal-vram', '--listen']
Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
Fooocus version: 2.3.1
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Total VRAM 8116 MB, total RAM 64308 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 Tesla P4 : native
VAE dtype: torch.float32
Using pytorch cross attention
Refiner unloaded.
Running on local URL:  http://0.0.0.0:7865

To create a public link, set `share=True` in `launch()`.
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids'}
Base model loaded: /home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors].
Loaded LoRA [/home/jared/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Started worker with PID 2184
App started successful. Use the app with http://localhost:7865/ or 0.0.0.0:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 59858353226061117
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] make a picture of a lambo, cinematic, phenomenal, creative, dynamic, dramatic, thought, epic, elegant, intricate, detailed, extremely light, shining, complimentary colors, shiny, glowing, winning, grand elaborate complex, highly decorated, open flowing, deep color, very beautiful, symmetry, great composition, atmosphere, perfect, artistic, innocent, inspiring, unique
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] make a picture of a lambo, detailed, elegant, holy, impressive, noble, gorgeous, amazing, fancy, dramatic, colorful, very inspirational, beautiful, illuminated background, epic composition, magical atmosphere, cinematic, symmetry, pure, solid colors, extremely, highly complex, determined, imposing, futuristic, professional, artistic, creative, vibrant, fine detail, color
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 9.55 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 5363.8427734375
[Fooocus Model Management] Moving model(s) has taken 9.02 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [04:40<00:00,  9.35s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.39 seconds
Image generated with private log at: /home/jared/Fooocus/outputs/2024-03-29/log.html
Generating and saving time: 294.14 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
loading in lowvram mode 5331.772085189819
[Fooocus Model Management] Moving model(s) has taken 8.64 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 30/30 [04:44<00:00,  9.49s/it]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.33 seconds
Image generated with private log at: /home/jared/Fooocus/outputs/2024-03-29/log.html
Generating and saving time: 297.74 seconds
Total time: 601.53 seconds

Additional information

current version
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P4 Off | 00000000:0B:00.0 Off | Off |
| N/A 75C P0 41W / 75W | 6458MiB / 8192MiB | 47% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2184 C python 6456MiB |
+----------------------------------------------------------------------------
i have also tried 550

@jhemley jhemley added bug Something isn't working triage This needs an (initial) review labels Mar 29, 2024
@mashb1t
Copy link
Collaborator

mashb1t commented Mar 31, 2024

As you can see in https://github.com/lllyasviel/Fooocus/blob/main/ldm_patched/modules/model_management.py#L429-L430 the trigger for lowvram mode is model_size > (current_free_mem - inference_memory).
Please check the model size and debug the other parameters in given code by adding a breakpoint and using python debugger or by prompting the values to further debug. Thanks!

@mashb1t mashb1t added feedback pending Waiting for further information and removed triage This needs an (initial) review labels Mar 31, 2024
@jhemley
Copy link
Author

jhemley commented Mar 31, 2024

Is there anyway to simply force the model because i belive i have enough vram. I tried the force normal and high vram but they didnt work.

@mashb1t
Copy link
Collaborator

mashb1t commented Mar 31, 2024

Please check the model size and debug the other parameters in given code by adding a breakpoint and using python debugger or by prompting the values to further debug. Thanks!

Please debug this yourself and provide further information.

@jhemley
Copy link
Author

jhemley commented Mar 31, 2024

I think i located the problem. i think the telsa driver limits the vram usage to 8102 MiB instead of the 8192 on the card. I found this by disabling the lowvram mode by changing the param to this model_size > (99999999999999).
now it outputs this
python entry_with_update.py --listen
Already up-to-date
Update succeeded.
[System ARGV] ['entry_with_update.py', '--listen']
Python 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
Fooocus version: 2.3.1
[Cleanup] Attempting to delete content of temp dir /tmp/fooocus
[Cleanup] Cleanup successful
Total VRAM 8123 MB, total RAM 32100 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 Tesla M60 : native
VAE dtype: torch.float32
Using pytorch cross attention
Refiner unloaded.
Running on local URL: http://0.0.0.0:7865

Thanks for being a Gradio user! If you have questions or feedback, please join our Discord server and chat with us: https://discord.gg/feTf9x3ZSB

To create a public link, set share=True in launch().
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale'}
Base model loaded: /home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors].
Loaded LoRA [/home/jared/Fooocus/models/loras/sd_xl_offset_example-lora_1.0.safetensors] for UNet [/home/jared/Fooocus/models/checkpoints/juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cpu, use_fp16 = False.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Started worker with PID 1658
App started successful. Use the app with http://localhost:7865/ or 0.0.0.0:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 291429156536229784
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] make a car, bright colors, elegant, highly detailed, sharp focus, beautiful, intricate, cinematic, new classic, sunny, shining, deep aesthetic, appealing, artistic, fine detail, awesome color, dynamic light, great composition, clear professional background, creative, innocent, scenic, positive, unique, attractive, cute, perfect, focused, vibrant, epic, best
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] make a car, expressive, dynamic composition, dramatic, elegant, highly detailed, sharp focus, beautiful, perfect light, attractive, innocent, divine, sublime, epic, stunning, inspired, vibrant, intricate, brilliant, thought, cinematic, background, illuminated, professional, best, creative, winning, romantic, fantastic, scenic, artistic, fabulous, bright, hopeful, cute
[Fooocus] Encoding positive #1 ...
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 12.90 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
ERROR diffusion_model.output_blocks.1.1.transformer_blocks.9.ff.net.0.proj.weight CUDA out of memory. Tried to allocate 50.00 MiB. GPU 0 has a total capacity of 7.93 GiB of which 17.62 MiB is free. Including non-PyTorch memory, this process has 7.91 GiB memory in use. Of the allocated memory 7.53 GiB is allocated by PyTorch, and 306.44 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Traceback (most recent call last):
File "/home/jared/Fooocus/modules/async_worker.py", line 913, in worker
handler(task)
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jared/Fooocus/modules/async_worker.py", line 816, in handler
imgs = pipeline.process_diffusion(
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jared/Fooocus/modules/default_pipeline.py", line 362, in process_diffusion
sampled_latent = core.ksampler(
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jared/anaconda3/envs/fooocus/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jared/Fooocus/modules/core.py", line 308, in ksampler
samples = ldm_patched.modules.sample.sample(model,
File "/home/jared/Fooocus/ldm_patched/modules/sample.py", line 93, in sample
real_model, positive_copy, negative_copy, noise_mask, models = prepare_sampling(model, noise.shape, positive, negative, noise_mask)
File "/home/jared/Fooocus/ldm_patched/modules/sample.py", line 86, in prepare_sampling
ldm_patched.modules.model_management.load_models_gpu([model] + models, model.memory_required([noise_shape[0] * 2] + list(noise_shape[1:])) + inference_memory)
File "/home/jared/Fooocus/modules/patch.py", line 447, in patched_load_models_gpu
y = ldm_patched.modules.model_management.load_models_gpu_origin(*args, **kwargs)
File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 437, in load_models_gpu
cur_loaded_model = loaded_model.model_load(lowvram_model_memory)
File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 304, in model_load
raise e
File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 300, in model_load
self.real_model = self.model.patch_model(device_to=patch_model_to) #TODO: do something with loras and offloading to CPU
File "/home/jared/Fooocus/ldm_patched/modules/model_patcher.py", line 199, in patch_model
temp_weight = ldm_patched.modules.model_management.cast_to_device(weight, device_to, torch.float32, copy=True)
File "/home/jared/Fooocus/ldm_patched/modules/model_management.py", line 615, in cast_to_device
return tensor.to(device, copy=copy, non_blocking=non_blocking).to(dtype, non_blocking=non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 7.93 GiB of which 17.62 MiB is free. Including non-PyTorch memory, this process has 7.91 GiB memory in use. Of the allocated memory 7.53 GiB is allocated by PyTorch, and 306.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Total time: 16.25 seconds
nvidia smi shows this
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla M60 Off | 00000000:0B:00.0 Off | Off |
| N/A 43C P0 39W / 150W | 8105MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1658 C python 8102MiB |
+-----------------------------------------------------------------------------+

@jhemley
Copy link
Author

jhemley commented Mar 31, 2024

but i still dont understand why it works on my 2080 maxq because it only utilizes about 6785 mib. but it runs just fine. here's that report
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 537.13 Driver Version: 537.13 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 ... WDDM | 00000000:01:00.0 On | N/A |
| N/A 52C P0 81W / 80W | 6785MiB / 8192MiB | 98% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+

Already up-to-date
Update succeeded.
[System ARGV] ['Fooocus\entry_with_update.py']
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
Fooocus version: 2.3.1
[Cleanup] Attempting to delete content of temp dir C:\Users\hemle\AppData\Local\Temp\fooocus
[Cleanup] Cleanup successful
Total VRAM 8192 MB, total RAM 65397 MB
Set vram state to: NORMAL_VRAM
Always offload VRAM
Device: cuda:0 NVIDIA GeForce RTX 2080 with Max-Q Design : native
VAE dtype: torch.float32
Using pytorch cross attention
Refiner unloaded.
Running on local URL: http://127.0.0.1:7865

To create a public link, set share=True in launch().
model_type EPS
UNet ADM Dimension 2816
Using pytorch attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using pytorch attention in VAE
extra {'cond_stage_model.clip_g.transformer.text_model.embeddings.position_ids', 'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
Base model loaded: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_v8Rundiffusion.safetensors
Request to load LoRAs [['sd_xl_offset_example-lora_1.0.safetensors', 0.1], ['None', 1.0], ['None', 1.0], ['None', 1.0], ['None', 1.0]] for model [C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_v8Rundiffusion.safetensors].
Loaded LoRA [C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\loras\sd_xl_offset_example-lora_1.0.safetensors] for UNet [C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\models\checkpoints\juggernautXL_v8Rundiffusion.safetensors] with 788 keys at weight 0.1.
Fooocus V2 Expansion: Vocab with 642 words.
Fooocus Expansion engine loaded for cuda:0, use_fp16 = True.
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
[Fooocus Model Management] Moving model(s) has taken 0.62 seconds
Started worker with PID 12820
App started successful. Use the app with http://127.0.0.1:7865/ or 127.0.0.1:7865
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 3296201712917260942
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] make a car, cinematic, dynamic, dramatic ambient light, detailed, intricate, elegant, highly saturated colors, strong, epic, stunning, heroic, amazing detail, creative, positive, attractive, cute, beautiful, confident, inspired, pretty, perfect, coherent, trendy, best, awesome, futuristic, cool, inspirational, vibrant, loving, full, color, complex
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] make a car, colorful, vivid, detailed, breathtaking, beautiful, emotional, shiny, shining, highly detail, amazing, flowing, light, complex, color, surreal, ambient, pristine, dynamic, symmetry, sharp focus, epic, fine, very strong, winning, perfect, artistic, innocent, confident, attractive, incredible, creative, positive, unique, loving
[Fooocus] Encoding positive #1 ...
[Fooocus Model Management] Moving model(s) has taken 0.15 seconds
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 3.90 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.06 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.20 seconds
Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html
Generating and saving time: 27.49 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.37 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.18 seconds
Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html
Generating and saving time: 26.77 seconds
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Total time: 63.45 seconds
[Fooocus Model Management] Moving model(s) has taken 0.59 seconds
[Parameters] Adaptive CFG = 7
[Parameters] Sharpness = 2
[Parameters] ControlNet Softness = 0.25
[Parameters] ADM Scale = 1.5 : 0.8 : 0.3
[Parameters] CFG = 4.0
[Parameters] Seed = 208600173302938237
[Parameters] Sampler = dpmpp_2m_sde_gpu - karras
[Parameters] Steps = 30 - 15
[Fooocus] Initializing ...
[Fooocus] Loading models ...
Refiner unloaded.
[Fooocus] Processing prompts ...
[Fooocus] Preparing Fooocus text #1 ...
[Prompt Expansion] make a car, cool color, perfect shiny deep background, sharp focus, intricate, elegant, highly detailed, dramatic light, professional still, dynamic composition, ambient atmosphere, vivid colors, beautiful, epic, stunning, creative, cinematic, fine detail, full clear, great quality, attractive, cheerful, novel, romantic, scenic, rich, hopeful, cute, radiant, colorful
[Fooocus] Preparing Fooocus text #2 ...
[Prompt Expansion] make a car, colorful, shiny, vivid, detailed, amazing, flowing, infinite, light, color, epic, atmosphere, new, dynamic, ambient, cinematic, elegant, intricate, highly focused, creative, pure, artistic, romantic, sunny, beautiful, deep, unique, vibrant, coherent, colors, perfect, illuminated, pretty, clear, shining, flawless
[Fooocus] Encoding positive #1 ...
[Fooocus Model Management] Moving model(s) has taken 0.13 seconds
[Fooocus] Encoding positive #2 ...
[Fooocus] Encoding negative #1 ...
[Fooocus] Encoding negative #2 ...
[Parameters] Denoising Strength = 1.0
[Parameters] Initial Latent shape: Image Space (896, 1152)
Preparation time: 3.24 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 2.20 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:22<00:00, 1.33it/s]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.23 seconds
Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html
Generating and saving time: 27.06 seconds
[Sampler] refiner_swap_method = joint
[Sampler] sigma_min = 0.0291671771556139, sigma_max = 14.614643096923828
Requested to load SDXL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 1.36 seconds
100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:23<00:00, 1.30it/s]
Requested to load AutoencoderKL
Loading 1 new model
[Fooocus Model Management] Moving model(s) has taken 0.18 seconds
Image generated with private log at: C:\Users\hemle\Downloads\Fooocus_win64_2-1-831\Fooocus\outputs\2024-03-31\log.html
Generating and saving time: 26.80 seconds
Requested to load SDXLClipModel
Requested to load GPT2LMHeadModel
Loading 2 new models
Total time: 57.17 seconds
[Fooocus Model Management] Moving model(s) has taken 0.58 seconds

@jhemley
Copy link
Author

jhemley commented Mar 31, 2024

It looks like i need to some how shave 100 mib of vram off the program is there anyway to run the gppt2 part on CPU?

@mashb1t
Copy link
Collaborator

mashb1t commented Mar 31, 2024

In general yes, but please check first if it works with disabled Fooocus V2 style

@jhemley
Copy link
Author

jhemley commented Mar 31, 2024

ok ill try that

@jhemley
Copy link
Author

jhemley commented Mar 31, 2024

i disabled Fooocus V2 style but still same error occured

@jhemley
Copy link
Author

jhemley commented Mar 31, 2024

from my testing i belive with the tesla drivers for the m60 and p4 it limits the max vram to 8094 Mib instead of 8192

@jhemley
Copy link
Author

jhemley commented Mar 31, 2024

i disabled Fooocus V2 style but still same error occured

**mashb1t ** commented Mar 31, 2024

@mashb1t
Copy link
Collaborator

mashb1t commented Apr 1, 2024

So this issue can be closed as this is a driver issue with your cards?

@jhemley
Copy link
Author

jhemley commented Apr 1, 2024

is there a way to change the gpt 2 model to run on CPU or another gpu to limit vram? Also it could still be a bug I am not sure because the behavior is odd. It runs on my 2080 maxq without ever filling the GPU to more than 7 gib but on the teslas it initial tries to fill the vram to 8 gib which fails as I belive they are limited to 8100 mib

@mashb1t
Copy link
Collaborator

mashb1t commented Apr 1, 2024

You can force it to be on CPU by setting

load_device = model_management.text_encoder_device()
to torch.device("cpu") or add a line in
def text_encoder_device():
if args.always_gpu:
return get_torch_device()
elif vram_state == VRAMState.HIGH_VRAM or vram_state == VRAMState.NORMAL_VRAM:
if is_intel_xpu():
return torch.device("cpu")
if should_use_fp16(prioritize_performance=False):
return get_torch_device()
else:
return torch.device("cpu")
else:
return torch.device("cpu")
to always return torch.device("cpu")

But keep in mind that prompt expansion is only used when setting style Fooocus V2, so this might not be the right place to begin with.

@jhemley
Copy link
Author

jhemley commented Apr 1, 2024

sed when setting style Fooocus V2, so this might

this would only make the text model run on CPU not the image model correct?

@mashb1t
Copy link
Collaborator

mashb1t commented Apr 1, 2024

yes

@jhemley
Copy link
Author

jhemley commented Apr 1, 2024

I tried that, but it didn't really work. Now that you are aware of this problem, is it possible that there are any plans in the future to try and trim the VRAM requirements by about 200 mib to allow them to run on Tesla 8 GB GPus?

@mashb1t
Copy link
Collaborator

mashb1t commented Apr 1, 2024

No plans for in-depth testing on P4 and M60 cards, works on 4GB VRAM and must be an issue with your driver reporting wrong numbers.

@mashb1t mashb1t closed this as not planned Won't fix, can't repro, duplicate, stale Apr 1, 2024
@jhemley
Copy link
Author

jhemley commented Apr 1, 2024

yeah that sucks. But i did just buy a tesla m40 which has 24gb vram so hopefully that works. Last question: are there any possibilities of adding multi-GPU support like what Ollama has?

@mashb1t
Copy link
Collaborator

mashb1t commented Apr 1, 2024

See #2292

What you can do is to start multiple instances of Fooocus instead.

@CultusMechanicus
Copy link

This is a weird driver setting issue with P4s. By default, it runs ECC memory. Disable the ECC RAM with "nvidia-smi -e 0". That should release the full 8GB of VRAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feedback pending Waiting for further information
Projects
None yet
Development

No branches or pull requests

3 participants