Replies: 7 comments
-
It's probably running on CPU instead of GPU. Also, RTX 3080 doesn't have enough VRAM to train a Flux LoRA without quantization. Most RTX 3080 cards are either 10 GB or 12 GB. You would need int4 quantization to squeeze the model down to that size. Or maybe wait and see if NF4 quantization support eventually lands here. |
Beta Was this translation helpful? Give feedback.
-
sorry I put the wrong number it is a 3090 with 24GB so normally OK for flux and I have a second older Nvidia card with 8Gb on the machine but that should not affects things |
Beta Was this translation helpful? Give feedback.
-
So this is a multi-GPU machine? That could also be causing these issues if it's trying to use both GPUs and the slower GPU is holding back everything. You can run And yes, RTX 3090 is better but you still need either fp8 or int8 quantization. |
Beta Was this translation helpful? Give feedback.
-
And since you don't always have internet access, you should probably run |
Beta Was this translation helpful? Give feedback.
-
not using wandb or tensorboard but maybe I will try tensorboard if I manage to pass this embed pre-computation there is only using one core 100% and not doing much on the GPU |
Beta Was this translation helpful? Give feedback.
-
One CPU core at 100% is normal. GPU should be busy when pre-computing the text embeds. There's probably something wrong with your system specifically. SimpleTuner is using CUDA 12.4 so the minimum Linux driver version is 550.54.14. |
Beta Was this translation helpful? Give feedback.
-
ok I'm on 555.58.02 and CUDA Version: 12.5 Made different tests with what i do not understand is that it seems to be happily moving the text encoders on the GPU: but then the GPU is not used |
Beta Was this translation helpful? Give feedback.
-
I'm on a machine where i do not have often access to internet and there are strange behaviors when I try to run in local. Hardware is 3080
At some point I had this error:
But now it is not doing it anymore but I have not understood what I changed.
Now the Pre-computing null embedding is extremely slow but i pases it
more importantly Initialize text embed pre-computation > 1000s/it
so it would take more than 160 days to complete!!!!!!
Here is the log:
the conf.env is as follows:
and the multidatabackend is here
Beta Was this translation helpful? Give feedback.
All reactions