Is training FLUX with NF4 possible? #798

gesen2egee · 2024-08-12T05:33:53Z

gesen2egee
Aug 12, 2024

I saw on Reddit that someone achieved significant memory savings and speed improvements using the NF4 version of FLUX on Forge.
The quality impact wasn't very significant.

I wanted to ask if introducing NF4 in training is feasible.
Perhaps it could enable FLUX training on consumer-grade GPUs.

bghira · 2024-08-12T13:05:51Z

bghira
Aug 12, 2024
Maintainer

we rely in third party libraries to do the quantisation because of how complex this gets while training. you will have to ask optimum-quanto devs that. also it already trains on consumer GPUs.

0 replies

gesen2egee · 2024-08-13T04:30:15Z

gesen2egee
Aug 13, 2024
Author

Hi, thank you for your reply. I would like to ask if it is possible to use bitsandbytes for quantization instead of quanto. bitsandbytes has already implemented NF4.

0 replies

bghira · 2024-08-13T04:32:21Z

bghira
Aug 13, 2024
Maintainer

everything else the trainer depends on has to support it too. these are core dependencies like Accelerate and PEFT and the Diffusers projects.

0 replies

gesen2egee · 2024-08-14T10:35:11Z

gesen2egee
Aug 14, 2024
Author

I roughly understand what you mean.
Is this related?
x@RisingSayak

0 replies

bghira · 2024-08-14T12:18:10Z

bghira
Aug 14, 2024
Maintainer

yes that is inference related. trust me if it worked here sayak would open a pull request

0 replies

bghira · 2024-08-15T16:10:19Z

bghira
Aug 15, 2024
Maintainer

@sayakpaul is indeed working on this :)

1 reply

BuildBackBuehler Aug 20, 2024

Sweeeeet. Long time, no be (here)! But with the pain in the arse that is/was training on the M1, I've decided to just upgrade my rig.

I'm only serious about 1 option but I may be underestimating my other options, any opinions (@ you/anyone 😂)?

RTX 4070 Super Ti (16GB)
I'm figuring that's just easily the best option because 1. can get away with 16GB for the moment. 2. Can/plan to buy a 2nd one if it becomes apparent I need dat 32GB VRAM. Though I'm a bit scared about parallelization (even w/ 2 identical models). Plus, IDK if that works training wise. There was a recent drop of Deep-Zero 3 (sic) and I think that promised near 0 difference between multi-GPU vs. 1 GPU performance/ability.

Of course, I'd rather spare some change and go with the 4060 Ti (16GB) but for whatever reason people seem to say that the 4070 Super Ti is worth the extra change. I guess it'd be nice to know how fast/slow a process like training a LoRA and/or training a Flux Model would be on each. I'd also probably buy 2 4060 Ti's immediately because I could probably swing that for $700. As for the 4070 STi, seems I'd be spending $700 rn. Maybe I can get a 2nd one on sale when 5xxx drops/holidays etc. @ another $500.

Then there's the RTX 3090 which is very tempting besides the generation gappage/speed issue. As far as I understand, the RTX 3090 is the only modern Nvidia GPU with NVLink capabilities aka the only multi-GPU option where you can truly tap into the full VRAM availability -- that of 48GB! That'd be insane, but I need to look into just how slow/fast it would be comparably. I'm not so much concerned about training speeds but speeds with LLM Inference and Stable Diff. Image/Video Generation. I figure I won't wanna upgrade for another 3-4 years. With the hope that I wouldn't be filled with FOMO 3-4 years from now, just starting to barely feel the hurt of the aging hardware lol.

So in conclusion, I've got my M1 Max 64GB VRAM for the large LLM models. I'll upgrade that in 1.5-2 years.

This is for the speedy generation, Stable Diff. Image/Video Gen. mostly. (Not terribly pressed to train an entire model). And I hope it'll last me 3-4 years.

I guess with training I may be able to use the M1 + the rig which'd mean I'd have 64 + 32/48GB at my disposal! Something I keep forgetting about lol. So I guess I can just focus on the 4060/4070 options because if 96GB of VRAM isn't cutting it...well duck me!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is training FLUX with NF4 possible? #798

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is training FLUX with NF4 possible? #798

gesen2egee Aug 12, 2024

Replies: 6 comments · 1 reply

bghira Aug 12, 2024 Maintainer

gesen2egee Aug 13, 2024 Author

bghira Aug 13, 2024 Maintainer

gesen2egee Aug 14, 2024 Author

bghira Aug 14, 2024 Maintainer

bghira Aug 15, 2024 Maintainer

BuildBackBuehler Aug 20, 2024

gesen2egee
Aug 12, 2024

Replies: 6 comments 1 reply

bghira
Aug 12, 2024
Maintainer

gesen2egee
Aug 13, 2024
Author

bghira
Aug 13, 2024
Maintainer

gesen2egee
Aug 14, 2024
Author

bghira
Aug 14, 2024
Maintainer

bghira
Aug 15, 2024
Maintainer