Training time per iteration scales linearly with batch size #971

primecai · 2024-09-14T12:38:02Z

primecai
Sep 14, 2024

I have been using this repo for a while but here is a problem I have been witnessing from day 1: no matter what model I use (SD3/FLUX), what training I do (LoRA/full model/some of my customized architectures), what GPUs I use (A6000s, A100s), and precision (mostly bf16 but I tested with fp16 too), the training time I have always scales linearly with my batch size. I tried to debug this but so far got nowhere. Is this a common issue?

bghira · 2024-09-14T12:43:29Z

bghira
Sep 14, 2024
Maintainer

it's a 12B parameter model. an H100 doesn't have the issue due to its dispatch layer

0 replies

bghira · 2024-09-14T12:46:23Z

bghira
Sep 14, 2024
Maintainer

dunno why SD3 would be like that, it's just 2B. you should see sublinear reduction in per-step speed vs a prolinear increase in per-step throughput

on Flux, on a 4090, a batch size of 1 at 1024px uses about 3.5 seconds per step but a batch size of 2 uses about 6.5 sconds. that's less than 2x the time taken for 2x the throughput.

maybe you're looking at the runtime calculator in the progress bar. idk. you will maybe use max run steps and then expect epochs time?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training time per iteration scales linearly with batch size #971

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Training time per iteration scales linearly with batch size #971

primecai Sep 14, 2024

Replies: 2 comments

bghira Sep 14, 2024 Maintainer

bghira Sep 14, 2024 Maintainer

primecai
Sep 14, 2024

bghira
Sep 14, 2024
Maintainer

bghira
Sep 14, 2024
Maintainer