You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got a question on the loss function used in both training scripts for diffusion DPO.
In train_diffusion_dpo.py line 863 and train_diffusion_dpo_sdxl.py line 978 loss = -1 * F.logsigmoid(inside_term.mean())
From my understanding, this mean is the average taken across the batches as the empirical mean for formula (14) in the paper.
So I think the loss should instead be loss = -1 * F.logsigmoid(inside_term).mean()?
Also, may I know if there's anything else I need to modify other than changing the training configuration to reproduce the results in the paper, say 28.16 HPSv2 reward for the DPO-SDXL HPSv2 generations? Thank you very much!