[Diffusion DPO] Loss function and reproducing results in the paper

Hi, 

I got a question on the loss function used in both training scripts for diffusion DPO. 

In train_diffusion_dpo.py line 863 and train_diffusion_dpo_sdxl.py line 978
` loss = -1 * F.logsigmoid(inside_term.mean())`
From my understanding, this mean is the average taken across the batches as the empirical mean for formula (14) in the paper.
<img width="300" alt="Screenshot 2024-01-24 at 3 54 39 PM" src="https://github.com/huggingface/diffusers/assets/119135690/bddea053-5d43-4496-a057-d120d09fcdaf">
So I think the loss should instead be 
` loss = -1 * F.logsigmoid(inside_term).mean()`?

Also, may I know if there's anything else I need to modify other than changing the training configuration to reproduce the results in the paper, say 28.16 HPSv2 reward for the DPO-SDXL HPSv2 generations? Thank you very much!






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Diffusion DPO] Loss function and reproducing results in the paper #6702

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Diffusion DPO] Loss function and reproducing results in the paper #6702

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions