Skip to content

[Diffusion DPO] Loss function and reproducing results in the paper #6702

@yigu1008

Description

@yigu1008

Hi,

I got a question on the loss function used in both training scripts for diffusion DPO.

In train_diffusion_dpo.py line 863 and train_diffusion_dpo_sdxl.py line 978
loss = -1 * F.logsigmoid(inside_term.mean())
From my understanding, this mean is the average taken across the batches as the empirical mean for formula (14) in the paper.
Screenshot 2024-01-24 at 3 54 39 PM
So I think the loss should instead be
loss = -1 * F.logsigmoid(inside_term).mean()?

Also, may I know if there's anything else I need to modify other than changing the training configuration to reproduce the results in the paper, say 28.16 HPSv2 reward for the DPO-SDXL HPSv2 generations? Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions