[Chatllama] KL Divergence equation #298

mountinyy · 2023-03-25T17:03:06Z

Hello, I have a quick question.
I know most RLHF structure use KL divergence.

https://github.com/nebuly-ai/nebullvm/blob/aad1c09ce20946294df3ec83569bad9496f58d0e/apps/accelerate/chatllama/chatllama/rlhf/trainer.py#L871-L877

However, when you see the InstructGPT paper(link), they are actually dividing (policy for RL) by (policy for SFT). (which is action_log_prob - old_action_log_probs from your code).
And I saw some codes that actually follow this instructions like
ColossalAI

But it seems like you are doing oppositely.
I know you are converting loss to negative since you're applying RL reward for deep learning loss.
But even consdiering that, you are adding those KL value, not subtracting it, whcih means it's still oppostie to action_log_prob - old_action_log_probs.

Do you have any special reason for this?

Thank you for reading :)

PierpaoloSorbellini · 2023-03-28T09:57:18Z

Hi @mountinyy, Thanks for reaching out!
Yep it would agree with you on the matter.
We are going to test out the changes and reach back to you!

mountinyy · 2023-03-29T02:51:05Z

@PierpaoloSorbellini Thank you for the reply! Please let me know the result :)

PierpaoloSorbellini · 2023-04-14T13:58:12Z

Hi @mountinyy I think now it's more coherent but we still having some issues with RLHF.
We are digging deeper to find the root cause.
thanks for the advice you should already see the change in the #306

PierpaoloSorbellini added a commit that referenced this issue Apr 4, 2023

Change sing to KL div accordingly to issue #298

32ddfa2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chatllama] KL Divergence equation #298

[Chatllama] KL Divergence equation #298

mountinyy commented Mar 25, 2023 •

edited

Loading

PierpaoloSorbellini commented Mar 28, 2023

mountinyy commented Mar 29, 2023

PierpaoloSorbellini commented Apr 14, 2023

[Chatllama] KL Divergence equation #298

[Chatllama] KL Divergence equation #298

Comments

mountinyy commented Mar 25, 2023 • edited Loading

PierpaoloSorbellini commented Mar 28, 2023

mountinyy commented Mar 29, 2023

PierpaoloSorbellini commented Apr 14, 2023

mountinyy commented Mar 25, 2023 •

edited

Loading