Support Chunked DPO Loss Kernel #378

austin362667 · 2024-11-13T06:12:48Z

Summary

Add support for a fused, torch-compiled, and chunked DPO (Direct Preference Optimization) loss kernel, as requested in #371.
This implementation is largely based on the excellent work done on ORPO (#362) by @shivam15s.

DPO Loss Formulation

In a reference setting:

$$r_\theta(x,y_c) - r_\theta(x,y_r) = \log(\pi_\theta(y_c|x)) - \log(\pi_\theta(y_r|x))$$

$$-\log(\sigma((\log(\pi_\theta(y_c|x)) - \log(\pi_\theta(y_r|x)) - \log(\pi_{\theta_{\text{ref}}}(y_c|x)) + \log(\pi_{\theta_{\text{ref}}}(y_r|x)))/\beta))$$

Corresponds to:

# Policy model log probabilities
policy_chosen_logps = log_probs(policy_chosen_logits)
policy_rejected_logps = log_probs(policy_rejected_logits)

# Reference model log probabilities
ref_chosen_logps = log_probs(ref_chosen_logits)
ref_rejected_logps = log_probs(ref_rejected_logits)

# Compute advantages
chosen_advantages = policy_chosen_logps - ref_chosen_logps
rejected_advantages = policy_rejected_logps - ref_rejected_logps

# policy_chosen_logps - ref_chosen_logps - policy_rejected_logps + ref_rejected_logps
logits_diff = (chosen_advantages - rejected_advantages) * beta

# DPO loss
losses = -F.logsigmoid(logits_diff)

Testing Done

Hardware Type: NVIDIA L40S (48G)
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

austin362667 · 2024-11-13T06:14:58Z

benchmark/scripts/benchmark_dpo_loss.py

+    run_benchmarks,
+)
+
+from liger_kernel.alignment.dpo_loss import HF_DPO_Loss, LigerFusedLinearDPOFunction


Should I use HF DPO impl here in benchmarking for function reusability purpose? Or write another naive impl in pure torch?

HF DPO should be fine

austin362667 · 2024-11-13T06:17:59Z

src/liger_kernel/alignment/dpo_loss.py

+        return grad_input, grad_weight, None, grad_bias, None, None, None
+
+
+class HF_DPO_Loss:


Should I move this HF impl to file test_dpo_loss.py?

yes, since HF impl is only for testing purpose

lancerts · 2024-11-13T17:32:34Z

can we modify logits_diff = (chosen_logps - rejected_logps) / beta
to
logits_diff = (chosen_logps - rejected_logps) * beta to align with the convention in paper as well as the trl implementation here

pramodith

Just a FYI, I think we should wait until @shivam15s pushes a generic/inheritable class that handles all the chunking and other repetitive logic common to different loss functions, before pushing new loss functions.

shivam15s · 2024-11-13T21:36:29Z

Great work @austin362667 ! The additional summing of NLL loss is going to be useful for IRPO loss as well :). I'll be creating a simple base class which adds the boilerplate code (backward/torch compile logic) that you can inherit from, as @pramodith mentioned

austin362667 · 2024-11-14T08:04:48Z

Issue addressed. Thanks @Tcc0403 @lancerts @pramodith @shivam15s and @ByronHsu for review!

Tcc0403 · 2024-11-14T09:47:31Z

I think we should make chunked_loss functions nn.Module (like flce and fljsd) for users? same for orpo? cc @shivam15s @ByronHsu

ByronHsu · 2024-11-14T18:43:14Z

@Tcc0403 that is the plan!

Signed-off-by: Austin Liu <[email protected]> Fix benchmark script

Signed-off-by: Austin Liu <[email protected]>

austin362667 commented Nov 13, 2024

View reviewed changes

pramodith reviewed Nov 13, 2024

View reviewed changes

austin362667 force-pushed the feat/alignment/dpo branch from 76017b7 to b88708d Compare November 14, 2024 07:33

austin362667 marked this pull request as ready for review November 14, 2024 07:56

austin362667 added 5 commits November 15, 2024 00:49

Add chunked DPO Loss

8bcde33

Signed-off-by: Austin Liu <[email protected]> Fix benchmark script

Remove unused imports

be44081

Signed-off-by: Austin Liu <[email protected]>

Clean up bench

5197082

Signed-off-by: Austin Liu <[email protected]>

Refactor

e95bda4

Signed-off-by: Austin Liu <[email protected]>

Clean up: fmt & fix tol

a50d38b

Signed-off-by: Austin Liu <[email protected]>

ByronHsu mentioned this pull request Nov 15, 2024

[RFC] Liger FlexChunkLoss: Alignment and Distillation loss #371

Open

12 tasks

align with interface

854c1b3

shivam15s force-pushed the feat/alignment/dpo branch from a995bc4 to 854c1b3 Compare November 15, 2024 01:23

ByronHsu approved these changes Nov 15, 2024

View reviewed changes

ByronHsu merged commit 1aa3d83 into linkedin:main Nov 15, 2024
1 of 3 checks passed

austin362667 mentioned this pull request Nov 15, 2024

Fix DPO with Reference Model #387

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Chunked DPO Loss Kernel #378

Support Chunked DPO Loss Kernel #378

austin362667 commented Nov 13, 2024 •

edited

Loading

austin362667 Nov 13, 2024 •

edited

Loading

lancerts Nov 13, 2024

austin362667 Nov 13, 2024

Tcc0403 Nov 13, 2024

lancerts commented Nov 13, 2024

pramodith left a comment

shivam15s commented Nov 13, 2024

austin362667 commented Nov 14, 2024

Tcc0403 commented Nov 14, 2024

ByronHsu commented Nov 14, 2024

		return grad_input, grad_weight, None, grad_bias, None, None, None


		class HF_DPO_Loss:

Support Chunked DPO Loss Kernel #378

Support Chunked DPO Loss Kernel #378

Conversation

austin362667 commented Nov 13, 2024 • edited Loading

Summary

DPO Loss Formulation

Testing Done

austin362667 Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

lancerts Nov 13, 2024

Choose a reason for hiding this comment

austin362667 Nov 13, 2024

Choose a reason for hiding this comment

Tcc0403 Nov 13, 2024

Choose a reason for hiding this comment

lancerts commented Nov 13, 2024

pramodith left a comment

Choose a reason for hiding this comment

shivam15s commented Nov 13, 2024

austin362667 commented Nov 14, 2024

Tcc0403 commented Nov 14, 2024

ByronHsu commented Nov 14, 2024

austin362667 commented Nov 13, 2024 •

edited

Loading

austin362667 Nov 13, 2024 •

edited

Loading