Integrate f-divergence to DPO by 1485840691 · Pull Request #1339 · huggingface/trl

1485840691 · 2024-02-18T11:02:51Z

Related issue: #1259

reverse-kl (current default)
command: examples/scripts/dpo.py --output_dir=dpo_anthropic_hh --model_name_or_path=gpt2 --per_device_train_batch_size 4 --max_steps 1000 --learning_rate 1e-5 --gradient_accumulation_steps 1 --logging_steps 10 --eval_steps 500 --output_dir=dpo_anthropic_hh --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns
alpha-divergence w/ alpha=0.5
command: examples/scripts/dpo.py --output_dir=dpo_anthropic_hh --model_name_or_path=gpt2 --per_device_train_batch_size 4 --max_steps 1000 --learning_rate 1e-5 --gradient_accumulation_steps 1 --logging_steps 10 --eval_steps 500 --output_dir=dpo_anthropic_hh --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns --f_divergence_type alpha_divergence --f_alpha_divergence_coef 0.5

https://wandb.ai/open_source/huggingface/runs/b943bky2?workspace=user-1485840691

js_divergence
command: examples/scripts/dpo.py --output_dir=dpo_anthropic_hh --model_name_or_path=gpt2 --per_device_train_batch_size 4 --max_steps 1000 --learning_rate 1e-5 --gradient_accumulation_steps 1 --logging_steps 10 --eval_steps 500 --output_dir=dpo_anthropic_hh --warmup_steps 150 --report_to wandb --logging_first_step --no_remove_unused_columns --f_divergence_type js_divergence

kmn1024 · 2024-02-29T06:19:00Z

Does it make sense to explore a similar change to KTO loss, to allow trading off alignment for diversity there?

1485840691 · 2024-02-29T09:32:31Z

@kmn1024 I have no idea of whether the divergence function works for KTO since KTO loss ,in my understanding, is more close to a point-wise loss. While this divergence function is applied in DPO for pair-wise loss. Quote @kashif @younesbelkada for their comments on this. As for this PR, I will speed up to get the rest test complete by end of this week or early next week.

github-actions · 2024-03-30T15:05:17Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

suanflower · 2025-01-06T07:13:51Z

Thank you very much for your work, but when using forward KL, the loss becomes 0.

1485840691 added 2 commits February 12, 2024 02:04

Initial draft of f-divergence fn

7715dbf

Update f-divergence to avoid overflow

18ec6a1

1485840691 marked this pull request as draft February 18, 2024 11:03

Merge branch 'main' into dpo

d47fc79

1485840691 added 4 commits February 29, 2024 15:56

fix test errors and comments

f3403f7

Add Unit tests for dpo loss with alpha and js div f

211c935

Adjust format

9afcec9

Fix test error

f77a83b

github-actions bot closed this Apr 8, 2024

1485840691 mentioned this pull request May 1, 2024

Integrate f-divergence to DPO (Follow up) #1610

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Integrate f-divergence to DPO#1339

Integrate f-divergence to DPO#1339
1485840691 wants to merge 7 commits intohuggingface:mainfrom
1485840691:dpo

1485840691 commented Feb 18, 2024 •

edited

Loading

Uh oh!

kmn1024 commented Feb 29, 2024

Uh oh!

1485840691 commented Feb 29, 2024

Uh oh!

github-actions bot commented Mar 30, 2024

Uh oh!

suanflower commented Jan 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

1485840691 commented Feb 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmn1024 commented Feb 29, 2024

Uh oh!

1485840691 commented Feb 29, 2024

Uh oh!

github-actions bot commented Mar 30, 2024

Uh oh!

suanflower commented Jan 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1485840691 commented Feb 18, 2024 •

edited

Loading