-
Notifications
You must be signed in to change notification settings - Fork 316
feat: Importance sampling trick #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
5883608
Importance sampling
yfw 34d4d8b
Docs
yfw 927f968
No math* in latex
yfw d750676
More doc fix
yfw 726356e
Rename config to use_importance_sampling_correction
yfw ad69440
Add use_online_kl_approximation and assertions in test
yfw eb800b2
Docs
yfw a4489b5
Remove tag
yfw 04732fd
Capitalization
yfw 2d47a43
Typo
yfw d28883c
Merge remote-tracking branch 'origin' into yifu/importance_sampling
yfw cdfe7d6
on_policy
yfw 060e662
Add use_on_policy_kl_approximation to config
yfw 35052a7
Handle nan importance weights
yfw 5144526
Detach kl importance weights
yfw 495f259
Merge remote-tracking branch 'origin' into yifu/importance_sampling
yfw b80a3b1
ruff
yfw 1e99327
Didn't commit by accident
yfw 0932ee9
Fix docs
yfw 4656f2e
Missed one
yfw a15878c
Merge remote-tracking branch 'origin' into yifu/importance_sampling
yfw b3a784b
Merge branch 'main' into yifu/importance_sampling
yfw 3146bd1
Update docs/guides/grpo.md
yfw 24011aa
Update docs/guides/grpo.md
yfw 9bad18d
Update docs/guides/grpo.md
yfw d2f682e
Update docs/guides/grpo.md
yfw 049ba18
Update docs/guides/grpo.md
yfw f468be7
Update examples/configs/grpo_math_1B.yaml
yfw 12d33bc
Update nemo_reinforcer/algorithms/loss_functions.py
yfw fc0f7bc
Update nemo_reinforcer/algorithms/loss_functions.py
yfw File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.