Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix] Preserve <eos> token and in-place it after trimming #401

Merged
merged 2 commits into from
Mar 29, 2023

Conversation

maxreciprocate
Copy link
Collaborator

This PR fixes #399 and #373 by keeping <|endoftext|> token if it was outputted by the model (previously it was treated as if it was one of stop_sequences) and replacing trimming after-and-including stop_sequences with <|endoftext|>, instead of simply discarding them. This lets PPO's make_experience never filter any empty outputs, or in the worst case, stuck and timeout.

https://wandb.ai/sorry/trlx-references/reports/fix-preserve-eos-token-v-main--VmlldzozOTE1MDU2

Copy link
Collaborator

@jon-tow jon-tow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dreaded filter is gone hehe Great work! I left a small nit but feel free to ignore and merge whenever you'd like.

(Also, first time seeing your benchmark util in practice. Love it 👍)

@@ -445,7 +445,7 @@ def make_experience(self, num_rollouts: int = 1024, iter_count: int = 0): # noq
sample_outputs = sample_outputs.cpu()
values = values.cpu()[:, :-1]

ends = start + attention_mask[:, start:].sum(1)
ends = start + attention_mask[:, start:].sum(1) + 1
Copy link
Collaborator

@jon-tow jon-tow Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe add a comment saying something like "add 1 to account for appended eos token" since the decode call is a few blocks up

@maxreciprocate maxreciprocate merged commit a369dad into main Mar 29, 2023
@maxreciprocate maxreciprocate deleted the fix-preserve-eos-token branch March 29, 2023 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PPO training stuck on make_experience
2 participants