-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prompt tokenization bugfix #4197
Conversation
Signed-off-by: Virginia Adams <[email protected]>
Signed-off-by: Virginia Adams <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a tiny comment. Good job using the enum type.
@@ -71,8 +71,8 @@ def get_task_templates(): | |||
"task_id_num": 0, | |||
} | |||
task_templates['task name B'] = { | |||
"prompt_template": "<|VIRTUAL_PROMPT_0|>{question}<|VIRTUAL_PROMPT_1|>{answer}{extra}", | |||
"prompt_template_fields": ['question', 'answer'], | |||
"prompt_template": "<|VIRTUAL_PROMPT_0|> {question} <|VIRTUAL_PROMPT_1|> {answer}{extra}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why you add the space around the virtual tokens? If space is needed, virtual prompt should learn to be a space embedding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was playing around with the unit tests to help diagnose a bug with the huggingface tokenizer. This change isn't needed, I just forgot to put it back. I removed the spaces so they should be back the way they were.
] | ||
|
||
return pseudo_tokens | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you have the similar logics in the prompt_lenaring_model file. Put the function there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can do that
Signed-off-by: Virginia Adams <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* Updated default virtual token placeholder Signed-off-by: Virginia Adams <[email protected]> * Python style fix Signed-off-by: Virginia Adams <[email protected]> * Addressed reviewer comments Signed-off-by: Virginia Adams <[email protected]>
What does this PR do ?
Collection: BigNLP
Changelog
Updated megatron_gpt_prompt_learning_model
Updated prompt learning dataset unit tests
Added Enum for virtual prompt token string
Usage
Same as before
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information