-
Notifications
You must be signed in to change notification settings - Fork 52
Preprocess hf #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preprocess hf #10
Conversation
|
|
||
| eod = self.tokenizer.eod | ||
| pad = self.tokenizer.tokenizer.special_tokens[FIM_PAD] | ||
| pad = self.tokenizer.special_tokens[FIM_PAD] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local GPT2Tokenizer implementation has this special_tokens attribute, but not the PreTrainedTokenizerFast from the transformers library.
So the code here instead rely on the wrappers around these tokenizers :
Megatron-LM/megatron/tokenizer/tokenizer.py
Line 330 in 7457e32
| self.special_tokens = { |
Megatron-LM/megatron/tokenizer/tokenizer.py
Line 293 in 7457e32
| self.special_tokens = self.tokenizer.special_tokens |
| scaled_softmax_warp_forward<input_t, output_t, acc_t, 12> | ||
| <<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(dst, src, scale, batch_count, key_seq_len); | ||
| break; | ||
| case 13: // 8192 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Did you double-check that it works as intended? Things can get tricky when kernels grow too big (registers, shared memory, etc. Don't know if relevant here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit that I did not double-check this. I relied on the code from: https://github.com/NVIDIA/Megatron-LM/pull/243/files
I tried to train a model with seq-length 4092 and it seemed to work fine
| @@ -0,0 +1,201 @@ | |||
| # coding=utf-8 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for now, but we'll want to switch to the common implementation soon.
| @@ -0,0 +1,323 @@ | |||
| import argparse | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an exact copy from transformers? Same for push_checkpoints.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert_checkpoints.py is largely copied form transformers, but includes support for the MQA models.
push_checkpoints.py is new code.
This folder is named hf_transformers because these are tools to facilitate usage of Megatron models in transformers
Uh oh!
There was an error while loading. Please reload this page.