-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Add gpt-sw3 model to transformers #20209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hey! Feel free to ping me if you need any pointers! :) |
…ing expected functionality
Thank you for your feedback, we're happy to follow your lead on how to proceed! So, if we understand you correctly, we should then remove
Should we await further review or simply get started on this? |
|
Yes, that would be easier. Just remove the model and config files and in the auto mapping, use the GPT2 classes. |
| ("fsmt", "FSMTModel"), | ||
| ("funnel", ("FunnelModel", "FunnelBaseModel")), | ||
| ("glpn", "GLPNModel"), | ||
| ("gpt-sw3", "GPT2Model"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to map all the other model with heads too. Also you get the TF and Flax models for free if you add the same things in modeling_tf_auto and modeling_flax_auto :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, hopefully fixed now!
|
Thank you again for your help, I hope we have now resolved all of your issues. Do you see anything else required from our side in this PR? @sgugger @ArthurZucker |
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well great work! It's super clean and LGTM, let's wait for @sgugger and I think we can merge!
| output_type=BaseModelOutputWithPastAndCrossAttentions, | ||
| config_class=_CONFIG_FOR_DOC, | ||
| ) | ||
| def forward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay good job, exactly what needed to be done 😄
| def test_vocab_size(self): | ||
| self.assertEqual(self.get_tokenizer().vocab_size, 2_000) | ||
|
|
||
| # TODO: these tests will differ with our 2 tokenizers, might be able to hard-code it for one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the TODO)
| - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for | ||
| BPE-dropout. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very small nitm, would be cool to have an example here of importing the tokenizer and tokenization of a Swedish sentence! 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, added! :)
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Arthur <[email protected]>
Co-authored-by: Arthur <[email protected]>
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bearing with us, it all looks good to me!
|
A last nit @JoeyOhman , could you add an example of a pretrained model that you released being loaded in the doc? Like what is done with |
|
@sgugger @ArthurZucker Thank you for all the help and guidance! We have made all the tokenizers reffered to in the PR public. We encountered some internal issues with the model sharing in the last minute, very sorry for that. Currently we are not allowed to share the model files publicly. However we can share the tokenizer and would very much like for it to be included in huggingface, since those with private access to the model easily can use the full hf ecosystem. We hope to be able to share the models fully public in the near future. Hopefully our PR can still be included in the release now that the tests should pass. |
|
No problem, I was thinking about the tokenizer rather than the actual checkpoints! You were mostly adding a tokenizer so I don't really see an issue with this 😉 Thanks for the contribution! |
* Add templates for gpt-sw3 * Add templates for gpt-sw3 * Added sentencepiece tokenizer * intermediate commit with many changes * fixed conflicts * Init commit for tokenization port * Tokenization progress * Remove fast tokenizer * Clean up and rename spm.model -> spiece.model * Remove TF -> PT conversion script template, Clean up Megatron -> PT script * Optimize encode & decode performance * added new attention * added new attention * attention for gpt-sw3 working * attention good * Cache is now working * fixed attention mask so that it works with causal attention * fixed badbmm bug for cpu and caching * updated config with correct parameters * Refactor and leave optimizations as separate functions to avoid breaking expected functionality * Fix special tokens mapping for both tokenizers * cleaning up of code and comments * HF compatible attention outputs * Tokenizer now passing tests, add documentation * Update documentation * reverted back to base implementation after checking that it is identical to pretrained model * updated gpt-sw3 config * updated conversion script * aligned parameters with gpt-sw3 config * changed default scale_attn_by_inverse_layer_idx to true * removed flag from conversion script * added temporary model path * reverted back to functioning convert script * small changes to default config * updated tests for gpt-sw3 * make style, make quality, minor cleanup * Change local paths to testing online repository * Change name: GptSw3 -> GPTSw3 * Remove GPTSw3TokenizerFast references * Use official model repository and add more model sizes * Added reference to 6.7b model * Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel * Remove pointers to non-existing TFGPTSw3 * Add GPTSw3 to docs/_toctree.yml * Remove TF artifacts from GPTSw3 in __init__ files * Update README:s with 'make fix-copies' * Add 20b model to archive list * Add documentation for GPT-Sw3 * Fix typo in documentation for GPT-Sw3 * Do 'make fix-copies' again after having updated docs * Fix some typos in docs * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Resolve comments from PR feedback * Resolve more comments from PR feedback, also set use_cache=True in convert script * Add '# Copied from' comments for GPTSw3 modeling * Set 'is_parallelizable = False' * Remove '# Copied from' where code was modified and add 'with x->y' when appropriate * Remove parallelize in mdx * make style, make quality * Update GPTSw3Config default values and corresponding documentation * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Sylvain Gugger <[email protected]> * Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available * Make style, make quality * Add dummy object for GPTSw3Tokenizer via 'make fix-copies' * make fix-copies * Remove GPTSw3 modeling classes * make style, make quality * Add GPTSw3 auto-mappings for other GPT2 heads * Update docs/source/en/model_doc/gpt-sw3.mdx Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Remove old TODO-comment * Add example usage to GPTSw3Tokenizer docstring * make style, make quality * Add implementation details and example usage to gpt-sw3.mdx Co-authored-by: JoeyOhman <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
This adds the gpt-sw3 models and tokenizer to hf. The models are developed by AI Sweden and others. They are gpt models trained from scratch with the nemo-megatron framework and will initially range in sizes from 128m to 20B. The models are multilingual and the languages in the models are English, Swedish, Norwegian, Danish and Icelandic.
Fixes # (issue) #20176
@ArthurZucker