Add gpt-sw3 model to transformers #20209

ekgren · 2022-11-14T14:04:00Z

This adds the gpt-sw3 models and tokenizer to hf. The models are developed by AI Sweden and others. They are gpt models trained from scratch with the nemo-megatron framework and will initially range in sizes from 128m to 20B. The models are multilingual and the languages in the models are English, Swedish, Norwegian, Danish and Icelandic.

Fixes # (issue) #20176

@ArthurZucker

ArthurZucker · 2022-11-16T11:21:07Z

Hey! Feel free to ping me if you need any pointers! :)
5seems like the history is a bit broken at this point rebasing with a force push should help.

…cript

…add_gpt_sw3

…ing expected functionality

JoeyOhman · 2022-12-09T14:21:29Z

Actually, it seems like the modeling code is exactly the same as for GPT2? In this case you can just set in the auto-mappings a correspondance ("gpt-sw3", "GPT2Model") without needing to add a new model module.

Thank you for your feedback, we're happy to follow your lead on how to proceed! So, if we understand you correctly, we should then remove modeling_gpt_sw3.py, configuration_gpt_sw3.py entirely?

Yep sorry for the late reply! Let's do the same as what was done with BertJapanese. I'll review again sorry for not realising sooner # Copied from sweat

Should we await further review or simply get started on this?

@sgugger @ArthurZucker

sgugger · 2022-12-09T14:30:51Z

Yes, that would be easier. Just remove the model and config files and in the auto mapping, use the GPT2 classes.

sgugger · 2022-12-09T15:36:13Z

src/transformers/models/auto/modeling_auto.py

        ("fsmt", "FSMTModel"),
        ("funnel", ("FunnelModel", "FunnelBaseModel")),
        ("glpn", "GLPNModel"),
+        ("gpt-sw3", "GPT2Model"),


You will need to map all the other model with heads too. Also you get the TF and Flax models for free if you add the same things in modeling_tf_auto and modeling_flax_auto :-)

Thank you, hopefully fixed now!

JoeyOhman · 2022-12-12T12:17:15Z

Thank you again for your help, I hope we have now resolved all of your issues. Do you see anything else required from our side in this PR? @sgugger @ArthurZucker

ArthurZucker

Well great work! It's super clean and LGTM, let's wait for @sgugger and I think we can merge!

ArthurZucker · 2022-12-08T20:03:02Z

src/transformers/models/gpt_sw3/modeling_gpt_sw3.py

+        output_type=BaseModelOutputWithPastAndCrossAttentions,
+        config_class=_CONFIG_FOR_DOC,
+    )
+    def forward(


Okay good job, exactly what needed to be done 😄

docs/source/en/model_doc/gpt-sw3.mdx

src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py

src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py

ArthurZucker · 2022-12-12T13:19:39Z

tests/models/gpt_sw3/test_tokenization_gpt_sw3.py

+    def test_vocab_size(self):
+        self.assertEqual(self.get_tokenizer().vocab_size, 2_000)
+
+    # TODO: these tests will differ with our 2 tokenizers, might be able to hard-code it for one


ArthurZucker · 2022-12-12T13:21:23Z

src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py

+
+            - `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
+              BPE-dropout.
+


Very small nitm, would be cool to have an example here of importing the tokenizer and tokenization of a Swedish sentence! 😉

Agree, added! :)

Co-authored-by: Arthur <[email protected]>

sgugger

Thanks for bearing with us, it all looks good to me!

ArthurZucker · 2022-12-12T15:41:21Z

A last nit @JoeyOhman , could you add an example of a pretrained model that you released being loaded in the doc? Like what is done with BertJapanese here. Would help people to understand that they can use the GPT2 model with this tokenizer 😉

ArthurZucker · 2022-12-13T17:07:04Z

Hey @ekgren could you add the correct checkpoints? They are probably private.
See our CI fail here

ekgren · 2022-12-14T13:07:56Z

@sgugger @ArthurZucker Thank you for all the help and guidance! We have made all the tokenizers reffered to in the PR public.

We encountered some internal issues with the model sharing in the last minute, very sorry for that. Currently we are not allowed to share the model files publicly. However we can share the tokenizer and would very much like for it to be included in huggingface, since those with private access to the model easily can use the full hf ecosystem. We hope to be able to share the models fully public in the near future.

Hopefully our PR can still be included in the release now that the tests should pass.

ArthurZucker · 2022-12-14T14:25:26Z

No problem, I was thinking about the tokenizer rather than the actual checkpoints! You were mostly adding a tokenizer so I don't really see an issue with this 😉 Thanks for the contribution!

* Add templates for gpt-sw3 * Add templates for gpt-sw3 * Added sentencepiece tokenizer * intermediate commit with many changes * fixed conflicts * Init commit for tokenization port * Tokenization progress * Remove fast tokenizer * Clean up and rename spm.model -> spiece.model * Remove TF -> PT conversion script template, Clean up Megatron -> PT script * Optimize encode & decode performance * added new attention * added new attention * attention for gpt-sw3 working * attention good * Cache is now working * fixed attention mask so that it works with causal attention * fixed badbmm bug for cpu and caching * updated config with correct parameters * Refactor and leave optimizations as separate functions to avoid breaking expected functionality * Fix special tokens mapping for both tokenizers * cleaning up of code and comments * HF compatible attention outputs * Tokenizer now passing tests, add documentation * Update documentation * reverted back to base implementation after checking that it is identical to pretrained model * updated gpt-sw3 config * updated conversion script * aligned parameters with gpt-sw3 config * changed default scale_attn_by_inverse_layer_idx to true * removed flag from conversion script * added temporary model path * reverted back to functioning convert script * small changes to default config * updated tests for gpt-sw3 * make style, make quality, minor cleanup * Change local paths to testing online repository * Change name: GptSw3 -> GPTSw3 * Remove GPTSw3TokenizerFast references * Use official model repository and add more model sizes * Added reference to 6.7b model * Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel * Remove pointers to non-existing TFGPTSw3 * Add GPTSw3 to docs/_toctree.yml * Remove TF artifacts from GPTSw3 in __init__ files * Update README:s with 'make fix-copies' * Add 20b model to archive list * Add documentation for GPT-Sw3 * Fix typo in documentation for GPT-Sw3 * Do 'make fix-copies' again after having updated docs * Fix some typos in docs * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Resolve comments from PR feedback * Resolve more comments from PR feedback, also set use_cache=True in convert script * Add '# Copied from' comments for GPTSw3 modeling * Set 'is_parallelizable = False' * Remove '# Copied from' where code was modified and add 'with x->y' when appropriate * Remove parallelize in mdx * make style, make quality * Update GPTSw3Config default values and corresponding documentation * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Sylvain Gugger <[email protected]> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Sylvain Gugger <[email protected]> * Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available * Make style, make quality * Add dummy object for GPTSw3Tokenizer via 'make fix-copies' * make fix-copies * Remove GPTSw3 modeling classes * make style, make quality * Add GPTSw3 auto-mappings for other GPT2 heads * Update docs/source/en/model_doc/gpt-sw3.mdx Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <[email protected]> * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Arthur <[email protected]> * Remove old TODO-comment * Add example usage to GPTSw3Tokenizer docstring * make style, make quality * Add implementation details and example usage to gpt-sw3.mdx Co-authored-by: JoeyOhman <[email protected]> Co-authored-by: Arthur <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>

Add templates for gpt-sw3

05ef62e

ekgren marked this pull request as draft November 14, 2022 14:04

ekgren added 4 commits November 16, 2022 11:55

Add templates for gpt-sw3

aa2fb95

Added sentencepiece tokenizer

1778a3f

intermediate commit with many changes

a327bc9

gpt-sw3 updates

85a8643

ekgren and others added 23 commits November 16, 2022 14:34

fixed conflicts

6e05043

Init commit for tokenization port

e333cd2

Tokenization progress

268b116

Remove fast tokenizer

e6d806a

Clean up and rename spm.model -> spiece.model

e5b05e4

Remove TF -> PT conversion script template, Clean up Megatron -> PT s…

17bbc59

…cript

Optimize encode & decode performance

4167192

added new attention

26d522b

added new attention

96f5d0e

Merge branch 'add_gpt_sw3' of github.com:ekgren/transformers-hf into …

a94aa0e

…add_gpt_sw3

attention for gpt-sw3 working

39a8e8f

attention good

1d7759f

Cache is now working

b7ef07a

fixed attention mask so that it works with causal attention

39892bf

fixed badbmm bug for cpu and caching

891cfb0

updated config with correct parameters

8ed7fb2

Refactor and leave optimizations as separate functions to avoid break…

eb1336b

…ing expected functionality

Fix special tokens mapping for both tokenizers

b9be87f

cleaning up of code and comments

6d8de24

fixed conflicts in convert script

65577c2

HF compatible attention outputs

285b33b

Tokenizer now passing tests, add documentation

682556f

Update documentation

d3a143e

JoeyOhman added 3 commits December 9, 2022 11:10

Add dummy object for GPTSw3Tokenizer via 'make fix-copies'

26930a4

Merge remote-tracking branch 'upstream/main' into add_gpt_sw3

c6b754d

make fix-copies

0201440

JoeyOhman added 2 commits December 9, 2022 16:09

Remove GPTSw3 modeling classes

dc6ce32

make style, make quality

ef1ec13

sgugger reviewed Dec 9, 2022

View reviewed changes

Add GPTSw3 auto-mappings for other GPT2 heads

c475766

ArthurZucker approved these changes Dec 12, 2022

View reviewed changes

JoeyOhman and others added 6 commits December 12, 2022 15:42

Update docs/source/en/model_doc/gpt-sw3.mdx

609b47c

Co-authored-by: Arthur <[email protected]>

Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py

f247b6c

Co-authored-by: Arthur <[email protected]>

Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py

b5bc165

Co-authored-by: Arthur <[email protected]>

Remove old TODO-comment

965bd5e

Add example usage to GPTSw3Tokenizer docstring

6790be4

make style, make quality

81006ea

sgugger approved these changes Dec 12, 2022

View reviewed changes

JoeyOhman added 2 commits December 12, 2022 17:53

Add implementation details and example usage to gpt-sw3.mdx

d9a1d9e

Merge remote-tracking branch 'upstream/main' into add_gpt_sw3

df4278c

sgugger merged commit 5f94855 into huggingface:main Dec 12, 2022

ekgren deleted the add_gpt_sw3 branch December 13, 2022 09:30

mnaylor5 mentioned this pull request Mar 21, 2023

Add Mega: Moving Average Equipped Gated Attention #21766

Merged

5 tasks


		- `alpha`: Smoothing parameter for unigram sampling, and dropout probability of merge operations for
		BPE-dropout.

Add gpt-sw3 model to transformers #20209

Add gpt-sw3 model to transformers #20209

Uh oh!

Conversation

ekgren commented Nov 14, 2022 • edited by LysandreJik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker commented Nov 16, 2022

Uh oh!

JoeyOhman commented Dec 9, 2022

Uh oh!

sgugger commented Dec 9, 2022

Uh oh!

sgugger Dec 9, 2022

Choose a reason for hiding this comment

Uh oh!

JoeyOhman Dec 9, 2022

Choose a reason for hiding this comment

Uh oh!

JoeyOhman commented Dec 12, 2022

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 8, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

JoeyOhman Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented Dec 12, 2022

Uh oh!

ArthurZucker commented Dec 13, 2022

Uh oh!

ekgren commented Dec 14, 2022

Uh oh!

ArthurZucker commented Dec 14, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ekgren commented Nov 14, 2022 •

edited by LysandreJik

Loading