-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Add gpt-sw3 model to transformers #20209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 58 commits
Commits
Show all changes
94 commits
Select commit
Hold shift + click to select a range
05ef62e
Add templates for gpt-sw3
ekgren aa2fb95
Add templates for gpt-sw3
ekgren 1778a3f
Added sentencepiece tokenizer
ekgren a327bc9
intermediate commit with many changes
ekgren 85a8643
gpt-sw3 updates
ekgren 6e05043
fixed conflicts
ekgren e333cd2
Init commit for tokenization port
JoeyOhman 268b116
Tokenization progress
JoeyOhman e6d806a
Remove fast tokenizer
JoeyOhman e5b05e4
Clean up and rename spm.model -> spiece.model
JoeyOhman 17bbc59
Remove TF -> PT conversion script template, Clean up Megatron -> PT s…
JoeyOhman 4167192
Optimize encode & decode performance
JoeyOhman 26d522b
added new attention
ekgren 96f5d0e
added new attention
ekgren a94aa0e
Merge branch 'add_gpt_sw3' of github.com:ekgren/transformers-hf into …
ekgren 39a8e8f
attention for gpt-sw3 working
ekgren 1d7759f
attention good
ekgren b7ef07a
Cache is now working
JoeyOhman 39892bf
fixed attention mask so that it works with causal attention
ekgren 891cfb0
fixed badbmm bug for cpu and caching
ekgren 8ed7fb2
updated config with correct parameters
ekgren eb1336b
Refactor and leave optimizations as separate functions to avoid break…
JoeyOhman b9be87f
Fix special tokens mapping for both tokenizers
JoeyOhman 6d8de24
cleaning up of code and comments
ekgren 65577c2
fixed conflicts in convert script
ekgren 285b33b
HF compatible attention outputs
JoeyOhman 682556f
Tokenizer now passing tests, add documentation
JoeyOhman d3a143e
Update documentation
JoeyOhman 5e40481
reverted back to base implementation after checking that it is identi…
ekgren eb1d4cb
updated gpt-sw3 config
ekgren 01192f0
updated conversion script
ekgren 29bebce
Merge branch 'add_gpt_sw3' of github.com:ekgren/transformers-hf into …
ekgren bfde918
aligned parameters with gpt-sw3 config
ekgren 988a9ca
changed default scale_attn_by_inverse_layer_idx to true
ekgren 4abb731
removed flag from conversion script
ekgren fe2e353
added temporary model path
ekgren b0f94a9
reverted back to functioning convert script
ekgren cfef112
small changes to default config
ekgren cefe37b
updated tests for gpt-sw3
ekgren fa815f6
Merge remote-tracking branch 'upstream/main' into add_gpt_sw3
ekgren a76f00f
Merge remote-tracking branch 'upstream/main' into add_gpt_sw3
ekgren 9e9742a
make style, make quality, minor cleanup
JoeyOhman 4524076
Change local paths to testing online repository
JoeyOhman 1ef7b47
Change name: GptSw3 -> GPTSw3
JoeyOhman f50cf3d
Remove GPTSw3TokenizerFast references
JoeyOhman 07caf98
Use official model repository and add more model sizes
JoeyOhman 199f260
Added reference to 6.7b model
ekgren 964227f
Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2Do…
JoeyOhman cfa82f7
Remove pointers to non-existing TFGPTSw3
JoeyOhman e22608e
Add GPTSw3 to docs/_toctree.yml
JoeyOhman d6d8eb2
Remove TF artifacts from GPTSw3 in __init__ files
JoeyOhman 65ae666
Merge remote-tracking branch 'upstream/main' into add_gpt_sw3
JoeyOhman fca7bcf
Update README:s with 'make fix-copies'
JoeyOhman bc5ae81
Add 20b model to archive list
JoeyOhman 05fb9d8
Add documentation for GPT-Sw3
JoeyOhman 26783cd
Fix typo in documentation for GPT-Sw3
JoeyOhman d8d8ff5
Do 'make fix-copies' again after having updated docs
JoeyOhman 7795ce4
Fix some typos in docs
JoeyOhman b353072
Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
JoeyOhman 30706f7
Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py
JoeyOhman 9e6f545
Update src/transformers/models/gpt_sw3/__init__.py
JoeyOhman 2e44e88
Update src/transformers/models/gpt_sw3/__init__.py
JoeyOhman a15ee21
Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
JoeyOhman 5e18908
Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
JoeyOhman 0d02ec5
Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py
JoeyOhman 195bd0c
Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
JoeyOhman 5140f52
Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py
JoeyOhman 718c441
Resolve comments from PR feedback
JoeyOhman 3ab3643
Merge branch 'add_gpt_sw3' of github.com:ekgren/transformers-hf into …
JoeyOhman d309d22
Resolve more comments from PR feedback, also set use_cache=True in co…
JoeyOhman 98002ab
Add '# Copied from' comments for GPTSw3 modeling
JoeyOhman 5bafb6a
Set 'is_parallelizable = False'
JoeyOhman cc2b702
Remove '# Copied from' where code was modified and add 'with x->y' wh…
JoeyOhman 81bf9ca
Remove parallelize in mdx
JoeyOhman 714d7fb
make style, make quality
JoeyOhman 2cfaa1c
Update GPTSw3Config default values and corresponding documentation
JoeyOhman 2fcee22
Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
JoeyOhman c97dca8
Update src/transformers/models/gpt_sw3/__init__.py
JoeyOhman 1d09a6b
Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_av…
JoeyOhman 62a41e8
Make style, make quality
JoeyOhman 26930a4
Add dummy object for GPTSw3Tokenizer via 'make fix-copies'
JoeyOhman c6b754d
Merge remote-tracking branch 'upstream/main' into add_gpt_sw3
JoeyOhman 0201440
make fix-copies
JoeyOhman dc6ce32
Remove GPTSw3 modeling classes
JoeyOhman ef1ec13
make style, make quality
JoeyOhman c475766
Add GPTSw3 auto-mappings for other GPT2 heads
JoeyOhman 609b47c
Update docs/source/en/model_doc/gpt-sw3.mdx
JoeyOhman f247b6c
Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py
JoeyOhman b5bc165
Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py
JoeyOhman 965bd5e
Remove old TODO-comment
JoeyOhman 6790be4
Add example usage to GPTSw3Tokenizer docstring
JoeyOhman 81006ea
make style, make quality
JoeyOhman d9a1d9e
Add implementation details and example usage to gpt-sw3.mdx
JoeyOhman df4278c
Merge remote-tracking branch 'upstream/main' into add_gpt_sw3
JoeyOhman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| <!--Copyright 2022 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
| # GPT-Sw3 | ||
|
|
||
| ## Overview | ||
|
|
||
| The GPT-Sw3 model was first proposed in | ||
| [Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.376.pdf) | ||
| by Ariel Ekgren, Amaru Cuba Gyllensten, Evangelia Gogoulou, Alice Heiman, Severine Verlinden, Joey Öhman, | ||
| Fredrik Carlsson, Magnus Sahlgren. | ||
|
|
||
| Since that first paper we have extended our work and trained new models on our new 1.2TB corpora named The Nordic Pile. | ||
ekgren marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| GPT-SW3 is a collection of large decoder-only pretrained transformer language models that were developed by AI Sweden | ||
| in collaboration with RISE and the WASP WARA for Media and Language. GPT-SW3 has been trained on a dataset containing | ||
| 320B tokens in Swedish, Norwegian, Danish, Icelandic, English, and programming code. The model was pretrained using a | ||
| causal language modeling (CLM) objective utilizing the NeMo Megatron GPT implementation. | ||
|
|
||
| This model was contributed by [AI Sweden](https://huggingface.co/AI-Sweden). | ||
|
|
||
|
|
||
| ## GPTSw3Config | ||
|
|
||
| [[autodoc]] GPTSw3Config | ||
|
|
||
| ## GPTSw3Tokenizer | ||
|
|
||
| [[autodoc]] GPTSw3Tokenizer | ||
| - save_vocabulary | ||
|
|
||
| ## GPTSw3 specific outputs | ||
|
|
||
| [[autodoc]] models.gpt_sw3.modeling_gpt_sw3.GPTSw3DoubleHeadsModelOutput | ||
|
|
||
| ## GPTSw3Model | ||
|
|
||
| [[autodoc]] GPTSw3Model | ||
| - forward | ||
| - parallelize | ||
| - deparallelize | ||
ekgren marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## GPTSw3LMHeadModel | ||
|
|
||
| [[autodoc]] GPTSw3LMHeadModel | ||
| - forward | ||
| - parallelize | ||
| - deparallelize | ||
|
|
||
| ## GPTSw3DoubleHeadsModel | ||
|
|
||
| [[autodoc]] GPTSw3DoubleHeadsModel | ||
| - forward | ||
|
|
||
| ## GPTSw3ForSequenceClassification | ||
|
|
||
| [[autodoc]] GPTSw3ForSequenceClassification | ||
| - forward | ||
|
|
||
| ## GPTSw3ForTokenClassification | ||
|
|
||
| [[autodoc]] GPTSw3ForTokenClassification | ||
| - forward | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.