-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Add qwen2 #28436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add qwen2 #28436
Changes from 5 commits
Commits
Show all changes
76 commits
Select commit
Hold shift + click to select a range
f5bf018
add config, modeling, and tokenization
JustinLin610 3226f3a
add auto and init
JustinLin610 52ab139
update readme
JustinLin610 0fe20cd
update readme
JustinLin610 8299266
update team name
JustinLin610 b8a8e22
fixup
JustinLin610 55802f3
fixup
JustinLin610 6b587b8
update config
JustinLin610 dfe7d2e
update code style
JustinLin610 525de40
update for fixup
JustinLin610 7f6f5e2
update for fixup
JustinLin610 36fcf65
update for fixup
JustinLin610 42f561a
update for testing
JustinLin610 105d98f
update for testing
JustinLin610 47e6e9e
fix bug for config and tokenization
JustinLin610 5061a26
fix bug for bos token
JustinLin610 0987dab
not doctest
JustinLin610 bf4e928
debug tokenizer
JustinLin610 2658baf
not doctest
JustinLin610 69b03e7
debug tokenization
JustinLin610 725ab32
debug init for tokenizer
JustinLin610 57264f3
fix style
JustinLin610 0489d0f
update init
JustinLin610 e3b60a6
delete if in token auto
JustinLin610 e1b4a8c
add tokenizer doc
JustinLin610 22d3402
add tokenizer in init
JustinLin610 c05a7be
Update dummy_tokenizers_objects.py
jklj077 783eec8
Merge pull request #1 from jklj077/patch-1
JustinLin610 c24ee57
update
JustinLin610 0f83bc1
update
JustinLin610 e68a65c
debug
JustinLin610 03ff228
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 2e00431
Update tokenization_qwen2.py
jklj077 73695ff
Merge pull request #3 from jklj077/patch-1
JustinLin610 e041c79
debug
JustinLin610 e96c5c2
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 9cda260
Update convert_slow_tokenizer.py
jklj077 a81deed
add copies
e652aa9
Merge pull request #4 from jklj077/patch-3
JustinLin610 985fe0f
add copied from and make style
JustinLin610 f7beb77
update files map
35a7480
update test
26a24db
Merge pull request #5 from jklj077/patch-4
JustinLin610 f419098
fix style
JustinLin610 6b3247b
fix merge reading and update tests
68968f5
Merge pull request #6 from jklj077/patch-6
JustinLin610 9c45e77
fix tests
93bae64
fix tests
1345aae
Merge pull request #7 from jklj077/patch-6
JustinLin610 bd57c63
fix style
JustinLin610 25d43f0
debug a variable in readme
JustinLin610 e68a1b8
Update src/transformers/models/qwen2/configuration_qwen2.py
JustinLin610 4135193
update test and copied from
JustinLin610 8708e50
fix style
JustinLin610 db82439
update qwen2 tokenization and tests
e0c7750
Update tokenization_qwen2.py
JustinLin610 7c13f74
Merge pull request #8 from jklj077/patch-7
JustinLin610 8ebacad
delete the copied from after property
JustinLin610 7b670a8
fix style
JustinLin610 304f0dc
update tests
8b144e1
Merge branch 'patch-7' of https://github.com/jklj077/transformers int…
ca4a8c5
Merge pull request #9 from jklj077/patch-7
JustinLin610 1ce4e4d
update tests
04c4a75
Merge pull request #10 from jklj077/patch-7
JustinLin610 93cbb5d
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 1e0b937
add copied from
JustinLin610 22f7b1e
fix bugs
JustinLin610 69d3f89
update doc
JustinLin610 2840f0a
add warning for sliding window attention
87f6cf7
update qwen2 tokenization
0a47113
Merge pull request #11 from jklj077/patch-8
JustinLin610 047e372
fix style
JustinLin610 6b6dae0
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 a4e80ee
Update src/transformers/models/qwen2/modeling_qwen2.py
JustinLin610 39cc60b
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 a8b0c3c
fix tokenizer fast
JustinLin610 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| <!--Copyright 2024 The Qwen Team and The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
|
|
||
| ⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | ||
| rendered properly in your Markdown viewer. | ||
|
|
||
| --> | ||
|
|
||
| # Qwen2 | ||
|
|
||
| ## Overview | ||
|
|
||
| Qwen2 is the new model series of large language models from the Qwen team. Previously, we released the Qwen series, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc. | ||
|
|
||
| ### Model Details | ||
|
|
||
| Qwen2 is a language model series including decoder LMs of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, QKV bias, GQA, mixture of SWA and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and code. | ||
|
|
||
|
|
||
| ## Usage tips | ||
|
|
||
| `Qwen2-7B-beta` and `Qwen2-7B-Chat-beta` can be found on the [Huggingface Hub](https://huggingface.co/Qwen) | ||
|
|
||
| In the following, we demonstrate how to use `Qwen2-7B-Chat-beta` for the inference. Note that we have used the ChatML format for dialog, in this demo we show how to leverage `apply_chat_template` for this purpose. | ||
|
|
||
| ```python | ||
| >>> from transformers import AutoModelForCausalLM, AutoTokenizer | ||
| >>> device = "cuda" # the device to load the model onto | ||
|
|
||
| >>> model = AutoModelForCausalLM.from_pretrained("Qwen2/Qwen2-7B-Chat-beta", device_map="auto") | ||
| >>> tokenizer = AutoTokenizer.from_pretrained("Qwen2/Qwen2-7B-Chat-beta") | ||
|
|
||
| >>> prompt = "Give me a short introduction to large language model." | ||
|
|
||
| >>> messages = [{"role": "user", "content": prompt}] | ||
|
|
||
| >>> text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | ||
|
|
||
| >>> model_inputs = tokenizer([text], return_tensors="pt").to(device) | ||
|
|
||
| >>> generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=100, do_sample=True) | ||
| >>> response = tokenizer.batch_decode( | ||
| generate_ids, | ||
| skip_special_tokens=False, | ||
| clean_up_tokenization_spaces=False | ||
| )[0][len(text):].strip('<|im_end|>') | ||
| ``` | ||
|
|
||
| ## Qwen2Config | ||
|
|
||
| [[autodoc]] Qwen2Config | ||
|
|
||
| ## Qwen2Model | ||
|
|
||
| [[autodoc]] Qwen2Model | ||
| - forward | ||
|
|
||
| ## Qwen2ForCausalLM | ||
|
|
||
| [[autodoc]] Qwen2ForCausalLM | ||
| - forward | ||
|
|
||
| ## Qwen2ForSequenceClassification | ||
|
|
||
| [[autodoc]] Qwen2ForSequenceClassification | ||
| - forward | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -176,6 +176,7 @@ | |
| prophetnet, | ||
| pvt, | ||
| qdqbert, | ||
| qwen2, | ||
| rag, | ||
| realm, | ||
| reformer, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| # Copyright 2024 The Qwen Team and The HuggingFace Inc. team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from ...utils import ( | ||
| OptionalDependencyNotAvailable, | ||
| _LazyModule, | ||
| is_torch_available, | ||
| ) | ||
|
|
||
|
|
||
| _import_structure = { | ||
| "configuration_qwen2": ["QWEN2_PRETRAINED_CONFIG_ARCHIVE_MAP", "Qwen2Config"], | ||
| } | ||
|
|
||
|
|
||
| try: | ||
| if not is_torch_available(): | ||
| raise OptionalDependencyNotAvailable() | ||
| except OptionalDependencyNotAvailable: | ||
| pass | ||
| else: | ||
| _import_structure["modeling_qwen2"] = [ | ||
| "Qwen2ForCausalLM", | ||
| "Qwen2Model", | ||
| "Qwen2PreTrainedModel", | ||
| "Qwen2ForSequenceClassification", | ||
| ] | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from .configuration_qwen2 import QWEN2_PRETRAINED_CONFIG_ARCHIVE_MAP, Qwen2Config | ||
|
|
||
| try: | ||
| if not is_torch_available(): | ||
| raise OptionalDependencyNotAvailable() | ||
| except OptionalDependencyNotAvailable: | ||
| pass | ||
| else: | ||
| from .modeling_qwen2 import ( | ||
| Qwen2ForCausalLM, | ||
| Qwen2ForSequenceClassification, | ||
| Qwen2Model, | ||
| Qwen2PreTrainedModel, | ||
| ) | ||
|
|
||
|
|
||
| else: | ||
| import sys | ||
|
|
||
| sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acronyms here might not be evident for everyone!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced acronyms with full names