Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
f5bf018
add config, modeling, and tokenization
JustinLin610 Jan 10, 2024
3226f3a
add auto and init
JustinLin610 Jan 10, 2024
52ab139
update readme
JustinLin610 Jan 10, 2024
0fe20cd
update readme
JustinLin610 Jan 10, 2024
8299266
update team name
JustinLin610 Jan 10, 2024
b8a8e22
fixup
JustinLin610 Jan 10, 2024
55802f3
fixup
JustinLin610 Jan 10, 2024
6b587b8
update config
JustinLin610 Jan 10, 2024
dfe7d2e
update code style
JustinLin610 Jan 10, 2024
525de40
update for fixup
JustinLin610 Jan 10, 2024
7f6f5e2
update for fixup
JustinLin610 Jan 10, 2024
36fcf65
update for fixup
JustinLin610 Jan 10, 2024
42f561a
update for testing
JustinLin610 Jan 10, 2024
105d98f
update for testing
JustinLin610 Jan 11, 2024
47e6e9e
fix bug for config and tokenization
JustinLin610 Jan 11, 2024
5061a26
fix bug for bos token
JustinLin610 Jan 11, 2024
0987dab
not doctest
JustinLin610 Jan 11, 2024
bf4e928
debug tokenizer
JustinLin610 Jan 11, 2024
2658baf
not doctest
JustinLin610 Jan 11, 2024
69b03e7
debug tokenization
JustinLin610 Jan 11, 2024
725ab32
debug init for tokenizer
JustinLin610 Jan 11, 2024
57264f3
fix style
JustinLin610 Jan 11, 2024
0489d0f
update init
JustinLin610 Jan 11, 2024
e3b60a6
delete if in token auto
JustinLin610 Jan 11, 2024
e1b4a8c
add tokenizer doc
JustinLin610 Jan 11, 2024
22d3402
add tokenizer in init
JustinLin610 Jan 11, 2024
c05a7be
Update dummy_tokenizers_objects.py
jklj077 Jan 11, 2024
783eec8
Merge pull request #1 from jklj077/patch-1
JustinLin610 Jan 11, 2024
c24ee57
update
JustinLin610 Jan 11, 2024
0f83bc1
update
JustinLin610 Jan 11, 2024
e68a65c
debug
JustinLin610 Jan 11, 2024
03ff228
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 Jan 11, 2024
2e00431
Update tokenization_qwen2.py
jklj077 Jan 11, 2024
73695ff
Merge pull request #3 from jklj077/patch-1
JustinLin610 Jan 11, 2024
e041c79
debug
JustinLin610 Jan 11, 2024
e96c5c2
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 Jan 12, 2024
9cda260
Update convert_slow_tokenizer.py
jklj077 Jan 13, 2024
a81deed
add copies
Jan 13, 2024
e652aa9
Merge pull request #4 from jklj077/patch-3
JustinLin610 Jan 13, 2024
985fe0f
add copied from and make style
JustinLin610 Jan 13, 2024
f7beb77
update files map
Jan 13, 2024
35a7480
update test
Jan 13, 2024
26a24db
Merge pull request #5 from jklj077/patch-4
JustinLin610 Jan 13, 2024
f419098
fix style
JustinLin610 Jan 13, 2024
6b3247b
fix merge reading and update tests
Jan 13, 2024
68968f5
Merge pull request #6 from jklj077/patch-6
JustinLin610 Jan 13, 2024
9c45e77
fix tests
Jan 13, 2024
93bae64
fix tests
Jan 13, 2024
1345aae
Merge pull request #7 from jklj077/patch-6
JustinLin610 Jan 13, 2024
bd57c63
fix style
JustinLin610 Jan 13, 2024
25d43f0
debug a variable in readme
JustinLin610 Jan 14, 2024
e68a1b8
Update src/transformers/models/qwen2/configuration_qwen2.py
JustinLin610 Jan 15, 2024
4135193
update test and copied from
JustinLin610 Jan 15, 2024
8708e50
fix style
JustinLin610 Jan 15, 2024
db82439
update qwen2 tokenization and tests
Jan 15, 2024
e0c7750
Update tokenization_qwen2.py
JustinLin610 Jan 15, 2024
7c13f74
Merge pull request #8 from jklj077/patch-7
JustinLin610 Jan 15, 2024
8ebacad
delete the copied from after property
JustinLin610 Jan 15, 2024
7b670a8
fix style
JustinLin610 Jan 15, 2024
304f0dc
update tests
Jan 15, 2024
8b144e1
Merge branch 'patch-7' of https://github.com/jklj077/transformers int…
Jan 15, 2024
ca4a8c5
Merge pull request #9 from jklj077/patch-7
JustinLin610 Jan 15, 2024
1ce4e4d
update tests
Jan 16, 2024
04c4a75
Merge pull request #10 from jklj077/patch-7
JustinLin610 Jan 16, 2024
93cbb5d
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 Jan 16, 2024
1e0b937
add copied from
JustinLin610 Jan 16, 2024
22f7b1e
fix bugs
JustinLin610 Jan 16, 2024
69d3f89
update doc
JustinLin610 Jan 16, 2024
2840f0a
add warning for sliding window attention
Jan 17, 2024
87f6cf7
update qwen2 tokenization
Jan 17, 2024
0a47113
Merge pull request #11 from jklj077/patch-8
JustinLin610 Jan 17, 2024
047e372
fix style
JustinLin610 Jan 17, 2024
6b6dae0
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 Jan 17, 2024
a4e80ee
Update src/transformers/models/qwen2/modeling_qwen2.py
JustinLin610 Jan 17, 2024
39cc60b
Merge branch 'huggingface:main' into add_qwen2
JustinLin610 Jan 17, 2024
a8b0c3c
fix tokenizer fast
JustinLin610 Jan 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -458,6 +458,7 @@ Current number of checkpoints: ![](https://img.shields.io/endpoint?url=https://h
1. **[ProphetNet](https://huggingface.co/docs/transformers/model_doc/prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
1. **[PVT](https://huggingface.co/docs/transformers/model_doc/pvt)** (from Nanjing University, The University of Hong Kong etc.) released with the paper [Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions](https://arxiv.org/pdf/2102.12122.pdf) by Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao.
1. **[QDQBert](https://huggingface.co/docs/transformers/model_doc/qdqbert)** (from NVIDIA) released with the paper [Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation](https://arxiv.org/abs/2004.09602) by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.
1. **[Qwen2](https://huggingface.co/docs/main/transformers/model_doc/qwen2)** (from the Qwen team, Alibaba Group) released with the paper [Qwen Technical Report](https://arxiv.org/abs/2309.16609) by Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou and Tianhang Zhu.
1. **[RAG](https://huggingface.co/docs/transformers/model_doc/rag)** (from Facebook) released with the paper [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/abs/2005.11401) by Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela.
1. **[REALM](https://huggingface.co/docs/transformers/model_doc/realm.html)** (from Google Research) released with the paper [REALM: Retrieval-Augmented Language Model Pre-Training](https://arxiv.org/abs/2002.08909) by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.
1. **[Reformer](https://huggingface.co/docs/transformers/model_doc/reformer)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -446,6 +446,8 @@
title: ProphetNet
- local: model_doc/qdqbert
title: QDQBert
- local: model_doc/qwen2
title: Qwen2
- local: model_doc/rag
title: RAG
- local: model_doc/realm
Expand Down
74 changes: 74 additions & 0 deletions docs/source/en/model_doc/qwen2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
<!--Copyright 2024 The Qwen Team and The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Qwen2

## Overview

Qwen2 is the new model series of large language models from the Qwen team. Previously, we released the Qwen series, including Qwen-72B, Qwen-1.8B, Qwen-VL, Qwen-Audio, etc.

### Model Details

Qwen2 is a language model series including decoder LMs of different model sizes. For each size, we release the base language model and the aligned chat model. It is based on the Transformer architecture with SwiGLU activation, QKV bias, GQA, mixture of SWA and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and code.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acronyms here might not be evident for everyone!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced acronyms with full names



## Usage tips

`Qwen2-7B-beta` and `Qwen2-7B-Chat-beta` can be found on the [Huggingface Hub](https://huggingface.co/Qwen)

In the following, we demonstrate how to use `Qwen2-7B-Chat-beta` for the inference. Note that we have used the ChatML format for dialog, in this demo we show how to leverage `apply_chat_template` for this purpose.

```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> device = "cuda" # the device to load the model onto

>>> model = AutoModelForCausalLM.from_pretrained("Qwen2/Qwen2-7B-Chat-beta", device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("Qwen2/Qwen2-7B-Chat-beta")

>>> prompt = "Give me a short introduction to large language model."

>>> messages = [{"role": "user", "content": prompt}]

>>> text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

>>> model_inputs = tokenizer([text], return_tensors="pt").to(device)

>>> generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=100, do_sample=True)
>>> response = tokenizer.batch_decode(
generate_ids,
skip_special_tokens=False,
clean_up_tokenization_spaces=False
)[0][len(text):].strip('<|im_end|>')
```

## Qwen2Config

[[autodoc]] Qwen2Config

## Qwen2Model

[[autodoc]] Qwen2Model
- forward

## Qwen2ForCausalLM

[[autodoc]] Qwen2ForCausalLM
- forward

## Qwen2ForSequenceClassification

[[autodoc]] Qwen2ForSequenceClassification
- forward
16 changes: 16 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -711,6 +711,7 @@
],
"models.pvt": ["PVT_PRETRAINED_CONFIG_ARCHIVE_MAP", "PvtConfig"],
"models.qdqbert": ["QDQBERT_PRETRAINED_CONFIG_ARCHIVE_MAP", "QDQBertConfig"],
"models.qwen2": ["QWEN2_PRETRAINED_CONFIG_ARCHIVE_MAP", "Qwen2Config"],
"models.rag": ["RagConfig", "RagRetriever", "RagTokenizer"],
"models.realm": [
"REALM_PRETRAINED_CONFIG_ARCHIVE_MAP",
Expand Down Expand Up @@ -2971,6 +2972,14 @@
"load_tf_weights_in_qdqbert",
]
)
_import_structure["models.qwen2"].extend(
[
"Qwen2ForCausalLM",
"Qwen2ForSequenceClassification",
"Qwen2Model",
"Qwen2PreTrainedModel",
]
)
_import_structure["models.rag"].extend(
[
"RagModel",
Expand Down Expand Up @@ -5405,6 +5414,7 @@
)
from .models.pvt import PVT_PRETRAINED_CONFIG_ARCHIVE_MAP, PvtConfig
from .models.qdqbert import QDQBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, QDQBertConfig
from .models.qwen2 import QWEN2_PRETRAINED_CONFIG_ARCHIVE_MAP, Qwen2Config
from .models.rag import RagConfig, RagRetriever, RagTokenizer
from .models.realm import (
REALM_PRETRAINED_CONFIG_ARCHIVE_MAP,
Expand Down Expand Up @@ -7374,6 +7384,12 @@
QDQBertPreTrainedModel,
load_tf_weights_in_qdqbert,
)
from .models.qwen2 import (
Qwen2ForCausalLM,
Qwen2ForSequenceClassification,
Qwen2Model,
Qwen2PreTrainedModel,
)
from .models.rag import (
RagModel,
RagPreTrainedModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@
prophetnet,
pvt,
qdqbert,
qwen2,
rag,
realm,
reformer,
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,7 @@
("prophetnet", "ProphetNetConfig"),
("pvt", "PvtConfig"),
("qdqbert", "QDQBertConfig"),
("qwen2", "Qwen2Config"),
("rag", "RagConfig"),
("realm", "RealmConfig"),
("reformer", "ReformerConfig"),
Expand Down Expand Up @@ -405,6 +406,7 @@
("prophetnet", "PROPHETNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("pvt", "PVT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("qdqbert", "QDQBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("qwen2", "QWEN2_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("realm", "REALM_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("regnet", "REGNET_PRETRAINED_CONFIG_ARCHIVE_MAP"),
("rembert", "REMBERT_PRETRAINED_CONFIG_ARCHIVE_MAP"),
Expand Down Expand Up @@ -649,6 +651,7 @@
("prophetnet", "ProphetNet"),
("pvt", "PVT"),
("qdqbert", "QDQBert"),
("qwen2", "Qwen2"),
("rag", "RAG"),
("realm", "REALM"),
("reformer", "Reformer"),
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,7 @@
("prophetnet", "ProphetNetModel"),
("pvt", "PvtModel"),
("qdqbert", "QDQBertModel"),
("qwen2", "Qwen2Model"),
("reformer", "ReformerModel"),
("regnet", "RegNetModel"),
("rembert", "RemBertModel"),
Expand Down Expand Up @@ -449,6 +450,7 @@
("plbart", "PLBartForCausalLM"),
("prophetnet", "ProphetNetForCausalLM"),
("qdqbert", "QDQBertLMHeadModel"),
("qwen2", "Qwen2ForCausalLM"),
("reformer", "ReformerModelWithLMHead"),
("rembert", "RemBertForCausalLM"),
("roberta", "RobertaForCausalLM"),
Expand Down Expand Up @@ -792,6 +794,7 @@
("phi", "PhiForSequenceClassification"),
("plbart", "PLBartForSequenceClassification"),
("qdqbert", "QDQBertForSequenceClassification"),
("qwen2", "Qwen2ForSequenceClassification"),
("reformer", "ReformerForSequenceClassification"),
("rembert", "RemBertForSequenceClassification"),
("roberta", "RobertaForSequenceClassification"),
Expand Down
8 changes: 8 additions & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,14 @@
("plbart", ("PLBartTokenizer" if is_sentencepiece_available() else None, None)),
("prophetnet", ("ProphetNetTokenizer", None)),
("qdqbert", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)),
(
"qwen2",
(
"LlamaTokenizer" if is_sentencepiece_available() else None,
"Qwen2Tokenizer" if is_tokenizers_available() else None,
"Qwen2TokenizerFast" if is_tokenizers_available() else None,
),
),
("rag", ("RagTokenizer", None)),
("realm", ("RealmTokenizer", "RealmTokenizerFast" if is_tokenizers_available() else None)),
(
Expand Down
62 changes: 62 additions & 0 deletions src/transformers/models/qwen2/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright 2024 The Qwen Team and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_torch_available,
)


_import_structure = {
"configuration_qwen2": ["QWEN2_PRETRAINED_CONFIG_ARCHIVE_MAP", "Qwen2Config"],
}


try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_qwen2"] = [
"Qwen2ForCausalLM",
"Qwen2Model",
"Qwen2PreTrainedModel",
"Qwen2ForSequenceClassification",
]


if TYPE_CHECKING:
from .configuration_qwen2 import QWEN2_PRETRAINED_CONFIG_ARCHIVE_MAP, Qwen2Config

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_qwen2 import (
Qwen2ForCausalLM,
Qwen2ForSequenceClassification,
Qwen2Model,
Qwen2PreTrainedModel,
)


else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Loading