Skip to content
Merged

add Glm #33823

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
6366077
Create modular_glm.py
Cyrilvallez Sep 24, 2024
fa83dab
Update modular_glm.py
Cyrilvallez Sep 25, 2024
11d74d9
Finalize architecture without all attentions
Cyrilvallez Sep 25, 2024
a3587d3
Add all attentions modules
Cyrilvallez Sep 25, 2024
9430273
Finalize modular
Cyrilvallez Sep 25, 2024
4efd782
Update given last version
Cyrilvallez Sep 25, 2024
e9efed1
Last update
Cyrilvallez Sep 25, 2024
a411a38
Finalize model
Cyrilvallez Sep 26, 2024
c4d3c21
Finalize converter
Cyrilvallez Sep 26, 2024
1aba7c4
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
ea4d61a
style
Cyrilvallez Sep 26, 2024
b8eed8c
style
Cyrilvallez Sep 26, 2024
4f4e6f6
Create __init__.py
Cyrilvallez Sep 26, 2024
de2e058
Aff all inits
Cyrilvallez Sep 26, 2024
9392380
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
0e76b9d
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
02ae570
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
dee65b1
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
65e393d
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
60f0ca0
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
fa58a78
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
a39465a
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
db141cc
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 26, 2024
7f2427c
Correct the rotary embeddings
Cyrilvallez Sep 27, 2024
0dc9819
Remove apply_residual_connection_post_layernorm (always false)
Cyrilvallez Sep 27, 2024
931e28d
remove use_rms_norm (always true)
Cyrilvallez Sep 27, 2024
49c15d8
remove past_layer_norm (always true)
Cyrilvallez Sep 27, 2024
fc5cd41
Update __init__.py
Cyrilvallez Sep 27, 2024
f7220ba
Update config and license
Cyrilvallez Sep 27, 2024
7c026f8
start adding tests and doc
Cyrilvallez Sep 27, 2024
3331d55
Add doc + style
Cyrilvallez Sep 27, 2024
a1fe067
Update test_modeling_glm.py
Cyrilvallez Sep 27, 2024
7c26a24
Add dummies
Cyrilvallez Sep 27, 2024
239e91b
Apply correct modeling
Cyrilvallez Sep 30, 2024
89624b4
Refactor attention to follow llama
Cyrilvallez Sep 30, 2024
2d887c5
Update __init__.py
Cyrilvallez Sep 30, 2024
20ac604
Update convert_glm_weights_to_hf.py
Cyrilvallez Sep 30, 2024
cd92f6c
Correct bias
Cyrilvallez Sep 30, 2024
e9c16e8
remove linear_bias and pdrop (never used)
Cyrilvallez Sep 30, 2024
fcd47ca
apply modular
Cyrilvallez Sep 30, 2024
9c41850
Simplify converter
Cyrilvallez Oct 1, 2024
c286c6b
remove dummies + style
Cyrilvallez Oct 1, 2024
63028c9
add model_input_names
Cyrilvallez Oct 1, 2024
cce541b
Add pretraining_tp to config for when eager attention is used
Cyrilvallez Oct 1, 2024
307668a
Update modular to remove all pretraining_tp
Cyrilvallez Oct 1, 2024
acf92e0
Update test_modeling_glm.py
Cyrilvallez Oct 1, 2024
ca6ba5e
Update the __all__
Cyrilvallez Oct 1, 2024
0c6c8ef
Update __all__
Cyrilvallez Oct 1, 2024
5f76628
Update __init__.py
Cyrilvallez Oct 1, 2024
3bca672
Update test_modeling_glm.py
Cyrilvallez Oct 1, 2024
4c8ee11
add revisions
Cyrilvallez Oct 1, 2024
82b0757
Add the correct repos and revisions
Cyrilvallez Oct 1, 2024
e394eb8
style
Cyrilvallez Oct 1, 2024
ad68911
Update __init__.py
Cyrilvallez Oct 1, 2024
fd73ff9
update exports
Cyrilvallez Oct 1, 2024
f4494e9
remove import of modular files
Cyrilvallez Oct 1, 2024
e510503
style
Cyrilvallez Oct 1, 2024
2a6b0e3
Apply Llama changes + refine converter
Cyrilvallez Oct 2, 2024
7affb64
Update convert_glm_weights_to_hf.py
Cyrilvallez Oct 2, 2024
c8762d6
Update convert_glm_weights_to_hf.py
Cyrilvallez Oct 2, 2024
4792118
Update convert_glm_weights_to_hf.py
Cyrilvallez Oct 2, 2024
6b2f46b
Update convert_glm_weights_to_hf.py
Cyrilvallez Oct 2, 2024
e97978b
Update convert_glm_weights_to_hf.py
Cyrilvallez Oct 2, 2024
806d035
Update convert_glm_weights_to_hf.py
Cyrilvallez Oct 2, 2024
efe17f8
Update convert_glm_weights_to_hf.py
Cyrilvallez Oct 2, 2024
aab9dcf
Update convert_glm_weights_to_hf.py
Cyrilvallez Oct 2, 2024
4c14a98
style
Cyrilvallez Oct 2, 2024
409ca3e
Use new modular converter
Cyrilvallez Oct 8, 2024
35e9623
add pretrainedmodel to init
Cyrilvallez Oct 8, 2024
684016c
style
Cyrilvallez Oct 8, 2024
764f755
Update test_modeling_glm.py
Cyrilvallez Oct 8, 2024
2739181
Move config outside modular to please CI about docstrings
Cyrilvallez Oct 18, 2024
1d95488
Add dummies to please CI
Cyrilvallez Oct 18, 2024
828509c
Update glm.md
Cyrilvallez Oct 18, 2024
ceec1f6
Update glm.md
Cyrilvallez Oct 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,8 @@
title: Gemma
- local: model_doc/gemma2
title: Gemma2
- local: model_doc/glm
title: GLM
- local: model_doc/openai-gpt
title: GPT
- local: model_doc/gpt_neo
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Flax), PyTorch, and/or TensorFlow.
| [Gemma](model_doc/gemma) | ✅ | ❌ | ✅ |
| [Gemma2](model_doc/gemma2) | ✅ | ❌ | ❌ |
| [GIT](model_doc/git) | ✅ | ❌ | ❌ |
| [GLM](model_doc/glm) | ✅ | ❌ | ❌ |
| [GLPN](model_doc/glpn) | ✅ | ❌ | ❌ |
| [GPT Neo](model_doc/gpt_neo) | ✅ | ❌ | ✅ |
| [GPT NeoX](model_doc/gpt_neox) | ✅ | ❌ | ❌ |
Expand Down
99 changes: 99 additions & 0 deletions docs/source/en/model_doc/glm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
<!--Copyright 2024 The GLM & ZhipuAI team and The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# GLM

## Overview

The GLM Model was proposed
in [ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools](https://arxiv.org/html/2406.12793v1)
by GLM Team, THUDM & ZhipuAI.

The abstract from the paper is the following:

*We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report
primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most
capable models that are trained with all the insights and lessons gained from the preceding three generations of
ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with
a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment
is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human
feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU,
GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3)
matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as
measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide
when and which tool(s) to use—including web browser, Python interpreter, text-to-image model, and user-defined
functions—to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All
Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter.
Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M),
GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone.*

Tips:

- This model was contributed by [THUDM](https://huggingface.co/THUDM). The most recent code can be
found [here](https://github.com/thudm/GLM-4).


## Usage tips

`GLM-4` can be found on the [Huggingface Hub](https://huggingface.co/collections/THUDM/glm-4-665fcf188c414b03c2f7e3b7)

In the following, we demonstrate how to use `glm-4-9b-chat` for the inference. Note that we have used the ChatML format for dialog, in this demo we show how to leverage `apply_chat_template` for this purpose.

```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> device = "cuda" # the device to load the model onto

>>> model = AutoModelForCausalLM.from_pretrained("THUDM/glm-4-9b-chat", device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4-9b-chat")

>>> prompt = "Give me a short introduction to large language model."

>>> messages = [{"role": "user", "content": prompt}]

>>> text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

>>> model_inputs = tokenizer([text], return_tensors="pt").to(device)

>>> generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True)

>>> generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]

>>> response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## GlmConfig

[[autodoc]] GlmConfig

## GlmModel

[[autodoc]] GlmModel
- forward

## GlmForCausalLM

[[autodoc]] GlmForCausalLM
- forward

## GlmForSequenceClassification

[[autodoc]] GlmForSequenceClassification
- forward

## GlmForTokenClassification

[[autodoc]] GlmForTokenClassification
- forward
2 changes: 2 additions & 0 deletions docs/source/en/perf_infer_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [Chameleon](https://huggingface.co/docs/transformers/model_doc/chameleon#transformers.Chameleon)
* [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPModel)
* [Cohere](https://huggingface.co/docs/transformers/model_doc/cohere#transformers.CohereModel)
* [GLM](https://huggingface.co/docs/transformers/model_doc/glm#transformers.GLMModel)
* [Dbrx](https://huggingface.co/docs/transformers/model_doc/dbrx#transformers.DbrxModel)
* [DistilBert](https://huggingface.co/docs/transformers/model_doc/distilbert#transformers.DistilBertModel)
* [Gemma](https://huggingface.co/docs/transformers/model_doc/gemma#transformers.GemmaModel)
Expand Down Expand Up @@ -215,6 +216,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [CamemBERT](https://huggingface.co/docs/transformers/model_doc/camembert#transformers.CamembertModel)
* [Chameleon](https://huggingface.co/docs/transformers/model_doc/chameleon#transformers.Chameleon)
* [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPModel)
* [GLM](https://huggingface.co/docs/transformers/model_doc/glm#transformers.GLMModel)
* [Cohere](https://huggingface.co/docs/transformers/model_doc/cohere#transformers.CohereModel)
* [data2vec_audio](https://huggingface.co/docs/transformers/main/en/model_doc/data2vec#transformers.Data2VecAudioModel)
* [Dbrx](https://huggingface.co/docs/transformers/model_doc/dbrx#transformers.DbrxModel)
Expand Down
18 changes: 18 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,7 @@
"GitProcessor",
"GitVisionConfig",
],
"models.glm": ["GlmConfig"],
"models.glpn": ["GLPNConfig"],
"models.gpt2": [
"GPT2Config",
Expand Down Expand Up @@ -2288,6 +2289,15 @@
"GitVisionModel",
]
)
_import_structure["models.glm"].extend(
[
"GlmForCausalLM",
"GlmForSequenceClassification",
"GlmForTokenClassification",
"GlmModel",
"GlmPreTrainedModel",
]
)
_import_structure["models.glpn"].extend(
[
"GLPNForDepthEstimation",
Expand Down Expand Up @@ -5286,6 +5296,7 @@
GitProcessor,
GitVisionConfig,
)
from .models.glm import GlmConfig
from .models.glpn import GLPNConfig
from .models.gpt2 import (
GPT2Config,
Expand Down Expand Up @@ -7002,6 +7013,13 @@
GitPreTrainedModel,
GitVisionModel,
)
from .models.glm import (
GlmForCausalLM,
GlmForSequenceClassification,
GlmForTokenClassification,
GlmModel,
GlmPreTrainedModel,
)
from .models.glpn import (
GLPNForDepthEstimation,
GLPNModel,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@
gemma,
gemma2,
git,
glm,
glpn,
gpt2,
gpt_bigcode,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@
("gemma", "GemmaConfig"),
("gemma2", "Gemma2Config"),
("git", "GitConfig"),
("glm", "GlmConfig"),
("glpn", "GLPNConfig"),
("gpt-sw3", "GPT2Config"),
("gpt2", "GPT2Config"),
Expand Down Expand Up @@ -415,6 +416,7 @@
("gemma", "Gemma"),
("gemma2", "Gemma2"),
("git", "GIT"),
("glm", "GLM"),
("glpn", "GLPN"),
("gpt-sw3", "GPT-Sw3"),
("gpt2", "OpenAI GPT-2"),
Expand Down
4 changes: 4 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@
("gemma", "GemmaModel"),
("gemma2", "Gemma2Model"),
("git", "GitModel"),
("glm", "GlmModel"),
("glpn", "GLPNModel"),
("gpt-sw3", "GPT2Model"),
("gpt2", "GPT2Model"),
Expand Down Expand Up @@ -485,6 +486,7 @@
("gemma", "GemmaForCausalLM"),
("gemma2", "Gemma2ForCausalLM"),
("git", "GitForCausalLM"),
("glm", "GlmForCausalLM"),
("gpt-sw3", "GPT2LMHeadModel"),
("gpt2", "GPT2LMHeadModel"),
("gpt_bigcode", "GPTBigCodeForCausalLM"),
Expand Down Expand Up @@ -939,6 +941,7 @@
("funnel", "FunnelForSequenceClassification"),
("gemma", "GemmaForSequenceClassification"),
("gemma2", "Gemma2ForSequenceClassification"),
("glm", "GlmForSequenceClassification"),
("gpt-sw3", "GPT2ForSequenceClassification"),
("gpt2", "GPT2ForSequenceClassification"),
("gpt_bigcode", "GPTBigCodeForSequenceClassification"),
Expand Down Expand Up @@ -1125,6 +1128,7 @@
("funnel", "FunnelForTokenClassification"),
("gemma", "GemmaForTokenClassification"),
("gemma2", "Gemma2ForTokenClassification"),
("glm", "GlmForTokenClassification"),
("gpt-sw3", "GPT2ForTokenClassification"),
("gpt2", "GPT2ForTokenClassification"),
("gpt_bigcode", "GPTBigCodeForTokenClassification"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,7 @@
),
),
("git", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)),
("glm", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),
("gpt-sw3", ("GPTSw3Tokenizer" if is_sentencepiece_available() else None, None)),
("gpt2", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
("gpt_bigcode", ("GPT2Tokenizer", "GPT2TokenizerFast" if is_tokenizers_available() else None)),
Expand Down
27 changes: 27 additions & 0 deletions src/transformers/models/glm/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import _LazyModule
from ...utils.import_utils import define_import_structure


if TYPE_CHECKING:
from .configuration_glm import *
from .modeling_glm import *
else:
import sys

_file = globals()["__file__"]
sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
Loading