Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
179 commits
Select commit Hold shift + click to select a range
c8cf718
First version - OPT model
younesbelkada May 4, 2022
9ee623d
Final changes
younesbelkada May 4, 2022
0484ca1
few changes
younesbelkada May 4, 2022
b931db8
few changes
younesbelkada May 4, 2022
681dfc5
fix style issues
younesbelkada May 4, 2022
1e21983
few changes
younesbelkada May 4, 2022
1363221
Update src/transformers/models/auto/tokenization_auto.py
younesbelkada May 4, 2022
8427279
add gen tests
younesbelkada May 4, 2022
5e8e2f5
few changes
younesbelkada May 4, 2022
be0e434
few changes
younesbelkada May 4, 2022
51db79e
some changes
younesbelkada May 5, 2022
99001d3
fix code quality
younesbelkada May 5, 2022
a777bbc
major changes
younesbelkada May 6, 2022
38f7463
rm useless classes
younesbelkada May 6, 2022
c6f3a69
Removed autodoc calls to non-existant classes
ArthurZucker May 6, 2022
30d3db2
Update src/transformers/__init__.py
younesbelkada May 6, 2022
f903445
Update src/transformers/__init__.py
younesbelkada May 6, 2022
bb4ab4a
Update src/transformers/models/auto/modeling_tf_auto.py
younesbelkada May 6, 2022
2a6e288
Replaced OPTTokeniser with GPT2 tokenizer
ArthurZucker May 6, 2022
cb853fd
added GPT2Tokenizer.from_pretrained("patrickvonplaten/opt_gpt2_tokeni…
ArthurZucker May 6, 2022
337e71f
Removed OPTTokenizer
ArthurZucker May 6, 2022
0d9130f
make style
ArthurZucker May 6, 2022
290b7f0
Make style replaces
ArthurZucker May 6, 2022
096eb74
make repo consistency
ArthurZucker May 6, 2022
020843a
Removed PretrainedOPTModel
ArthurZucker May 6, 2022
c63d9f8
fix opt.mdx removed other heads
ArthurZucker May 6, 2022
8b6e496
fix init, removed 3 heads
ArthurZucker May 6, 2022
0303f2b
removed heads
ArthurZucker May 6, 2022
2c0327d
finished cleaning head
ArthurZucker May 6, 2022
4aa6ab2
removed seauence classif and question answering
ArthurZucker May 6, 2022
752f512
removed unused imports
ArthurZucker May 6, 2022
14eeb13
removed useless dummy object for QA, SC and CG
ArthurZucker May 6, 2022
9c96f09
removed tests for removed useless dummy object for QA, SC and CG
ArthurZucker May 6, 2022
54fc962
Removed head_mask using encoder layers which don't exist
ArthurZucker May 6, 2022
06f42ca
fixed test
ArthurZucker May 6, 2022
76e52ac
fix line
ArthurZucker May 6, 2022
556c2f4
added OPT to toctree
ArthurZucker May 6, 2022
1460025
Updated model path with pushed weigths
ArthurZucker May 6, 2022
db100a5
fix model path
ArthurZucker May 6, 2022
d16d40d
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker May 6, 2022
c10f347
fixed code quality
ArthurZucker May 6, 2022
f1fe820
fixed embeddings and generation tests
ArthurZucker May 6, 2022
9b9c65b
update paths
ArthurZucker May 7, 2022
4fb9608
clean comments
ArthurZucker May 7, 2022
ab57047
removed OPTClassificationHead for sentence classification
ArthurZucker May 8, 2022
0c1c791
renamed hidden layer
ArthurZucker May 9, 2022
ac50b44
renamed num layers to standard num_hidden_layers
ArthurZucker May 9, 2022
1505de5
num_attention_heads fix
ArthurZucker May 9, 2022
8ace67b
changes for 125m
younesbelkada May 9, 2022
80296cb
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
younesbelkada May 9, 2022
752c1d2
add first version for 125m
younesbelkada May 9, 2022
77e6e04
add first version - flax
younesbelkada May 9, 2022
1564dac
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
younesbelkada May 9, 2022
abd1f3c
add new version
younesbelkada May 9, 2022
23ff89c
Merge branch 'opt-350-m' of https://github.com/younesbelkada/transfor…
younesbelkada May 9, 2022
5c5c858
causal LM output
ArthurZucker May 9, 2022
41fad01
Merge branch 'opt-350-m' of github.com:younesbelkada/transformers int…
ArthurZucker May 9, 2022
27b55c9
replace output type with BaseModelOutputWithPastAndCrossAttentions
ArthurZucker May 9, 2022
aebd19e
revert working config from 150m to 350m
ArthurZucker May 9, 2022
d0723aa
clean
ArthurZucker May 9, 2022
7575749
removed decoder input ids
ArthurZucker May 9, 2022
66e8298
fixed embed dim
ArthurZucker May 9, 2022
8d4920e
more embed_dim issues
ArthurZucker May 9, 2022
c005840
make style + removed enc_dec test
ArthurZucker May 9, 2022
84eb497
update falx model
ArthurZucker May 9, 2022
043a109
removed troublesome copy
ArthurZucker May 9, 2022
8ba7cbc
added is_encoder_decoder=False to config
ArthurZucker May 9, 2022
2099b5f
added set_input emb fuinction to model class
ArthurZucker May 9, 2022
1c9580f
requires torch on embed test
ArthurZucker May 9, 2022
9f6291d
use head mask instead of decoder head mask input param solves a test
ArthurZucker May 9, 2022
740fcf5
8 test remaining, update
ArthurZucker May 9, 2022
f8c276b
Updated create_and_check_decoder_model_past_large_inputs
ArthurZucker May 9, 2022
fff035f
Make style
ArthurZucker May 9, 2022
30ed9f6
update op tokenizer with condition
ArthurZucker May 9, 2022
69c7ae6
make style
ArthurZucker May 9, 2022
ff09958
See if I can push
patrickvonplaten May 10, 2022
0555b92
some clean up
patrickvonplaten May 10, 2022
5491431
remove linear head hack
patrickvonplaten May 10, 2022
521822f
save intermediate
patrickvonplaten May 10, 2022
61e8023
save correct attention
patrickvonplaten May 10, 2022
7b27a91
add copied from from bart
patrickvonplaten May 10, 2022
26729d7
Merge branch 'main' of https://github.com/huggingface/transformers in…
patrickvonplaten May 10, 2022
7661453
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
25a40b1
fix part of the reviewss
ArthurZucker May 11, 2022
aefa63d
Merge pull request #2 from younesbelkada/opt_branch/opt-350-m
ArthurZucker May 11, 2022
f3b5e24
same changes in naming / conversion
patrickvonplaten May 11, 2022
0365e27
correct mask
patrickvonplaten May 11, 2022
929be23
more fixes
patrickvonplaten May 11, 2022
85ce8e8
fix mask
patrickvonplaten May 11, 2022
72c30c0
get 125m, 6.7b to work
patrickvonplaten May 11, 2022
c969fbf
fixed positionnal embedding length when past key value is provoded
ArthurZucker May 11, 2022
8a066fa
Added do_layer_norm
ArthurZucker May 11, 2022
c2f79c7
solved mismatch in load dictionnary
ArthurZucker May 11, 2022
b01b78c
clean up preapre opt input dict
ArthurZucker May 11, 2022
94ebdce
fixed past key value as bool
ArthurZucker May 11, 2022
dda1063
fix previus
ArthurZucker May 11, 2022
2c7ccae
fixed return dict False tuple issue
ArthurZucker May 11, 2022
372378f
All tests are passing
ArthurZucker May 11, 2022
bc0f722
Make style
ArthurZucker May 11, 2022
7344338
Ignore OPTDecoder non tested
ArthurZucker May 11, 2022
ff32af1
make fix-copies
ArthurZucker May 11, 2022
50702e3
make repo consistency
ArthurZucker May 11, 2022
1127c90
small fix
ArthurZucker May 11, 2022
70c977e
removed uselss @torch.no_grad decorator
ArthurZucker May 11, 2022
6c911c8
make styl;e
ArthurZucker May 11, 2022
1443b19
fix previous opt test
ArthurZucker May 11, 2022
9d0986d
style
ArthurZucker May 11, 2022
c5189b8
make style
ArthurZucker May 11, 2022
55eac1d
added opt documentation
ArthurZucker May 11, 2022
6cc44d1
update OPT_PRETRAINED_MODEL_ARCHIVE_LIST
ArthurZucker May 11, 2022
e9fb2d8
up
patrickvonplaten May 11, 2022
3cdac3b
more fixes
patrickvonplaten May 11, 2022
e9dae8e
model & config work
patrickvonplaten May 11, 2022
d274b01
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
2d64db2
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
51d0817
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
ba9cc6c
added comment on padding hack (+2)
ArthurZucker May 11, 2022
81a6bd2
cleaup
ArthurZucker May 11, 2022
b62cb76
review update
ArthurZucker May 11, 2022
451793e
docstring for missing arg
ArthurZucker May 11, 2022
a7b5200
update pretrained map
ArthurZucker May 11, 2022
e81d5e9
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
9bc53b1
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
fcc77f4
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
3486a22
Update src/transformers/models/opt/__init__.py
ArthurZucker May 11, 2022
787825d
update path and tests
ArthurZucker May 11, 2022
28c68ea
make style
ArthurZucker May 11, 2022
648b3b8
styling
ArthurZucker May 11, 2022
2bc543d
make consistency
ArthurZucker May 11, 2022
bc136fb
Update based on reviews
ArthurZucker May 11, 2022
68e1f79
add gpt2 tok new
patrickvonplaten May 11, 2022
2089155
more tok fixes
patrickvonplaten May 11, 2022
3d4ece4
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
1dd7006
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
3fecdd5
Update docs/source/en/model_doc/opt.mdx
ArthurZucker May 11, 2022
1ab56a3
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
b7fffb7
Update tests/models/opt/test_modeling_opt.py
ArthurZucker May 11, 2022
2ef13f2
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
a3549a0
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
98a55a1
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
136ac08
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
d37e397
Update src/transformers/models/opt/modeling_opt.py
ArthurZucker May 11, 2022
63657bd
Apply suggestions from code review
patrickvonplaten May 12, 2022
74efd29
make style
patrickvonplaten May 12, 2022
9d852a8
make tokenizer auto tests pass
patrickvonplaten May 12, 2022
de1a94d
apply Lysandre suggestion
patrickvonplaten May 12, 2022
72018c2
[trainer] sharded _load_best_model (#17150)
stas00 May 10, 2022
b81e148
[Deepspeed] add many more models to the model zoo test (#12695)
stas00 May 10, 2022
b08145b
Fixing the output of code examples in the preprocessing chapter (#17162)
HallerPatrick May 10, 2022
b310357
missing file (#17164)
stas00 May 10, 2022
1f4ed0f
Add MLFLOW_FLATTEN_PARAMS support in MLflowCallback (#17148)
orieg May 10, 2022
5544db3
Fix template init (#17163)
sgugger May 10, 2022
4bf7b9d
Add DebertaV2ForMultipleChoice (#17135)
zphang May 10, 2022
2b75f63
MobileBERT tokenizer tests (#16896)
leondz May 10, 2022
6e1942c
[M2M100 doc] remove duplicate example (#17175)
patil-suraj May 11, 2022
f537df3
Extend Transformers Trainer Class to Enable PyTorch SGD/Adagrad Optim…
jianan-gu May 11, 2022
5ca32d7
propagate "attention_mask" dtype for "use_past" in OnnxConfig.generat…
arampacha May 11, 2022
6d54aef
Fix repo consistency
sgugger May 11, 2022
cd5d51c
Convert image to rgb for clip model (#17101)
hengkuanwee May 11, 2022
7de0278
Add missing RetriBERT tokenizer tests (#17017)
mpoemsl May 11, 2022
aef4e0c
[WIP] Enable reproducibility for distributed trainings (#16907)
hasansalimkanmaz May 11, 2022
1b98a3e
Remove unnecessary columns for all dataset types in `Trainer` (#17166)
Yard1 May 11, 2022
5d2880a
Fix LED documentation (#17181)
manuelciosici May 11, 2022
320cc6d
Ensure tensors are at least 1d for pad and concat (#17179)
Yard1 May 11, 2022
69fde79
add shift_tokens_right in FlaxMT5 (#17188)
patil-suraj May 11, 2022
b591cfb
Remove columns before passing to data collator (#17187)
Yard1 May 11, 2022
682995b
[feat] Add FLAVA model (#16654)
apsdehal May 11, 2022
6c22b82
Remove duplicated os.path.join (#17192)
shijie-wu May 12, 2022
bf5a316
Spanish translation of philosophy.mdx #15947 (#16922)
jkmg May 12, 2022
54fb499
Added es version of language_modeling.mdx doc (#17021)
jQuinRivero May 12, 2022
afddbb2
Documentation: Spanish translation of fast_tokenizers.mdx (#16882)
jloayza10 May 12, 2022
01dc139
Translate index.mdx (to ES) and add Spanish models to quicktour.mdx e…
omarespejel May 12, 2022
47e9918
Fix style error in Spanish docs (#17197)
osanseviero May 12, 2022
5518389
Fix contents in index.mdx to match docs' sidebar (#17198)
omarespejel May 12, 2022
c48413d
finish tests
patrickvonplaten May 12, 2022
1341b5f
add some good tokenizer tests
patrickvonplaten May 12, 2022
4384047
update flax code
ArthurZucker May 13, 2022
02b2500
update and clean
ArthurZucker May 13, 2022
e2cbaf0
Merge branch 'main' of https://github.com/huggingface/transformers in…
ArthurZucker May 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions docs/source/en/model_doc/opt.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,26 @@ The original code can be found [here](https://github.com/facebookresearch/metase
[[autodoc]] OPTForCausalLM
- forward

## TFOPTModel

[[autodoc]] TFOPTModel
- call

## TFOPTPretrainedModel

[[autodoc]] TFOPTPretrainedModel
- call


## FlaxOPTModel

[[autodoc]] FlaxOPTModel
- __call__
- encode
- decode


## FlaxOPTForCausalLM

[[autodoc]] FlaxOPTForCausalLM
- __call__
11 changes: 11 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2147,6 +2147,7 @@
"TFOpenAIGPTPreTrainedModel",
]
)
_import_structure["models.opt"].extend(["TFOPTModel", "TFOPTPretrainedModel"])
_import_structure["models.pegasus"].extend(
["TFPegasusForConditionalGeneration", "TFPegasusModel", "TFPegasusPreTrainedModel"]
)
Expand Down Expand Up @@ -2485,6 +2486,14 @@
]
)
_import_structure["models.mt5"].extend(["FlaxMT5ForConditionalGeneration", "FlaxMT5Model"])
_import_structure["models.opt"].extend(
[
"FlaxOPTDecoderPreTrainedModel",
"FlaxOPTForCausalLM",
"FlaxOPTModel",
"FlaxOPTPreTrainedModel",
]
)
_import_structure["models.pegasus"].extend(
[
"FlaxPegasusForConditionalGeneration",
Expand Down Expand Up @@ -4319,6 +4328,7 @@
TFOpenAIGPTModel,
TFOpenAIGPTPreTrainedModel,
)
from .models.opt import TFOPTModel, TFOPTPretrainedModel
from .models.pegasus import TFPegasusForConditionalGeneration, TFPegasusModel, TFPegasusPreTrainedModel
from .models.rag import TFRagModel, TFRagPreTrainedModel, TFRagSequenceForGeneration, TFRagTokenForGeneration
from .models.rembert import (
Expand Down Expand Up @@ -4581,6 +4591,7 @@
FlaxMBartPreTrainedModel,
)
from .models.mt5 import FlaxMT5ForConditionalGeneration, FlaxMT5Model
from .models.opt import FlaxOPTDecoderPreTrainedModel, FlaxOPTForCausalLM, FlaxOPTModel, FlaxOPTPreTrainedModel
from .models.pegasus import FlaxPegasusForConditionalGeneration, FlaxPegasusModel, FlaxPegasusPreTrainedModel
from .models.roberta import (
FlaxRobertaForCausalLM,
Expand Down
32 changes: 32 additions & 0 deletions src/transformers/modeling_flax_outputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,38 @@ class FlaxCausalLMOutputWithCrossAttentions(ModelOutput):
attentions: Optional[Tuple[jnp.ndarray]] = None
cross_attentions: Optional[Tuple[jnp.ndarray]] = None

@flax.struct.dataclass
class FlaxCausalLMOutputWithPast(ModelOutput):
"""
Base class for causal language model (or autoregressive) outputs.

Args:
logits (`jnp.ndarray` of shape `(batch_size, sequence_length, config.vocab_size)`):
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
hidden_states (`tuple(jnp.ndarray)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
Tuple of `jnp.ndarray` (one for the output of the embeddings + one for the output of each layer) of shape
`(batch_size, sequence_length, hidden_size)`.

Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (`tuple(jnp.ndarray)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
Tuple of `jnp.ndarray` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
sequence_length)`.

Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
past_key_values (`tuple(tuple(jnp.ndarray))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
Tuple of `jnp.ndarray` tuples of length `config.n_layers`, with each tuple containing the cached key, value
states of the self-attention and the cross-attention layers if model is used in encoder-decoder setting.
Only relevant if `config.is_decoder = True`.

Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (see
`past_key_values` input) to speed up sequential decoding.
"""

logits: jnp.ndarray = None
past_key_values: Optional[Tuple[Tuple[jnp.ndarray]]] = None
hidden_states: Optional[Tuple[jnp.ndarray]] = None
attentions: Optional[Tuple[jnp.ndarray]] = None

@flax.struct.dataclass
class FlaxMaskedLMOutput(ModelOutput):
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/modeling_flax_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
("beit", "FlaxBeitModel"),
("big_bird", "FlaxBigBirdModel"),
("bart", "FlaxBartModel"),
("opt", "FlaxOPTModel"),
("gpt2", "FlaxGPT2Model"),
("gpt_neo", "FlaxGPTNeoModel"),
("gptj", "FlaxGPTJModel"),
Expand Down Expand Up @@ -127,6 +128,7 @@
("gptj", "FlaxGPTJForCausalLM"),
("xglm", "FlaxXGLMForCausalLM"),
("bart", "FlaxBartForCausalLM"),
("opt", "FlaxOPTForCausalLM"),
("bert", "FlaxBertForCausalLM"),
("roberta", "FlaxRobertaForCausalLM"),
("big_bird", "FlaxBigBirdForCausalLM"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/modeling_tf_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
("distilbert", "TFDistilBertModel"),
("albert", "TFAlbertModel"),
("bart", "TFBartModel"),
("opt", "TFOPTModel"),
("camembert", "TFCamembertModel"),
("xlm-roberta", "TFXLMRobertaModel"),
("longformer", "TFLongformerModel"),
Expand Down
23 changes: 22 additions & 1 deletion src/transformers/models/opt/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import _LazyModule, is_tokenizers_available, is_torch_available
from ...utils import _LazyModule, is_tokenizers_available, is_torch_available, is_tf_available, is_flax_available


_import_structure = {
Expand All @@ -33,13 +33,34 @@
"OPTPreTrainedModel",
]

if is_tf_available():
_import_structure["modeling_tf_opt"] = ["TFOPTModel", "TFOPTPretrainedModel"]

if is_flax_available():
_import_structure["modeling_flax_opt"] = [
"FlaxOPTDecoderPreTrainedModel",
"FlaxOPTForCausalLM",
"FlaxOPTModel",
"FlaxOPTPreTrainedModel",
]

if TYPE_CHECKING:
from .configuration_opt import OPT_PRETRAINED_CONFIG_ARCHIVE_MAP, OPTConfig

if is_torch_available():
from .modeling_opt import OPT_PRETRAINED_MODEL_ARCHIVE_LIST, OPTForCausalLM, OPTModel, OPTPreTrainedModel

if is_tf_available():
from .modeling_tf_opt import TFOPTModel, TFOPTPretrainedModel

if is_flax_available():
from .modeling_flax_opt import (
FlaxOPTDecoderPreTrainedModel,
FlaxOPTForCausalLM,
FlaxOPTModel,
FlaxOPTPreTrainedModel,
)

else:
import sys

Expand Down
Loading