Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
2ecd9f5
initial commit
zucchini-nlp Oct 1, 2025
5758b7c
just push for now
zucchini-nlp Oct 1, 2025
e9f940b
maybe not do it for all models, lets see how many models fail now
zucchini-nlp Oct 1, 2025
684e799
update
zucchini-nlp Oct 2, 2025
43eb47c
lets see what esle fails now
zucchini-nlp Oct 3, 2025
868cac6
nit
zucchini-nlp Oct 8, 2025
b7db732
merge main
zucchini-nlp Oct 9, 2025
17739ff
style
zucchini-nlp Oct 9, 2025
19a2ba1
delete rope validation
zucchini-nlp Oct 9, 2025
6095a39
bart
zucchini-nlp Oct 9, 2025
26892c1
rebase
zucchini-nlp Feb 3, 2026
bfe2998
make style
zucchini-nlp Feb 3, 2026
d039be1
provate rope valid for now, hub complains
zucchini-nlp Feb 3, 2026
a82d894
more updates
zucchini-nlp Feb 3, 2026
b7b0492
i love backwards compatibility! Let's check if this will work with re…
zucchini-nlp Feb 3, 2026
b9aec45
pin hf hub 1.4.0
zucchini-nlp Feb 4, 2026
7edc1a2
merge main
zucchini-nlp Feb 6, 2026
40d2128
want to check tests
zucchini-nlp Feb 6, 2026
e241202
why do we even keep `use_return_dict` from 6 hyear ago?
zucchini-nlp Feb 9, 2026
7b24e38
special eos token can be a list in many cases, fix type hints
zucchini-nlp Feb 9, 2026
b79200f
batch
zucchini-nlp Feb 9, 2026
b4e93e3
batch
zucchini-nlp Feb 10, 2026
f011bd4
batch
zucchini-nlp Feb 11, 2026
d3a35b6
another small batch
zucchini-nlp Feb 11, 2026
c6b41da
more
zucchini-nlp Feb 12, 2026
4106bb7
more models
zucchini-nlp Feb 13, 2026
d9a9a77
batch
zucchini-nlp Feb 13, 2026
5dcfc55
batch
zucchini-nlp Feb 13, 2026
2a3459a
annoying typings
zucchini-nlp Feb 13, 2026
902ef4e
batch
zucchini-nlp Feb 15, 2026
c75c5f6
batch
zucchini-nlp Feb 16, 2026
40125dc
batch
zucchini-nlp Feb 16, 2026
818bb18
last batch
zucchini-nlp Feb 17, 2026
192e06c
fix repo
zucchini-nlp Feb 17, 2026
ed2fff5
rebase
zucchini-nlp Feb 17, 2026
54b8bca
fix some
zucchini-nlp Feb 17, 2026
ef11216
many many fixes
zucchini-nlp Feb 18, 2026
9e26fb2
fix more
zucchini-nlp Feb 18, 2026
c2d9414
commit a small batch of fixes
zucchini-nlp Feb 18, 2026
2d2c8ec
more fixes
zucchini-nlp Feb 18, 2026
41d3f46
merge man
zucchini-nlp Feb 18, 2026
417a4b5
fix repo and the new model
zucchini-nlp Feb 18, 2026
3f83236
clean up config files from unused imports
zucchini-nlp Feb 18, 2026
5e5ce6a
revert ths one
zucchini-nlp Feb 18, 2026
4c88d54
more new models in main branch
zucchini-nlp Feb 18, 2026
b8f055a
let dropouts be float AND int, who know what we have in the hub!
zucchini-nlp Feb 18, 2026
ce29eee
fix a few more non-modeling tests
zucchini-nlp Feb 18, 2026
523ad39
roep validation is now part of hub strict
zucchini-nlp Feb 18, 2026
6a850bf
oops
zucchini-nlp Feb 18, 2026
9dcfb22
rope and text config
zucchini-nlp Feb 18, 2026
0d8deb9
Merge remote-tracking branch 'upstream/main' into config-validation
zucchini-nlp Feb 18, 2026
8fac6b3
when does this end?
zucchini-nlp Feb 18, 2026
61db1ab
comment out for now
zucchini-nlp Feb 18, 2026
684cbbc
oke, now donw i think
zucchini-nlp Feb 19, 2026
a0a2deb
dropout can be int in saved ckpt, fix again
zucchini-nlp Feb 19, 2026
e76a9e1
fox repo again
zucchini-nlp Feb 19, 2026
bb806c5
processor tests
zucchini-nlp Feb 19, 2026
7ab67fd
nit
zucchini-nlp Feb 19, 2026
982fe62
remove `| None` in typing when not needed!
zucchini-nlp Feb 19, 2026
661a2b2
merge main
zucchini-nlp Mar 2, 2026
eb64d48
fix style
zucchini-nlp Mar 2, 2026
24b4d51
new models
zucchini-nlp Mar 2, 2026
967ff62
subconfig is a cls attr
zucchini-nlp Mar 3, 2026
af6ae64
merge main
zucchini-nlp Mar 3, 2026
9b00a04
fix some tests
zucchini-nlp Mar 3, 2026
bdb0e5f
cosmetic stuff
zucchini-nlp Mar 3, 2026
8ff0243
.
zucchini-nlp Mar 3, 2026
96cf2ff
fix repo
zucchini-nlp Mar 3, 2026
d11d795
the test
zucchini-nlp Mar 3, 2026
3aba70c
please be fixed!
zucchini-nlp Mar 4, 2026
3387dbf
this time is the real final fix. before merging docs
zucchini-nlp Mar 4, 2026
b44b0f4
merge main
zucchini-nlp Mar 6, 2026
b155dea
fix style
zucchini-nlp Mar 6, 2026
731f6c8
fix repo
zucchini-nlp Mar 6, 2026
42c045a
why auto-doc can't resolve inheritance and just copy???
zucchini-nlp Mar 6, 2026
4f0d5cd
fix some tests
zucchini-nlp Mar 6, 2026
15b4ac3
fix the auto-docstring
zucchini-nlp Mar 6, 2026
bee92b3
Merge remote-tracking branch 'upstream/main' into config-validation
zucchini-nlp Mar 6, 2026
f8711ea
oh pls!
zucchini-nlp Mar 6, 2026
3db960d
Merge branch 'main' into config-validation
zucchini-nlp Mar 11, 2026
4a5baad
lastc fix
zucchini-nlp Mar 6, 2026
124ddc4
repr is false by default
zucchini-nlp Mar 6, 2026
42c1da0
check docstring attr
zucchini-nlp Mar 11, 2026
10ac0b3
fix slow CI
zucchini-nlp Mar 12, 2026
6454e17
fix repo
zucchini-nlp Mar 12, 2026
6922028
merge main
zucchini-nlp Mar 12, 2026
6404f5e
fix style and copies after rebase
zucchini-nlp Mar 12, 2026
f4c6d43
pin 1.5.0
zucchini-nlp Mar 12, 2026
6888cb3
init subclass doesn't help with dataclass decorator, revert
zucchini-nlp Mar 12, 2026
a7bc9cd
style
zucchini-nlp Mar 12, 2026
07095f3
regex replace doesn't always just work, fix!
zucchini-nlp Mar 12, 2026
5106e5b
Fix incorrect default values in config dataclass migration (PR #41250)
ArthurZucker Mar 12, 2026
82fedab
Fix 3 more config default regressions (round 2)
ArthurZucker Mar 12, 2026
86577fb
Fix check-repo: regenerate higgs_audio_v2_tokenizer from modular
ArthurZucker Mar 13, 2026
0ea4586
higgs nit?
ArthurZucker Mar 13, 2026
eaa6f2e
Merge branch 'config-validation' of github.com:zucchini-nlp/transform…
ArthurZucker Mar 13, 2026
7690991
Merge remote-tracking branch 'origin/main' into config-validation
ArthurZucker Mar 13, 2026
78f77a6
fix higgs
zucchini-nlp Mar 16, 2026
145a659
style
zucchini-nlp Mar 16, 2026
ef12271
rebase
zucchini-nlp Mar 16, 2026
dca4158
new models
zucchini-nlp Mar 16, 2026
1c7961c
xcodec is same as higgs, fix
zucchini-nlp Mar 16, 2026
78e7672
forogt
zucchini-nlp Mar 16, 2026
9a2b361
love it when modular complains about newline
zucchini-nlp Mar 16, 2026
801bf4a
fix new models' typing hints
zucchini-nlp Mar 16, 2026
6f96d54
oops, that is a property
zucchini-nlp Mar 16, 2026
cf39486
Merge remote-tracking branch 'upstream/main' into config-validation
zucchini-nlp Mar 16, 2026
b101899
and one more new model just merged
zucchini-nlp Mar 16, 2026
b027d8f
actually, non-dataclass child is really not the way so
zucchini-nlp Mar 16, 2026
81cd131
dont' replace all matches!
zucchini-nlp Mar 16, 2026
125624a
Apply repo consistency fixes
github-actions[bot] Mar 16, 2026
43187f4
Revert "Apply repo consistency fixes"
zucchini-nlp Mar 16, 2026
d31003d
fix repo, would be great to fix this in `style`
zucchini-nlp Mar 16, 2026
d561953
why I cant fix all failures from repo at once
zucchini-nlp Mar 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion examples/modular-transformers/modeling_new_task_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,7 @@ def forward(
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
)
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
return_dict = return_dict if return_dict is not None else self.config.return_dict

# Replace image id with PAD if the image token if OOV, to avoid index-errors
if input_ids is not None and self.config.image_token_id >= self.vocab_size:
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@
"fugashi>=1.0",
"GitPython<3.1.19",
"hf-doc-builder>=0.3.0",
"huggingface-hub>=1.3.0,<2.0",
"huggingface-hub>=1.5.0,<2.0",
"ipadic>=1.0.0,<2.0",
"jinja2>=3.1.0",
"jmespath>=1.0.1",
Expand Down
402 changes: 208 additions & 194 deletions src/transformers/configuration_utils.py

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion src/transformers/dependency_versions_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"fugashi": "fugashi>=1.0",
"GitPython": "GitPython<3.1.19",
"hf-doc-builder": "hf-doc-builder>=0.3.0",
"huggingface-hub": "huggingface-hub>=1.3.0,<2.0",
"huggingface-hub": "huggingface-hub>=1.5.0,<2.0",
"ipadic": "ipadic>=1.0.0,<2.0",
"jinja2": "jinja2>=3.1.0",
"jmespath": "jmespath>=1.0.1",
Expand Down
17 changes: 7 additions & 10 deletions src/transformers/modeling_rope_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,8 +628,9 @@ class RotaryEmbeddingConfigMixin:
"""

default_theta = 10_000.0
ignore_keys_at_rope_validation = set()

def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation: set | None = None, **kwargs):
def convert_rope_params_to_dict(self, **kwargs):
rope_scaling = kwargs.pop("rope_scaling", None)
self.rope_parameters = rope_scaling or self.rope_parameters
self.rope_parameters = self.rope_parameters if self.rope_parameters is not None else {}
Expand All @@ -645,13 +646,9 @@ def convert_rope_params_to_dict(self, ignore_keys_at_rope_validation: set | None
partial_rotary_factor = kwargs.get("partial_rotary_factor", getattr(self, "partial_rotary_factor", None))
if partial_rotary_factor is not None:
self.rope_parameters.setdefault("partial_rotary_factor", partial_rotary_factor)
ignore_keys_at_rope_validation = (
set() if ignore_keys_at_rope_validation is None else set(ignore_keys_at_rope_validation)
)
ignore_keys_at_rope_validation = ignore_keys_at_rope_validation | {"partial_rotary_factor"}
self.ignore_keys_at_rope_validation = self.ignore_keys_at_rope_validation | {"partial_rotary_factor"}

self.standardize_rope_params()
self.validate_rope(ignore_keys=ignore_keys_at_rope_validation)
return kwargs

def standardize_rope_params(self):
Expand Down Expand Up @@ -702,11 +699,11 @@ def standardize_rope_params(self):

self.rope_parameters = rope_parameters

def validate_rope(self: "PreTrainedConfig", ignore_keys: set | None = None):
def validate_rope(self: "PreTrainedConfig"):
"""
Validate the RoPE config arguments, given a `"PreTrainedConfig"` object
"""
rope_parameters_dict = self.rope_parameters
rope_parameters_dict = getattr(self, "rope_parameters", None)
if rope_parameters_dict is None:
return

Expand All @@ -723,7 +720,7 @@ def validate_rope(self: "PreTrainedConfig", ignore_keys: set | None = None):
rope_parameters["rope_type"] = rope_type

if validation_fn is not None:
validation_fn(rope_parameters, ignore_keys=ignore_keys)
validation_fn(rope_parameters, ignore_keys=self.ignore_keys_at_rope_validation)
else:
logger.warning(
f"Missing validation function in 'RotaryEmbeddingConfigMixin' for 'rope_type'='{rope_type}'"
Expand Down Expand Up @@ -942,4 +939,4 @@ def rope_config_validation(config: RotaryEmbeddingConfigMixin, ignore_keys: set
FutureWarning,
)
config.standardize_rope_params()
config.validate_rope(ignore_keys=ignore_keys)
config.validate_rope()
118 changes: 40 additions & 78 deletions src/transformers/models/afmoe/configuration_afmoe.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@
# limitations under the License.
"""AFMoE model configuration"""

from ...configuration_utils import PreTrainedConfig, layer_type_validation
from ...modeling_rope_utils import RopeParameters
from ...utils import auto_docstring, logging

from huggingface_hub.dataclasses import strict

logger = logging.get_logger(__name__)
from ...configuration_utils import PreTrainedConfig
from ...modeling_rope_utils import RopeParameters
from ...utils import auto_docstring


@strict(accept_kwargs=True)
@auto_docstring(
custom_intro="""
AFMoE is an Adaptive Feedforward MoE (Mixture of Experts) model with token-choice routing, shared experts, and a
Expand Down Expand Up @@ -64,85 +64,47 @@ class AfmoeConfig(PreTrainedConfig):
"norm": (["hidden_states"], ["hidden_states"]),
}

def __init__(
self,
vocab_size: int | None = 200192,
hidden_size: int | None = 2048,
intermediate_size: int | None = 6144,
moe_intermediate_size: int | None = 1408,
num_hidden_layers: int | None = 32,
num_dense_layers: int | None = 1,
num_attention_heads: int | None = 16,
num_key_value_heads: int | None = None,
head_dim: int | None = 128,
hidden_act: str | None = "silu",
max_position_embeddings: int | None = 16384,
initializer_range: float | None = 0.02,
rms_norm_eps: float | None = 1e-5,
use_cache: bool | None = True,
tie_word_embeddings: bool | None = False,
rope_theta: float | None = 10000.0,
rope_parameters: RopeParameters | dict[str, RopeParameters] | None = None,
num_experts: int | None = 64,
num_experts_per_tok: int | None = 6,
num_shared_experts: int | None = 2,
route_scale: float | None = 1.0,
global_attn_every_n_layers: int | None = 4,
sliding_window: int | None = 1024,
layer_types: list | None = None,
attention_dropout: float | None = 0.0,
mup_enabled: bool | None = False,
eos_token_id: bool | None = None,
pad_token_id: bool | None = None,
bos_token_id: bool | None = None,
**kwargs,
):
self.vocab_size = vocab_size
self.max_position_embeddings = max_position_embeddings
self.hidden_size = hidden_size
self.intermediate_size = intermediate_size
self.num_hidden_layers = num_hidden_layers
self.num_dense_layers = num_dense_layers
self.num_attention_heads = num_attention_heads
self.head_dim = head_dim
self.hidden_act = hidden_act
self.initializer_range = initializer_range
self.rms_norm_eps = rms_norm_eps
self.use_cache = use_cache
self.rope_theta = rope_theta
self.rope_parameters = rope_parameters

# MoE specific
self.moe_intermediate_size = moe_intermediate_size
self.num_experts_per_tok = num_experts_per_tok
self.num_experts = num_experts
self.num_shared_experts = num_shared_experts
self.route_scale = route_scale
self.attention_bias = False

# Attention specific
self.attention_dropout = attention_dropout
self.global_attn_every_n_layers = global_attn_every_n_layers
self.sliding_window = sliding_window
self.mup_enabled = mup_enabled
self.layer_types = layer_types
vocab_size: int = 200192
hidden_size: int = 2048
intermediate_size: int = 6144
moe_intermediate_size: int = 1408
num_hidden_layers: int = 32
num_dense_layers: int | None = 1
num_attention_heads: int = 16
num_key_value_heads: int | None = None
head_dim: int | None = 128
hidden_act: str = "silu"
max_position_embeddings: int = 16384
initializer_range: float = 0.02
rms_norm_eps: float = 1e-5
use_cache: bool = True
tie_word_embeddings: bool = False
rope_parameters: RopeParameters | dict | None = None
num_experts: int | None = 64
num_experts_per_tok: int | None = 6
num_shared_experts: int | None = 2
route_scale: float | None = 1.0
global_attn_every_n_layers: int | None = 4
sliding_window: int | None = 1024
layer_types: list | None = None
attention_dropout: float | int | None = 0.0
mup_enabled: bool | None = False
eos_token_id: int | list[int] | None = None
pad_token_id: int | None = None
bos_token_id: int | None = None
attention_bias: bool = False

def __post_init__(self, **kwargs):
if self.layer_types is None:
self.layer_types = [
"sliding_attention" if bool((i + 1) % global_attn_every_n_layers) else "full_attention"
"sliding_attention" if bool((i + 1) % self.global_attn_every_n_layers) else "full_attention"
for i in range(self.num_hidden_layers)
]
layer_type_validation(self.layer_types)

if num_key_value_heads is None:
num_key_value_heads = num_attention_heads

self.num_key_value_heads = num_key_value_heads
self.eos_token_id = eos_token_id
self.pad_token_id = pad_token_id
self.bos_token_id = bos_token_id
self.tie_word_embeddings = tie_word_embeddings
if self.num_key_value_heads is None:
self.num_key_value_heads = self.num_attention_heads

super().__init__(**kwargs)
super().__post_init__(**kwargs)


__all__ = ["AfmoeConfig"]
Loading
Loading