Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
f48a47b
remove attributes and add all missing sub processors to their auto cl…
yonigozlan Oct 15, 2025
d5d5c58
remove all mentions of .attributes
yonigozlan Oct 15, 2025
dd505b5
cleanup
yonigozlan Oct 15, 2025
6a1448f
fix processor tests
yonigozlan Oct 15, 2025
a292900
fix modular
yonigozlan Oct 15, 2025
63a255d
remove last attributes
yonigozlan Oct 16, 2025
ef73759
fixup
yonigozlan Oct 16, 2025
b5e8b2e
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Oct 16, 2025
f14ff3c
fixes after merge
yonigozlan Oct 16, 2025
0306430
fix wrong tokenizer in auto florence2
yonigozlan Oct 16, 2025
01cb815
fix missing audio_processor + nits
yonigozlan Oct 17, 2025
49ec906
Override __init__ in NewProcessor and change hf-internal-testing-repo…
yonigozlan Oct 17, 2025
7dd5682
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Oct 17, 2025
946cc5c
fix auto tokenizer test
yonigozlan Oct 17, 2025
b0cb3e0
add init to markup_lm
yonigozlan Oct 17, 2025
3b9e846
update CustomProcessor in custom_processing
yonigozlan Oct 17, 2025
53de7a4
remove print
yonigozlan Oct 17, 2025
93d2c4d
Merge branch 'main' into remove-attributes-from-processors
yonigozlan Oct 17, 2025
feeec28
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Oct 22, 2025
4a6b080
nit
yonigozlan Oct 22, 2025
02402a0
Merge branch 'remove-attributes-from-processors' of https://github.co…
yonigozlan Oct 22, 2025
9204b4c
refactor processor tests first part
yonigozlan Oct 21, 2025
1ed7c56
refactor part 2
yonigozlan Oct 22, 2025
757e1f1
fix test modeling owlv2
yonigozlan Oct 22, 2025
bf763b2
fix test_processing_layoutxlm
yonigozlan Oct 22, 2025
0799a0a
Fix owlv2, wav2vec2, markuplm, voxtral issues
yonigozlan Oct 22, 2025
98ead2c
part3
yonigozlan Oct 23, 2025
59234ee
refactor all processor with mixin
yonigozlan Oct 23, 2025
54bf8e0
Merge branch 'remove-attributes-from-processors' into simplify-proces…
yonigozlan Oct 23, 2025
bf1a4b6
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Oct 31, 2025
e3f130d
add support for loading and saving multiple tokenizer natively
yonigozlan Oct 31, 2025
cc45a7e
remove exclude_attributes from save_pretrained
yonigozlan Oct 31, 2025
3810196
Merge branch 'remove-attributes-from-processors' into simplify-proces…
yonigozlan Oct 31, 2025
34bfc74
get processor from pretrained instead of components in tests
yonigozlan Oct 31, 2025
a0c5c1a
skip tests in colqwen2, pixtral
yonigozlan Oct 31, 2025
8979645
modifs after review
yonigozlan Nov 7, 2025
6cc30f9
Merge remote-tracking branch 'upstream/main' into remove-attributes-f…
yonigozlan Nov 7, 2025
447b598
Merge branch 'remove-attributes-from-processors' into simplify-proces…
yonigozlan Nov 7, 2025
ac72ba2
Merge remote-tracking branch 'upstream/main' into simplify-processor-…
yonigozlan Nov 7, 2025
d5bf14a
fix style and copies
yonigozlan Nov 7, 2025
773342b
Fix after review
yonigozlan Nov 11, 2025
12c854c
Merge remote-tracking branch 'upstream/main' into simplify-processor-…
yonigozlan Nov 11, 2025
12a01fd
Merge remote-tracking branch 'upstream/main' into simplify-processor-…
yonigozlan Nov 24, 2025
7d7c6b2
add test_processor_from_pretrained_vs_from_components, fix failing tests
yonigozlan Nov 24, 2025
fa94bcb
fix overflowing_tokens tests
yonigozlan Nov 24, 2025
74492e5
add config for layoutxlm
yonigozlan Nov 24, 2025
9bd9da1
fix ci
yonigozlan Nov 24, 2025
e4e36d9
use modular
yonigozlan Nov 24, 2025
1fd0cd5
fic docstring
yonigozlan Nov 24, 2025
1c21d90
Standardize mgp_str tests
yonigozlan Nov 25, 2025
d931a2b
Merge remote-tracking branch 'upstream/main' into simplify-processor-…
yonigozlan Nov 25, 2025
572b26d
fix after review
yonigozlan Nov 26, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/en/model_doc/layoutxlm.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,12 @@ data for the model.
As LayoutXLM's architecture is equivalent to that of LayoutLMv2, one can refer to [LayoutLMv2's documentation page](layoutlmv2) for all tips, code examples and notebooks.
</Tip>


## LayoutXLMConfig

[[autodoc]] LayoutXLMConfig


## LayoutXLMTokenizer

[[autodoc]] LayoutXLMTokenizer
Expand Down
8 changes: 7 additions & 1 deletion src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@
("layoutlm", "LayoutLMConfig"),
("layoutlmv2", "LayoutLMv2Config"),
("layoutlmv3", "LayoutLMv3Config"),
("layoutxlm", "LayoutLMv2Config"),
("layoutxlm", "LayoutXLMConfig"),
("led", "LEDConfig"),
("levit", "LevitConfig"),
("lfm2", "Lfm2Config"),
Expand Down Expand Up @@ -915,12 +915,14 @@
[
("audioflamingo3_encoder", "audioflamingo3"),
("openai-gpt", "openai"),
("blip-2", "blip_2"),
Comment thread
yonigozlan marked this conversation as resolved.
("data2vec-audio", "data2vec"),
("data2vec-text", "data2vec"),
("data2vec-vision", "data2vec"),
("donut-swin", "donut"),
("kosmos-2", "kosmos2"),
("kosmos-2.5", "kosmos2_5"),
("omdet-turbo", "omdet_turbo"),
("maskformer-swin", "maskformer"),
("xclip", "x_clip"),
("clip_vision_model", "clip"),
Expand All @@ -936,7 +938,10 @@
("glm4v_moe_vision", "glm4v_moe"),
("glm4v_text", "glm4v"),
("glm4v_moe_text", "glm4v_moe"),
("grounding-dino", "grounding_dino"),
("mm-grounding-dino", "mm_grounding_dino"),
("idefics3_vision", "idefics3"),
("mgp-str", "mgp_str"),
("siglip_vision_model", "siglip"),
("siglip2_vision_model", "siglip2"),
("aimv2_vision_model", "aimv2"),
Expand All @@ -962,6 +967,7 @@
("video_llama_3_vision", "video_llama_3"),
("parakeet_encoder", "parakeet"),
("parakeet_ctc", "parakeet"),
("wav2vec2-bert", "wav2vec2_bert"),
]
)

Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@
("levit", ("LevitImageProcessor", "LevitImageProcessorFast")),
("lfm2_vl", (None, "Lfm2VlImageProcessorFast")),
("lightglue", ("LightGlueImageProcessor", "LightGlueImageProcessorFast")),
("llama4", ("Llama4ImageProcessor", "Llama4ImageProcessorFast")),
("llama4", (None, "Llama4ImageProcessorFast")),
("llava", ("LlavaImageProcessor", "LlavaImageProcessorFast")),
("llava_next", ("LlavaNextImageProcessor", "LlavaNextImageProcessorFast")),
("llava_next_video", ("LlavaNextImageProcessor", "LlavaNextImageProcessorFast")),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@
("kyutai_speech_to_text", "KyutaiSpeechToTextProcessor"),
("layoutlmv2", "LayoutLMv2Processor"),
("layoutlmv3", "LayoutLMv3Processor"),
("layoutxlm", "LayoutXLMProcessor"),
("lfm2_vl", "Lfm2VlProcessor"),
("llama4", "Llama4Processor"),
("llava", "LlavaProcessor"),
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/auto/tokenization_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -393,7 +393,7 @@
("llava", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
("llava_next", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
("llava_next_video", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
("llava_onevision", ("LlamaTokenizer", "LlamaTokenizerFast" if is_tokenizers_available() else None)),
("llava_onevision", ("Qwen2Tokenizer", "Qwen2TokenizerFast" if is_tokenizers_available() else None)),
("longformer", ("LongformerTokenizer", "LongformerTokenizerFast" if is_tokenizers_available() else None)),
(
"longt5",
Expand Down
2 changes: 1 addition & 1 deletion src/transformers/models/edgetam/modeling_edgetam.py
Original file line number Diff line number Diff line change
Expand Up @@ -1103,7 +1103,7 @@ def forward(

>>> # Postprocess masks
>>> masks = processor.post_process_masks(
... outputs.pred_masks, inputs["original_sizes"], inputs["reshaped_input_sizes"]
... outputs.pred_masks, inputs["original_sizes"]
... )
```
"""
Expand Down
8 changes: 8 additions & 0 deletions src/transformers/models/gemma3n/processing_gemma3n.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,5 +147,13 @@ def __call__(
text_inputs["token_type_ids"] = token_type_ids.tolist()
return BatchFeature(data={**text_inputs, **image_inputs, **audio_inputs}, tensor_type=return_tensors)

@property
def model_input_names(self):
tokenizer_input_names = self.tokenizer.model_input_names + ["token_type_ids"]
image_processor_input_names = self.image_processor.model_input_names
audio_processor_input_names = self.feature_extractor.model_input_names
image_processor_input_names = [name for name in image_processor_input_names if name != "num_crops"]
return list(tokenizer_input_names + image_processor_input_names + audio_processor_input_names)


__all__ = ["Gemma3nProcessor"]
2 changes: 0 additions & 2 deletions src/transformers/models/glm46v/modeling_glm46v.py
Original file line number Diff line number Diff line change
Expand Up @@ -562,8 +562,6 @@ def forward(
The temporal, height and width of feature shape of each image in LLM.
video_grid_thw (`torch.LongTensor` of shape `(num_videos, 3)`, *optional*):
The temporal, height and width of feature shape of each video in LLM.
rope_deltas (`torch.LongTensor` of shape `(batch_size, )`, *optional*):
The rope index difference between sequence length and multimodal rope.

Example:

Expand Down
2 changes: 0 additions & 2 deletions src/transformers/models/glm4v/modeling_glm4v.py
Original file line number Diff line number Diff line change
Expand Up @@ -1410,8 +1410,6 @@ def forward(
The temporal, height and width of feature shape of each image in LLM.
video_grid_thw (`torch.LongTensor` of shape `(num_videos, 3)`, *optional*):
The temporal, height and width of feature shape of each video in LLM.
rope_deltas (`torch.LongTensor` of shape `(batch_size, )`, *optional*):
The rope index difference between sequence length and multimodal rope.

Example:

Expand Down
2 changes: 0 additions & 2 deletions src/transformers/models/glm4v/modular_glm4v.py
Original file line number Diff line number Diff line change
Expand Up @@ -1350,8 +1350,6 @@ def forward(
The temporal, height and width of feature shape of each image in LLM.
video_grid_thw (`torch.LongTensor` of shape `(num_videos, 3)`, *optional*):
The temporal, height and width of feature shape of each video in LLM.
rope_deltas (`torch.LongTensor` of shape `(batch_size, )`, *optional*):
The rope index difference between sequence length and multimodal rope.
Comment on lines -1353 to -1354

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks, the repo check complains on the docs order. Just noticed that the arg isn't even in the signature 😆


Example:

Expand Down
2 changes: 0 additions & 2 deletions src/transformers/models/glm4v_moe/modeling_glm4v_moe.py
Original file line number Diff line number Diff line change
Expand Up @@ -1630,8 +1630,6 @@ def forward(
The temporal, height and width of feature shape of each image in LLM.
video_grid_thw (`torch.LongTensor` of shape `(num_videos, 3)`, *optional*):
The temporal, height and width of feature shape of each video in LLM.
rope_deltas (`torch.LongTensor` of shape `(batch_size, )`, *optional*):
The rope index difference between sequence length and multimodal rope.

Example:

Expand Down
12 changes: 8 additions & 4 deletions src/transformers/models/layoutlmv2/configuration_layoutlmv2.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ class LayoutLMv2Config(PreTrainedConfig):
Args:
vocab_size (`int`, *optional*, defaults to 30522):
Vocabulary size of the LayoutLMv2 model. Defines the number of different tokens that can be represented by
the `inputs_ids` passed when calling [`LayoutLMv2Model`] or [`TFLayoutLMv2Model`].
the `inputs_ids` passed when calling [`LayoutLMv2Model`].
hidden_size (`int`, *optional*, defaults to 768):
Dimension of the encoder layers and the pooler layer.
num_hidden_layers (`int`, *optional*, defaults to 12):
Expand All @@ -59,12 +59,13 @@ class LayoutLMv2Config(PreTrainedConfig):
The maximum sequence length that this model might ever be used with. Typically set this to something large
just in case (e.g., 512 or 1024 or 2048).
type_vocab_size (`int`, *optional*, defaults to 2):
The vocabulary size of the `token_type_ids` passed when calling [`LayoutLMv2Model`] or
[`TFLayoutLMv2Model`].
The vocabulary size of the `token_type_ids` passed when calling [`LayoutLMv2Model`].
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
layer_norm_eps (`float`, *optional*, defaults to 1e-12):
The epsilon used by the layer normalization layers.
pad_token_id (`int`, *optional*, defaults to 0):
Padding token id.
max_2d_position_embeddings (`int`, *optional*, defaults to 1024):
The maximum value that the 2D position embedding might ever be used with. Typically set this to something
large just in case (e.g., 1024).
Expand All @@ -78,7 +79,9 @@ class LayoutLMv2Config(PreTrainedConfig):
The maximum number of relative 2D positions in the self-attention mechanism.
rel_2d_pos_bins (`int`, *optional*, defaults to 64):
The number of 2D relative position bins in the self-attention mechanism.
image_feature_pool_shape (`list[int]`, *optional*, defaults to [7, 7, 256]):
convert_sync_batchnorm (`bool`, *optional*, defaults to `True`):
Whether or not to convert batch normalization layers to synchronized batch normalization layers.
image_feature_pool_shape (`list[int]`, *optional*, defaults to `[7, 7, 256]`):
The shape of the average-pooled feature map.
coordinate_size (`int`, *optional*, defaults to 128):
Dimension of the coordinate embeddings.
Expand All @@ -95,6 +98,7 @@ class LayoutLMv2Config(PreTrainedConfig):
file](https://github.com/microsoft/unilm/blob/master/layoutlmft/layoutlmft/models/layoutlmv2/detectron2_config.py)
for details regarding default values.


Example:

```python
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/layoutxlm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@


if TYPE_CHECKING:
from .configuration_layoutxlm import *
from .processing_layoutxlm import *
from .tokenization_layoutxlm import *
from .tokenization_layoutxlm_fast import *
Expand Down
Loading