Models with incorrect tokenizer_class in tokenization_config.json tha… #44179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

ArthurZucker merged 1 commit into main from bad_models

Feb 23, 2026

+27 −0

src/transformers/models/auto/tokenization_auto.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -185,6 +185,12 @@ @@
             ("megatron-bert", "BertTokenizer" if is_tokenizers_available() else None),
             ("metaclip_2", "XLMRobertaTokenizer" if is_tokenizers_available() else None),
             ("mgp-str", "MgpstrTokenizer"),
+            (
+                "ministral",
+                "MistralCommonBackend"
+                if is_mistral_common_available()
+                else ("TokenizersBackend" if is_tokenizers_available() else None),
+            ),
             (
                 "ministral3",
                 "MistralCommonBackend"
@@ Expand Down Expand Up / @@ -331,6 +337,27 @@ @@
         ]
     )
+    # Models with incorrect tokenizer_class in their Hub tokenizer_config.json files.
+    # These models will be forced to use TokenizersBackend.
+    MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS: set[str] = {
+        "arctic",
+        "deepseek_vl",
+        "deepseek_vl_hybrid",
+        "hyperclovax_vlm",
+        "janus",
+        "jamba",
+        "llava",
+        "llava_next",
+        "opencua",
+        "phi3",
+        "step3p5",
+        "vipllava",
+    }
+    for model_type in MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS:
+        if model_type not in TOKENIZER_MAPPING_NAMES:
+            TOKENIZER_MAPPING_NAMES[model_type] = "TokenizersBackend" if is_tokenizers_available() else None
     TOKENIZER_MAPPING = _LazyAutoMapping(CONFIG_MAPPING_NAMES, TOKENIZER_MAPPING_NAMES)
     CONFIG_TO_TYPE = {v: k for k, v in CONFIG_MAPPING_NAMES.items()}
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models with incorrect tokenizer_class in tokenization_config.json tha… #44179

Diff view

Diff view

There are no files selected for viewing

ArthurZucker Feb 20, 2026

Uh oh!

Models with incorrect tokenizer_class in tokenization_config.json tha… #44179

Models with incorrect tokenizer_class in tokenization_config.json tha… #44179

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

ArthurZucker Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!