huggingface · zucchini-nlp · Feb 4, 2026 · Jan 16, 2026 · Jan 23, 2026 · Jan 23, 2026
diff --git a/docs/source/en/backbones.md b/docs/source/en/backbones.md
@@ -36,8 +36,8 @@ This guide describes the backbone class, backbones from the [timm](https://hf.co
 
 There are two backbone classes.
 
-- [`~transformers.utils.BackboneMixin`] allows you to load a backbone and includes functions for extracting the feature maps and indices.
-- [`~transformers.utils.BackboneConfigMixin`] allows you to set the feature map and indices of a backbone configuration.
+- [`~transformers.utils.BackboneMixin`] allows you to load a backbone and includes functions for extracting the feature maps and indices from config.
+- [`~transformers.utils.BackboneConfigMixin`] allows you to set, align and verify the feature map and indices of a backbone configuration.
 
 Refer to the [Backbone](./main_classes/backbones) API documentation to check which models support a backbone.
 
@@ -69,12 +69,13 @@ When you know a model supports a backbone, you can load the backbone and neck di
 
 The example below loads a [ResNet](./model_doc/resnet) backbone and neck for use in a [MaskFormer](./model_doc/maskformer) instance segmentation head.
 
-Set `backbone` to a pretrained model and  `use_pretrained_backbone=True` to use pretrained weights instead of randomly initialized weights.
+Note that initializing from config will create the model with random weights. If you want to load a pretrained model, use `from_pretrained` API.
 
 ```py
 from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation
 
-config = MaskFormerConfig(backbone="microsoft/resnet-50", use_pretrained_backbone=True)
+backbone_config = AutoConfig.from_pretrained("microsoft/resnet-50")
+config = MaskFormerConfig(backbone_config=backbone_config)
 model = MaskFormerForInstanceSegmentation(config)
 ```
 
@@ -96,14 +97,13 @@ model = MaskFormerForInstanceSegmentation(config)
 
 ## timm backbones
 
-[timm](https://hf.co/docs/timm/index) is a collection of vision models for training and inference. Transformers supports timm models as backbones with the [`TimmBackbone`] and [`TimmBackboneConfig`] classes.
-
-Set `use_timm_backbone=True` to load pretrained timm weights, and `use_pretrained_backbone` to use pretrained or randomly initialized weights.
+[timm](https://hf.co/docs/timm/index) is a collection of vision models for training and inference. Transformers supports timm models as backbones with the [`TimmBackbone`] and [`TimmBackboneConfig`] classes. Set the neccessary backbone checkpoint in `backbone` to create a model with Timm backbone with randomly initialized weights.
 
 ```py
 from transformers import MaskFormerConfig, MaskFormerForInstanceSegmentation
 
-config = MaskFormerConfig(backbone="resnet50", use_timm_backbone=True, use_pretrained_backbone=True)
+backbone_config = TimmBackboneConfig(backbone="resnet50", out_indices=[-1])
+config = MaskFormerConfig(backbone_config=backbone_config)
 model = MaskFormerForInstanceSegmentation(config)
 ```
 
@@ -112,7 +112,7 @@ You could also explicitly call the [`TimmBackboneConfig`] class to load and crea
 ```py
 from transformers import TimmBackboneConfig
 
-backbone_config = TimmBackboneConfig("resnet50", use_pretrained_backbone=True)
+backbone_config = TimmBackboneConfig("resnet50")
 ```
 
 Pass the backbone configuration to the model configuration and instantiate the model head, [`MaskFormerForInstanceSegmentation`], with the backbone.

diff --git a/docs/source/en/main_classes/backbones.md b/docs/source/en/main_classes/backbones.md
@@ -18,8 +18,8 @@ rendered properly in your Markdown viewer.
 
 A backbone is a model used for feature extraction for higher level computer vision tasks such as object detection and image classification. Transformers provides an [`AutoBackbone`] class for initializing a Transformers backbone from pretrained model weights, and two utility classes:
 
-* [`~utils.BackboneMixin`] enables initializing a backbone from Transformers or [timm](https://hf.co/docs/timm/index) and includes functions for returning the output features and indices.
-* [`~utils.BackboneConfigMixin`] sets the output features and indices of the backbone configuration.
+* [`~backbone_utils.BackboneMixin`] enables initializing a backbone from Transformers or [timm](https://hf.co/docs/timm/index) and includes functions for returning the output features and indices.
+* [`~backbone_utils.BackboneConfigMixin`] sets the output features and indices of the backbone configuration.
 
 [timm](https://hf.co/docs/timm/index) models are loaded with the [`TimmBackbone`] and [`TimmBackboneConfig`] classes.
 
@@ -45,11 +45,11 @@ Backbones are supported for the following models:
 
 ## BackboneMixin
 
-[[autodoc]] utils.BackboneMixin
+[[autodoc]] backbone_utils.BackboneMixin
 
 ## BackboneConfigMixin
 
-[[autodoc]] utils.BackboneConfigMixin
+[[autodoc]] backbone_utils.BackboneConfigMixin
 
 ## TimmBackbone
 

diff --git a/docs/source/en/model_doc/dab-detr.md b/docs/source/en/model_doc/dab-detr.md
@@ -110,7 +110,7 @@ Option 2: Instantiate DAB-DETR with randomly initialized weights for Transformer
 Option 3: Instantiate DAB-DETR with randomly initialized weights for backbone + Transformer
 
 ```py
->>> config = DabDetrConfig(use_pretrained_backbone=False)
+>>> config = DabDetrConfig()
 >>> model = DabDetrForObjectDetection(config)
 ```
 

diff --git a/docs/source/en/model_doc/detr.md b/docs/source/en/model_doc/detr.md
@@ -132,7 +132,7 @@ model = DetrForObjectDetection(config)
 - Option 3: Instantiate DETR with randomly initialized weights for backbone + Transformer
 
 ```python
-config = DetrConfig(use_pretrained_backbone=False)
+config = DetrConfig()
 model = DetrForObjectDetection(config)
 ```
 

diff --git a/docs/source/en/model_doc/pvt_v2.md b/docs/source/en/model_doc/pvt_v2.md
@@ -64,7 +64,7 @@ processed = image_processor(image)
 outputs = model(torch.tensor(processed["pixel_values"]))
 ```
 
-To use the PVTv2 as a backbone for more complex architectures like DeformableDETR, you can use AutoBackbone (this model would need fine-tuning as you're replacing the backbone in the pretrained model):
+To use the PVTv2 as a backbone for more complex architectures like DeformableDETR, you can use AutoBackbone (this model would need fine-tuning as you're replacing the backbone in the pretrained model and it is initialized with random weights):
 
 ```python
 import requests
@@ -77,7 +77,6 @@ model = AutoModelForObjectDetection.from_config(
     config=AutoConfig.from_pretrained(
         "SenseTime/deformable-detr",
         backbone_config=AutoConfig.from_pretrained("OpenGVLab/pvt_v2_b5"),
-        use_timm_backbone=False
     ),
 )
 

diff --git a/docs/source/en/tasks/training_vision_backbone.md b/docs/source/en/tasks/training_vision_backbone.md
@@ -38,13 +38,18 @@ Initialize [`DetrConfig`] with the pre-trained DINOv3 ConvNext backbone. Use `nu
 ```py
 from transformers import DetrConfig, DetrForObjectDetection, AutoImageProcessor
 
-config = DetrConfig(backbone="facebook/dinov3-convnext-large-pretrain-lvd1689m",
-                    use_pretrained_backbone=True, use_timm_backbone=False,
+# Create a model with randomly initialized weights
+backbone_config = AutoConfig.from_pretrained("facebook/dinov3-convnext-large-pretrain-lvd1689m")
+backbone = AutoBackbone.from_pretrained("facebook/dinov3-convnext-large-pretrain-lvd1689m")
+
+config = DetrConfig(backbone_config=backbone_config,
                     num_labels=1, id2label={0: "license_plate"}, label2id={"license_plate": 0})
 model = DetrForObjectDetection(config)
 
-for param in model.model.backbone.parameters():
-    param.requires_grad = False
+# Assign pretrained backbone checkpoint and freeze the weights
+model.model.backbone = backbone
+model.model.freeze_backbone()
+
 image_processor = AutoImageProcessor.from_pretrained("facebook/detr-resnet-50")
 ```
 

diff --git a/docs/source/ja/model_doc/detr.md b/docs/source/ja/model_doc/detr.md
@@ -140,7 +140,7 @@ DETR モデルをインスタンス化するには 3 つの方法があります
 オプション 3: バックボーン + トランスフォーマーのランダムに初期化された重みを使用して DETR をインスタンス化します。
 
 ```py
->>> config = DetrConfig(use_pretrained_backbone=False)
+>>> config = DetrConfig()
 >>> model = DetrForObjectDetection(config)
 ```
 

diff --git a/examples/modular-transformers/modeling_test_detr.py b/examples/modular-transformers/modeling_test_detr.py
@@ -15,14 +15,14 @@
 
 from ... import initialization as init
 from ...activations import ACT2FN
+from ...backbone_utils import load_backbone
 from ...integrations import use_kernel_forward_from_hub
 from ...modeling_attn_mask_utils import _prepare_4d_attention_mask
 from ...modeling_layers import GradientCheckpointingLayer
 from ...modeling_outputs import BaseModelOutput
 from ...modeling_utils import PreTrainedModel
 from ...pytorch_utils import meshgrid
 from ...utils import ModelOutput, auto_docstring, is_timm_available, requires_backends, torch_compilable_check
-from ...utils.backbone_utils import load_backbone
 from .configuration_test_detr import TestDetrConfig
 
 

diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py
@@ -439,6 +439,7 @@
     _import_structure["modeling_flash_attention_utils"] = []
     _import_structure["modeling_layers"] = ["GradientCheckpointingLayer"]
     _import_structure["modeling_outputs"] = []
+    _import_structure["backbone_utils"] = ["BackboneConfigMixin", "BackboneMixin"]
     _import_structure["modeling_rope_utils"] = ["ROPE_INIT_FUNCTIONS", "dynamic_rope_update", "RopeParameters"]
     _import_structure["modeling_utils"] = ["PreTrainedModel", "AttentionInterface"]
     _import_structure["masking_utils"] = ["AttentionMaskInterface"]
@@ -467,6 +468,8 @@
 # Direct imports for type-checking
 if TYPE_CHECKING:
     # All modeling imports
+    # Models
+    from .backbone_utils import BackboneConfigMixin, BackboneMixin
     from .cache_utils import Cache as Cache
     from .cache_utils import DynamicCache as DynamicCache
     from .cache_utils import DynamicLayer as DynamicLayer
@@ -609,8 +612,6 @@
     from .integrations.executorch import convert_and_export_with_cache as convert_and_export_with_cache
     from .masking_utils import AttentionMaskInterface as AttentionMaskInterface
     from .model_debugging_utils import model_addition_debugger_context as model_addition_debugger_context
-
-    # Models
     from .modeling_layers import GradientCheckpointingLayer as GradientCheckpointingLayer
     from .modeling_rope_utils import ROPE_INIT_FUNCTIONS as ROPE_INIT_FUNCTIONS
     from .modeling_rope_utils import RopeParameters as RopeParameters