Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
18e7172
refactor attention and many other
yonigozlan Oct 13, 2025
a931c26
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Oct 13, 2025
f7536b9
remove return_dict interface
yonigozlan Oct 13, 2025
c016021
improve variable names
yonigozlan Oct 14, 2025
7f21985
use _can_record_outputs and add real support for pixel and queries ma…
yonigozlan Oct 14, 2025
d0494e9
split self attention and cross attention
yonigozlan Oct 14, 2025
2f0752f
nits
yonigozlan Oct 14, 2025
67a007a
standardize mask handling
yonigozlan Oct 16, 2025
eab4e85
update DetrMHAttentionMap
yonigozlan Oct 17, 2025
ac8d387
refactor mlp detr
yonigozlan Oct 17, 2025
b82b172
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Nov 5, 2025
4181e50
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 9, 2026
efb8070
make style
yonigozlan Jan 9, 2026
1fc5fe3
remve outdated tests which used ".bin" checkpoints
yonigozlan Jan 9, 2026
ed595af
Updates modeling_detr to newest standards
yonigozlan Jan 9, 2026
0827d6b
Review + fix detr weight conversion
yonigozlan Jan 10, 2026
9aeded8
replace einsum, reorder
yonigozlan Jan 10, 2026
1d65bd9
refactor rt_detr rt_detr_v2 d_fine to updated library standards
yonigozlan Jan 12, 2026
0d6115c
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 12, 2026
4bc6b5e
fix repo
yonigozlan Jan 12, 2026
e5cf22a
Fix test_reverse_loading_mapping test
yonigozlan Jan 13, 2026
b2419e4
use modular for RT-DETR
yonigozlan Jan 13, 2026
1dddb27
refactor conditional and deformable detr
yonigozlan Jan 14, 2026
534dbd3
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 14, 2026
e11657f
use modular for deformable_detr
yonigozlan Jan 14, 2026
8625827
use modular for conditional_detr
yonigozlan Jan 14, 2026
dd02700
fix repo
yonigozlan Jan 14, 2026
bad9c69
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 14, 2026
6f2b302
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 21, 2026
d2c74bb
refactor DetrMaskHeadSmallConv
yonigozlan Jan 21, 2026
57a318d
fix modular
yonigozlan Jan 21, 2026
d7892e9
Temporarily remove outdated copied from
yonigozlan Jan 21, 2026
0e8d262
fix consistency
yonigozlan Jan 21, 2026
63dc531
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 21, 2026
65d9d1b
Improve DetrMHAttentionMap
yonigozlan Jan 21, 2026
ca317d1
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 21, 2026
3ce2110
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 21, 2026
99b87d4
Fix torch functional import aliases
yonigozlan Jan 21, 2026
4f2fa85
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 22, 2026
52827f4
fix after merge with main
yonigozlan Jan 22, 2026
41a10c2
Fix missing copyrights
yonigozlan Jan 22, 2026
7821c47
Refactor HybridEncoder rt_detr
yonigozlan Jan 24, 2026
90741ba
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Jan 28, 2026
175d4cd
tie weights fix deformable detr
yonigozlan Jan 28, 2026
66ec416
Merge branch 'main' into refactor-detr
yonigozlan Jan 28, 2026
b254841
Merge branch 'main' into refactor-detr
yonigozlan Jan 28, 2026
209f4e6
Merge branch 'main' into refactor-detr
yonigozlan Jan 30, 2026
5b7a3f1
Merge remote-tracking branch 'upstream/main' into refactor-detr
yonigozlan Feb 2, 2026
4fe521b
Refactor pp docs layout + fix fp16 overflow
yonigozlan Feb 2, 2026
08a7308
fix modular
yonigozlan Feb 2, 2026
45a9b62
fix modular
yonigozlan Feb 2, 2026
f15aef7
fix deformable detr tests
yonigozlan Feb 2, 2026
037bbc9
Merge branch 'main' into refactor-detr
yonigozlan Feb 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions src/transformers/conversion_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@
"hunyuan_v1_moe": "qwen2_moe",
"flex_olmo": "qwen2_moe",
"olmoe": "qwen2_moe",
"rt_detr_v2": "rt_detr",
"pp_doclayout_v3": "rt_detr",
}


Expand Down Expand Up @@ -224,6 +226,52 @@ def _build_checkpoint_conversion_mapping():
operations=[ErnieFuseAndSplitTextVisionExperts(stack_dim=0, concat_dim=1)],
),
],
"detr": [
WeightRenaming("backbone.conv_encoder", "backbone"),
WeightRenaming("out_proj", "o_proj"),
WeightRenaming(r"layers.(\d+).fc1", r"layers.\1.mlp.fc1"),
WeightRenaming(r"layers.(\d+).fc2", r"layers.\1.mlp.fc2"),
Comment on lines +230 to +233
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we rename all, we can be pretty liberal with the modifications on detr and the like, that's good. No other keys misnamed or that could cause annoyances?

Copy link
Copy Markdown
Member Author

@yonigozlan yonigozlan Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed! I think the more we refactor, the more we'll see patterns and the more wecan try to standardize weight names, especially as the new weight loaders makes it ok to iterate on this imo and not make all the modifications at once. What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my dream would be all keys identical across all models when relevant hehe. Purely aesthetical though

],
"rt_detr": [
WeightRenaming("out_proj", "o_proj"),
WeightRenaming(r"layers.(\d+).fc1", r"layers.\1.mlp.fc1"),
WeightRenaming(r"layers.(\d+).fc2", r"layers.\1.mlp.fc2"),
WeightRenaming(r"encoder.encoder.(\d+).layers", r"encoder.aifi.\1.layers"),
],
"conditional_detr": [
WeightRenaming("backbone.conv_encoder", "backbone"),
WeightRenaming("self_attn.out_proj", "self_attn.o_proj"),
WeightRenaming("encoder_attn.out_proj", "encoder_attn.o_proj"),
WeightRenaming(r"layers.(\d+).fc1", r"layers.\1.mlp.fc1"),
WeightRenaming(r"layers.(\d+).fc2", r"layers.\1.mlp.fc2"),
# Decoder self-attention projections moved into self_attn module
WeightRenaming(r"decoder.layers.(\d+).sa_qcontent_proj", r"decoder.layers.\1.self_attn.q_content_proj"),
WeightRenaming(r"decoder.layers.(\d+).sa_qpos_proj", r"decoder.layers.\1.self_attn.q_pos_proj"),
WeightRenaming(r"decoder.layers.(\d+).sa_kcontent_proj", r"decoder.layers.\1.self_attn.k_content_proj"),
WeightRenaming(r"decoder.layers.(\d+).sa_kpos_proj", r"decoder.layers.\1.self_attn.k_pos_proj"),
WeightRenaming(r"decoder.layers.(\d+).sa_v_proj", r"decoder.layers.\1.self_attn.v_proj"),
# Decoder cross-attention projections moved into encoder_attn module
WeightRenaming(r"decoder.layers.(\d+).ca_qcontent_proj", r"decoder.layers.\1.encoder_attn.q_content_proj"),
WeightRenaming(r"decoder.layers.(\d+).ca_qpos_proj", r"decoder.layers.\1.encoder_attn.q_pos_proj"),
WeightRenaming(r"decoder.layers.(\d+).ca_kcontent_proj", r"decoder.layers.\1.encoder_attn.k_content_proj"),
WeightRenaming(r"decoder.layers.(\d+).ca_kpos_proj", r"decoder.layers.\1.encoder_attn.k_pos_proj"),
WeightRenaming(r"decoder.layers.(\d+).ca_v_proj", r"decoder.layers.\1.encoder_attn.v_proj"),
WeightRenaming(
r"decoder.layers.(\d+).ca_qpos_sine_proj", r"decoder.layers.\1.encoder_attn.q_pos_sine_proj"
),
],
"deformable_detr": [
WeightRenaming("backbone.conv_encoder", "backbone"),
WeightRenaming("self_attn.out_proj", "self_attn.o_proj"),
WeightRenaming(r"layers.(\d+).fc1", r"layers.\1.mlp.fc1"),
WeightRenaming(r"layers.(\d+).fc2", r"layers.\1.mlp.fc2"),
],
"d_fine": [
WeightRenaming("out_proj", "o_proj"),
WeightRenaming(r"layers.(\d+).fc1", r"layers.\1.mlp.layers.0"),
WeightRenaming(r"layers.(\d+).fc2", r"layers.\1.mlp.layers.1"),
WeightRenaming(r"encoder.encoder.(\d+).layers", r"encoder.aifi.\1.layers"),
],
Comment thread
vasqu marked this conversation as resolved.
"jamba": [
WeightConverter(
source_patterns=[
Expand Down Expand Up @@ -344,6 +392,7 @@ def register_checkpoint_conversion_mapping(
"sam3_tracker_video",
"paddleocrvl",
"ernie4_5_vl_moe",
"detr",
]


Expand Down
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really necessary to touch the conversion scripts? I'd rather we know that the original conversion will work with our current renaming ops

Original file line number Diff line number Diff line change
Expand Up @@ -93,54 +93,92 @@
rename_keys.append((f"transformer.decoder.layers.{i}.norm3.bias", f"decoder.layers.{i}.final_layer_norm.bias"))

# q, k, v projections in self/cross-attention in decoder for conditional DETR
# Self-attention projections moved into self_attn module
rename_keys.append(
(f"transformer.decoder.layers.{i}.sa_qcontent_proj.weight", f"decoder.layers.{i}.sa_qcontent_proj.weight")
(
f"transformer.decoder.layers.{i}.sa_qcontent_proj.weight",
f"decoder.layers.{i}.self_attn.q_content_proj.weight",
)
)
rename_keys.append(
(
f"transformer.decoder.layers.{i}.sa_kcontent_proj.weight",
f"decoder.layers.{i}.self_attn.k_content_proj.weight",
)
)
rename_keys.append(
(f"transformer.decoder.layers.{i}.sa_qpos_proj.weight", f"decoder.layers.{i}.self_attn.q_pos_proj.weight")
)
rename_keys.append(
(f"transformer.decoder.layers.{i}.sa_kcontent_proj.weight", f"decoder.layers.{i}.sa_kcontent_proj.weight")
(f"transformer.decoder.layers.{i}.sa_kpos_proj.weight", f"decoder.layers.{i}.self_attn.k_pos_proj.weight")
)
rename_keys.append(
(f"transformer.decoder.layers.{i}.sa_qpos_proj.weight", f"decoder.layers.{i}.sa_qpos_proj.weight")
(f"transformer.decoder.layers.{i}.sa_v_proj.weight", f"decoder.layers.{i}.self_attn.v_proj.weight")
)
# Cross-attention projections moved into encoder_attn module
rename_keys.append(
(f"transformer.decoder.layers.{i}.sa_kpos_proj.weight", f"decoder.layers.{i}.sa_kpos_proj.weight")
(
f"transformer.decoder.layers.{i}.ca_qcontent_proj.weight",
f"decoder.layers.{i}.encoder_attn.q_content_proj.weight",
)
)
rename_keys.append((f"transformer.decoder.layers.{i}.sa_v_proj.weight", f"decoder.layers.{i}.sa_v_proj.weight"))
# rename_keys.append((f"transformer.decoder.layers.{i}.ca_qpos_proj.weight", f"decoder.layers.{i}.encoder_attn.q_pos_proj.weight"))
rename_keys.append(
(f"transformer.decoder.layers.{i}.ca_qcontent_proj.weight", f"decoder.layers.{i}.ca_qcontent_proj.weight")
(
f"transformer.decoder.layers.{i}.ca_kcontent_proj.weight",
f"decoder.layers.{i}.encoder_attn.k_content_proj.weight",
)
)
# rename_keys.append((f"transformer.decoder.layers.{i}.ca_qpos_proj.weight", f"decoder.layers.{i}.ca_qpos_proj.weight"))
rename_keys.append(
(f"transformer.decoder.layers.{i}.ca_kcontent_proj.weight", f"decoder.layers.{i}.ca_kcontent_proj.weight")
(f"transformer.decoder.layers.{i}.ca_kpos_proj.weight", f"decoder.layers.{i}.encoder_attn.k_pos_proj.weight")
)
rename_keys.append(
(f"transformer.decoder.layers.{i}.ca_kpos_proj.weight", f"decoder.layers.{i}.ca_kpos_proj.weight")
(f"transformer.decoder.layers.{i}.ca_v_proj.weight", f"decoder.layers.{i}.encoder_attn.v_proj.weight")
)
rename_keys.append((f"transformer.decoder.layers.{i}.ca_v_proj.weight", f"decoder.layers.{i}.ca_v_proj.weight"))
rename_keys.append(
(f"transformer.decoder.layers.{i}.ca_qpos_sine_proj.weight", f"decoder.layers.{i}.ca_qpos_sine_proj.weight")
(
f"transformer.decoder.layers.{i}.ca_qpos_sine_proj.weight",
f"decoder.layers.{i}.encoder_attn.q_pos_sine_proj.weight",
)
)

rename_keys.append(
(f"transformer.decoder.layers.{i}.sa_qcontent_proj.bias", f"decoder.layers.{i}.sa_qcontent_proj.bias")
(f"transformer.decoder.layers.{i}.sa_qcontent_proj.bias", f"decoder.layers.{i}.self_attn.q_content_proj.bias")
)
rename_keys.append(
(f"transformer.decoder.layers.{i}.sa_kcontent_proj.bias", f"decoder.layers.{i}.sa_kcontent_proj.bias")
(f"transformer.decoder.layers.{i}.sa_kcontent_proj.bias", f"decoder.layers.{i}.self_attn.k_content_proj.bias")
)
rename_keys.append((f"transformer.decoder.layers.{i}.sa_qpos_proj.bias", f"decoder.layers.{i}.sa_qpos_proj.bias"))
rename_keys.append((f"transformer.decoder.layers.{i}.sa_kpos_proj.bias", f"decoder.layers.{i}.sa_kpos_proj.bias"))
rename_keys.append((f"transformer.decoder.layers.{i}.sa_v_proj.bias", f"decoder.layers.{i}.sa_v_proj.bias"))
rename_keys.append(
(f"transformer.decoder.layers.{i}.ca_qcontent_proj.bias", f"decoder.layers.{i}.ca_qcontent_proj.bias")
(f"transformer.decoder.layers.{i}.sa_qpos_proj.bias", f"decoder.layers.{i}.self_attn.q_pos_proj.bias")
)
# rename_keys.append((f"transformer.decoder.layers.{i}.ca_qpos_proj.bias", f"decoder.layers.{i}.ca_qpos_proj.bias"))
rename_keys.append(
(f"transformer.decoder.layers.{i}.ca_kcontent_proj.bias", f"decoder.layers.{i}.ca_kcontent_proj.bias")
(f"transformer.decoder.layers.{i}.sa_kpos_proj.bias", f"decoder.layers.{i}.self_attn.k_pos_proj.bias")
)
rename_keys.append((f"transformer.decoder.layers.{i}.ca_kpos_proj.bias", f"decoder.layers.{i}.ca_kpos_proj.bias"))
rename_keys.append((f"transformer.decoder.layers.{i}.ca_v_proj.bias", f"decoder.layers.{i}.ca_v_proj.bias"))
rename_keys.append((f"transformer.decoder.layers.{i}.sa_v_proj.bias", f"decoder.layers.{i}.self_attn.v_proj.bias"))
rename_keys.append(
(f"transformer.decoder.layers.{i}.ca_qpos_sine_proj.bias", f"decoder.layers.{i}.ca_qpos_sine_proj.bias")
(
f"transformer.decoder.layers.{i}.ca_qcontent_proj.bias",
f"decoder.layers.{i}.encoder_attn.q_content_proj.bias",
)
)
# rename_keys.append((f"transformer.decoder.layers.{i}.ca_qpos_proj.bias", f"decoder.layers.{i}.encoder_attn.q_pos_proj.bias"))
rename_keys.append(
(
f"transformer.decoder.layers.{i}.ca_kcontent_proj.bias",
f"decoder.layers.{i}.encoder_attn.k_content_proj.bias",
)
)
rename_keys.append(
(f"transformer.decoder.layers.{i}.ca_kpos_proj.bias", f"decoder.layers.{i}.encoder_attn.k_pos_proj.bias")
)
rename_keys.append(
(f"transformer.decoder.layers.{i}.ca_v_proj.bias", f"decoder.layers.{i}.encoder_attn.v_proj.bias")
)
rename_keys.append(
(
f"transformer.decoder.layers.{i}.ca_qpos_sine_proj.bias",
f"decoder.layers.{i}.encoder_attn.q_pos_sine_proj.bias",
)
)

# convolutional projection + query embeddings + layernorm of decoder + class and bounding box heads
Expand Down Expand Up @@ -168,8 +206,8 @@
("transformer.decoder.query_scale.layers.0.bias", "decoder.query_scale.layers.0.bias"),
("transformer.decoder.query_scale.layers.1.weight", "decoder.query_scale.layers.1.weight"),
("transformer.decoder.query_scale.layers.1.bias", "decoder.query_scale.layers.1.bias"),
("transformer.decoder.layers.0.ca_qpos_proj.weight", "decoder.layers.0.ca_qpos_proj.weight"),
("transformer.decoder.layers.0.ca_qpos_proj.bias", "decoder.layers.0.ca_qpos_proj.bias"),
("transformer.decoder.layers.0.ca_qpos_proj.weight", "decoder.layers.0.encoder_attn.q_pos_proj.weight"),
("transformer.decoder.layers.0.ca_qpos_proj.bias", "decoder.layers.0.encoder_attn.q_pos_proj.bias"),
]
)

Expand Down
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format?

Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@
# the file from the modular. If any change should be done, please apply the change to the
# modular_conditional_detr.py file directly. One of our CI enforces this.
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
# Copyright 2022 Microsoft Research Asia and The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import pathlib
from typing import Any, Optional

Expand Down
Loading
Loading