Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
b591887
add: segformer utils and img. classification.
sayakpaul Jun 28, 2022
a6f8781
add: segmentation layer.
sayakpaul Jun 28, 2022
4ebb62a
feat: working implementation of segformer.
sayakpaul Jun 28, 2022
15c7271
Merge branch 'main' into tf-segformer
sayakpaul Jun 28, 2022
b994e35
chore: remove unused variable.
sayakpaul Jun 28, 2022
d472c38
add test, remaining modifications.
sayakpaul Jun 28, 2022
22db5d0
remove: unnecessary files.
sayakpaul Jun 28, 2022
90629ab
add: rest of the files.
sayakpaul Jun 30, 2022
59e365c
Merge branch 'main' into tf-segformer
sayakpaul Jun 30, 2022
8d913e6
chore: remove ModuleList comment.
sayakpaul Jun 30, 2022
3152f05
chore: apply make style.
sayakpaul Jun 30, 2022
129db92
chore: apply make fixup-copies.
sayakpaul Jun 30, 2022
93fafd4
add to check_repo.py
sayakpaul Jun 30, 2022
eb33e0f
add decode head to IGNORE_NON_TESTED
sayakpaul Jun 30, 2022
48f836f
chore: run make style.
sayakpaul Jun 30, 2022
828960d
chore: PR comments.
sayakpaul Jul 1, 2022
942bec1
chore: minor changes to model doc.
sayakpaul Jul 2, 2022
c5bf93b
tests: reduction across samples.
sayakpaul Jul 2, 2022
a641451
add a note on the space.
sayakpaul Jul 3, 2022
6f59aa0
Merge branch 'main' into tf-segformer
sayakpaul Jul 3, 2022
4770b5d
Merge branch 'main' into tf-segformer
sayakpaul Jul 5, 2022
a9f7ec8
sort importats.
sayakpaul Jul 5, 2022
d414f24
fix: reduction in loss computation.
sayakpaul Jul 5, 2022
4d7f5a1
chore: align loss function with that of NER.
sayakpaul Jul 6, 2022
ac49cef
chore: correct utils/documentation_tests.txt
sayakpaul Jul 7, 2022
ba93bb4
chore: simplify the interpolation of logits in loss computation.
sayakpaul Jul 8, 2022
6c97e8d
chore: return transposed logits when return_dict=False.
sayakpaul Jul 8, 2022
754b145
Merge branch 'main' into tf-segformer
sayakpaul Jul 13, 2022
4c85484
chore: add link to the tf fine-tuning repo.
sayakpaul Jul 14, 2022
4afb097
address pr comments.
sayakpaul Jul 18, 2022
05edc7b
Merge branch 'main' into tf-segformer
sayakpaul Jul 18, 2022
59c530e
Merge branch 'main' into tf-segformer
sayakpaul Jul 19, 2022
8dd8b46
address niels's comments.
sayakpaul Jul 20, 2022
52affaa
remove from_pt=True since tf weights are in.
sayakpaul Jul 20, 2022
4a41bdc
remove comment from pt model.
sayakpaul Jul 20, 2022
9c1584c
address niels's comments.
sayakpaul Jul 20, 2022
ebaec84
Merge branch 'main' into tf-segformer
sayakpaul Jul 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/en/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ Flax), PyTorch, and/or TensorFlow.
| RetriBERT | ✅ | ✅ | ✅ | ❌ | ❌ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ |
| RoFormer | ✅ | ✅ | ✅ | ✅ | ✅ |
| SegFormer | ❌ | ❌ | ✅ | | ❌ |
| SegFormer | ❌ | ❌ | ✅ | | ❌ |
| SEW | ❌ | ❌ | ✅ | ❌ | ❌ |
| SEW-D | ❌ | ❌ | ✅ | ❌ | ❌ |
| Speech Encoder decoder | ❌ | ❌ | ✅ | ❌ | ✅ |
Expand Down
37 changes: 33 additions & 4 deletions docs/source/en/model_doc/segformer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,14 @@ The figure below illustrates the architecture of SegFormer. Taken from the [orig

<img width="600" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/segformer_architecture.png"/>

This model was contributed by [nielsr](https://huggingface.co/nielsr). The original code can be found [here](https://github.com/NVlabs/SegFormer).
This model was contributed by [nielsr](https://huggingface.co/nielsr). The TensorFlow version
of the model was contributed by [sayakpaul](https://huggingface.co/sayakpaul). The original code can be found [here](https://github.com/NVlabs/SegFormer).

Tips:

- SegFormer consists of a hierarchical Transformer encoder, and a lightweight all-MLP decode head.
- SegFormer consists of a hierarchical Transformer encoder, and a lightweight all-MLP decoder head.
[`SegformerModel`] is the hierarchical Transformer encoder (which in the paper is also referred to
as Mix Transformer or MiT). [`SegformerForSemanticSegmentation`] adds the all-MLP decode head on
as Mix Transformer or MiT). [`SegformerForSemanticSegmentation`] adds the all-MLP decoder head on
top to perform semantic segmentation of images. In addition, there's
[`SegformerForImageClassification`] which can be used to - you guessed it - classify images. The
authors of SegFormer first pre-trained the Transformer encoder on ImageNet-1k to classify images. Next, they throw
Expand All @@ -51,6 +52,9 @@ Tips:
found on the [hub](https://huggingface.co/models?other=segformer).
- The quickest way to get started with SegFormer is by checking the [example notebooks](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SegFormer) (which showcase both inference and
fine-tuning on custom data). One can also check out the [blog post](https://huggingface.co/blog/fine-tune-segformer) introducing SegFormer and illustrating how it can be fine-tuned on custom data.
- TensorFlow users should refer to [this repository](https://github.com/deep-diver/segformer-tf-transformers) that shows off-the-shelf inference and fine-tuning.
- One can also check out [this interactive demo on Hugging Face Spaces](https://huggingface.co/spaces/chansung/segformer-tf-transformers)
to try out a SegFormer model on custom images.
- SegFormer works on any input size, as it pads the input to be divisible by `config.patch_sizes`.
- One can use [`SegformerFeatureExtractor`] to prepare images and corresponding segmentation maps
for the model. Note that this feature extractor is fairly basic and does not include all data augmentations used in
Expand All @@ -65,7 +69,8 @@ Tips:
used by [`SegformerForSemanticSegmentation`]). However, other datasets use the 0 index as
background class and include this class as part of all labels. In that case, `reduce_labels` should be set to
`False`, as loss should also be computed for the background class.
- As most models, SegFormer comes in different sizes, the details of which can be found in the table below.
- As most models, SegFormer comes in different sizes, the details of which can be found in the table below
(taken from Table 7 of the [original paper](https://arxiv.org/abs/2105.15203)).

| **Model variant** | **Depths** | **Hidden sizes** | **Decoder hidden size** | **Params (M)** | **ImageNet-1k Top 1** |
| :---------------: | ------------- | ------------------- | :---------------------: | :------------: | :-------------------: |
Expand All @@ -76,6 +81,10 @@ Tips:
| MiT-b4 | [3, 8, 27, 3] | [64, 128, 320, 512] | 768 | 62.6 | 83.6 |
| MiT-b5 | [3, 6, 40, 3] | [64, 128, 320, 512] | 768 | 82.0 | 83.8 |

Note that MiT in the above table refers to the Mix Transformer encoder backbone introduced in SegFormer. For
SegFormer's results on the segmentation datasets like ADE20k, refer to the [paper](https://arxiv.org/abs/2105.15203).


## SegformerConfig

[[autodoc]] SegformerConfig
Expand Down Expand Up @@ -104,3 +113,23 @@ Tips:

[[autodoc]] SegformerForSemanticSegmentation
- forward

## TFSegformerDecodeHead

[[autodoc]] TFSegformerDecodeHead
- call

## TFSegformerModel

[[autodoc]] TFSegformerModel
- call

## TFSegformerForImageClassification

[[autodoc]] TFSegformerForImageClassification
- call

## TFSegformerForSemanticSegmentation

[[autodoc]] TFSegformerForSemanticSegmentation
- call
18 changes: 18 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2430,6 +2430,16 @@
"TFRoFormerPreTrainedModel",
]
)
_import_structure["models.segformer"].extend(
[
"TF_SEGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST",
"TFSegformerDecodeHead",
"TFSegformerForImageClassification",
"TFSegformerForSemanticSegmentation",
"TFSegformerModel",
"TFSegformerPreTrainedModel",
]
)
_import_structure["models.speech_to_text"].extend(
[
"TF_SPEECH_TO_TEXT_PRETRAINED_MODEL_ARCHIVE_LIST",
Expand Down Expand Up @@ -4789,6 +4799,14 @@
TFRoFormerModel,
TFRoFormerPreTrainedModel,
)
from .models.segformer import (
TF_SEGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
TFSegformerDecodeHead,
TFSegformerForImageClassification,
TFSegformerForSemanticSegmentation,
TFSegformerModel,
TFSegformerPreTrainedModel,
)
from .models.speech_to_text import (
TF_SPEECH_TO_TEXT_PRETRAINED_MODEL_ARCHIVE_LIST,
TFSpeech2TextForConditionalGeneration,
Expand Down
3 changes: 3 additions & 0 deletions src/transformers/models/auto/modeling_tf_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@
("resnet", "TFResNetModel"),
("roberta", "TFRobertaModel"),
("roformer", "TFRoFormerModel"),
("segformer", "TFSegformerModel"),
("speech_to_text", "TFSpeech2TextModel"),
("swin", "TFSwinModel"),
("t5", "TFT5Model"),
Expand Down Expand Up @@ -180,6 +181,7 @@
("deit", ("TFDeiTForImageClassification", "TFDeiTForImageClassificationWithTeacher")),
("regnet", "TFRegNetForImageClassification"),
("resnet", "TFResNetForImageClassification"),
("segformer", "TFSegformerForImageClassification"),
("swin", "TFSwinForImageClassification"),
("vit", "TFViTForImageClassification"),
]
Expand All @@ -189,6 +191,7 @@
[
# Model for Semantic Segmentation mapping
("data2vec-vision", "TFData2VecVisionForSemanticSegmentation"),
("segformer", "TFSegformerForSemanticSegmentation"),
]
)

Expand Down
38 changes: 36 additions & 2 deletions src/transformers/models/segformer/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,13 @@
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_tf_available,
is_torch_available,
is_vision_available,
)


_import_structure = {"configuration_segformer": ["SEGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP", "SegformerConfig"]}
Expand Down Expand Up @@ -46,6 +52,21 @@
"SegformerPreTrainedModel",
]

try:
if not is_tf_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_tf_segformer"] = [
"TF_SEGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST",
"TFSegformerDecodeHead",
"TFSegformerForImageClassification",
"TFSegformerForSemanticSegmentation",
"TFSegformerModel",
"TFSegformerPreTrainedModel",
]


if TYPE_CHECKING:
from .configuration_segformer import SEGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, SegformerConfig
Expand Down Expand Up @@ -73,7 +94,20 @@
SegformerModel,
SegformerPreTrainedModel,
)

try:
if not is_tf_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_tf_segformer import (
TF_SEGFORMER_PRETRAINED_MODEL_ARCHIVE_LIST,
TFSegformerDecodeHead,
TFSegformerForImageClassification,
TFSegformerForSemanticSegmentation,
TFSegformerModel,
TFSegformerPreTrainedModel,
)

else:
import sys
Expand Down
4 changes: 3 additions & 1 deletion src/transformers/models/segformer/modeling_segformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -785,6 +785,8 @@ def forward(
>>> inputs = feature_extractor(images=image, return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits # shape (batch_size, num_labels, height, width)
>>> logits.shape
(1, 150, 128, 128)
```"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
output_hidden_states = (
Expand All @@ -804,7 +806,7 @@ def forward(

loss = None
if labels is not None:
if self.config.num_labels == 1:
if not self.config.num_labels > 1:
raise ValueError("The number of labels should be greater than one")
else:
# upsample logits to the images' original size
Expand Down
Loading