Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed assets/56_fine_tune_segformer/output.png
Binary file not shown.
Binary file removed assets/56_fine_tune_segformer/pizza-scene.png
Binary file not shown.
Binary file removed assets/56_fine_tune_segformer/segformer.png
Binary file not shown.
Binary file removed assets/56_fine_tune_segformer/sidewalk-examples.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed assets/56_fine_tune_segformer/widget-poster.png
Binary file not shown.
Binary file removed assets/56_fine_tune_segformer/widget.mp4
Binary file not shown.
22 changes: 11 additions & 11 deletions fine-tune-segformer.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ authors:

<script async defer src="https://unpkg.com/medium-zoom-element@0/dist/medium-zoom-element.min.js"></script>

<a target="_blank" href="https://colab.research.google.com/drive/1MdkavsjGHYcuGyjmsf9wmeAK3WvtYLty?usp=sharing">
<a target="_blank" href="https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/56_fine_tune_segformer.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Expand All @@ -27,7 +27,7 @@ Because semantic segmentation is a type of classification, the network architect
[SegFormer](https://huggingface.co/docs/transformers/model_doc/segformer) is a model for semantic segmentation introduced by Xie et al. in 2021. It has a hierarchical Transformer encoder that doesn't use positional encodings (in contrast to ViT) and a simple multi-layer perceptron decoder. SegFormer achieves state-of-the-art performance on multiple common datasets. Let's see how our pizza delivery robot performs for sidewalk images.

<figure class="image table text-center m-0 w-full">
<medium-zoom background="rgba(0,0,0,.7)" alt="Pizza delivery robot segmenting a scene" src="assets/56_fine_tune_segformer/pizza-scene.png"></medium-zoom>
<medium-zoom background="rgba(0,0,0,.7)" alt="Pizza delivery robot segmenting a scene" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/56_fine_tune_segformer/pizza-scene.png"></medium-zoom>
</figure>

Let's get started by installing the necessary dependencies. Because we're going to push our dataset and model to the Hugging Face Hub, we need to install [Git LFS](https://git-lfs.github.com/) and log in to Hugging Face.
Expand Down Expand Up @@ -59,7 +59,7 @@ To create your semantic segmentation dataset, you'll need two things:
We went ahead and captured a thousand images of sidewalks in Belgium. Collecting and labeling such a dataset can take a long time, so you can start with a smaller dataset and expand it if the model does not perform well enough.

<figure class="image table text-center m-0 w-full">
<medium-zoom background="rgba(0,0,0,.7)" alt="Example images from the sidewalk dataset" src="assets/56_fine_tune_segformer/sidewalk-examples.png"></medium-zoom>
<medium-zoom background="rgba(0,0,0,.7)" alt="Example images from the sidewalk dataset" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/56_fine_tune_segformer/sidewalk-examples.png"></medium-zoom>
<figcaption>Some examples of the raw images in the sidewalk dataset.</figcaption>
</figure>

Expand All @@ -68,7 +68,7 @@ To obtain segmentation labels, we need to indicate the classes of all the region
### Set up the labeling task on Segments.ai

First, create an account at [https://segments.ai/join](https://segments.ai/join?utm_source=hf&utm_medium=colab&utm_campaign=sem_seg).
Next, create a new dataset and upload your images. You can either do this from the web interface or via the Python SDK (see the [notebook](https://colab.research.google.com/drive/1BImTyBjW3KtvHGVcjGpYYFZdRGXzM3-j?usp=sharing)).
Next, create a new dataset and upload your images. You can either do this from the web interface or via the Python SDK (see the [notebook](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/56_fine_tune_segformer.ipynb)).


### Label the images
Expand All @@ -81,7 +81,7 @@ Now that the raw data is loaded, go to [segments.ai/home](https://segments.ai/ho
style="max-width: 70%; margin: auto;"
autoplay loop autobuffer muted playsinline
>
<source src="assets/56_fine_tune_segformer/sidewalk-labeling-crop.mp4" poster="assets/56_fine_tune_segformer/sidewalk-labeling-crop-poster.png" type="video/mp4">
<source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/56_fine_tune_segformer/sidewalk-labeling-crop.mp4" poster="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/56_fine_tune_segformer/sidewalk-labeling-crop-poster.png" type="video/mp4">
</video>
<figcaption>Tip: when using the superpixel tool, scroll to change the superpixel size, and click and drag to select segments.</figcaption>
</figure>
Expand All @@ -92,7 +92,7 @@ When you're done labeling, create a new dataset release containing the labeled d

Note that creating the release can take a few seconds. You can check the releases tab on Segments.ai to check if your release is still being created.

Now, we'll convert the release to a [Hugging Face dataset](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset) via the Segments.ai Python SDK. If you haven't set up the Segments Python client yet, follow the instructions in the "Set up the labeling task on Segments.ai" section of the [notebook](https://colab.research.google.com/drive/1BImTyBjW3KtvHGVcjGpYYFZdRGXzM3-j#scrollTo=9T2Jr9t9y4HD).
Now, we'll convert the release to a [Hugging Face dataset](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset) via the Segments.ai Python SDK. If you haven't set up the Segments Python client yet, follow the instructions in the "Set up the labeling task on Segments.ai" section of the [notebook](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/56_fine_tune_segformer.ipynb#scrollTo=9T2Jr9t9y4HD).

*Note that the conversion can take a while, depending on the size of your dataset.*

Expand Down Expand Up @@ -192,7 +192,7 @@ repo_id = f"datasets/{hf_dataset_identifier}"
filename = "id2label.json"
id2label = json.load(open(hf_hub_download(repo_id=hf_dataset_identifier, filename=filename, repo_type="dataset"), "r"))
id2label = {int(k): v for k, v in id2label.items()}
label2id = {v: k for k, v in id2label.items()
label2id = {v: k for k, v in id2label.items()}

num_labels = len(id2label)
```
Expand Down Expand Up @@ -237,7 +237,7 @@ test_ds.set_transform(val_transforms)
The SegFormer authors define 5 models with increasing sizes: B0 to B5. The following chart (taken from the original paper) shows the performance of these different models on the ADE20K dataset, compared to other models.

<figure class="image table text-center m-0 w-full">
<medium-zoom background="rgba(0,0,0,.7)" alt="SegFormer model variants compared with other segmentation models" src="assets/56_fine_tune_segformer/segformer.png"></medium-zoom>
<medium-zoom background="rgba(0,0,0,.7)" alt="SegFormer model variants compared with other segmentation models" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/56_fine_tune_segformer/segformer.png"></medium-zoom>
<figcaption><a href="https://arxiv.org/abs/2105.15203">Source</a></figcaption>
</figure>

Expand Down Expand Up @@ -324,7 +324,7 @@ def compute_metrics(eval_pred):
references=labels,
num_labels=len(id2label),
ignore_index=0,
reduce_labels=feature_extractor.reduce_labels,
reduce_labels=feature_extractor.do_reduce_labels,
)

# add per category metrics as individual key-value pairs
Expand Down Expand Up @@ -387,7 +387,7 @@ However, you can also try out your model directly on the Hugging Face Hub, thank
style="max-width: 70%; margin: auto;"
autoplay loop autobuffer muted playsinline
>
<source src="assets/56_fine_tune_segformer/widget.mp4" poster="assets/56_fine_tune_segformer/widget-poster.png" type="video/mp4">
<source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/56_fine_tune_segformer/widget.mp4" poster="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/56_fine_tune_segformer/widget-poster.png" type="video/mp4">
</video>
</figure>

Expand Down Expand Up @@ -438,7 +438,7 @@ pred_seg = upsampled_logits.argmax(dim=1)[0]
Now it's time to display the result. We'll display the result next to the ground-truth mask.

<figure class="image table text-center m-0 w-full">
<medium-zoom background="rgba(1,1,1,1)" alt="SegFormer prediction vs the ground truth" src="assets/56_fine_tune_segformer/output.png"></medium-zoom>
<medium-zoom background="rgba(1,1,1,1)" alt="SegFormer prediction vs the ground truth" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/56_fine_tune_segformer/output.png"></medium-zoom>
</figure>

What do you think? Would you send our pizza delivery robot on the road with this segmentation information?
Expand Down
Loading