Interactive Video Object Segmentation (iVOS) has become an essential task for efficiently obtaining object segmentations in videos, often guided by user inputs like scribbles, clicks, or bounding boxes. In this tutorial, you'll learn how to leverage the video tracking feature of SAM2 on X-AnyLabeling to accomplish iVOS tasks.
Let's get started!
Before you begin, make sure you have the following prerequisites installed:
Step 0: Download and install Miniconda from the official website.
Step 1: Create a new Conda environment with Python version 3.10
or higher, and activate it:
conda create -n x-anylabeling-sam2 python=3.10 -y
conda activate x-anylabeling-sam2
You'll need to install SAM2 first. The code requires torch>=2.3.1
and torchvision>=0.18.1
. Follow the instructions here to install both PyTorch and TorchVision dependencies.
Afterward, you can install SAM2 on a GPU-enabled machine using:
git clone https://github.com/CVHub520/segment-anything-2
cd segment-anything-2
pip install -e .
Finally, install the necessary dependencies for X-AnyLabeling (v2.4.2+):
cd ..
git clone https://github.com/CVHub520/X-AnyLabeling
cd X-AnyLabeling
# For Windows or Linux
pip install -r requirements.txt
# For macOS
pip install -r requirements-macos.txt
conda install -c conda-forge pyqt=5.15.9
Step 0: Launch the app:
python3 anylabeling/app.py
Step 1: Load the SAM 2 Video model
Note: If the model fails to load due to network issues, please refer to the following settings.
First, you'll need to download a model checkpoint. For this tutorial, we'll use the sam2_hiera_large.pt checkpoint as an example.
After downloading, place the checkpoint file in the corresponding model folder within your user directory (create the folder if it doesn't exist):
# Windows
C:\Users\${User}\xanylabeling_data\models\sam2_hiera_large_video-r20240901
# Linux or macOS
~/xanylabeling_data/models/sam2_hiera_large_video-r20240901
Additionally, if you want to use other sizes of SAM2 models or modify the model loading path, refer to this documentation for custom settings: 简体中文 | English.
Step 2: Add a video file (Ctrl + O) or a folder of split video frames (Ctrl + U).
Note
As of now, the supported file formats are limited to [*.jpg, *.jpeg, *.JPG, *.JPEG]. When loading video files, they will be automatically converted to jpg format by default.
Step 0: Add Prompts
add_prompts.mp4
Tip
- Point (q): Add a positive point.
- Point (e): Add a negative point.
- +Rect: Draw a rectangle around the object.
- Clear (b): Erase all added marks.
- Finish Object (f): Confirm the object.
For the initial frame, you can add prompts such as positive points, negative points, and rectangles (Marks) to guide the tracking of the desired object. Follow these steps:
- If the segmentation result meets your expectations, click the
Finish Object (f)
button at the top of the screen or press thef
key to confirm the object. If not, click theClear (b)
button or press theb
key to quickly clear any invalid marks. - Then, you can sequentially assign custom labels and track IDs to each added target.
Warning
If you need to delete a confirmed object, follow these steps:
a. Open the edit mode (Ctrl + J) and remove all added objects from the current frame;
b. Click the Reset Tracker
button at the top of the screen to reset the tracker;
c. Reapply the prompts (Marks) as described above.
Alternatively, if you only want to set up object detection tracking, you simply need to filter the output mode to Rectangle.
Step 1: Propagate the prompts to get the tracklet across the video
Once you've finished setting the prompts, you can start the video tracking by either clicking the video start button on the left-hand menu or using the shortcut Ctrl+M
to get the tracklet throughout the entire video.
Step 2: Add New Prompts to Further Refine the tracklet
After tracking the entire video, if you notice any of the following issues in the middle frames:
- Target is lost
- Imperfections in boundary details
- New objects need to be tracked
You can treat the current frame as the starting frame and follow these steps:
a. Open the edit mode (Ctrl + J
) and remove all added objects from the current frame.
b. Click the Reset Tracker
button at the top of the screen to reset the tracker.
c. Reapply the prompts (Marks) as described earlier.
Then, repeat the steps in Step 0 and Step 1.
After completing all tasks, you can:
- Use the
Tool
->Label Manager
option from the top menu to assign specific class names. - Press
Alt+G
to open the GroupIDs manager and modify the track IDs if needed.
Note
Just a reminder to click the Reset Tracker
button at the top of the screen after uploading a new video file to reset the tracker.
Congratulations! 🎉 You’ve now mastered the basics of X-AnyLabeling. Feel free to experiment with it on your own videos and various use cases!