Camouflaged object segmentation presents unique challenges compared to traditional segmentation tasks, primarily due to the high similarity in patterns and colors between camouflaged objects and their backgrounds. Effective solutions to this problem have significant implications in critical areas such as pest control, defect detection, and lesion segmentation in medical imaging. Prior research has predominantly emphasized supervised or unsupervised pre-training methods, leaving zero-shot approaches significantly underdeveloped. Existing zero-shot techniques commonly utilize the Segment Anything Model (SAM) in automatic mode or rely on vision-language models to generate cues for segmentation; however, their performances remain unsatisfactory, likely due to the similarity of the camouflaged object and the background. Optical flow, commonly utilized for detecting moving objects, has demonstrated effectiveness even with camouflaged entities. Our method integrates optical flow, a vision-language model, and SAM 2 into a sequential pipeline. Evaluated on the MoCA-Mask dataset, our approach achieves outstanding performance improvements, significantly outperforming existing zero-shot methods by raising the F-measure (
"SV Tr" denotes supervised training, and "SV Te" denotes supervised testing, where one frame from the video was provided to the model along with prompts. "ZS" indicates zero-shot learning, while ZS w/ PK means zero-shot with prior knowledge (since the model already knows it is looking for animals). Our method significantly outperforms all zero-shot and even supervised methods.
Method | Pub. | Setting | MAE | ||
---|---|---|---|---|---|
SLT-Net | CVPR 22 | SV Tr | 0.656 | 0.357 | 0.021 |
ZoomNeXt | TPAMI 24 | SV Tr | 0.734 | 0.476 | 0.010 |
TSP-SAM(M+B) | CVPR 24 | SV Tr | 0.689 | 0.444 | 0.008 |
Gao et al. | arXiv 25 | SV Tr | 0.709 | 0.451 | 0.008 |
------------------------------------- | ----------- | -------------- | -------------- | ----------------- | -------- |
SAM2 Tracking | arXiv 24 | SV Te* | 0.804 | 0.691 | 0.004 |
------------------------------------- | ----------- | -------------- | -------------- | ----------------- | -------- |
SAM-PM | CVPRW 24 | SV Tr+Te* | 0.728 | 0.567 | 0.009 |
Finetuned SAM2-T + Prompts | arXiv 24 | SV Tr+Te* | 0.832 | 0.726 | 0.005 |
------------------------------------- | ----------- | -------------- | -------------- | ----------------- | -------- |
CVP | ACM MM 24 | ZS | 0.569 | 0.196 | 0.031 |
SAM-2-L Auto | arXiv 24 | ZS | 0.447 | 0.198 | 0.250 |
LLaVA + SAM2-L | arXiv 24 | ZS w/ PK | 0.622 | 0.296 | 0.047 |
Shikra + SAM2-L | arXiv 24 | ZS w/ PK | 0.495 | 0.132 | 0.107 |
Ours | - | ZS w/ PK | 0.776 | 0.628 | 0.008 |
Warning: Different methods of calculating IoU can produce inconsistent results. Previous work lacked a standardized evaluation approach, so we did not report IoU for cross-method comparison. For internal comparisons among our methods, we used the SLT-Net evaluation code, which computes IoU per frame, averages across each video, and then averages over the entire dataset.
wget https://zs-vcos.weasoft.com/FMOCA.zip
If the server is down, download from Google Drive. https://drive.google.com/file/d/10D-K2jXZ96BeznXuYcHwom90g6cp_L6Q/view?usp=sharing
Verify file integrity with SHA-512:
eda88bd52daf0b44e20d5c1c545c3f3759e5368c6101a594396f4b1acf3034f812ee7aa19b3eca9203232aa0af922a2d252feec79914b125ccb2d52cf94829cf
git clone https://github.com/facebookresearch/sam2.git
mv sam2 .sam2
cd .sam2
pip3 install -e .
If installation fails, run:
echo -e '[build-system]\nrequires = [\n "setuptools>=62.3.0,<75.9",\n "torch>=2.5.1",\n ]\nbuild-backend = "setuptools.build_meta"' > pyproject.toml
(See facebookresearch/sam2#611 for more) Then run:
pip3 install -e .
Download the checkpoints:
cd checkpoints
bash download_ckpts.sh
More details: https://github.com/facebookresearch/sam2
Modify run.py
to include the following runtime arguments:
--video_name
: name of the input video (required)--log_path
: log file output path (default:output.log
)--use_motion_detection
: enable motion detection support--output_dir
: output directory for processed video (default:output
)--positive_prompt
: prompt to guide object detection (default: "an animal or insect being highlighted in blue")--threshold
: object detection confidence threshold (default:0.12
)--use_bgs
: enable background subtraction--no_back_tracking
: enable forward-only tracking--momentum
: set optical flow momentum (default:0
)--no_mean_sub
: disable mean subtraction in optical flow--no_negative_prompt
: disable negative prompts in VLM--box_only
: use only box prompts for SAM2
Open eval/main_MoCa.m
, update the file paths to match your local setup, and run the script using MATLAB.
For questions, contact: [email protected]