ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation

Camouflaged object segmentation presents unique challenges compared to traditional segmentation tasks, primarily due to the high similarity in patterns and colors between camouflaged objects and their backgrounds. Effective solutions to this problem have significant implications in critical areas such as pest control, defect detection, and lesion segmentation in medical imaging. Prior research has predominantly emphasized supervised or unsupervised pre-training methods, leaving zero-shot approaches significantly underdeveloped. Existing zero-shot techniques commonly utilize the Segment Anything Model (SAM) in automatic mode or rely on vision-language models to generate cues for segmentation; however, their performances remain unsatisfactory, likely due to the similarity of the camouflaged object and the background. Optical flow, commonly utilized for detecting moving objects, has demonstrated effectiveness even with camouflaged entities. Our method integrates optical flow, a vision-language model, and SAM 2 into a sequential pipeline. Evaluated on the MoCA-Mask dataset, our approach achieves outstanding performance improvements, significantly outperforming existing zero-shot methods by raising the F-measure ($F_\beta^w$) from 0.296 to 0.628. Remarkably, our approach also surpasses supervised methods, increasing the F-measure from 0.476 to 0.628. Additionally, evaluation on the MoCA-Filter dataset demonstrates an increase in the success rate from 0.628 to 0.697 when compared with FlowSAM, a supervised transfer method. A thorough ablation study further validates the individual contributions of each component. More details can be found on https://github.com/weathon/vcos.

Leaderboard

Method Overview

Performance comparison on the MoCA-Mask dataset

"SV Tr" denotes supervised training, and "SV Te" denotes supervised testing, where one frame from the video was provided to the model along with prompts. "ZS" indicates zero-shot learning, while ZS w/ PK means zero-shot with prior knowledge (since the model already knows it is looking for animals). Our method significantly outperforms all zero-shot and even supervised methods.

Method	Pub.	Setting	$S_{\alpha}$	$F_{\beta}^{w}$	MAE
SLT-Net	CVPR 22	SV Tr	0.656	0.357	0.021
ZoomNeXt	TPAMI 24	SV Tr	0.734	0.476	0.010
TSP-SAM(M+B)	CVPR 24	SV Tr	0.689	0.444	0.008
Gao et al.	arXiv 25	SV Tr	0.709	0.451	0.008
-------------------------------------	-----------	--------------	--------------	-----------------	--------
SAM2 Tracking	arXiv 24	SV Te*	0.804	0.691	0.004
-------------------------------------	-----------	--------------	--------------	-----------------	--------
SAM-PM	CVPRW 24	SV Tr+Te*	0.728	0.567	0.009
Finetuned SAM2-T + Prompts	arXiv 24	SV Tr+Te*	0.832	0.726	0.005
-------------------------------------	-----------	--------------	--------------	-----------------	--------
CVP	ACM MM 24	ZS	0.569	0.196	0.031
SAM-2-L Auto	arXiv 24	ZS	0.447	0.198	0.250
LLaVA + SAM2-L	arXiv 24	ZS w/ PK	0.622	0.296	0.047
Shikra + SAM2-L	arXiv 24	ZS w/ PK	0.495	0.132	0.107
Ours	-	ZS w/ PK	0.776	0.628	0.008

Warning: Different methods of calculating IoU can produce inconsistent results. Previous work lacked a standardized evaluation approach, so we did not report IoU for cross-method comparison. For internal comparisons among our methods, we used the SLT-Net evaluation code, which computes IoU per frame, averages across each video, and then averages over the entire dataset.

Setup Instructions

Step 1: Download MoCA-Mask with Precomputed Optical Flow

wget https://zs-vcos.weasoft.com/FMOCA.zip

If the server is down, download from Google Drive. https://drive.google.com/file/d/10D-K2jXZ96BeznXuYcHwom90g6cp_L6Q/view?usp=sharing

Verify file integrity with SHA-512:

eda88bd52daf0b44e20d5c1c545c3f3759e5368c6101a594396f4b1acf3034f812ee7aa19b3eca9203232aa0af922a2d252feec79914b125ccb2d52cf94829cf

Step 2: Download and Install SAM-2

git clone https://github.com/facebookresearch/sam2.git
mv sam2 .sam2
cd .sam2
pip3 install -e .

If installation fails, run:

echo -e '[build-system]\nrequires = [\n    "setuptools>=62.3.0,<75.9",\n    "torch>=2.5.1",\n    ]\nbuild-backend = "setuptools.build_meta"' > pyproject.toml

(See facebookresearch/sam2#611 for more) Then run:

pip3 install -e .

Download the checkpoints:

cd checkpoints
bash download_ckpts.sh

More details: https://github.com/facebookresearch/sam2

Step 3: Configure and Run

Modify run.py to include the following runtime arguments:

--video_name: name of the input video (required)
--log_path: log file output path (default: output.log)
--use_motion_detection: enable motion detection support
--output_dir: output directory for processed video (default: output)
--positive_prompt: prompt to guide object detection (default: "an animal or insect being highlighted in blue")
--threshold: object detection confidence threshold (default: 0.12)
--use_bgs: enable background subtraction
--no_back_tracking: enable forward-only tracking
--momentum: set optical flow momentum (default: 0)
--no_mean_sub: disable mean subtraction in optical flow
--no_negative_prompt: disable negative prompts in VLM
--box_only: use only box prompts for SAM2

Step 4: Evaluation

Open eval/main_MoCa.m, update the file paths to match your local setup, and run the script using MATLAB.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
eval		eval
flow		flow
res		res
webp		webp
README.md		README.md
camera_motion.py		camera_motion.py
compare.png		compare.png
data.ipynb		data.ipynb
davis.py		davis.py
davis2017.py		davis2017.py
flow.png		flow.png
leaderboard.png		leaderboard.png
main.batch.py		main.batch.py
main.old.py		main.old.py
rename.py		rename.py
run.py		run.py
run_davis.py		run_davis.py
run_gpt.py		run_gpt.py
top_boxes.py		top_boxes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation

Leaderboard

Method Overview

Performance comparison on the MoCA-Mask dataset

Setup Instructions

Step 1: Download MoCA-Mask with Precomputed Optical Flow

Step 2: Download and Install SAM-2

Step 3: Configure and Run

Step 4: Evaluation

Testing Visualizations

Arctic Fox – mIoU: 0.842

Arctic Fox 3 – mIoU: 0.787

Black Cat 1 – mIoU: 0.479

Copperhead Snake – mIoU: 0.575

Flower Crab Spider 0 – mIoU: 0.761

Flower Crab Spider 1 – mIoU: 0.783

Flower Crab Spider 2 – mIoU: 0.758

Hedgehog 3 – mIoU: 0.502

Ibex – mIoU: 0.615

Mongoose – mIoU: 0.388

Moth – mIoU: 0.774

Pygmy Seahorse 0 – mIoU: 0.000

Rusty Spotted Cat 0 – mIoU: 0.217

Sand Cat 0 – mIoU: 0.613

Snow Leopard 10 – mIoU: 0.468

Stick Insect 1 – mIoU: 0.246

About

Uh oh!

Releases

Packages

Languages

weathon/vcos

Folders and files

Latest commit

History

Repository files navigation

ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation

Leaderboard

Method Overview

Performance comparison on the MoCA-Mask dataset

Setup Instructions

Step 1: Download MoCA-Mask with Precomputed Optical Flow

Step 2: Download and Install SAM-2

Step 3: Configure and Run

Step 4: Evaluation

Testing Visualizations

Arctic Fox – mIoU: 0.842

Arctic Fox 3 – mIoU: 0.787

Black Cat 1 – mIoU: 0.479

Copperhead Snake – mIoU: 0.575

Flower Crab Spider 0 – mIoU: 0.761

Flower Crab Spider 1 – mIoU: 0.783

Flower Crab Spider 2 – mIoU: 0.758

Hedgehog 3 – mIoU: 0.502

Ibex – mIoU: 0.615

Mongoose – mIoU: 0.388

Moth – mIoU: 0.774

Pygmy Seahorse 0 – mIoU: 0.000

Rusty Spotted Cat 0 – mIoU: 0.217

Sand Cat 0 – mIoU: 0.613

Snow Leopard 10 – mIoU: 0.468

Stick Insect 1 – mIoU: 0.246

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages