Skip to content

weathon/vcos

Repository files navigation

ZS-VCOS: Zero-Shot Outperforms Supervised Video Camouflaged Object Segmentation

PWC

Preprint

Camouflaged object segmentation presents unique challenges compared to traditional segmentation tasks, primarily due to the high similarity in patterns and colors between camouflaged objects and their backgrounds. Effective solutions to this problem have significant implications in critical areas such as pest control, defect detection, and lesion segmentation in medical imaging. Prior research has predominantly emphasized supervised or unsupervised pre-training methods, leaving zero-shot approaches significantly underdeveloped. Existing zero-shot techniques commonly utilize the Segment Anything Model (SAM) in automatic mode or rely on vision-language models to generate cues for segmentation; however, their performances remain unsatisfactory, likely due to the similarity of the camouflaged object and the background. Optical flow, commonly utilized for detecting moving objects, has demonstrated effectiveness even with camouflaged entities. Our method integrates optical flow, a vision-language model, and SAM 2 into a sequential pipeline. Evaluated on the MoCA-Mask dataset, our approach achieves outstanding performance improvements, significantly outperforming existing zero-shot methods by raising the F-measure ($F_\beta^w$) from 0.296 to 0.628. Remarkably, our approach also surpasses supervised methods, increasing the F-measure from 0.476 to 0.628. Additionally, evaluation on the MoCA-Filter dataset demonstrates an increase in the success rate from 0.628 to 0.697 when compared with FlowSAM, a supervised transfer method. A thorough ablation study further validates the individual contributions of each component. More details can be found on https://github.com/weathon/vcos.

Leaderboard

leaderboard

Method Overview

flowchart

Performance comparison on the MoCA-Mask dataset

"SV Tr" denotes supervised training, and "SV Te" denotes supervised testing, where one frame from the video was provided to the model along with prompts. "ZS" indicates zero-shot learning, while ZS w/ PK means zero-shot with prior knowledge (since the model already knows it is looking for animals). Our method significantly outperforms all zero-shot and even supervised methods.

Method Pub. Setting $S_{\alpha}$ $F_{\beta}^{w}$ MAE
SLT-Net CVPR 22 SV Tr 0.656 0.357 0.021
ZoomNeXt TPAMI 24 SV Tr 0.734 0.476 0.010
TSP-SAM(M+B) CVPR 24 SV Tr 0.689 0.444 0.008
Gao et al. arXiv 25 SV Tr 0.709 0.451 0.008
------------------------------------- ----------- -------------- -------------- ----------------- --------
SAM2 Tracking arXiv 24 SV Te* 0.804 0.691 0.004
------------------------------------- ----------- -------------- -------------- ----------------- --------
SAM-PM CVPRW 24 SV Tr+Te* 0.728 0.567 0.009
Finetuned SAM2-T + Prompts arXiv 24 SV Tr+Te* 0.832 0.726 0.005
------------------------------------- ----------- -------------- -------------- ----------------- --------
CVP ACM MM 24 ZS 0.569 0.196 0.031
SAM-2-L Auto arXiv 24 ZS 0.447 0.198 0.250
LLaVA + SAM2-L arXiv 24 ZS w/ PK 0.622 0.296 0.047
Shikra + SAM2-L arXiv 24 ZS w/ PK 0.495 0.132 0.107
Ours - ZS w/ PK 0.776 0.628 0.008

Warning: Different methods of calculating IoU can produce inconsistent results. Previous work lacked a standardized evaluation approach, so we did not report IoU for cross-method comparison. For internal comparisons among our methods, we used the SLT-Net evaluation code, which computes IoU per frame, averages across each video, and then averages over the entire dataset.

Setup Instructions

Step 1: Download MoCA-Mask with Precomputed Optical Flow

wget https://zs-vcos.weasoft.com/FMOCA.zip

If the server is down, download from Google Drive. https://drive.google.com/file/d/10D-K2jXZ96BeznXuYcHwom90g6cp_L6Q/view?usp=sharing

Verify file integrity with SHA-512:

eda88bd52daf0b44e20d5c1c545c3f3759e5368c6101a594396f4b1acf3034f812ee7aa19b3eca9203232aa0af922a2d252feec79914b125ccb2d52cf94829cf

Step 2: Download and Install SAM-2

git clone https://github.com/facebookresearch/sam2.git
mv sam2 .sam2
cd .sam2
pip3 install -e .

If installation fails, run:

echo -e '[build-system]\nrequires = [\n    "setuptools>=62.3.0,<75.9",\n    "torch>=2.5.1",\n    ]\nbuild-backend = "setuptools.build_meta"' > pyproject.toml

(See facebookresearch/sam2#611 for more) Then run:

pip3 install -e .

Download the checkpoints:

cd checkpoints
bash download_ckpts.sh

More details: https://github.com/facebookresearch/sam2

Step 3: Configure and Run

Modify run.py to include the following runtime arguments:

  • --video_name: name of the input video (required)
  • --log_path: log file output path (default: output.log)
  • --use_motion_detection: enable motion detection support
  • --output_dir: output directory for processed video (default: output)
  • --positive_prompt: prompt to guide object detection (default: "an animal or insect being highlighted in blue")
  • --threshold: object detection confidence threshold (default: 0.12)
  • --use_bgs: enable background subtraction
  • --no_back_tracking: enable forward-only tracking
  • --momentum: set optical flow momentum (default: 0)
  • --no_mean_sub: disable mean subtraction in optical flow
  • --no_negative_prompt: disable negative prompts in VLM
  • --box_only: use only box prompts for SAM2

Step 4: Evaluation

Open eval/main_MoCa.m, update the file paths to match your local setup, and run the script using MATLAB.

For questions, contact: [email protected]

Testing Visualizations

Arctic Fox – mIoU: 0.842

Arctic Fox

Arctic Fox 3 – mIoU: 0.787

Arctic Fox 3

Black Cat 1 – mIoU: 0.479

Black Cat 1

Copperhead Snake – mIoU: 0.575

Copperhead Snake

Flower Crab Spider 0 – mIoU: 0.761

Flower Crab Spider 0

Flower Crab Spider 1 – mIoU: 0.783

Flower Crab Spider 1

Flower Crab Spider 2 – mIoU: 0.758

Flower Crab Spider 2

Hedgehog 3 – mIoU: 0.502

Hedgehog 3

Ibex – mIoU: 0.615

Ibex

Mongoose – mIoU: 0.388

Mongoose

Moth – mIoU: 0.774

Moth

Pygmy Seahorse 0 – mIoU: 0.000

Pygmy Seahorse 0

Rusty Spotted Cat 0 – mIoU: 0.217

Rusty Spotted Cat 0

Sand Cat 0 – mIoU: 0.613

Sand Cat 0

Snow Leopard 10 – mIoU: 0.468

Snow Leopard 10

Stick Insect 1 – mIoU: 0.246

Stick Insect 1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published