ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models

Abstract

Service robots operating in unstructured environments must effectively recognize and segment unknown objects to enhance their functionality. Traditional supervised learningbased segmentation techniques require extensive annotated datasets, which are impractical for the diversity of objects encountered in real-world scenarios. Unseen Object Instance Segmentation (UOIS) methods aim to address this by training models on synthetic data to generalize to novel objects, but they often suffer from the simulation-to-reality gap. This paper proposes a novel approach (ZISVFM) for solving UOIS by leveraging the powerful zero-shot capability of the segment anything model (SAM) and explicit visual representations from a selfsupervised vision transformer (ViT). The proposed framework operates in three stages: (1) generating object-agnostic mask proposals from colorized depth images using SAM, (2) refining these proposals using attention-based features from the selfsupervised ViT to filter non-object masks, and (3) applying K-Medoids clustering to generate point prompts that guide SAM towards precise object segmentation. Experimental validation on two benchmark datasets and a self-collected dataset demonstrates ZISVFM’s superior performance in complex environments, including hierarchical settings such as cabinets, drawers, and handheld objects.

Method Overview

Overview of the proposed methodology. This approach employs two vision foundation models: SAM for segmentation and ViT, trained with DINO, for feature description in a scene. The process consists of three main stages: 1) Generating object-agnostic mask proposals using SAM on colorized depth images; 2) Refinement of object masks by removing non-object masks based on explicit visual representations from a self-supervised ViT; 3) Point prompts derived from clustering centres within each object's proposal further optimise object segmentation performance.

Installation

To install and run this project using a Conda environment, follow these steps:

Clone the Repository

git clone https://github.com/Yinmlmaoliang/zisvfm.git
cd zifvfm

Create and Activate a Conda Environment

conda create --name zisvfm python=3.9
conda activate zisvfm

Install Dependencies
```
pip install -r requirements.txt
```

Usage

Demo

We have provided a demo.ipynb jupyter notebook to easily run predictions using our model.

Testing on the OCID dataset and the OSD dataset.

The code used to evaluate our model performance is from UOAIS. Thanks to the authors for sharing the code!

Visualisation Results

Acknowledgements

We thank the authors of the following repositories for their outstanding work and open-source contributions:

DINOv2 - The official implementation of DINOv2 self-supervised learning method.
Segment Anything - The official implementation of the Segment Anything Model (SAM).
Dinov2_public - A public implementation of DINOv2 with additional features.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config		config
images		images
media		media
model		model
README.md		README.md
demo.ipynb		demo.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models

Abstract

Method Overview

Installation

Usage

Demo

Testing on the OCID dataset and the OSD dataset.

Visualisation Results

Acknowledgements

About

Releases

Packages

Languages

Yinmlmaoliang/zisvfm

Folders and files

Latest commit

History

Repository files navigation

ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models

Abstract

Method Overview

Installation

Usage

Demo

Testing on the OCID dataset and the OSD dataset.

Visualisation Results

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages