AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

The official PyTorch implementation for "AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors", ICLR 2025

Authors: Ruoxuan Feng, Jiangyu Hu, Wenke Xia, Tianci Gao, Ao Shen, Yuhao Sun, Bin Fang, Di Hu

Accepted by: International Conference on Learning Representations (ICLR 2025)

Resources:[Project Page],[ArXiv],[Checkpoint],[Dataset]

If you have any questions, please open an issue or send an email to [email protected].

Introduction

Tactile perception is crucial for humans to perceive the physical world. Over the years, various visuo-tactile sensors have been designed to endow robots with human-like tactile perception abilities. However, the low standardization of visuo-tactile sensors has hindered the development of a powerful tactile perception system. In this work, we present TacQuad, an aligned multi-modal multi-sensor tactile dataset that enables the explicit integration of sensors. Building on this foundation and other open-sourced tactile datasets, we propose learning unified representations from both static and dynamic perspectives to accommodate a range of tasks. We introduce AnyTouch, a unified static-dynamic multi-sensor tactile representation learning framework with a multi-level architecture, enabling comprehensive static and real-world dynamic tactile perception.

TacQuad Dataset

TacQuad is an aligned multi-modal multi-sensor tactile dataset collected from 4 types of visuo-tactile sensors (GelSight Mini, DIGIT, DuraGel and Tac3D). It offers a more comprehensive solution to the low standardization of visuo-tactile sensors by providing multi-sensor aligned data with text and visual images. This dataset includes two subsets of paired data with different levels of alignment:

Fine-grained spatio-temporal aligned data: This portion of the data was collected by pressing the same location of the same object at the same speed with the four sensors. It contains a total of 17,524 contact frames from 25 objects, which can be used for fine-grained tasks such as cross-sensor generation.
Coarse-grained spatial aligned data: This portion of the data was collected by hand, with the four sensors pressing the same location on the same object, although temporal alignment is not guaranteed. It contains 55,082 contact frames from 99 objects, including both indoor and outdoor scenes, which can be used for cross-sensor matching task.

We also use GPT-4o to generate or expand the text modality for several open-sourced tactile datasets. The TacQuad dataset and text prompt for other datasets are hosted on HuggingFace.

AnyTouch Model

AnyTouch is a unified static-dynamic multi-sensor tactile representation learning framework which integrates the input format of tactile images and videos. It learns both fine-grained pixel-level details for refined tasks and semantic-level sensor-agnostic features for understanding properties and building unified space by a multi-level structure.

The checkpoint for AnyTouch is provided below:

	Training Data	TAG (M/R/H)*	Feel (Grasp)	OF 1.0	OF 2.0
AnyTouch	TAG, VisGel, Cloth, TVL, SSVTP, YCB-Slide, OF Real, Octopi, TacQuad	80.82 / 86.74 / 94.68	80.53	49.62	76.02	Download

*M: Material R: Roughness H: Hardness

Setup

This code is tested in Ubuntu 20.04, PyTorch 2.1.0, CUDA 11.8

Install the requirements

# Optionally create a conda environment
conda create -n anytouch python=3.9
conda activate anytouch
conda install pytorch==2.1.0 torchvision==0.16.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Linear Probing

0. Initialization

AnyTouch model is initialized from CLIP-ViT-L-14-DataComp.XL-s13B-b90K.

Then, download the checkpoint of AnyTouch to log/checkpoint.pth.

1. Data Preparation

Download and process Touch and Go, ObjectFolder 1.0, ObjectFolder 2.0 and Feel datasets to tactile_datasets/

2. Run Linear Probing

To evaluate the AnyTouch checkpoint through linear probing:

./run_probe_TAG.sh
./run_probe_OF1.sh
./run_probe_OF2.sh
./run_probe_Feel.sh

Train

0. Initialization

AnyTouch model is initialized from CLIP-ViT-L-14-DataComp.XL-s13B-b90K. To use our AnyTouch model, you need to download CLIP-ViT-L-14-DataComp.XL-s13B-b90K/pytorch_model.bin first.

1. Data Preparation

Download and process Touch and Go, VisGel, Cloth, TVL, SSVTP, YCB-Slide, ObjectFolder Real, Octopi and TacQuad datasets to tactile_datasets/. The TacQuad dataset and text prompt of other datasets can be downloaded here.

2. Train

To train AnyTouch model:

# First Stage (MAE)
./train_stage1.sh

# Second Stage (Align + Match)
./train_stage2.sh

Citation

@inproceedings{feng2025learning,
	title={Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors},
	author={Ruoxuan Feng and Jiangyu Hu and Wenke Xia and Tianci Gao and Ao Shen and Yuhao Sun and Bin Fang and Di Hu},
	booktitle={The Thirteenth International Conference on Learning Representations},
	year={2025},
	url={https://openreview.net/forum?id=XToAemis1h}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
CLIP-ViT-L-14-DataComp.XL-s13B-b90K		CLIP-ViT-L-14-DataComp.XL-s13B-b90K
assest		assest
dataloader		dataloader
model		model
util		util
LICENSE		LICENSE
README.md		README.md
config.py		config.py
config_probe.py		config_probe.py
main_probe.py		main_probe.py
main_stage1.py		main_stage1.py
main_stage2.py		main_stage2.py
probe_engine.py		probe_engine.py
requirements.txt		requirements.txt
run_probe_Feel.sh		run_probe_Feel.sh
run_probe_OF1.sh		run_probe_OF1.sh
run_probe_OF2.sh		run_probe_OF2.sh
run_probe_TAG.sh		run_probe_TAG.sh
stage1_engine.py		stage1_engine.py
stage2_engine.py		stage2_engine.py
train_stage1.sh		train_stage1.sh
train_stage2.sh		train_stage2.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

Introduction

TacQuad Dataset

AnyTouch Model

Setup

Linear Probing

Train

Citation

About

Releases

Packages

Languages

License

pankhurivanjani/AnyTouch

Folders and files

Latest commit

History

Repository files navigation

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

Introduction

TacQuad Dataset

AnyTouch Model

Setup

Linear Probing

Train

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages