Tokenize Anything via Prompting

Ting Pan^1,2*, Lulu Tang^2*, Xinlong Wang^2¶, Shiguang Shan¹

¹ICT-CAS, ²BAAI
^* Equal Contribution, ^¶Project Lead

[Paper] [🤗 Demo]

We present Tokenize Anything via Prompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning arbitrary regions, with flexible visual prompts (point, box and sketch). The model is trained with exhaustive segmentation masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.

Installation

Preliminaries

torch >= 2.1

flash-attn >= 2.3.3 (for TextGeneration)

gradio-image-prompter (for GradioApp, Install from URL)

Installing Package

Clone this repository to local disk and install:

cd tokenize-anything && pip install .

You can also install from the remote repository:

pip install git+ssh://[email protected]/baaivision/tokenize-anything.git

Quick Start

Development

The TAP models can be used for diverse vision and language tasks.

We adopt a modular design that decouples all components and predictors.

As a best practice, implement your custom predictor and asynchronous pipeline as follows:

from tokenize_anything import model_registry

with <distributed_actor>:
    model = model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
    results = <custom_predictor>(model, *args, **kwargs)

server.collect_results()

See builtin examples (web-demo and evaluations) provided in scripts for more details.

Inference

See Inference Guide.

See Concept Guide.

Evaluation

See Evaluation Guide for TAP-H.

See Evaluation Guide for TAP-L.

See Evaluation Guide for TAP-B.

Models

Model weights

V1.1 Release Notes

Three versions of the model are available with different image encoders.
Use a longer pre-training and fine-tuning schedule (improved segmentation and caption performance).
Apply weight decay for all bias parameters (avoid FP16 overflow in QK matmul).
Sample point prompts from predicted mask instead of GT box during VG training.

Model	Description	Schedule	MD5	Weights
tap_vit_h	ViT-H TAP v1.1 model	(100% SA-1B, 180k), (VG, 50ep)	4bdfb9	🤗 HF link
tap_vit_l	ViT-L TAP v1.1 model	(100% SA-1B, 180k), (VG, 50ep)	c1d41f	🤗 HF link
tap_vit_b	ViT-B TAP v1.1 model	(100% SA-1B, 180k), (VG, 50ep)	707f80	🤗 HF link

V1.0 Release Notes

Two versions of the model are available with different image encoders.
Original paper results.

Model	Description	Schedule	MD5	Weights
tap_vit_l	ViT-L TAP v1.0 model	(50% SA-1B, 90k), (VG, 25ep)	03f8ec	🤗 HF link
tap_vit_b	ViT-B TAP v1.0 model	(50% SA-1B, 90k), (VG, 25ep)	b45cbf	🤗 HF link

Concept weights

Note: You can generate these weights following the Concept Guide.

Concept	Description	Weights
Merged-2560	Merged concepts	🤗 HF link
LVIS-1203	LVIS concepts	🤗 HF link
COCO-80	COCO concepts	🤗 HF link

License

Apache License 2.0

Citation

@article{pan2023tap,
  title={Tokenize Anything via Prompting},
  author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
  journal={arXiv preprint arXiv:2312.09128},
  year={2023}
}

Acknowledgement

We thank the repositories: SAM, EVA, LLaMA, FlashAttention, Gradio, Detectron2 and CodeWithGPU.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
notebooks		notebooks
scripts		scripts
tokenize_anything		tokenize_anything
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tokenize Anything via Prompting

Installation

Preliminaries

Installing Package

Quick Start

Development

Inference

Evaluation

Models

Model weights

V1.1 Release Notes

V1.0 Release Notes

Concept weights

License

Citation

Acknowledgement

About

Releases

Packages

Contributors 4

Languages

License

baaivision/tokenize-anything

Folders and files

Latest commit

History

Repository files navigation

Tokenize Anything via Prompting

Installation

Preliminaries

Installing Package

Quick Start

Development

Inference

Evaluation

Models

Model weights

V1.1 Release Notes

V1.0 Release Notes

Concept weights

License

Citation

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages