Token Embeddings Alignment for Cross-Modal Retrieval

PyTorch implementation and pretrained models of TEAM. A new dataset which contains over 100M Chinese image-text pairs will also be released.

Pretrained Models

We provide three pre-trained models:

pretrained_4m.pth: TEAM with ViT-B/16 (initialized by DeiT-base) as image encoder, pre-trained on 4 millions of image-text pairs.

pretrained_14m_clip_large.pth: TEAM with ViT-L/14 (initialized by CLIP-L/14) as image encoder, pre-trained on 14 millions of image-text pairs.

Both of them can be found here

Besides, we also release TEAM trained on our collected Chinese image-text dataset, please refere to TEAM图文检索模型-中文-large for more details.

Evaluation

To evaluate the pretrained_14m_clip_large.pth on COCO Retrieval task, you can run:

python -m eval configs/pretrain_5m/team_clipl14.py

Note that the results of the second stage is the final results.

Training

To train TEAM with ViT-L/14 as image encoder on 4 millions of image-text pairs:

python -m torch.distributed.launch --nproc_per_node=8 train.py configs/pretrain_5m/team_clipl14.py

Experimental Results

COCO Retrieval

	Zero-shot						Finetune
	Text Retrieval			Image Retrieval			Text Retrieval			Image Retrieval
	R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10	R@1	R@5	R@10
pretrained_4m	74.9	91.8	95.3	54.7	79.5	86.6	77.3	93.6	96.5	59.7	83.2	89.4
pretrained_14m_clip_large	82.8	95.6	97.6	63.9	85.1	90.4	84.0	96.1	98.0	66.9	87.0	92.1

Citation

If you find this repository useful, please consider citing our paper:

@inproceedings{TEAM2022MM,
  title = {Token Embeddings Alignment for Cross-Modal Retrieval},
  author = {Xie, Chen-Wei and Wu, Jianmin and Zheng, Yun and Pan, Pan and Hua, Xian-Sheng},
  booktitle = {ACMMM},
  year = {2022}
}

Some code is borrowed from ALBEF and CLIP. Thanks a lot to them.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs/pretrain_5m		configs/pretrain_5m
datasets		datasets
figs		figs
models		models
optimizer		optimizer
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt
run.sh		run.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Token Embeddings Alignment for Cross-Modal Retrieval

Pretrained Models

Evaluation

Training

Experimental Results

COCO Retrieval

Citation

About

Releases

Packages

Languages

License

Eniac-Xie/TEAM

Folders and files

Latest commit

History

Repository files navigation

Token Embeddings Alignment for Cross-Modal Retrieval

Pretrained Models

Evaluation

Training

Experimental Results

COCO Retrieval

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages