To set up for training, please download the datasets and update the paths in dataset configs,
e.g., groma/data/configs/vl_finetune.py
.
Dataset | Images | Annotations |
---|---|---|
VL Pretrain & Finetune | ||
COCO | coco_train_2017 | instances_train2017.json |
RefCOCO | refcoco_train.json | |
RefCOCO+ | refcoco+_train.json | |
RefCOCOg | refcocog_train.json | |
LLaVA Instruct | llava_conversation_reasoning.json | |
Flickr30k Entities | flickr_images | flickr_entities_train.json |
Visual Genome* | vg_part1&2 | vg_train_single.json, vg_train_multi.json |
Groma Instruct | groma_instruct.json | |
ShareGPT4V-PT | sharegpt4v_data | share-captioner_coco_lcs_sam_1246k_1107.json |
ShareGPT4V | sharegpt4v_instruct_gpt4-vision_cap100k.json | |
GRIT-20m | grit_images | grit_filtered_10m.json |
Detection Pretrain | ||
COCO | coco_train_2017 | class_agnostic_coco_instances_train2017.json |
Objects365 | objects365_v2 | class_agnostic_obj365v2_train_new.json |
OpenImages | openimages_v6 | class_agnostic_openimages_v6_train_bbox.json |
V3Det | v3det_v1 | class_agnostic_v3det_2023_v1_train.json |
SA1B | sa1b_images | class_agnostic_sa1b_2m.json |
*Note: Please put part_1 and part_2 images under the same folder.