🤖 Multi-modal GPT

Train a multi-modal chatbot with visual and language instructions!

Based on the open-source multi-modal model OpenFlamingo, we create various visual instruction data with open datasets, including VQA, Image Captioning, Visual Reasoning, Text OCR, and Visual Dialogue. Additionally, we also train the language model component of OpenFlamingo using only language-only instruction data.

The joint training of visual and language instructions effectively improves the performance of the model!

Features

Support various vision and language instruction data
Parameter efficient fine-tuning with LoRA
Tuning vision and language at the same time, complement each other

Installation

To install the package in an existing environment, run

https://github.com/vermaprince17/FloRA.git
cd FloRA
pip install -r requirements.txt
pip install -v -e .

or create a new conda environment

conda env create -f environment.yml

Launch Demo Locally

Download the pre-trained weights.

Use this script for converting LLaMA weights to Hugging Face format.

Download the OpenFlamingo pre-trained model from openflamingo/OpenFlamingo-9B.

Download our LoRA Weight from here.

Then place these models in checkpoints folders like this:
```
checkpoints
├── llama-7b_hf
│   ├── config.json
│   ├── pytorch_model-00001-of-00002.bin
│   ├── ......
│   └── tokenizer.model
├── OpenFlamingo-9B
│   └──checkpoint.pt
├──mmgpt-lora-v0-release.pt
```
launch the gradio demo
```
python app.py
```
single example inference : (execute command in inference_cmd.txt; Assumes that there is Flamingo ckpts in checkpoints/OpenFlamingo-9B/checkpoint.pt

python inference.py <path to language ckpts> <path to fine tuned ckpts> <text input> <path to image input>

example: python inference.py openlm-research/open_llama_3B_V2 prod/run_LLama-aokvaq-train_ds_8k/ckpt_per_steps/checkpoint_0_2176.pt "What is this image content?" ./docs/images/demo_image.jpg

Examples

Recipe:

Travel plan:

Movie:

Famous person:

Fine-tuning

Prepare datasets

A-OKVQA

Download annotation from this link and unzip to data/aokvqa/annotations.

It also requires images from coco dataset which can be downloaded from here.
COCO Caption

Download from this link and unzip to data/coco.

It also requires images from coco dataset which can be downloaded from here.
OCR VQA

Download from this link and place in data/OCR_VQA/.
LlaVA

Download from liuhaotian/LLaVA-Instruct-150K and place in data/llava/.

It also requires images from coco dataset which can be downloaded from here.
Mini-GPT4

Download from Vision-CAIR/cc_sbu_align and place in data/cc_sbu_align/.
Dolly 15k

Download from databricks/databricks-dolly-15k and place it in data/dolly/databricks-dolly-15k.jsonl.
Alpaca GPT4

Download it from this link and place it in data/alpaca_gpt4/alpaca_gpt4_data.json.

You can also customize the data path in the configs/dataset_config.py.

Baize

Download it from this link and place it in data/baize/quora_chat_data.json.
PubMedQA

Download it from this link and place it in data/pubmedqa/ori_pqal.json.

Start training

torchrun --nproc_per_node=8 mmgpt/train/instruction_finetune.py \
  --lm_path checkpoints/llama-7b_hf \
  --tokenizer_path checkpoints/llama-7b_hf \
  --pretrained_path checkpoints/OpenFlamingo-9B/checkpoint.pt \
  --run_name train-my-gpt4 \
  --learning_rate 1e-5 \
  --lr_scheduler cosine \
  --batch_size 1 \ 
  --tuning_config configs/lora_config.py \
  --dataset_config configs/dataset_config.py \
  --report_to_wandb

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
configs		configs
docs/images		docs/images
mmgpt		mmgpt
nlp-vlm-project		nlp-vlm-project
notebooks		notebooks
paper_presentation_docs		paper_presentation_docs
vqav2		vqav2
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cmd.txt		cmd.txt
eval_datasets.py		eval_datasets.py
evaluate.py		evaluate.py
inference.py		inference.py
inference_cmd.txt		inference_cmd.txt
mmgpt_VQAv2_eval.ipynb		mmgpt_VQAv2_eval.ipynb
requirements.txt		requirements.txt
robots.txt		robots.txt
setup.py		setup.py
utils.py		utils.py
vqa_metric.py		vqa_metric.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Multi-modal GPT

Features

Installation

Launch Demo Locally

Examples

Recipe:

Travel plan:

Movie:

Famous person:

Fine-tuning

Prepare datasets

Start training

Acknowledgements

About

Releases

Packages

Contributors 3

Languages

License

vermaprince17/FloRA

Folders and files

Latest commit

History

Repository files navigation

🤖 Multi-modal GPT

Features

Installation

Launch Demo Locally

Examples

Recipe:

Travel plan:

Movie:

Famous person:

Fine-tuning

Prepare datasets

Start training

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages