Skip to content

Unable to use BLIP2 with caption_coco_opt6.7b at HEAD via salesforce-lavis (also HEAD) #21713

@AstraliteHeart

Description

@AstraliteHeart

System Info

working:

  • transformers version: 4.26.1
  • Platform: Linux-6.0.12-x86_64-with-glibc2.10
  • Python version: 3.8.16
  • Huggingface_hub version: 0.12.0
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

broken:

  • transformers version: 4.27.0.dev0
  • Platform: Linux-6.0.12-x86_64-with-glibc2.10
  • Python version: 3.8.16
  • Huggingface_hub version: 0.12.0
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

@gante @NielsRogge

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. Start with clean env setup via https://github.com/salesforce/LAVIS/blob/main/requirements.txt (transformers-4.26.1)
  2. Run python test_simple.py, model is correctly loaded and prints a caption
  3. pip install --upgrade git+https://github.com/huggingface/transformers (I wanted the new shiny blip2 conversion script so I can conver my finetuned model into HF format)
  4. Resolved https://github.com/huggingface/transformers to commit 8b3db33a763ccef828fca89bac7e6cbff314f131
  5. Run python test_simple.py
  6. RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.
import torch
from lavis.models import load_model_and_preprocess
import torch
from PIL import Image
import requests

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model, vis_processors, _ = load_model_and_preprocess(name="blip2_opt", model_type="caption_coco_opt6.7b", is_eval=True, device=device)

url = "..."
raw_image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
data = model.generate({"image": image})
print(data)

Expected behavior

Can use BLIP2 with latest HF

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions