Skip to content

Conversation

@vasqu
Copy link
Contributor

@vasqu vasqu commented Jul 22, 2025

Continuation of #39228 for the VL models

Current inference script for testing (torch 2.6):

import requests
from PIL import Image

from transformers import AutoModelForImageTextToText, AutoProcessor


use_fast = False
model_path = "/raid/anton/code/forks/transformers/src/transformers/models/ernie4_5_vl/AntonV/ErnieVL"

processor_kwargs = {} if not use_fast else {"use_fast": True}
processor = AutoProcessor.from_pretrained(model_path, **processor_kwargs)

model = AutoModelForImageTextToText.from_pretrained(
    model_path,
    device_map="auto",
    dtype="auto",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Only use English during your responses and describe the following image."},
            {"type": "image"},
        ]
    },
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image = Image.open(requests.get("https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example1.jpg", stream=True).raw)
#inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
inputs = processor(text=[text, text], images=[image, image], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=64,
    do_sample=False,
)
print(processor.decode(generated_ids[0][len(inputs['input_ids'][0]):]))
print(processor.decode(generated_ids[1][len(inputs['input_ids'][1]):]))

Output:
The image features a person sitting on a hilltop, gazing out at a vast mountain range. The person is wrapped in a colorful, striped blanket, and their head is covered with a red headscarf. The foreground includes vibrant pink flowers, adding a pop of color to the scene. The background show

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, ernie4_5_moe, ernie4_5_vl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants