Inference Script #10

jia11112727 · 2025-01-20T04:21:07Z

Hello. Thank you for Great Work! I encountered an issue while using the Pangea-7B-hf fine-tuned multilingual model. The model itself is multimodal, but I only need to use text input and do not require the image functionality. When running the model, I received an error indicating that image input is required, but I do not intend to use image input. Is there a way to disable image input and only use the text functionality? Below is the code template I am using:

Assuming that you have text_input and image_path

from transformers import LlavaNextForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

image_input = Image.open(image_path)

model = LlavaNextForConditionalGeneration.from_pretrained(
"neulab/Pangea-7B-hf",
torch_dtype=torch.float16
).to(0)
processor = AutoProcessor.from_pretrained("neulab/Pangea-7B-hf")
model.resize_token_embeddings(len(processor.tokenizer))

text_input = f"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n\n{text_input}<|im_end|>\n<|im_start|>assistant\n"
model_inputs = processor(images=image_input, text=text_input, return_tensors='pt').to("cuda", torch.float16)
output = model.generate(**model_inputs, max_new_tokens=1024, min_new_tokens=32, temperature=1.0, top_p=0.9, do_sample=True)
output = output[0]
result = processor.decode(output, skip_special_tokens=True, clean_up_tokenization_spaces=False)

print(result)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Script #10

Inference Script #10

jia11112727 commented Jan 20, 2025

Inference Script #10

Inference Script #10

Comments

jia11112727 commented Jan 20, 2025

Assuming that you have text_input and image_path