Img2txt result is pretty bad on 16bit #114

billzhao9 · 2023-09-25T19:21:29Z

Don't know if I did anything wrong, but the result was not like the example.

import os
from core.models.model_module_infer import model_module

model_load_paths = ['CoDi_encoders.pth', 'CoDi_text_diffuser.pth', 'CoDi_audio_diffuser_m.pth', 'CoDi_video_diffuser_8frames.pth']
inference_tester = model_module(data_dir='checkpoints/', pth=model_load_paths, fp16=True) # turn on fp16=True if loading fp16 weights
inference_tester = inference_tester.cuda()
inference_tester = inference_tester.eval()

from PIL import Image
im = Image.open('./assets/demo_files/house.jpeg').resize((224,224))
im
text = inference_tester.inference(
xtype = ['text'],
condition = [im],
condition_types = ['image'],
n_samples = 4,
ddim_steps = 50,
scale = 7.5,)
text[0]
Data shape for DDIM sampling is [[4, 768]], eta 0.0
DDIM Sampler: 100%|██████████| 50/50 [00:01<00:00, 29.62it/s]
['oriental book examines a bag with bright spots.',
'a street view of kitchen toys and tv.',
'a is also carrying a ton of blue and green spandex around and pedals.',
'woman and a white bird crossing in the field.']

The sample image is a house.. but somehow the result is very strange.

zinengtang · 2023-09-25T21:40:30Z

What is your transformers version? Did you install requirements.txt

billzhao9 · 2023-09-25T22:00:20Z

Yes. I have all modules installed as requirements.txt listed. However, the transformers version is 4.33.2. Did it impact the result?

zinengtang · 2023-09-26T03:10:19Z

I think so. The transformers version in requirements.txt is 4.26.0. The higher version can cause some mismatch to the code.

jacklishufan · 2023-10-26T06:08:13Z

@billzhao9 I might have similar issues with Image Encoder. Were you able to fix the issue by fixing transformer version?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Img2txt result is pretty bad on 16bit #114

Img2txt result is pretty bad on 16bit #114

billzhao9 commented Sep 25, 2023

zinengtang commented Sep 25, 2023

billzhao9 commented Sep 25, 2023

zinengtang commented Sep 26, 2023

jacklishufan commented Oct 26, 2023

Img2txt result is pretty bad on 16bit #114

Img2txt result is pretty bad on 16bit #114

Comments

billzhao9 commented Sep 25, 2023

zinengtang commented Sep 25, 2023

billzhao9 commented Sep 25, 2023

zinengtang commented Sep 26, 2023

jacklishufan commented Oct 26, 2023