Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Img2txt result is pretty bad on 16bit #114

Open
billzhao9 opened this issue Sep 25, 2023 · 4 comments
Open

Img2txt result is pretty bad on 16bit #114

billzhao9 opened this issue Sep 25, 2023 · 4 comments

Comments

@billzhao9
Copy link

Don't know if I did anything wrong, but the result was not like the example.

import os
from core.models.model_module_infer import model_module

model_load_paths = ['CoDi_encoders.pth', 'CoDi_text_diffuser.pth', 'CoDi_audio_diffuser_m.pth', 'CoDi_video_diffuser_8frames.pth']
inference_tester = model_module(data_dir='checkpoints/', pth=model_load_paths, fp16=True) # turn on fp16=True if loading fp16 weights
inference_tester = inference_tester.cuda()
inference_tester = inference_tester.eval()

from PIL import Image
im = Image.open('./assets/demo_files/house.jpeg').resize((224,224))
im
text = inference_tester.inference(
xtype = ['text'],
condition = [im],
condition_types = ['image'],
n_samples = 4,
ddim_steps = 50,
scale = 7.5,)
text[0]
Data shape for DDIM sampling is [[4, 768]], eta 0.0
DDIM Sampler: 100%|██████████| 50/50 [00:01<00:00, 29.62it/s]
['oriental book examines a bag with bright spots.',
'a street view of kitchen toys and tv.',
'a is also carrying a ton of blue and green spandex around and pedals.',
'woman and a white bird crossing in the field.']

The sample image is a house.. but somehow the result is very strange.

@zinengtang
Copy link
Collaborator

What is your transformers version? Did you install requirements.txt

@billzhao9
Copy link
Author

Yes. I have all modules installed as requirements.txt listed. However, the transformers version is 4.33.2. Did it impact the result?

@zinengtang
Copy link
Collaborator

I think so. The transformers version in requirements.txt is 4.26.0. The higher version can cause some mismatch to the code.

@jacklishufan
Copy link

@billzhao9 I might have similar issues with Image Encoder. Were you able to fix the issue by fixing transformer version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants