-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CLIP score #1311
Comments
Hi! thanks for your contribution!, great first issue! |
Hi @chinoll, this sounds like a nice addition to add the first multi-model metric. Would you have please any reference implementation? |
Here is at least one reference implementation: |
simple pytorch implementation,Reference CLIP-score-vs-FID-pareto-curves from transformers import CLIPModel,CLIPTokenizer,CLIPFeatureExtractor
import torch
import PIL
version = "openai/clip-vit-large-patch14"
tokenizer = CLIPTokenizer.from_pretrained(version)
model = CLIPModel.from_pretrained(version)
feature_extractor = CLIPFeatureExtractor.from_pretrained(version)
def clip_score(text:str, image:PIL.Image):
txt_features = model.get_text_features(tokenizer(text,return_tensors="pt")["input_ids"])
img_features = model.get_image_features(torch.tensor(feature_extractor(image)['pixel_values'][0][None]))
img_features, txt_features = [
x / torch.linalg.norm(x, axis=-1, keepdims=True)
for x in [img_features, txt_features]
]
return (img_features * txt_features).sum(axis=-1) |
I started the work of adding the metric in #1314 |
🚀 Feature
Calculate the correlation between image and text
Motivation
Evaluating the performance of the text2image model
Pitch
pytorch-like Pseudocode
Alternatives
Additional context
clip score
The text was updated successfully, but these errors were encountered: