You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I checked the predict code and paper. It seems you added the quantized image tokens to pretrained language tokenizer. In other papers, Some people separate the tokenizer of language and images, and the image feature are concatenated with the embedding of language through a linear layer. Have you tried this method?
The text was updated successfully, but these errors were encountered:
We added the quantized image tokens to pretrained language tokenizer to unify the representation of image and text tokens, and the LLM is trained to optimize the visual embeddings. We did not try the latter method.
I checked the predict code and paper. It seems you added the quantized image tokens to pretrained language tokenizer. In other papers, Some people separate the tokenizer of language and images, and the image feature are concatenated with the embedding of language through a linear layer. Have you tried this method?
The text was updated successfully, but these errors were encountered: