Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] distiluse-base-multilingual-cased-v2 - wrong vector dimension (768 vs 512) in onnx version? #230

Closed
do-me opened this issue Jul 30, 2023 · 3 comments · Fixed by #545
Labels
question Further information is requested

Comments

@do-me
Copy link
Contributor

do-me commented Jul 30, 2023

I was just playing around with the model distiluse-base-multilingual-cased-v2 and noticed that your onnx versions both (quantized and normal) produce embeddings with 768-dimensional vectors instead of 512.

Example:

index.html

<!DOCTYPE html>
<html>
  <head>
    <title>Transformers.js Example</title>
  </head>
  <body>
    <h1>Transformers.js Example</h1>
    <script type="module" src="main.js"></script>
  </body>
</html>

main.js

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/[email protected]';

async function allocatePipeline() {
  let pipe = await pipeline("feature-extraction",
                             "Xenova/distiluse-base-multilingual-cased-v2");
  let out = await await pipe("test", { pooling: 'mean', normalize: true });
  console.log(out);
}
allocatePipeline();

That gives me

Proxy(s) {dims: Array(2), type: 'float32', data: Float32Array(768), size: 768}

However, the model page states

This is a sentence-transformers model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search.

Also, I used the Python package

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/distiluse-base-multilingual-cased-v2')
model.encode("test") 

which gives me a correct 512-dimensional embedding.

Am I missing some option here or overseeing the obvious?

@do-me do-me added the question Further information is requested label Jul 30, 2023
@xenova
Copy link
Collaborator

xenova commented Jul 30, 2023

Here's the model architecture according to their README:

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)

It would appear as though they store the final "dense" layer in a separate folder (https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v2/tree/main/2_Dense) and the ONNX model you're loading was only converted from the pytorch_model.bin in the root directory.

If you try using the HF transformers python library (not sbert), you should also get 768 dimensions, simply because it doesn't know of the existence of the final dense layer.

Regarding a way to fix it, you could perhaps convert the dense layer to ONNX, then use another AutoModel, and pass through the outputs from the transformer (after pooling/normalisation)

@do-me
Copy link
Contributor Author

do-me commented Jul 31, 2023

Thanks for your answer!

You're right with transformers library in Python, it returns a 768 dimensional vector too.

from transformers import AutoModel, AutoTokenizer
import torch 

model_name = "sentence-transformers/distiluse-base-multilingual-cased-v2"  
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

sentence_embedding = outputs[0][0][0]

len(sentence_embedding)
#768

or simply

from transformers import pipeline
pipe = pipeline('feature-extraction', model= "sentence-transformers/distiluse-base-multilingual-cased-v2")
out = pipe('I love transformers!')
len(out[0][0])
#768

where the first tensor ([CLS]) should be the sentence embedding (afaik) according to the BERT paper (right?).

I suppose the dense layer in the sentence transformer models serves only for shortening the tensors and saving memory. It's certainly a nice banana skin to slip over. :D

Are you aware of any way to add the dense layer to the onnx.model so I could create it once for my purpose? I want to avoid loading two models and piping data around.

Also (for anyone reading this in the future), I am not aware of any parameter to ignore the dense layer in sentence transformer models.

@xenova
Copy link
Collaborator

xenova commented Aug 1, 2023

Are you aware of any way to add the dense layer to the onnx.model so I could create it once for my purpose? I want to avoid loading two models and piping data around.

Maybe @fxmarty or @michaelbenayoun can help with this? It most likely will require some custom config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants