[Question] distiluse-base-multilingual-cased-v2 - wrong vector dimension (768 vs 512) in onnx version? #230

do-me · 2023-07-30T16:49:36Z

I was just playing around with the model distiluse-base-multilingual-cased-v2 and noticed that your onnx versions both (quantized and normal) produce embeddings with 768-dimensional vectors instead of 512.

Example:

index.html

<!DOCTYPE html>
<html>
  <head>
    <title>Transformers.js Example</title>
  </head>
  <body>
    <h1>Transformers.js Example</h1>
    <script type="module" src="main.js"></script>
  </body>
</html>

main.js

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/[email protected]';

async function allocatePipeline() {
  let pipe = await pipeline("feature-extraction",
                             "Xenova/distiluse-base-multilingual-cased-v2");
  let out = await await pipe("test", { pooling: 'mean', normalize: true });
  console.log(out);
}
allocatePipeline();

That gives me

Proxy(s) {dims: Array(2), type: 'float32', data: Float32Array(768), size: 768}

However, the model page states

This is a sentence-transformers model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search.

Also, I used the Python package

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/distiluse-base-multilingual-cased-v2')
model.encode("test")

which gives me a correct 512-dimensional embedding.

Am I missing some option here or overseeing the obvious?

The text was updated successfully, but these errors were encountered:

xenova · 2023-07-30T18:01:34Z

Here's the model architecture according to their README:

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Dense({'in_features': 768, 'out_features': 512, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
)

It would appear as though they store the final "dense" layer in a separate folder (https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v2/tree/main/2_Dense) and the ONNX model you're loading was only converted from the pytorch_model.bin in the root directory.

If you try using the HF transformers python library (not sbert), you should also get 768 dimensions, simply because it doesn't know of the existence of the final dense layer.

Regarding a way to fix it, you could perhaps convert the dense layer to ONNX, then use another AutoModel, and pass through the outputs from the transformer (after pooling/normalisation)

do-me · 2023-07-31T07:44:34Z

Thanks for your answer!

You're right with transformers library in Python, it returns a 768 dimensional vector too.

from transformers import AutoModel, AutoTokenizer
import torch 

model_name = "sentence-transformers/distiluse-base-multilingual-cased-v2"  
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

sentence_embedding = outputs[0][0][0]

len(sentence_embedding)
#768

or simply

from transformers import pipeline
pipe = pipeline('feature-extraction', model= "sentence-transformers/distiluse-base-multilingual-cased-v2")
out = pipe('I love transformers!')
len(out[0][0])
#768

where the first tensor ([CLS]) should be the sentence embedding (afaik) according to the BERT paper (right?).

I suppose the dense layer in the sentence transformer models serves only for shortening the tensors and saving memory. It's certainly a nice banana skin to slip over. :D

Are you aware of any way to add the dense layer to the onnx.model so I could create it once for my purpose? I want to avoid loading two models and piping data around.

Also (for anyone reading this in the future), I am not aware of any parameter to ignore the dense layer in sentence transformer models.

xenova · 2023-08-01T17:03:45Z

Are you aware of any way to add the dense layer to the onnx.model so I could create it once for my purpose? I want to avoid loading two models and piping data around.

Maybe @fxmarty or @michaelbenayoun can help with this? It most likely will require some custom config.

do-me added the question Further information is requested label Jul 30, 2023

xenova mentioned this issue Jan 27, 2024

🚀🚀🚀 Transformers.js V3 🚀🚀🚀 #545

Merged

13 tasks

xenova closed this as completed in #545 Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] distiluse-base-multilingual-cased-v2 - wrong vector dimension (768 vs 512) in onnx version? #230

[Question] distiluse-base-multilingual-cased-v2 - wrong vector dimension (768 vs 512) in onnx version? #230

do-me commented Jul 30, 2023 •

edited

Loading

xenova commented Jul 30, 2023 •

edited

Loading

do-me commented Jul 31, 2023 •

edited

Loading

xenova commented Aug 1, 2023

[Question] distiluse-base-multilingual-cased-v2 - wrong vector dimension (768 vs 512) in onnx version? #230

[Question] distiluse-base-multilingual-cased-v2 - wrong vector dimension (768 vs 512) in onnx version? #230

Comments

do-me commented Jul 30, 2023 • edited Loading

xenova commented Jul 30, 2023 • edited Loading

do-me commented Jul 31, 2023 • edited Loading

xenova commented Aug 1, 2023

do-me commented Jul 30, 2023 •

edited

Loading

xenova commented Jul 30, 2023 •

edited

Loading

do-me commented Jul 31, 2023 •

edited

Loading