Skip to content

python sentence transformer all-MiniLM-L6-v2 is almost 2x faster than candle #2418

@AbhishekBose

Description

@AbhishekBose

Here is my candle implementation: (Taken from the examples itself)

`pub fn encode(&self, prompt: &str) -> Result<(Tensor,Tensor)> {

    let tokens = self.tokenizer
        .encode(prompt, true)
        .map_err(E::msg)?
        .get_ids()
        .to_vec();
    let token_ids = Tensor::new(&tokens[..],&self.device )?.unsqueeze(0)?;
    let token_type_ids = token_ids.zeros_like()?;
    let embeddings =self.model.forward(&token_ids, &token_type_ids)?;
    Ok((embeddings,token_ids))
}`

and here is the python implementation

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') embeddings = model.encode([request.query])

On comparing just the encoding times, this is what I get

Encoding time for rust:

Time taken for encoding: 57.054333ms
Time taken for encoding: 59.913916ms
Time taken for encoding: 55.118625ms
Time taken for encoding: 51.580917ms
Time taken for encoding: 60.823625ms
Time taken for encoding: 56.318333ms
Time taken for encoding: 52.357875ms
Time taken for encoding: 82.0645ms
Time taken for encoding: 52.349709ms
Time taken for encoding: 63.768209ms
Time taken for encoding: 55.508666ms

Encoding time for python:

Time taken for encoding: 33.95 ms
Time taken for encoding: 124.68 ms
Time taken for encoding: 54.5 ms
Time taken for encoding: 30.46 ms
Time taken for encoding: 20.73 ms
Time taken for encoding: 26.07 ms
Time taken for encoding: 37.49 ms
Time taken for encoding: 24.42 ms
Time taken for encoding: 36.08 ms
Time taken for encoding: 24.55 ms
Time taken for encoding: 36.13 ms
Time taken for encoding: 29.97 ms
Time taken for encoding: 35.69 ms
Time taken for encoding: 26.8 ms
Time taken for encoding: 31.32 ms
Time taken for encoding: 30.12 ms
Time taken for encoding: 32.37 ms
Time taken for encoding: 34.27 ms
Time taken for encoding: 31.85 ms
Time taken for encoding: 35.78 ms
Time taken for encoding: 44.09 ms
Time taken for encoding: 19.15 ms
Time taken for encoding: 23.83 ms
Time taken for encoding: 33.09 ms
Time taken for encoding: 31.65 ms

Is it the issue of the implementation or am I doing something wrong here?

The experiment was run on an m1 mac air

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions