Precision does not work when requesting token_embeddings

Hi! I've been experimenting with token embeddings recently and found out I cannot reduce the precision if I request them.

```python
emb = model.encode(
    [
        "Hello Word, a test sentence",
        "Here comes another sentence",
        "My final sentence",
    ],
    output_value="token_embeddings",
    precision="uint8"
)
```

The `quantize_embeddings` function raises a ValueError since the shapes of token embedding arrays are unaligned and cannot be combined into a single numpy array. Here is part of the stack trace:

```
  File "/home/kacper/.cache/pypoetry/virtualenvs/beir-qdrant-wY0sLiQM-py3.10/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 553, in encode
    all_embeddings = quantize_embeddings(all_embeddings, precision=precision)
  File "/home/kacper/.cache/pypoetry/virtualenvs/beir-qdrant-wY0sLiQM-py3.10/lib/python3.10/site-packages/sentence_transformers/quantization.py", line 400, in quantize_embeddings
    embeddings = np.array(embeddings)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.
```

@tomaarsen I'm happy to provide a PR fixing this. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Precision does not work when requesting token_embeddings #2882

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Precision does not work when requesting token_embeddings #2882

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions