Skip to content

Conversation

@iamlemec
Copy link
Contributor

@iamlemec iamlemec commented Mar 8, 2024

Due to updates in ggml-org/llama.cpp#5796, sequence level embeddings are now output through a separate channel from token level embeddings, and they are accessed with llama_get_embeddings_seq.

@abetlen
Copy link
Owner

abetlen commented Mar 9, 2024

@iamlemec thank you! Just so I understand, the sequence level embeddings are the ones that are pooled up to the end of the from the last processed batch?

Also, I think the new function in llama_cpp.py is duplicated by accident.

@iamlemec
Copy link
Contributor Author

iamlemec commented Mar 9, 2024

Oh yeah, I didn't see you added it already! Yup, it's the pooled embeddings by sequence for the last batch. So for both mean pooling and cls (first token) it works, and for non-pooling it returns null.

@abetlen abetlen merged commit 2811014 into abetlen:main Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants