Embedding models from Jina AI
Jina AI Launches World's First Open-Source 8K Text Embedding, Rivaling OpenAI introduces these models.
See also Embeddings: What they are and why they matter for background on embeddings and an explanation of the LLM embeddings tool.
Here's my blog post about how I built this plugin.
Install this plugin in the same environment as LLM.
llm install llm-embed-jina
This plugin adds support for three new embedding models:
jina-embeddings-v2-small-en
: 33 million parameters.jina-embeddings-v2-base-en
: 137 million parameters.jina-embeddings-v2-large-en
: 435 million parameters - not yet released, but it will work once it has been released.
The models will be downloaded the first time you try to use them.
See the LLM documentation for everything you can do.
To get started embedding a single string, run the following:
llm embed -m jina-embeddings-v2-small-en -c 'Hello world'
This will output a JSON array of 512 floating point numbers to your terminal.
To calculate and store embeddings for every README in the current directory (try this somewhere with a node_modules
directory to get lots of READMEs) run this:
llm embed-multi jina-readmes \
-m jina-embeddings-v2-small-en \
--files . '**/README.md' --store
Then you can run searches against them like this:
llm similar jina-readmes -c 'utility functions'
Add | jq
to pipe it through jq for pretty-printed output, or | jq .id
to just see the matching filenames.
To set up this plugin locally, first checkout the code. Then create a new virtual environment:
cd llm-embed-jina
python3 -m venv venv
source venv/bin/activate
Now install the dependencies and test dependencies:
llm install -e '.[test]'
To run the tests:
pytest