Skip to content

simonw/llm-embed-jina

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-embed-jina

PyPI Changelog Tests License

Embedding models from Jina AI

Background

Jina AI Launches World's First Open-Source 8K Text Embedding, Rivaling OpenAI introduces these models.

See also Embeddings: What they are and why they matter for background on embeddings and an explanation of the LLM embeddings tool.

Here's my blog post about how I built this plugin.

Installation

Install this plugin in the same environment as LLM.

llm install llm-embed-jina

Usage

This plugin adds support for three new embedding models:

The models will be downloaded the first time you try to use them.

See the LLM documentation for everything you can do.

To get started embedding a single string, run the following:

llm embed -m jina-embeddings-v2-small-en -c 'Hello world'

This will output a JSON array of 512 floating point numbers to your terminal.

To calculate and store embeddings for every README in the current directory (try this somewhere with a node_modules directory to get lots of READMEs) run this:

llm embed-multi jina-readmes \
    -m jina-embeddings-v2-small-en \
    --files . '**/README.md' --store

Then you can run searches against them like this:

llm similar jina-readmes -c 'utility functions'

Add | jq to pipe it through jq for pretty-printed output, or | jq .id to just see the matching filenames.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd llm-embed-jina
python3 -m venv venv
source venv/bin/activate

Now install the dependencies and test dependencies:

llm install -e '.[test]'

To run the tests:

pytest