vectorstore

Pure JavaScript implementation of a vector store with similarity search. Runs locally, in Node/Bun/Deno and soon even in your browser. Supports various embedding models, default to nomic-embed-text-v1. Open-source, fast and cost-free.

Features

✅ Search for text similarities, locally, without API key, free of charge
✅ Best in class performance; better than OpenAI (see "The Science" section)
✅ Downloads the model automatically, caches it, executes offline afterwards
✅ Runs Node.js and (soon) in the browser (large download though ~500 MB)
✅ Uses the open-source nomic-embed-text-v1 text embedding model, 8192 token context window
✅ Benchmarked: ~1 GiB memory usage at runtime
✅ Fast! Inference < 0.05 sec. on average (per document)
✅ Available as a simple API
✅ Tree-shakable and side-effect free
✅ Runs on Windows, Mac, Linux, CI tested
✅ First class TypeScript support
✅ Well tested (soon to be... ;-)

Roadmap

HNSW (Hierarchical Navigable Small World graphs) based vector search with complexity scaling O(log(N))
- in best case, based on WebAssembly, using SMID vector instruction set
alternativem WebGPU backend to run expensive vector operations (making use of wasm_webgpu)

Initial results

Demo/Setup

npm install
npm run demo

The Science

If you came here to understand the math behind the scenes, please head on to: https://towardsdatascience.com/text-embeddings-comprehensive-guide-afd97fce8fb5 where Mariya Mansurova wrote an excellent article on Text Embeddings.

Now let's dive deeper into metrics and open-source models: https://towardsdatascience.com/openai-vs-open-source-multilingual-embedding-models-e5ccb7c90f05

This is why I decided to use nomic-embed-text-v1. (Nomic-Embed): The model was designed by Nomic, and claims better performances than OpenAI Ada-002 and text-embedding-3-small while being only 0.55GB in size. Interestingly, the model is the first to be fully reproducible and auditable (open data and open-source training code).

https://huggingface.co/nomic-ai/nomic-embed-text-v1

Example usage (API, as a library)

Setup

yarn: yarn add vectorstore
npm: npm install vectorstore

ESM

import { createDocument, search, type Document } from "vectorstore";
// This text corpus is a collection of documents in different languages, each describing the ocean.
// They share the meaning, but the words and even the symbols used to describe it are different.
// However, using vector embeddings, we can compare the documents and find similarities,
// which allows for cross-lingual search - a search that is made for humans, not machines.
// This quality is, for an open-source model, a major breakthrough.
// Combined with vector embedding search, everyone has access to local, powerful text search now.
// And the best news: It's fast, it's available, it's possible, now, and for free!
const myDocuments = [
  {
    text: "Exploring the depths of the ocean reveals a world beyond imagination.",
    metaData: { id: 1, language: "English" },
  },
  {
    text: "La exploración de las profundidades del océano revela un mundo más allá de la imaginación.",
    metaData: { id: 2, language: "Spanish" },
  },
  {
    text: "探索海洋的深处揭示了一个超乎想象的世界。",
    metaData: { id: 3, language: "Chinese" },
  },
  {
    text: "L'exploration des profondeurs de l'océan révèle un monde au-delà de l'imagination.",
    metaData: { id: 4, language: "French" },
  },
  {
    text: "Die Erforschung der Tiefen des Ozeans offenbart eine Welt jenseits der Vorstellungskraft.",
    metaData: { id: 5, language: "German" },
  },
];

console.log("Text corpus:", myDocuments);

const haystack: Array<Document> = [];

// vectorization (text to embeddings)
for (const doc of myDocuments) {
  haystack.push(await createDocument(doc.text, doc.metaData));
}

// vecotrization of the search string (which doesn't share much text similarity, BUT MEANING)
const needle = await createDocument(
  "Unveiling the mysteries beneath the sea",
);

console.log("Searching for:", "Unveiling the mysteries beneath the sea");

// running cosine similarity search in vector space
const searchResults = search(haystack, needle);

// displaying search results
console.log(
  searchResults.map((result) => ({
    score: result.score,
    id: result.doc.metadata.id,
    language: result.doc.metadata.language, // include language in the result for better context
  })),
);

/** Prints something like this:
 * Searching for: Unveiling the mysteries beneath the sea
  [
    { score: 0.6855015563968822, id: 1, language: 'English' },
    { score: 0.5687096727474149, id: 4, language: 'French' },
    { score: 0.5426440067625005, id: 2, language: 'Spanish' },
    { score: 0.4697886145316811, id: 3, language: 'Chinese' },
    { score: 0.34714563173592217, id: 5, language: 'German' }
  ]
  benchmarked: elapsed secs 0.292
  benchmarked: total memory usage was: 1073.42 MiB
 */

return searchResults;

You can run this exact code as a demo when checking out this repository using git clone, run npm i followed by npm run demo

CommonJS

const { createDocument, search } = require('vectorstore')

// same API like ESM variant

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
scripts		scripts
src		src
test		test
.editorconfig		.editorconfig
.gitignore		.gitignore
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lockb		bun.lockb
demo_results.png		demo_results.png
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vectorstore

Features

Roadmap

Initial results

Demo/Setup

The Science

Example usage (API, as a library)

Setup

ESM

CommonJS

About

Releases

Packages

Languages

License

kyr0/vectorstore

Folders and files

Latest commit

History

Repository files navigation

vectorstore

Features

Roadmap

Initial results

Demo/Setup

The Science

Example usage (API, as a library)

Setup

ESM

CommonJS

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages