TuskVector - API Platform 🐘

TuskVector - API Platform 🐘

This API framework first transforms your data in 1536D vectors (as RAGs do), then employs HNSW indexing for efficient information retrieval (again, as RAGs do) using pgvector. In short - it enhances your database with search capabilities before plugging it into further queries. Check it out on https://tuskvector.com

Tech Stack

TuskVector is built with:

Python for the backend (no surprises there)
pgvector for Postgre DB vector functionality (elephants and vectors, get it?)
HNSW for fast approximate nearest neighbor search
OpenAI's ada2 for text embeddings
GPT 4o for LLM queries
FastAPI for building APIs
HTMX as frontend to dodge JavaScript (because apparently, that's a thing now)

Setup

It's packaged and uploaded to PyPI! - check out https://pypi.org/project/tuskvector/ and/or use

pip install tuskvector

API Endpoints - also found on https://tuskvector.com/docs

Vector Embedding (POST /api/embed_text)
- Utilizes OpenAI's text-embedding-ada-002 model
- Generates 1536-dimensional embeddings
- Automatically stores embeddings in pgvector-enabled PostgreSQL database
Similarity Search (POST /api/similarity_search)
- Implements cosine similarity metric
- Utilizes HNSW (Hierarchical Navigable Small World) index for approximate nearest neighbor search
- Configurable search parameters:
  - ef_search: Controls the trade-off between search speed and accuracy
  - Distance threshold: Filters results based on maximum allowed cosine distance
Context-Aware LLM Queries (POST /api/query)
- Integrates with OpenAI's GPT models
- Enhances LLM responses with relevant context from the vector database
- Implements a two-stage retrieval process:
  1. Vector similarity search to find relevant facts
  2. LLM query augmented with retrieved context

Configuration Options

HNSW_M: Maximum number of connections per layer in HNSW index (we went with 16)
HNSW_EF_CONSTRUCTION: Size of the dynamic candidate list for constructing the HNSW graph (we went with 64)
MAX_DISTANCE: Cosine distance threshold for similarity search (we went with 0.1)

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
unit_tests		unit_tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TuskVector - API Platform 🐘

Tech Stack

Setup

API Endpoints - also found on https://tuskvector.com/docs

Configuration Options

About

Releases 5

Packages

Languages

niklas-xgh-dev/tuskvector

Folders and files

Latest commit

History

Repository files navigation

TuskVector - API Platform 🐘

Tech Stack

Setup

API Endpoints - also found on https://tuskvector.com/docs

Configuration Options

About

Topics

Resources

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages