Skip to content

give GPTs access to project files, docs and websites that will be stored in a db for long term memory

Notifications You must be signed in to change notification settings

elhenro/code-seek

Repository files navigation

code-seek

give GPT access to project files, docs and websites

It is acomplished with the retrieval plugin, which allows models to perform semantic searches against a vector database (chroma).

The seek.py script will scrape web and local content, store it in a vector db, then use it with openai models to answer queries.

Initially, it reads a list of URLs from a file, scrapes the web content from these URLs, and stores text. Then, it creates an index from the stored data and other files in the project directory, and uses this index to power a question-answering system, to answer questions based on the indexed data.

Features

  • Web Scraping
  • Data Storage
  • Sqlite Database (for history, with custom loader)
  • Recursive Document Finding and Loading
  • Index Creation
  • Retrieval-Based Question Answering
  • Interactive Chat Interface
  • Command Line Arguments

Requirements

  • Python version 3.10.12 or higher

Installation

  1. Install the required Python packages:
pip install -r requirements.txt
  1. Add your OpenAI API key to constants.py.

  2. Create a symbolic to your project directory:

ln -s ~/path/to/your-project project

  1. Add any URLs for documentation that should be scraped to the urls.txt file (one URL per line).

usage

basic usage

This command builds a new vector store each time it's run:

python seek.py -q "Summarise what my web application does in 3 sentences."

Usage with persistent vector store

This command saves the vector store locally for faster subsequent queries:

python seek.py -q "Summarise what my web application does in 3 sentences." --persist

Defaults to False if not specified.

Specify Model

python seek.py -q ".." -m gpt-4

or

python seek.py -q ".." -m gpt-3.5-turbo

Defaults to gpt-3.5-turbo if not specified.

Specify Number of Results to Return from Chroma

python seek.py -q ".." -k 10

Note that high values can lead to long runtimes and there are context length limits for GPT.

Defaults to 5 if not specified.

History

To deactivate history, use the --no-history flag:

python seek.py -q ".." --no-history

History is enabled by default.

It is stored in a local sqlite database.

Check size of vector store

du -sh persist

Additional Tools

The generateMetadata.py script can be used to read project files and generate a summary and tags for them using GPT.

  • generateMetadata.py

Docker

Build

docker build -t code-seek .

Run

docker run -it --rm code-seek "Summarise what my web application does in 3 sentences."

Privacy

Be aware of data usage policies when using this tool. Use for fun/hobby projects only. See

OpenAI API data usage policies

... OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose...

About

give GPTs access to project files, docs and websites that will be stored in a db for long term memory

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published