This is a chrome extension and flask server that allows you to query the llama-cpp-python models while in the browser. It uses a local server to handle the queries and display the results in a popup.
To learn a little bit about chrome extensions and flask. And make a tool that i can use to query the models while using the browser.
The extension uses the chrome api to get the selected text and send it to the server. The server then queries the model and returns the results to the extension. The extension then displays the results in a popup. The conversations are stored in local storage. and can be cleared with the clear button in the popup.
llama-cpp-python must be installed and some models must be downloaded. See llama-cpp-python for more information. Models available for download from huggingface:
- TheBlokeAI i'v been using:
- TheBlokeAI/Llama-2-7B for my testing but most of the gguf models should work. obviously the bigger the model the slower the query. and the more ram it will use.
- Download the extension from the chrome store
- Pip install the server with
pip install local-llama
- Start the server with
local-llama
- Go to any page and click on the extension icon
- query the model and press enter
- The results will be displayed in the popup
- Clone this repo
- Open Chrome and go to
chrome://extensions/
- Enable developer mode
- Click on
Load unpacked
and select the folder where you cloned this repo - Go to any page and click on the extension icon
- build the package with
python setup.py sdist bdist_wheel
- Install the package with
pip install .
- Start the server with
local-llama
- If this is the first time you are using the extension you will be prompted to enter the path for your default model
- Type in the query and press enter
- The results will be displayed in the popup
- add a server to handle the queries
- add a popup to display the results
- store and retrieve conversations
- clear saved conversations
- add a settings page
- add a way to change the model easily
- turn the server.py into a proper python package, to make it easier to install and use if downloaded from the chrome store
- add a way to change the server address
- hadle when an html is added to the codeblock
- add a way to download models from huggingface
- add a way to start the server from the extension