Skip to content

Latest commit

 

History

History
275 lines (168 loc) · 7.09 KB

README.md

File metadata and controls

275 lines (168 loc) · 7.09 KB

Ollama Copilot

Ollama Copilot is a UI for Ollama on Windows that uses Windows Forms.

Copilot responses can be automatically forward to other applications just like other paid copilots.

The Ollama Copilot has other features like speech to text, text to speech, and OCR all using free open-source software.

Check out Releases for the latest installer.

Screenshots

image_1

Videos

Playlist

Overview of Ollama Copilot

Ollama Copilot v1.0.0

Youtube Transcripts v1.0.1

Speech to Text v1.0.2

Text to Speech v1.0.3

Optical Character Recognition v1.0.4

Dependencies

Visual Studio Build Dependencies

image_5

  • The project uses Newtonsoft JSON so right-click the solution in solution explorer to select Restore NuGet Packages

image_2

  • Build and run the application

image_3

Feature Dependencies

Ollama with Windows preview

ollama run llama3
  • Install the llama2 model
ollama run llama2
  • Install the qwen model
ollama run qwen:4b
  • Install the llava model
ollama run llava
  • Install the Phi 3 model
ollama run phi3
  • Install the gemma model (7B default)
ollama run gemma
  • You can remove the gemma model (7B)
ollama rm gemma
  • to install the smaller gemma 2B model
ollama run gemma:2b

Ollama with Docker

image_4

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
  • Install the llama2 model to enable the Chat API.
docker exec -it ollama ollama run llama2
  • Install the llava model
docker exec -it ollama ollama run llava
  • Install the gemma model
docker exec -it ollama ollama run gemma
  • Install the mixtral model (requires 48GB of VRAM)
docker exec -it ollama ollama run mixtral

Launch Whisper Server to enable local dictation

WSL2

  • Install Ubuntu 22.04.3 LTS with WSL2

  • Setup Ubuntu for hosting the local Whisper server

sudo apt-get update
sudo apt install python3-pip
sudo apt install uvicorn
pip3 install FastAPI[all]
pip3 install uvloop
pip3 install numpy
sudo apt-get install curl
sudo apt-get install ffmpeg
pip3 install ffmpeg
pip3 install scipy
pip3 install git+https://github.com/openai/whisper.git
  • Run the server
python3 -m uvicorn WhisperServer:app --reload --port 11437

image_6

Useful information on Whisper model sizes.

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

Test the Whisper mode conversion MP3 to text

python3 WhisperTest.py audio.mp3

Launch Pyttsx3 Server to enable text to speech

Windows

  • Install Python from the Microsoft Store app on the Windows host machine which has access to the sound card.

image_7

  • Open the Windows command prompt to install dependencies
pip3 install uvicorn
pip3 install FastAPI[all]
pip3 install pyttsx3
  • Launch the Pyttsx3 Server in the Windows command prompt
python3 -m uvicorn Pyttsx3Server:app --reload --port 11438

Speech Commands

  • "Prompt clear" - Clears the prompt text area

  • "Prompt submit" - Submits the prompt

  • "Response play" - Speaks the response

  • "Response clear" - Clears the response text area

Launch Tesseract-OCR server for real-time OCR

Windows

  • Install pytesseract
pip3 install uvicorn
pip3 install FastAPI[all]
pip install pytesseract
python3 -m uvicorn TesseractOCRServer:app --reload --port 11439 --log-level error