vnc-lm is a Discord bot that integrates leading large language model APIs.
Load and manage language models through local or hosted API endpoints. Configure parameters, split conversations, and refine prompts to improve responses.
Load models using the /model
command. Configure model behavior by adjusting the num_ctx
(context length), system_prompt
(base instructions), and temperature
(response randomness) parameters. The bot sends notifications upon successful model loading. When models load, a thread will be created for the conversation. /model
cannot be used in threads.
The initial prompt in a thread will be scraped for keywords and be used to rename the thread. To change models inside a thread, send +
and a partial piece of the model name you wish to switch to. For example, to switch to Claude Sonnet 3.5, send + Claude
, + Sonnet
, or + 3.5
.
When you switch models mid-conversation, your current conversation history and settings (system_prompt
and temperature
) will remain unchanged.
Resume any conversation just by sending a new message.
Edit any prompt to refine the subsequent model response. The bot generates a new response using your edited prompt, replacing the previous output.
When you edit or delete messages in Discord, these changes are immediately synchronized with the conversation cache and incorporated into the model's context for future responses.
Download new models by sending a model tag link in a channel. Models may not be downloaded inside threads.
https://ollama.com/library/llama3.2:1b-instruct-q8_0
https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/blob/main/Llama-3.2-1B-Instruct-Q8_0.gguf
Local models can be removed with the remove
parameter of /model
.
π‘ Model downloading and removal is turned off by default and can be enabled by configuring the
.env
.
Messages longer than 1500 characters are automatically paginated during generation. Message streaming is available with ollama. Other APIs handle responses quickly without streaming. The context window accepts text files, web links, and images. Deploy using Docker for a simplified setup.
Messages are cached and organized in bot_cache.json
. The entrypoint.sh
script maintains conversation history across Docker container restarts.
While both hosted APIs and Ollama support vision functionality, not all models have vision capabilities.
π‘ Message
stop
to end message generation early.
Docker: Docker is a platform designed to help developers build, share, and run container applications. We handle the tedious setup, so you can focus on the code.
Provider | Description |
---|---|
ollama | Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models. |
OpenRouter | A unified interface for LLMs. Find the best models & prices for your prompts. Use the latest state-of-the-art models from OpenAI, Anthropic, Google, and Meta. |
Mistral | Mistral AI is a research lab building the best open source models in the world. La Plateforme enables developers and enterprises to build new products and applications, powered by Mistral's open source and commercial LLMs. |
Cohere | The Cohere platform builds natural language processing and generation into your product with a few lines of code. Our large language models can solve a broad spectrum of natural language use cases, including classification, semantic search, paraphrasing, summarization, and content generation. |
Groq | Groq technology can be accessed by anyone via GroqCloudβ’, while enterprises and partners can choose between cloud or on-prem AI compute center deployment. |
GitHub Models | If you want to develop a generative AI application, you can use GitHub Models to find and experiment with AI models for free. Once you are ready to bring your application to production, you can switch to a token from a paid Azure account. |
π‘ Each API offers a free tier.
# clone the repository
git clone https://github.com/jake83741/vnc-lm.git
# enter the directory
cd vnc-lm
# rename the env file
mv .env.example .env
Configure the below fields in the .env
:
TOKEN
: Discord bot token from the Discord Developer Portal. Set required bot permissions.
ADMIN
: Discord user ID for model management permissions.
CHARACTER_LIMIT
: Page embed character limit. Default: 1500
REQUIRE_MENTION
: Toggle mention requirement. Default: false
MESSAGE_UPDATE_INTERVAL
: Discord message update frequency. Lower values may trigger rate limits (default: 10)
MESSAGE_UPDATE_INTERVAL
is specific to ollama.
USE_VISION
: Turn vision on or off. Turning vision off will turn OCR on. (default: false)
OLLAMAURL
: ollama server URL. See API documentation. For Docker: http://host.docker.internal:11434
OPENROUTER_API_KEY
: OpenRouter API key from the OpenRouter Dashboard
OPENROUTER_MODELS
: Comma-separated OpenRouter model list
MISTRAL_API_KEY
: Mistral API key from the Mistral Dashboard
MISTRAL_MODELS
: Comma-separated Mistral model list
COHERE_API_KEY
: Cohere API key from the Cohere Dashboard
COHERE_MODELS
: Comma-separated Cohere model list
GROQ_API_KEY
: Groq API key from the Groq Dashboard
GROQ_MODELS
: Comma-separated Groq model list
GITHUB_API_KEY
: GitHub API key from the GitHub Models Dashboard
GITHUB_MODELS
: Comma-separated GitHub model list
Configure at least one API.
## Discord configuration
# Discord bot token
TOKEN=bKZ57JqLJq...
# Administrator Discord user ID
ADMIN=qddSPlT9MG...
# Character limit for page embeds (default: 1500)
CHARACTER_LIMIT=1500
# Require bot mention (default: false)
REQUIRE_MENTION=false
# API response chunk size before message update (default: 10)
MESSAGE_UPDATE_INTERVAL=10
## Generic model configuration
# Turn vision on or off. Turning vision off will turn OCR on. (default: false)
USE_VISION=true
## API configurations
# ollama server URL (default: http://localhost:11434)
# For Docker: http://host.docker.internal:11434
# Leave blank to not use ollama
OLLAMAURL=http://host.docker.internal:11434
# OpenRouter API Key
OPENROUTER_API_KEY=Zmda38qlVH...
# OpenRouter models (comma-separated)
OPENROUTER_MODELS="meta-llama/llama-3.1-405b-instruct:free, anthropic/claude-3.5-sonnet:beta, google/gemini-flash-1.5, "
# Mistral API Key
MISTRAL_API_KEY=0bbFJgRiPg...
# Mistral models (comma-separated)
MISTRAL_MODELS="mistral-large-latest, "
# Cohere API Key
COHERE_API_KEY=ijpyasLrFw...
# Cohere models (comma-separated)
COHERE_MODELS="command-r-plus-08-2024, "
# Groq API Key
GROQ_API_KEY=WRDSyLb11g...
# Groq models (comma-separated)
GROQ_MODELS="llama-3.1-70b-versatile, "
# Github API Key
GITHUB_API_KEY=0gHrfg6RZD...
# Github models (comma-separated)
GITHUB_MODELS="gpt-4o, "
# build the container with Docker
docker compose up --build
π‘ Send
/help
for instructions on how to use the bot.
npm install
npm run build
npm start
Use /model
to load and configure models and to create threads. Quickly adjust model behavior using the optional parameters num_ctx
, system_prompt
, and temperature
. Note that num_ctx
only works with local ollama models.
Switch between models mid-conversation in threads by sending +
followed by part of the model name. For example: + gpt
for GPT-4o, + mistral
for Mistral-Large-Latest, or + google
for Gemini-Flash-1.5
.
Reply split
to any message to create a new thread of the conversation from that point. A diagram of the thread relationship and a summary of the conversation up to the point where it was split will also be sent in the new thread.
Hop between different threads while maintaining separate conversation histories, allowing you to explore different directions with the same or different models.
Edit any prompt to refine a model's response. Each edit automatically generates a new response that replaces the previous one. Your latest edits are saved and will be used for context in future responses.
.
βββ api-connections
β βββ base-client.ts
β βββ factory.ts
β βββ provider
β βββ hosted
β β βββ client.ts
β βββ ollama
β βββ client.ts
βββ bot.ts
βββ commands
β βββ command-registry.ts
β βββ help-command.ts
β βββ loading-comand.ts
β βββ model-command.ts
β βββ remove-command.ts
β βββ services
β β βββ ocr.ts
β β βββ scraper.ts
β βββ stop-command.ts
β βββ thread-command.ts
βββ managers
β βββ cache
β β βββ entrypoint.sh
β β βββ manager.ts
β β βββ store.ts
β βββ generation
β βββ controller.ts
β βββ messages.ts
β βββ pages.ts
β βββ processor.ts
β βββ stream.ts
βββ utilities
βββ index.ts
βββ settings.ts
βββ types.ts
{
"dependencies": {
"@azure-rest/ai-inference": "latest",
"@azure/core-auth": "latest",
"@mozilla/readability": "^0.5.0",
"@types/xlsx": "^0.0.35",
"axios": "^1.7.2",
"cohere-ai": "^7.14.0",
"discord.js": "^14.15.3",
"dotenv": "^16.4.5",
"jsdom": "^24.1.3",
"keyword-extractor": "^0.0.27",
"puppeteer": "^22.14.0",
"sharp": "^0.33.5",
"tesseract.js": "^5.1.0"
},
"devDependencies": {
"@types/jsdom": "^21.1.7",
"@types/node": "^18.15.25",
"typescript": "^5.1.3"
}
}
- Set higher
num_ctx
values when using attachments with large amounts of text. - Vision models may have difficulty with follow-up questions.
This project is licensed under the MIT License.