koboldcpp-1.63
koboldcpp-1.63
Enable Sound, Press Play
kobo_gif.mp4
- Added support for special tokens in
stop_sequences
. Thus, if you set<|eot_id|>
as a stop sequence and it can be tokenized into a single token, it will just work and function like the EOS token, allowing multiple EOS-like tokens. - Reworked the Automatic RoPE scaling calculations to support Llama3 (just specify the desired
--contextsize
and it will trigger automatically). - Added a console warning if another program is already using the desired port.
- Improved server handling for bad or empty requests, which fixes a potential flooding vulnerability.
- Fixed a scenario where the BOS token could get lost, potentially resulting in lower quality especially during context-shifting.
- Pulled and merged new model support, improvements and fixes from upstream.
- Updated Kobold Lite: Fixed markdown, reworked memory layout, added a regex replacer feature, added aesthetic background color settings, added more save slots, added usermod saving, added Llama3 prompt template
Edit: Something seems to be flagging the CI built binary on windows defender. Replaced it with a locally built one until I can figure it out.
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.