v6.0.0
Summary
This is a major release that introduces an OpenAI-compatible server in a completely new serve
tool, support for Quark quantization in the new quark
tool, and many other fixes/improvements.
Breaking Changes
New OpenAI-Compatible Server
The previous serve
Tool
has been replaced by a new standalone serving command. This new server has OpenAI API compatibility and will add Ollama compatibility in the near future.
- Old usage:
lemoande -i CHECKPOINT oga-load --args serve
- New usage:
lemonade serve
, then use REST APIs to control model loading, completions, etc. See https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md to learn more.
The server can also be installed and used with no-code by running Lemonade_Server_Installer.exe
, which is provided as a release asset in this and all future releases.
The server code was also moved out of tools/chat.py into its own file in tools/serve.py. We also renamed chat.py to prompt.py for clarity, since that file now only contains the prompting tool.
The LEAP name has been deprecated
In the interest of reducing naming confusion, the "LEAP API" is now simply the "high-level lemonade API".
- Old usage:
from lemonade.leap import from_pretrained
- New usage:
from lemonade.api import from_pretrained
Summary of Contributions
- The base checkpoint for models is retrieved from the Hugging Face API at loading time (@ramkrishna2910)
- The benchmarking tools (huggingface-bench, oga-bench, and llamacpp-bench) have been refactored to reduce code duplication and improve maintainability. They now also support a list of prompts (or prompt lengths) to be benchmarked:
--prompts 128 256 512
(@amd-pworfolk) - The
avg_accuracy
stats has been renamed toaverage_mmlu_accuracy
for clarity with respect to non-MMLU accuracy tests (@jeremyfowers), (attn @apsonawane) - Introduce
Lemonade_Server_Installer.exe
(@jeremyfowers) - Implement an OpenAI-compatible server and remove the old
serve
tool (@danielholanda) - Rename
chat
module toprompt
(@jeremyfowers) - Improved lemonade getting started documentation and remove the "LEAP" branding (@jeremyfowers)
- OGA 0.6.0 is the default package for CPU, CUDA, and DML (@jeremyfowers)
- Add support for Quark quantization with a new
quark-quantize
tool (@iswaryaalex) - Clean up the lemonade getting started docs and remove some deprecated tools (@jeremyfowers)
New Contributors
- @iswaryaalex made their first contribution in #290
Full Changelog: v5.1.1...v6.0.0