Skip to content

v6.0.0

Compare
Choose a tag to compare
@jeremyfowers jeremyfowers released this 27 Feb 20:35
· 6 commits to main since this release

Summary

This is a major release that introduces an OpenAI-compatible server in a completely new serve tool, support for Quark quantization in the new quark tool, and many other fixes/improvements.

Breaking Changes

New OpenAI-Compatible Server

The previous serve Tool has been replaced by a new standalone serving command. This new server has OpenAI API compatibility and will add Ollama compatibility in the near future.

The server can also be installed and used with no-code by running Lemonade_Server_Installer.exe, which is provided as a release asset in this and all future releases.

The server code was also moved out of tools/chat.py into its own file in tools/serve.py. We also renamed chat.py to prompt.py for clarity, since that file now only contains the prompting tool.

The LEAP name has been deprecated

In the interest of reducing naming confusion, the "LEAP API" is now simply the "high-level lemonade API".

  • Old usage: from lemonade.leap import from_pretrained
  • New usage: from lemonade.api import from_pretrained

Summary of Contributions

  • The base checkpoint for models is retrieved from the Hugging Face API at loading time (@ramkrishna2910)
  • The benchmarking tools (huggingface-bench, oga-bench, and llamacpp-bench) have been refactored to reduce code duplication and improve maintainability. They now also support a list of prompts (or prompt lengths) to be benchmarked: --prompts 128 256 512 (@amd-pworfolk)
  • The avg_accuracy stats has been renamed to average_mmlu_accuracy for clarity with respect to non-MMLU accuracy tests (@jeremyfowers), (attn @apsonawane)
  • Introduce Lemonade_Server_Installer.exe (@jeremyfowers)
  • Implement an OpenAI-compatible server and remove the old serve tool (@danielholanda)
  • Rename chat module to prompt (@jeremyfowers)
  • Improved lemonade getting started documentation and remove the "LEAP" branding (@jeremyfowers)
  • OGA 0.6.0 is the default package for CPU, CUDA, and DML (@jeremyfowers)
  • Add support for Quark quantization with a new quark-quantize tool (@iswaryaalex)
  • Clean up the lemonade getting started docs and remove some deprecated tools (@jeremyfowers)

New Contributors

Full Changelog: v5.1.1...v6.0.0